KR100791478B1

KR100791478B1 - Encoder and decoder for vertex processing, and method thereof

Info

Publication number: KR100791478B1
Application number: KR1020060066587A
Authority: KR
Inventors: 박기현; 정형기; 이광엽
Original assignee: 엠텍비젼 주식회사
Priority date: 2006-07-14
Filing date: 2006-07-14
Publication date: 2008-01-04

Abstract

An encoder and a decoder for vertex processing and its method are provided to perform vertex processing effectively by effectively defining an operation of an operation logical module and facilitating expansion and a change of an execution code afterward. An input unit(1010) receives a command language including an execution code for vertex processing, a target operand, and one or more source operands. A converting unit(1020) encodes the command language into a mechanical language. An output unit(1030) outputs the mechanical language. The mechanical language includes an index field in which an execution code type previously determined according to a type of the execution code and an execution code index are recorded, a target operand field in which a destination address storing an operation result according to the execution code is recorded, and one or more source operand fields in which a source address storing data for performing an operation of the execution code is recorded.

Description

Encoder and decoder for vertex processing and method thereof

도 1은 본 발명의 바람직한 일 실시예에 따른 정점 처리 장치의 구성블록도.1 is a block diagram of a vertex processing apparatus according to an embodiment of the present invention.

도 2는 본 발명의 바람직한 일 실시예에 따른 명령어의 필드 구성을 나타낸 도면.2 is a diagram showing a field configuration of an instruction according to an embodiment of the present invention.

도 3은 실행코드 룩업 테이블에서의 그룹에 따른 명령어 디코딩 필드 영역을 나타낸 도면.3 illustrates an instruction decoding field region according to a group in an executable code lookup table.

도 4는 실행코드 인덱스와 실행코드 간의 실행코드 룩업 테이블의 일례를 나타낸 도면.4 shows an example of an executable code lookup table between an executable code index and an executable code.

도 5는 OpenGL ARB와 Vertex Shader 1.1을 위한 실행코드 룩업 테이블의 설정 방법을 나타낸 도면.5 is a diagram illustrating a method of setting an executable code lookup table for OpenGL ARB and Vertex Shader 1.1.

도 6은 실행코드 룩업 테이블에서의 실행코드의 필드 구성을 나타낸 도면.6 is a diagram showing a field structure of executable code in an executable code lookup table.

도 7은 연산 논리 모듈의 각 연산부에서의 연산 방법을 나타낸 도면.7 is a diagram illustrating a calculation method in each calculation unit of the calculation logic module.

도 8은 기본 연산 필드의 값을 나타낸 도면.8 shows the value of a basic operation field;

도 9는 연산 논리 모듈에서의 다단 파이프라인 구조에 의한 순차적인 연산의 흐름을 나타낸 도면.9 illustrates a flow of sequential operations by a multi-stage pipeline structure in a calculation logic module.

도 10은 본 발명의 바람직한 일 실시예에 따른 인코딩 장치의 구성블록도. 10 is a block diagram of an encoding apparatus according to an embodiment of the present invention.

도 11은 본 발명의 바람직한 일 실시예에 따른 디코딩 장치의 구성블록도. 11 is a block diagram of a decoding apparatus according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : 명령어 인출 모듈100: instruction fetch module

110a : 제1 레지스터110a: first register

120 : 디코딩 모듈120: decoding module

130 : 연산 논리 모듈130: arithmetic logic module

140 : 라이트백 모듈140: lightback module

110b : 제2 레지스터110b: second register

본 발명은 정점 처리를 위한 인코딩 장치, 디코딩 장치에 관한 것으로, 보다 상세하게는 3차원 그래픽 가속을 위한 정점 처리 처리를 위한 명령어 구조의 인코딩 및 디코딩을 위한 장치 및 방법에 관한 것이다. The present invention relates to an encoding apparatus and a decoding apparatus for vertex processing, and more particularly, to an apparatus and a method for encoding and decoding an instruction structure for vertex processing processing for three-dimensional graphics acceleration.

OpenGL ES(Open Graphics Library Embedded System)는 자동차와 각종 설비 및 휴대 장치를 포함하는 임베디드 시스템 상에서의 2차원/3차원 그래픽 기능을 위한 크로스 플랫폼(cross-platform) 응용프로그램 인터페이스(API; Application Program Interface)이다. 이는 PC 환경의 3차원 그래픽 표준인 OpenGL(Open Graphics Library)의 부분집합으로, 소프트웨어 어플리케이션(application)과 하드웨어 또는 소프트웨어의 그래픽 엔진 간의 유연하면서도 강력한 저수준의 인터페이스를 제공한다. OpenGL Open Graphics Library Embedded System (ESG) is a cross-platform application program interface (API) for two-dimensional and three-dimensional graphics functions on embedded systems including automobiles, various equipment and portable devices. to be. It is a subset of the Open Graphics Library (OpenGL), a three-dimensional graphics standard for PC environments, that provides a flexible yet powerful low-level interface between software applications and the graphics engine of hardware or software.

OpenGL ES는 이동 통신 단말기, 개인 휴대 단말기(PDA : Personal Digital Assistant), 휴대형 멀티미디어 단말기(PMP : Portable Multimedia Player) 등의 모바일 장치, 자동차 제어장치, 냉장고 제어장치, 공장로봇 제어장치 등의 임베디드 시스템 하에서 3차원 게임과 다양한 고급 3차원 그래픽 기능을 제공하기 위해 3차원 그래픽 연산을 처리하는 소프트웨어 솔루션이다. OpenGL ES can be used under embedded systems such as mobile communication devices, personal digital assistants (PDAs), mobile multimedia devices (PMPs) and the like, automotive controls, refrigerator controls, and factory robot controls. It is a software solution that processes 3D graphics operations to provide 3D games and a variety of advanced 3D graphics features.

OpenGL ES를 지원하는 그래픽스 하드웨어는 OpenGL ES가 제공하는 3차원 알고리즘을 하드웨어로 구현한 것으로, 3차원 그래픽 연산을 실시간으로 처리하기 위한 장치이다. 기존의 그래픽스 하드웨어는 고정된 알고리즘에 따라 3차원 데이터를 처리하였다.Graphics hardware that supports OpenGL ES is a hardware implementation of the three-dimensional algorithm provided by OpenGL ES, and is a device for processing three-dimensional graphics operations in real time. Conventional graphics hardware processed three-dimensional data according to a fixed algorithm.

임베디드 시스템 중 대표적인 것이 모바일 장치인 휴대형 단말기이다. 현재 3D 그래픽 엔진(즉, 그래픽스 하드웨어)을 탑재하여 출시되는 휴대형 단말기는 다음과 같은 과정을 통해 그래픽 연산을 처리하고 있다. A typical example of an embedded system is a mobile terminal that is a mobile device. Currently, portable terminals equipped with a 3D graphics engine (ie, graphics hardware) are processing graphics operations through the following process.

표현하고자 하는 사물의 모양을 삼각형 형태의 폴리곤 집합으로 구분한다. 여기서, 각 폴리곤을 구성하는 세 개의 꼭지점을 정점(vertex)이라고 한다. 그래픽스 하드웨어는 세 개의 정점의 좌표(position), 색상(color), 법선 벡터(normal vector), 텍스처 좌표(texture coordinate) 등의 데이터를 응용프로그램 인터페이스로부터 입력받는다. The shape of the object to be expressed is divided into a polygon set of triangles. Here, three vertices constituting each polygon are called vertices. The graphics hardware receives data from the application interface, including the three vertices' positions, colors, normal vectors, and texture coordinates.

정점 처리(Vertex Processing) 과정을 통해 입력받은 정점들에 대해 행렬연산을 통해 화면 상에서의 좌표를 결정하고 조명 모델(예를 들어, phong illumination model 등)에 따라 점의 밝기를 결정한다. For vertices received through vertex processing, matrix coordinates are used to determine the coordinates on the screen, and the brightness of the points is determined according to an illumination model (eg, phong illumination model).

그리고 프리미티브 어셈블리(Primitive Assembly) 과정을 통해 좌표 변환 및 조명 계산이 끝난 점들을 모아서 삼각형을 구성한다. 이후 래스터라이저(Rasterizer) 과정을 통해 삼각형이 화면에서 차지하는 픽셀(pixel)들을 결정한다. And through the primitive assembly process, the coordinate transformation and lighting calculation points are collected to form a triangle. The rasterizer process then determines the pixels the triangle occupies on the screen.

그리고 지정된 상태 정보에 따라 래스터라이저 과정을 통해 결정된 픽셀 데이터를 픽셀 처리(pixel processing) 과정(즉, 텍스처 연산, 색상 합계, 안개 효과)을 거쳐 픽셀 데이터의 최종 색상을 결정하고, 렌더링된 픽셀이 출력된다. Based on the specified state information, the pixel data determined through the rasterizer process is processed through pixel processing (ie, texture operation, color sum, and fog effect) to determine the final color of the pixel data, and the rendered pixel is output. do.

정점 처리 과정을 구체화하면, 정점의 좌표를 모델 좌표계에서 스크린 좌표계로 변환하는 과정과, 조명 계산 과정으로 구분된다. When the vertex processing process is specified, it is divided into a process of converting the coordinates of the vertex from the model coordinate system to the screen coordinate system and the lighting calculation process.

정점 데이터에 포함된 정점의 좌표는 모델들이 정의된 좌표계(일반적으로 모델의 중심이 원점이다)에서 여러 모델들이 공존하는 가상세계 좌표계인 월드 좌표계로 변환한다. 즉, 모델 좌표계 상의 점들을 이동, 회전, 크기조절 등의 처리과정을 거쳐 월드 좌표계 상의 점들을 획득한다. 그리고 월드 좌표계 상의 점들을 이동과 회전을 통해 계산되는 카메라를 중심으로 한 좌표계인 뷰 좌표계로의 뷰변환을 하고, 원근투영한 결과에 해당하는 좌표계인 투영 좌표계로의 투영변환을 한다. 투영변환은 뷰 좌표계 상의 점들을 원점에서 멀어질수록 x, y 좌표들을 작게 만드는 과정이다. 그리고 실제 표현하고자 하는 화면의 크기에 따라 크기 변환(뷰포트 스 케일)을 하여 스크린 좌표계 상의 점들로 좌표 변환한다. The coordinates of the vertices included in the vertex data are converted from the coordinate system in which the models are defined (generally, the center of the model is the origin) to the world coordinate system, a virtual world coordinate system in which several models coexist. That is, the points on the world coordinate system are acquired through a process of moving, rotating, and scaling the points on the model coordinate system. Then, the points of the world coordinate system are transformed into the view coordinate system, which is the coordinate system centered on the camera calculated through movement and rotation, and the projection transformation is performed to the projection coordinate system, which is the coordinate system corresponding to the result of perspective projection. Projection transformation is the process of making the x and y coordinates smaller as the points on the view coordinate system move away from the origin. Then, the size is converted (viewport scale) according to the size of the screen to be expressed, and coordinates are converted to points on the screen coordinate system.

그리고 주위의 다른 사물에 의해 반사된 빛이 간접적으로 영향을 주는 빛의 성분인 주변광(Ambient lighting), 물체의 표면에서 산란되어 반사되는 빛의 성분인 산란광(Diffuse lighting), 물체의 표면에서 반사되는 빛이되 특정 방향(눈의 위치를 고려함)을 가지는 반사광(Specular lighting)을 합하여 정점 색상을 결정하는 조명 계산을 한다. Ambient lighting, a component of light that is indirectly influenced by other objects around it, diffuse lighting, which is a component of light scattered and reflected from the surface of an object, and reflection from the surface of an object. It calculates lighting to determine vertex color by adding specular lighting that is light but has a specific direction (considering eye position).

상술한 것과 같은 정점 처리를 위한 명령어는 실행코드(Opcode)와 하나 이상의 오퍼런드(Operand)로 구성된다. 이러한 정점 처리를 위한 명령어는 OpenGL ARB와 Vertex Shader 1.1에서 매크로 명령어 수준으로 공개하고 있으며, 세부적인 사항은 개발사에서 정의하도록 하고 있어, 각 개발사마다 서로 다른 명령어 체계를 갖추고 있다. An instruction for vertex processing as described above consists of an executable code and one or more operands. These vertex processing instructions are disclosed at the macro instruction level in OpenGL ARB and Vertex Shader 1.1, and the details are defined by the developer, and each developer has a different instruction system.

종래 명령어에 포함된 실행코드를 실행하여 연산 처리함에 있어서 지연시간(Latency)은 연산 스테이지 수로 계산할 때 1~8로 다양하다. 여기서, 연산 스테이지 수라 함은 명령어 수행시 명령어 인출(Instruction fetch)에서부터 라이트백(Write back) 동작까지의 1 사이클 동안 연산 논리 모듈에서 연산을 위해 필요로 하는 스테이지 수를 의미한다. When executing the execution code included in the conventional instruction and processing operation, the latency (Latency) varies from 1 to 8 when calculated by the number of operation stages. Here, the number of operation stages refers to the number of stages required for operation in the operation logic module during one cycle from instruction fetch to write back operation.

정점 데이터를 각각 정점 처리함에 있어서 다수의 실행코드들이 다양한 순서를 가지고 실행되면 많은 시간 지연(stall)이 발생하게 되고, 정점 처리 과정에 있어서 성능이 저하되고 효율이 낮아지는 문제점이 있다. When a plurality of execution codes are executed in various orders in processing vertex data, a lot of time delays occur and performance is degraded and efficiency is lowered in the vertex processing process.

따라서, 본 발명은 연산부들이 순차적으로 연결된 다단 파이프라인 구조의 연산 논리 모듈을 통해 각 실행코드 간에 최대 지연시간을 3 스테이지로 줄이는 것이 가능하도록 하는 정점 처리를 위한 명령어의 구조와, 이러한 명령어의 인코딩 장치, 디코딩 장치 및 그 방법을 제공한다.Accordingly, the present invention provides a structure of an instruction for vertex processing to enable the operation units to reduce the maximum delay time between each execution code to three stages through an arithmetic logic module of a multi-stage pipeline structure, and an encoding apparatus of such an instruction. A decoding apparatus and a method thereof are provided.

또한, 본 발명은 실행코드 룩업 테이블을 사용하여 실행코드를 해독함으로써 연산 논리 모듈의 연산을 효율적으로 정의하고, 추후 실행코드의 확장 및 변경을 용이하게 하는 정점 처리를 위한 명령어 인코딩 장치, 디코딩 장치 및 그 방법을 제공한다.In addition, the present invention provides an instruction encoding apparatus, a decoding apparatus for vertex processing to efficiently define the operation of the operation logic module by decrypting the execution code using the execution code lookup table, and to facilitate the expansion and modification of the execution code later. It provides a way.

본 발명의 이외의 목적들은 하기의 설명을 통해 쉽게 이해될 수 있을 것이다. Other objects of the present invention will be readily understood through the following description.

상기 목적들을 달성하기 위하여, 본 발명의 일 측면에 따르면, 정점 처리를 위한 실행코드와, 대상 오퍼런드와, 하나 이상의 소스 오퍼런드를 포함하는 명령어를 수신하는 입력부; 상기 명령어를 기계어로 인코딩하는 변환부; 및 상기 기계어를 출력하는 출력부를 포함하되, 상기 기계어는 상기 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 상기 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기록된 대상 오퍼런드 필드와, 그리고 상기 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기 록된 하나 이상의 소스 오퍼런드 필드를 포함하는 것을 특징으로 하는 인코딩 장치가 제공될 수 있다.In order to achieve the above object, according to an aspect of the present invention, an input unit for receiving instructions including execution code for vertex processing, the target operator, and one or more source operators; A conversion unit for encoding the instruction in machine language; And an output unit for outputting the machine language, wherein the machine language stores an execution code index field in which an execution code type and an execution code index determined according to the type of the execution code are recorded, and an operation result according to the execution code. An encoding apparatus may include a target operand field in which a destination address is recorded, and one or more source operand fields in which a source address in which data for operation according to the execution code is stored is recorded. .

바람직하게는, 상기 기계어는 64 비트(bit) 필드로 구성될 수 있다.Preferably, the machine language may consist of a 64-bit field.

또한, 상기 실행코드 타입은 상기 연산결과를 저장하는 목적지 주소에 따라 구분되거나 상기 실행코드의 연산에 필요로 하는 소스 오퍼런드의 개수에 따라 구분될 수 있다. In addition, the execution code type may be classified according to a destination address storing the operation result or according to the number of source operations required for the operation code of the execution code.

상기 목적들을 달성하기 위하여, 본 발명의 다른 측면에 따르면, 정점 처리를 위한 실행코드와, 대상 오퍼런드와, 하나 이상의 소스 오퍼런드를 포함하는 명령어를 수신하는 단계; 상기 명령어를 기계어로 인코딩하는 단계; 및 상기 기계어를 출력하는 단계를 포함하되, 상기 기계어는 상기 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 상기 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기록된 대상 오퍼런드 필드와, 그리고 상기 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기록된 하나 이상의 소스 오퍼런드 필드를 포함하는 것을 특징으로 하는 인코딩 방법이 제공될 수 있다.In order to achieve the above objects, according to another aspect of the present invention, there is provided a method comprising: receiving instructions for executing vertex processing, a target operand, and one or more source operators; Encoding the instruction in machine language; And outputting the machine language, wherein the machine language stores an execution code index field in which an execution code type and an execution code index determined according to the type of the execution code are recorded, and an operation result according to the execution code. An encoding method may include a target operand field in which a destination address is recorded, and one or more source operand fields in which a source address in which data for operation according to the execution code is stored is recorded. .

상기 목적들을 달성하기 위하여, 본 발명의 또 다른 측면에 따르면, 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 상기 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기 록된 대상 오퍼런드 필드와, 그리고 상기 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기록된 하나 이상의 소스 오퍼런드 필드를 포함하는 기계어를 수신하는 입력부; 상기 기계어를 디코딩된 정보신호로 변환하는 변환부; 및 상기 디코딩된 정보신호를 출력하는 출력부를 포함하는 디코딩 장치가 제공될 수 있다.In order to achieve the above object, according to another aspect of the present invention, an executable code index field in which an executable code type and an executable code index determined according to the type of executable code are recorded, and an operation result according to the executable code is stored. An input unit for receiving a machine language including a target operand field in which a destination address is recorded, and one or more source operand fields in which a source address in which data for operation according to the execution code is stored is recorded; A converter for converting the machine language into a decoded information signal; And an output unit for outputting the decoded information signal.

바람직하게는, 상기 기계어는 64 비트 필드로 구성될 수 있다. Preferably, the machine language may consist of a 64-bit field.

또한, 상기 실행코드 타입은 상기 연산결과를 저장하는 목적지 주소에 따라 구분되거나 상기 실행코드의 연산에 필요로 하는 소스 오퍼런드의 개수에 따라 구분될 수 있다. 여기서, 상기 변환부는 상기 실행코드 타입에 따라 상기 실행코드의 연산에 필요로 하는 소스 오퍼런드 필드만을 디코딩할 수 있다. In addition, the execution code type may be classified according to a destination address storing the operation result or according to the number of source operations required for the operation code of the execution code. Here, the converter may decode only the source operand field required for the calculation of the execution code according to the execution code type.

또한, 상기 실행코드 인덱스에 상응하여 각 성분에서 수행하는 기본 연산 종류를 결정하는 실행코드 룩업 테이블 또는 상기 실행코드 인덱스에 상응하여 라이트백(write back)할 스테이지(stage)를 결정하는 실행코드 룩업 테이블을 더 포함할 수 있다.Also, an execution code lookup table that determines a basic operation type performed by each component corresponding to the execution code index, or an execution code lookup table that determines a stage to write back corresponding to the execution code index. It may further include.

상기 목적들을 달성하기 위하여, 본 발명의 또 다른 측면에 따르면, 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 상기 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기록된 대상 오퍼런드 필드와, 그리고 상기 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기록된 하나 이상의 소스 오퍼런드 필드를 포함하는 기계어를 수신하는 단계; 상기 기계어를 디코딩된 정보신호로 변환하는 단계; 및 상기 디코 딩된 정보신호를 출력하는 단계를 포함하는 디코딩 방법이 제공될 수 있다.In order to achieve the above object, according to another aspect of the present invention, an executable code index field in which an executable code type and an executable code index determined according to the type of executable code are recorded, and an operation result according to the executable code is stored. Receiving a machine language including a target operand field in which a destination address is recorded and one or more source operand fields in which a source address in which data for operation according to the execution code is stored is recorded; Converting the machine language into a decoded information signal; And outputting the decoded information signal.

이하, 첨부된 도면을 참조하여 본 발명에 따른 정점 처리를 위한 인코딩 장치, 디코딩 장치 및 그 방법의 바람직한 실시예를 상세히 설명한다. 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 동일 또는 유사한 개체를 순차적으로 구분하기 위한 식별기호에 불과하다.Hereinafter, exemplary embodiments of an encoding apparatus, a decoding apparatus, and a method for vertex processing according to the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, when it is determined that the detailed description of the related known technology may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. Numbers (eg, first, second, etc.) used in the description of the present specification are merely identification symbols for sequentially distinguishing identical or similar entities.

본 발명에서 사이클(cycle)은 정점 처리를 위한 명령어를 수행함에 있어서 명령어 인출(Instruction fetch), 명령어 해독(Instruction decode), 연산(ALU operation), 라이트백(Write back) 동작까지를 순차적으로 한 번 실행하는 주기를 의미한다. 그리고 스테이지(stage)는 1개의 명령어를 수행함에 있어서 필요로 하는 각 동작들의 단계로써, 명령어 인출, 명령어 해독, 라이트백 동작이 각각 1 스테이지에 해당하며, 연산 동작은 경우에 따라 1~3 스테이지에 해당하게 된다. 즉, 하나의 명령어가 수행되는 주기인 한 사이클 내에 복수의 스테이지로 이루어진 연산 동작이 포함된다. 연산 스테이지 수는 하나의 명령어가 수행되는 한 사이클 내에서 연산 동작을 수행함에 있어서 필요로 하는 스테이지의 수를 의미한다. In the present invention, a cycle is performed once in order to perform instruction fetch, instruction decode, ALU operation, and write back operation in executing an instruction for vertex processing. It means the cycle to execute. In addition, a stage is a stage of operations required to execute one instruction, and instruction fetch, instruction decode, and writeback operations correspond to one stage, and arithmetic operations are performed in one to three stages in some cases. It becomes. That is, a calculation operation including a plurality of stages is included in one cycle, which is a period in which one instruction is executed. The operation stage number refers to the number of stages required for performing an operation operation in one cycle in which one instruction is executed.

본 발명의 정점 처리 장치에서 정점 처리를 위한 실행코드는 ABS, ADD, ARL, DP3, DP4, DPH, DST, EX2, EXP, FLR, FRC, LG2, LIT, LOG, MAD, MAX, MIN, MOV, MUL, POW, RCP, RSQ, SGE, SLT, SUB, SWZ, XPD가 있다.Execution code for vertex processing in the vertex processing apparatus of the present invention is ABS, ADD, ARL, DP3, DP4, DPH, DST, EX2, EXP, FLR, FRC, LG2, LIT, LOG, MAD, MAX, MIN, MOV, There are MUL, POW, RCP, RSQ, SGE, SLT, SUB, SWZ, XPD.

명령어의 기본적인 문법 구조(syntax)는 하기의 수학식 1과 같다. The basic syntax of the command is shown in Equation 1 below.

opcode destination operand, source operand0, (source operand1, source operand2)opcode destination operand, source operand0, (source operand1, source operand2)

명령어(instruction)는 실행코드(opcode), 대상 오퍼런드(destination operand, 이하 dest라 함), 소스 오퍼런드(source operand, 이하 src라 함)로 구성된다.An instruction consists of an opcode, a destination operand (hereinafter referred to as dest), and a source operand (hereinafter referred to as src).

대상 오퍼런드 및/또는 소스 오퍼런드는 각 레지스터와 바인드된 변수 또는 레지스터의 이름을 직접 쓰고 있으며, 본 발명에서는 오퍼런드의 참조를 위해 각 레지스터의 인덱스(index) 주소를 사용한다. 대상 오퍼런드는 연산결과를 저장하기 위한 목적지 레지스터를 가르키며, 소스 오퍼런드는 연산을 위한 정점 데이터, 상수 데이터, 중간 연산결과, 주소값 등의 데이터를 저장하는 입력 레지스터, 상수 레지스터, 임시 레지스터, 주소 레지스터 등을 가르킨다. The target and / or source operator writes directly the name of the variable or register bound with each register, and the present invention uses the index address of each register for reference of the operand. Target operand refers to the destination register for storing the operation result, and source operation refers to the input register, constant register, temporary register, address that stores data such as vertex data, constant data, intermediate operation result, and address value for operation. Points to registers, etc.

각각의 명령어는 기본적으로 대상 오퍼런드를 가지고 있으며, 연산 내용에 따라 소스 오퍼런드는 1~3개를 가지게 된다. 4 성분 벡터(4 component vector)는 각 데이터가 4개의 실수 집합으로 이루어지는 [x, y, z, w]의 벡터 형식을 가진다. s 스칼라(scalar)는 x 성분만을 가지는 값이고, ssss 스칼라는 x, y, z, w의 4 성분을 가지되 모두 동일한 값이다. Each instruction basically has a target operand, and depending on the operation contents, there are one or three source operands. A four component vector has a vector format of [x, y, z, w] in which each data consists of four real sets. The s scalar is a value having only x components, and the ssss scalar has four components of x, y, z, and w, all of which are the same.

각 실행코드 중 24개의 기본 실행코드는 하기의 표 1과 같다. 여기서, a는 주소 레지스터, v는 4 성분 벡터, s는 스칼라, ssss는 4 성분 스칼라, normal은 기본 실행코드, macro는 매크로 실행코드를 의미한다.24 basic execution codes of the respective execution codes are shown in Table 1 below. Where a is an address register, v is a four-component vector, s is a scalar, ssss is a four-component scalar, normal is the basic executable code, and macro is the macro executable code.

(1) ABS(Absolute value)는 4 성분 벡터에 해당하는 src0의 x, y, z, w 각 성분의 절대값을 dest의 x, y, z, w에 각각 대입한다. (2) ADD(Add two vectors)는 4 성분 벡터에 해당하는 src0과 src1의 x, y, z, w 각 성분의 값을 더하여 dest의 x, y, z, w에 각각 대입한다. (3) ARL(Address register load)은 s 스칼라에 해당하는 src0에 저장된 값 이하 가장 큰 정수를 a0(주소 레지스터임)에 대입한다. 상술한 바에 의하면 a0에 대입되는 값이 기본 주소가 된다. (4) DP3(3-component dot product)는 4 성분 벡터에 해당하는 src0과 src1 중 x, y, z의 3 성분에 대하여 내적(dot product)을 구한 뒤, 내적값을 ssss 스칼라인 dest의 4 성분에 대입한다. (5) DP4(4-component dot product)는 4 성분 벡터에 해당하는 src0과 src1의 x, y, z, w의 4 성분에 대하여 내적을 구한 뒤, 내적값을 ssss 스칼라인 dest의 4 성분에 대입한다. (6) DPH(Homogeneous dot product)는 4 성분 벡터에 해당하는 src0과 src1 중 x, y, z의 3 성분에 대하여 내적을 구한 뒤 내적값을 4 성분 벡터인 dest의 x, y, z에 대입하고, src1의 w 성분의 값을 dest의 w 성분에 대입한다. src0의 w 성분을 1.0으로 설정하고 src0과 src1에 대하여 DP4 연산을 적용한 것과 같은 결과이다. (7) DST(Distance vector)는 2개의 특수한 포맷을 가지는 오퍼런드로부터 거리 벡터(distance vector)를 계산한다. 4 성분 벡터인 src0과 src1(여기서, src0은 [NA, d², d², NA], src1은 [NA, 1/d, NA, 1/d]이고, NA는 계산과 관련없으며 d는 벡터 크기를 의미함)에 대해 4 성분 벡터인 dest의 x 성분은 1.0을, y 성분은 src0과 src1의 y 성분의 곱을, z 성분은 scr0의 z 성분을, w 성분은 src1의 w 성분을 대입하여 dest가 [1.0, d, d², 1/d] 형태가 되도록 한다. (8) EX2(exponential base 2)는 2를 밑(base)으로 하고 s 스칼라인 src0을 지수로 하여 2^src0을 ssss 스칼라인 dest의 4 성분에 각각 대입한다. (9) EXP(exponential base 2(approximate))는 2를 밑(base)으로 하고 s 스칼라인 src0을 지수로 하여 2^src0의 부분적으로 정확한 값을 4 성분 벡터인 dest에 대입한다. dest의 x 성분에는 src0의 정수부분(src0 이하 가장 큰 정수)에 대해 2의 지수를 취한 형태를 대입하고, y 성분에는 src0의 소수부분(src0으로부터 src0 이하 가장 큰 정수를 뺀 값)을 대입하며, z 성분에는 src0의 대략적인 2의 지수를 취한 형태를 대입하고, w 성분에는 1.0을 대입한다. (10) FLR(floor)는 4 성분 벡터에 해당하는 src0의 각 성분의 정수부분(src0 이하 가장 큰 정수)을 4 성분 벡터에 해당하는 dest의 각 성분에 대입한다. (11) FRC(fraction)는 4 성분 벡터에 해당하는 src0의 각 성분의 소수부분(src0으로부터 src0 이하 가장 큰 정수를 뺀 값)을 4 성분 벡터에 해당하는 dest의 각 성분에 대입한다. (12) LG2(logarithm base 2)는 s 스칼라에 해당하는 src0에 대하여 밑(base)이 2인 로그값을 ssss 스칼라인 dest의 각 성분에 대입한다. (13) LOG(logarithm base 2(approximate))는 s 스칼라에 해당하는 src0에 대하여 밑(base)이 2인 로그값의 대략적인 값을 4 성분 벡터인 dest에 대입한다. dest의 x 성분에는 src0에 대하여 밑(base)이 2인 로그값의 정수부분을 대입하고, y 성분에는 src0의 값을 dest의 x 성분 만큼 오른쪽 쉬프트 연산한 값을 대입하며 w 성분에는 1.0을 대입한다. (14) MAD(Multiply and add)는 본 발명에서의 명령어 중에서 유일하게 소스 오퍼런드를 3개 모두 사용한다. 4 성분 벡터인 src0와 src1의 각 성분의 곱에 src2의 각 성분을 더한 값을 4 성분 벡터인 dest의 각 성분에 대입한다. (15) MAX(Maximum)는 4 성분 벡터인 src0과 src1의 각 성분 중 큰 값을 4 성분 벡터인 dest의 각 성분에 대입한다. (16) MIN(Minimum)는 4 성분 벡터인 src0과 src1의 각 성분 중 작은 값을 4 성분 벡터인 dest의 각 성분에 대입한다. (17) MOV(Move)는 4 성분 벡터인 src0의 각 성분의 값을 4 성분 벡터인 dest의 각 성분에 대입한다. (18) MUL(Multiply)는 4 성분 벡터인 src0과 src1의 각 성분의 곱을 4 성분 벡터인 dest의 각 성분에 대입한다. (19) RCP(Reciprocal)는 s 스칼라인 src0의 값의 역수를 ssss 스칼라인 dest의 각 성분에 대입한다. (20) RSQ(Reciprocal square root)는 s 스칼라인 src0의 제곱근(square root)의 역수를 ssss 스칼라인 dest의 각 성분에 대입한다. (21) SGE(set on Greater than or equal)는 4 성분 벡터인 src0과 src1의 각 성분을 비교하여 4 성분 벡터인 dest의 각 성분에 src0이 src1보다 크거나 같으면 1.0을, 작으면 0.0을 대입한다. (22) SLT(set on less than)는 4 성분 벡터인 src0과 src1의 각 성분을 비교하여 4 성분 벡터인 dest의 각 성분에 src0이 src1보다 작으면 1.0을, 크거나 같으면 0.0을 대입한다. (23) SWZ(extended swizzle)는 4 성분 벡터인 src0의 각 성분, 1.0, 0.0을 로드하고 4 성분 벡터인 dest의 각 성분에 대해 부호 반전(negation)이 있는지 여부와 6개의 값(src0의 각 성분, 1.0, 0.0) 중 어느 하나의 값을 조합하여 대입한다. (24) XPD(cross product)는 4 성분 벡터인 src0과 src1의 x, y, z의 3 성분의 외적(cross product)를 구하여 그 외적값을 4 성분 벡터인 dest의 x, y, z 성분에 대입한다. w 성분은 정의되지 않는다. (1) Absolute value (ABS) substitutes the absolute value of each component of x, y, z, w of src0 corresponding to four component vectors into x, y, z, w of dest, respectively. (2) ADD (Add two vectors) adds the values of x, y, z and w components of src0 and src1 corresponding to four component vectors and assigns them to x, y, z and w of dest, respectively. (3) ARL (Address register load) assigns the largest integer less than or equal to the value stored in src0 corresponding to the s scalar to a0 (the address register). According to the above description, the value assigned to a0 is the base address. (3) The 3-component dot product (DP3) calculates the dot product of three components of x, y, and z among src0 and src1 corresponding to the four component vector, and then calculates the dot product of the ssss scalar dest. Substitute in ingredients. (4) The 4-component dot product (DP4) calculates the inner product of four components of x, y, z, and w of src0 and src1 corresponding to the four component vectors, and then places the inner product on the four components of the ssss scalar dest. Assign. (6) Homogeneous dot product (DPH) finds the dot product of three components of x, y, and z among src0 and src1 corresponding to the four component vector, and then substitutes the dot product into x, y, z of the four component vector dest. Then, the value of the w component of src1 is substituted into the w component of dest. This is equivalent to setting the w component of src0 to 1.0 and applying the DP4 operation on src0 and src1. (7) A distance vector (DST) calculates a distance vector from an operand having two special formats. Four-component vectors src0 and src1, where src0 is [NA, d ² , d ² , NA], src1 is [NA, 1 / d, NA, 1 / d], NA is not computational and d is a vector X component of dest, the four-component vector, 1.0, y is the product of src0 and y components of src1, z is the z component of scr0, and w is the w component of src1. Let dest be of the form [1.0, d, d ² , 1 / d]. (8) EX2 (exponential base 2) assigns 2 ^src0 to 4 components of ssss scalar dest, with 2 as base and s scalar src0 as exponent. (9) EXP (exponential base 2 (approximate)) assigns the partially correct value of 2 ^src0 to the four-component vector dest with 2 as the base and s scalar src0 as the exponent. The x component of dest is substituted for the integer part of src0 (the largest integer less than src0), and the y component is substituted for the fractional part of src0 (src0 minus the largest integer less than or equal to src0). The z component is substituted with the form taking the approximate 2 exponent of src0, and 1.0 is substituted for the w component. (10) FLR (floor) substitutes the integer part (largest integer less than or equal to src0) of each component of src0 corresponding to the four component vector to each component of dest corresponding to the four component vector. (11) FRC (fraction) substitutes the fractional part (src0 minus the largest integer less than or equal to src0) of each component of src0 corresponding to the four component vector to each component of dest corresponding to the four component vector. (12) LG2 (logarithm base 2) substitutes the log value of base 2 for src0 corresponding to the s scalar to each component of the ssss scalar dest. (13) LOG (logarithm base 2 (approximate)) assigns the approximate value of the logarithm of base 2 to src0, the s scalar, to dest, a four-component vector. The x component of dest is substituted for the integer part of the logarithm base whose base is 2 with respect to src0, and the y component is substituted with the right-shifted value of src0 by the x component of dest. do. (14) Multiply and add (MAD) use all three source operations uniquely among the instructions in the present invention. The value obtained by adding each component of src2 to the product of four components vector src0 and src1 is substituted into each component of dest which is a four component vector. (15) MAX (Maximum) substitutes a larger value of each component of src0 and src1, which are four component vectors, into each component of dest, which is a four component vector. (16) MIN (Minimum) substitutes the smaller value of each component of src0 and src1 as four component vectors into each component of dest as a four component vector. (17) MOV (Move) substitutes the value of each component of src0 as a four component vector into each component of dest as a four component vector. (18) MUL (Multiply) substitutes the product of src0, which is a four component vector, and src1, into each component of dest, which is a four component vector. (19) RCP (Reciprocal) assigns the inverse of the value of the s scalar src0 to each component of the ssss scalar dest. (20) Reciprocal square root (RSQ) assigns the inverse of the square root of the s scalar src0 to each component of the ssss scalar dest. (21) SGE (set on Greater than or equal) compares each component of the four-component vector src0 and src1, and assigns 1.0 to src0 and 0.0 to less than src1 to each component of dest, the four-component vector. do. (22) Set on less than (SLT) compares each component of the four component vector src0 and src1 and substitutes 1.0 for src0 less than src1 and 0.0 for each component of dest, which is a four component vector. (23) SWZ (extended swizzle) is loaded with each component of src0, a four-component vector, 1.0 and 0.0, and whether there is a sign inversion for each component of dest, a four-component vector, and six values (each of src0). Component, 1.0, 0.0), and the value of any combination is substituted. (24) XPD (cross product) obtains the cross product of three components of x, y, z of four component vectors src0 and src1, and converts the cross product to the x, y, z components of dest of four component vectors. Assign. The w component is not defined.

3개의 매크로 실행코드는 다음과 같다. The three macro executables are as follows:

(1) LIT(compute light coefficients)는 각 정점마다의 주변광, 산란광, 반사광에 의한 조명 효과를 가속하기 위한 연산이다. LIT 연산은 하기의 수학식 2와 같은 기본 실행코드의 집합으로 구성된다. (1) LIT (compute light coefficients) is an operation for accelerating lighting effects due to ambient light, scattered light, and reflected light for each vertex. The LIT operation is composed of a set of basic executable codes as shown in Equation 2 below.

LIT f, a, bLIT f, a, b

Clamp tmp, a.0, b Clamp tmp, a.0, b

rLG2 tmp.w tmp.w rLG2 tmp.w tmp.w

MUL tmp.w tmp.w tmp.y MUL tmp.w tmp.w tmp.y

rEX2 tmp.w tmp.w rEX2 tmp.w tmp.w

mulz f, tmp1.1xz1, tmp.w mulz f, tmp1.1xz1, tmp.w

(2) POW(exponentiate)는 s 스칼라인 src0과 src1에 대해서 src0^src1을 구하여 ssss 스칼라인 dest에 대입한다. POW 연산은 하기의 수학식 3과 같은 기본 실행코드의 집합으로 구성된다.(2) POW (exponentiate) obtains src0 ^src1 for s scalar src0 and src1 and assigns it to ssss scalar dest. The POW operation consists of a set of basic executable codes as shown in Equation 3 below.

POW f, a, b (f = a^b)POW f, a, b (f = a ^b )

LG2 tmp, a LG2 tmp, a

MUL tmp, tmp, b MUL tmp, tmp, b

EX2 f, tmp EX2 f, tmp

(3) SUB(subtract)는 4 성분 벡터인 src0의 각 성분으로부터 src1의 각 성분의 값을 뺀 값을 4 성분 벡터인 dest의 각 성분에 대입한다. SUB 연산은 하기의 수 학식 4와 같은 기본 실행코드의 집합으로 구성된다. (3) SUB (subtract) substitutes each component of dest as a four-component vector by subtracting the value of each component of src1 from each component of src0 as a four-component vector. The SUB operation consists of a set of basic executable code such as Equation 4 below.

SUB f, a, b (f = a - b)SUB f, a, b (f = a-b)

ADD f, a, -b ADD f, a, -b

3개의 매크로 실행코드는 24개의 기본 실행코드들로 재구성할 수 있으며, 정점 처리 장치는 정점 처리 과정에서 매크로 실행코드를 수학식 2 내지 4에 표현된 것과 같은 기본 실행코드의 집합으로 해석한다. 따라서, 본 발명에서 정점 처리 장치에서 실제 사용되는 실행코드는 24개의 기본 실행코드이며, 다른 실행코드에 대하여 확장이 가능하다.The three macro execution codes can be reconstructed into 24 basic execution codes, and the vertex processing apparatus interprets the macro execution code as a set of basic execution codes as expressed in Equations 2 to 4 during the vertex processing. Therefore, the execution code actually used in the vertex processing apparatus in the present invention is 24 basic execution codes, and can be extended to other execution codes.

상술한 실행코드들은 기본적으로 연산 스테이지 수가 3 이하인 일반 실행코드와, 연산 스테이지 수가 3을 초과하는 특별 실행코드로 분류된다. 연산 사이클 수는 실행코드들이 소스 오퍼런드가 지칭하는 데이터들에 대하여 정해진 연산을 하고 그 결과를 대상 오퍼런드가 지칭하는 레지스터에 저장하는 동안의 지연시간을 클럭의 수로 나타낸 것이다.The above-described execution codes are basically classified into general execution code having a number of operation stages of 3 or less, and special execution code having a number of operation stages of three or more. The number of operation cycles is the number of clocks that represent the delay time during which the execution code performs a predetermined operation on the data indicated by the source operator and stores the result in the register indicated by the target operand.

일반 실행코드는 연산 스테이지 수에 따라 제1 그룹(연산 스테이지 수 = 1), 제2 그룹(연산 스테이지 수 = 2), 제3 그룹(연산 스테이지 수 = 3)으로 분류된다. 제1 그룹에는 ADD, MUL, DST, MOV, MAX, MIN, SGE, SLT, ABS, ARL, FLR, FRC가, 제2 그룹 에는 MAD, XPD가, 제3 그룹에는 DP3, DP4, DPH가 포함된다. The general execution code is classified into a first group (operation stage number = 1), a second group (operation stage number = 2), and a third group (operation stage number = 3) according to the operation stage number. The first group includes ADD, MUL, DST, MOV, MAX, MIN, SGE, SLT, ABS, ARL, FLR, FRC, the second group includes MAD, XPD, and the third group includes DP3, DP4, DPH. .

특별 실행코드는 특별 연산모듈(도 1을 참조하여 후술함)에 의해 연산 스테이지 수가 1로 변환된 EXP, LOG, EX2, LG2, RCP, RSQ가 포함된다. The special execution code includes EXP, LOG, EX2, LG2, RCP, and RSQ in which the number of operation stages is converted to 1 by a special operation module (to be described later with reference to FIG. 1).

그리고 매크로 실행코드는 LIT와 POW로 구성되며, 상기한 수학식 2와 수학식 3에 의해 복수의 일반 실행코드 또는 특별 실행코드들의 집합으로 해석된다. The macro execution code is composed of a LIT and a POW, and is interpreted as a set of a plurality of general execution codes or special execution codes by Equations 2 and 3 described above.

이하에서는 상술한 바와 같이 연산 스테이지 수가 최대 3인 상기 실행코드들을 이용하여 정점 처리를 함에 있어서 다단 파이프라인 구조를 가지는 정점 처리 장치를 설명한다.Hereinafter, a vertex processing apparatus having a multi-stage pipeline structure in performing vertex processing using the execution codes having the maximum number of operation stages as described above will be described.

도 1은 본 발명의 바람직한 일 실시예에 따른 정점 처리 장치의 구성블록도이다. 1 is a block diagram illustrating a vertex processing apparatus according to an exemplary embodiment of the present invention.

정점 처리 장치는 명령어 인출 모듈(100), 레지스터들(110a, 110b), 디코딩 모듈(120), 연산 논리 모듈(130), 라이트백 모듈(140)을 포함한다. 필요에 따라 포워딩 모듈(150) 및/또는 소스 수정(Source Modifier) 모듈(160)이 더 포함될 수 있다. The vertex processing apparatus includes an instruction retrieval module 100, registers 110a and 110b, a decoding module 120, an arithmetic logic module 130, and a writeback module 140. If necessary, a forwarding module 150 and / or a source modifier module 160 may be further included.

명령어 인출 모듈(100)은 정점 처리를 위해 실행코드, 오퍼런드들을 포함하는 명령어를 순차적으로 인출(fetch)한다. 명령어 인출 모듈(100)에서의 인출 순서에 따라 각 실행코드는 후술할 라이트백 모듈(140)에서의 우선순위가 결정된다. The instruction fetch module 100 sequentially fetches instructions including executable code and operands for vertex processing. Each execution code is prioritized in the writeback module 140, which will be described later, according to a drawing order in the instruction drawing module 100.

레지스터는 정점 처리를 하기 위한 데이터(예를 들어, 정점 데이터, 상수 데이터, 중간 연산결과, 주소 데이터 등)를 저장하는 제1 레지스터(110a)와, 연산 논리 모듈(130)에서 출력되는 연산결과(예를 들어, 정점 처리가 완료된 출력 데이터, 중간 연산결과, 주소 데이터 등)를 저장하는 제2 레지스터(110b)를 포함한다. The register includes a first register 110a that stores data for vertex processing (for example, vertex data, constant data, intermediate calculation results, address data, and the like), and an operation result output from the calculation logic module 130 ( For example, a second register 110b for storing vertex processing output data, intermediate calculation result, address data, etc.) is included.

제1 레지스터(110a)는 정점 데이터를 저장하는 입력 레지스터(111), 중간 연 산결과를 저장하는 임시 레지스터(112). 정점 처리에 필요한 상수 데이터를 저장하는 상수 레지스터(113)(또는 상수 데이터를 참조하기 위해 상수 데이터가 저장된 주소를 저장하는 주소 레지스터(114))를 포함한다.The first register 110a is an input register 111 for storing vertex data and a temporary register 112 for storing intermediate calculation results. A constant register 113 (or an address register 114 that stores the address where the constant data is stored to refer to the constant data) for storing constant data required for vertex processing.

제2 레지스터(110b)는 연산 논리 모듈(130)에 의해 연산된 결과 중 정점 처리가 완료된 출력 데이터를 저장하는 출력 레지스터(116), 중간 연산결과를 저장하는 임시 레지스터(112), 주소 데이터를 저장하는 주소 레지스터(114)를 포함한다. The second register 110b includes an output register 116 that stores output data of which vertex processing is completed among the results calculated by the operation logic module 130, a temporary register 112 that stores intermediate calculation results, and address data. Address register 114 to be included.

여기서, 임시 레지스터(112), 주소 레지스터(114)는 제1 레지스터(110a) 및 제2 레지스터(110b)에 공통인 레지스터들이다. Here, the temporary register 112 and the address register 114 are registers common to the first register 110a and the second register 110b.

정점 데이터(vertex data)는 화면 상에 표현하고자 하는 물체를 구성하는 각 폴리곤들의 정점의 스트림 데이터(stream data)이다. 화면 상에서 물체를 3차원적으로 표현하고자 하면, 표현하고자 하는 사물의 모양을 삼각형 형태의 폴리곤 집합으로 구분하고, 폴리곤을 구성하는 3개의 꼭지점에 대응하는 정점 데이터를 정점 처리 과정을 통해 처리함으로써 3차원 화상을 만들어낸다. Vertex data is stream data of the vertices of the polygons constituting the object to be displayed on the screen. In order to express an object on the screen three-dimensionally, the shape of the object to be expressed is divided into triangular polygon sets, and the vertex data corresponding to three vertices constituting the polygon is processed through a vertex processing process. Create an image.

정점 데이터는 각 정점에 대한 좌표(position), 색상(color), 법선 벡터(normal vector), 텍스처 좌표(texture coordinate) 등과 같은 속성 데이터(attribute data)를 포함한다. 각각의 속성 데이터는 4개의 실수 집합으로 이루어진다. 예를 들어, 좌표 속성 데이터는 3차원을 나타내는 x, y, z 값과 투영(projection) 정도를 나타내는 w 값의 4개 실수 정보를 가지고, 색상 속성 데이터는 기본 삼원색의 밝기값인 r(red), g(green), b(blue) 값과 불투명도를 나타내는 알파(α) 값의 4개 실수 정보를 가진다. Vertex data includes attribute data such as position, color, normal vector, texture coordinate, and the like for each vertex. Each attribute data consists of four real sets. For example, the coordinate attribute data has four real numbers of x, y, z values representing three dimensions and w values representing the degree of projection, and the color attribute data is r (red) which is the brightness value of the basic three primary colors. It has four real informations, alpha (α) values representing g (green), b (blue) values, and opacity.

정점 데이터는 OpenGL ARB extension 1.0 구조에서 최소 16개의 속성 데이터를 지원한다. 좌표, 주색상(primary color), 부색상(secondary color), 법선 벡터, 정점 가중치, 안개 좌표 등의 8개와 제1 텍스처 좌표, 제2 텍스처 좌표, 제3 텍스처 좌표 등 최대 8개의 텍스처 좌표에 관한 속성 데이터를 지원할 수 있다.Vertex data supports at least 16 attribute data in the OpenGL ARB extension 1.0 structure. 8 coordinates including coordinates, primary colors, secondary colors, normal vectors, vertex weights, fog coordinates, and up to 8 texture coordinates including first texture coordinates, second texture coordinates, and third texture coordinates. Can support attribute data.

입력 레지스터(input register; 111)에 정점 데이터의 값이 저장되거나 입력 레지스터(111)에 정점 데이터를 참조할 수 있는 주소가 저장된다. 디코딩 모듈(120)은 입력 레지스터(111)에 저장된 값을 읽거나 저장된 주소를 참조하여 정점 데이터를 입력받을 수 있게 되며, 정점 레지스터(111)는 읽기만이 가능하고 변경이나 쓰기는 불가능하다.The value of the vertex data is stored in an input register 111 or an address to which the vertex data can be referenced is stored in the input register 111. The decoding module 120 may read the value stored in the input register 111 or receive the vertex data with reference to the stored address. The vertex register 111 may read only and cannot change or write.

상수 레지스터(constant register; 113)는 상수 데이터를 저장하거나 상수 데이터를 참조할 수 있는 주소를 저장한다. 상수 데이터는 정점 처리를 수행함에 있어서 사용되는 값들이다. 예를 들어 상수 데이터는 행렬 계산을 위한 값이나 특정 색상 값, 조명 계산을 위한 값 등이며, 각 상수 데이터는 4개의 실수 집합으로 구성된다. 상수 레지스터(113)는 읽기만이 가능하며 변경이나 쓰기는 불가능하다.The constant register 113 stores constant data or an address to which the constant data can be referenced. Constant data are values used in performing vertex processing. For example, constant data is a value for matrix calculation, a specific color value, a value for lighting calculation, etc. Each constant data is composed of four sets of real numbers. The constant register 113 can only read, not change or write.

임시 레지스터(temporary register; 112)는 정점 처리 과정 동안에 사용되는 임시 변수, 즉 중간 연산결과를 저장한다. 정점 처리를 위해 사용되는 정점 데이터 또는 상수 데이터가 모두 4개의 실수 집합으로 이루어지기 때문에, 임시 레지스터(112) 역시 기본적으로 4개의 실수 집합으로 구성된다.Temporary register 112 stores temporary variables that are used during vertex processing, that is, intermediate computation results. Since the vertex data or the constant data used for the vertex processing are all four real sets, the temporary register 112 also basically consists of four real sets.

임시 레지스터(112)는 OpenGL ARB 1.0 구조에서 최소 12개를 지원하도록 하고 있으며, 본 발명에서는 실행코드 중 매크로 실행코드가 존재하므로 추가적인 임 시 저장 공간이 필요로 하여 4개를 추가적으로 지원하여 총 16개를 지원한다. 정점 처리가 수행되는 동안에 정점 처리 장치는 임시 레지스터(112)에 임의의 데이터를 쓰거나 읽을 수 있다.The temporary register 112 supports at least 12 in the OpenGL ARB 1.0 structure. In the present invention, since macro execution code is present among the execution codes, additional temporary storage space is required, and thus, additional four are additionally supported. Support. While the vertex processing is being performed, the vertex processing device may write or read arbitrary data to the temporary register 112.

주소 레지스터(114)는 상수 레지스터(113)를 이용하여 상수 데이터를 읽고자 할 때 참조의 기본이 되는 기본 주소(base address)를 저장한다. 기본 주소를 기준으로 설정하고, 정점 처리 과정에서 필요로 하는 상수 데이터가 저장된 주소까지의 오프셋(offset)으로 이용하여 상수 데이터가 저장된 상대적인 위치를 참조하여 읽어온다. The address register 114 stores a base address that is a reference base when reading constant data using the constant register 113. The reference is set based on the base address, and it is read by referring to the relative position where the constant data is stored using the offset to the address where the constant data needed in the vertex processing is stored.

출력 레지스터(116)는 정점 처리된 최종 연산결과인 출력 데이터를 저장한다. 출력 데이터, 즉 출력 변수는 이후 상술한 바와 같이 프리미티브 어셈블리, 래스터라이저, 픽셀 프로세싱 등의 그래픽스 파이프라인을 따르게 된다. OpenGL ARB extension 1.0 구조에서 최소 13개의 출력 레지스터(116)을 지원하도록 하고 있으며, 각 출력 레지스터(116)는 4개의 실수 집합이 저장가능하도록 구성된다. 출력 레지스터(116)는 쓰기만이 가능하다. The output register 116 stores output data which is the final arithmetic result of vertex processing. The output data, i.e., the output variables, will then follow the graphics pipeline, such as primitive assemblies, rasterizers, pixel processing, etc., as described above. In the OpenGL ARB extension 1.0 architecture, at least 13 output registers 116 are supported, and each output register 116 is configured to store four real numbers. The output register 116 can only write.

본 발명에서 디코딩된 실행코드, 정점 데이터, 상수 데이터는 상술한 레지스터라는 별도의 저장 장치 없이, 메모리 상에 저장되어 있는 각 데이터에 대한 포인터(pointer)를 통해 직접 참조하는 방식이 사용가능하다. 이는 각 데이터에 대하여 별도의 저장 공간을 할당하여 각 데이터를 복사하는 과정에서 발생하는 시간 지연을 줄이기 위함이며, 디코딩된 실행코드, 정점 데이터, 상수 데이터는 후술할 연산 논리 모듈(130)에서 읽기 용도로만 사용되기 때문이다. In the present invention, the decoded executable code, vertex data, and constant data may be directly referred to through a pointer to each data stored in the memory, without using a separate storage device as described above. This is to alleviate time delay incurred in allocating a separate storage space for each data and copying each data. The decoded execution code, vertex data, and constant data can be read by the arithmetic logic module 130 to be described later. This is because it is used only.

임시 레지스터(112), 주소 레지스터(114), 출력 레지스터(116)는 정점 처리의 수행 중 각 레지스터에 저장되는 데이터의 쓰기 동작이 이루어지는 장치이다. 정점 처리 장치의 환경 설정을 초기화하는 단계에서 각 레지스터의 크기만큼 메모리를 할당하여 사용한다.The temporary register 112, the address register 114, and the output register 116 are devices in which a data write operation is stored in each register during vertex processing. In the step of initializing the configuration of the vertex processing device, memory is allocated and used as much as the size of each register.

디코딩 모듈(120)은 입력된 명령어를 기계어(machine language)로 디코딩한다. 명령어는 소스 오퍼런드가 지칭하는 레지스터에 저장된 값을 읽어온다. 즉, 제1 레지스터(110a) 중 명령어에 따라 필요로 하는 레지스터(입력 레지스터(111), 임시 레지스터(112), 상수 레지스터(113), 주소 레지스터(114) 중 어느 하나 이상)에 저장된 데이터를 데이터 디코딩 모듈(122, 124, 126)이 읽어온다.The decoding module 120 decodes the input command into machine language. The instruction reads the value stored in the register referred to by the source operator. That is, the data stored in the register (input register 111, temporary register 112, constant register 113, address register 114, one or more) required according to the instruction of the first register (110a) data The decoding module 122, 124, 126 reads.

여기서, 각 레지스터들로부터 데이터를 읽어오는 경우, 필요에 따라 스위즐(swizzle) 및/또는 부호 반전(negate) 동작을 수행하여 변환된 데이터를 후술할 연산 논리 모듈(130)에 입력해야 하는 경우가 발생한다. 이때 정점 처리 장치는 소스 수정 모듈(160)을 더 포함할 수 있으며, 소스 수정 모듈(160)은 스위즐 동작을 수행하거나 부호 반전 동작을 수행한다. In this case, when data is read from each register, it is necessary to perform a swizzle and / or sign inversion operation to input the converted data to the arithmetic logic module 130 as will be described later. Occurs. In this case, the vertex processing apparatus may further include a source modification module 160, and the source modification module 160 performs a swizzle operation or a sign inversion operation.

스위즐 동작은 각 데이터가 예를 들어 x, y, z, w의 4 성분을 가지는 4 성분 데이터인 경우에 각 성분의 값을 바꾸는 것을 의미한다. 즉, x 성분의 값을 y 성분으로, y 성분의 값을 z 성분으로 바꾸는 것과 같은 동작을 수행하여 데이터를 변환시키는 것이 가능하다. 부호 반전 동작은 데이터의 각 성분의 값들의 부호를 반전시킨다. 즉, 양(+)의 값을 가지고 있는 경우 음(-)의 값을 가지도록 데이터를 변환시키게 된다. 스위즐 동작 및/또는 부호 반전 동작은 효과적인 정점 처리를 위해 필요한 경우 활용하게 된다. The swizzle operation means changing the value of each component when each data is four-component data having four components of x, y, z and w, for example. That is, it is possible to transform data by performing an operation such as changing the value of the x component to the y component and the value of the y component to the z component. The sign inversion operation inverts the sign of the values of each component of the data. In other words, if it has a positive value, the data is converted to have a negative value. The swizzle operation and / or sign inversion operation are utilized when necessary for effective vertex processing.

실행코드는 실행코드 룩업 테이블(115)에 의해 상응하는 값이 실행코드 디코딩 모듈(128)로 전송된다. 실행코드 룩업 테이블(115)은 명령어 중 실행코드를 기계어로 해독함에 있어서 후술할 연산 논리 모듈(130)의 연산을 효율적으로 정의하고 향후 실행코드의 확장이나 변경을 용이하게 하기 위해 사용되는 룩업 테이블(LookUp Table)이다. 실행코드의 종류에 따라 필요로 하는 소스 오퍼런드의 수가 달라지기 때문에, 각 실행코드마다 필요로 하는 소스 오퍼런드의 수와 목적지 주소의 종류에 따라 그룹화하고, 동일 그룹의 실행코드에 대해서 필요로 하는 소스 오퍼런드 영역의 필드만을 디코딩되도록 하여 디코딩 효율을 높일 수 있다. 즉, 실행코드 룩업 테이블(115)를 통해 실행코드의 종류를 파악하고, 실행코드 디코딩 모듈(128)은 해당 실행코드가 속하는 그룹에 따라 미리 결정된 영역의 필드만을 디코딩한다. 이러한 실행코드의 포맷 및 인코딩 또는 디코딩을 위한 그룹화 방법, 실행코드 룩업 테이블(115)에 대해서는 추후 도 2 내지 도 4를 참조하여 상세히 설명한다. The executable code is transmitted by the executable code lookup table 115 to its corresponding value to the executable code decoding module 128. Execution code lookup table 115 is a lookup table that is used to efficiently define the operation of the operation logic module 130 to be described later in decoding the executable code among the instructions and to facilitate the expansion or modification of the execution code in the future ( LookUp Table). Since the number of source operations required varies depending on the type of executable code, each group of executable code is grouped according to the number of source operations required and the type of destination address, and required for the same group of executable codes. Decoding efficiency can be improved by decoding only the field of the source operation region. That is, the type of executable code is identified through the executable code lookup table 115, and the executable code decoding module 128 decodes only a field of a predetermined region according to the group to which the corresponding executable code belongs. The grouping method for formatting and encoding or decoding the execution code and the execution code lookup table 115 will be described in detail later with reference to FIGS. 2 to 4.

연산 논리 모듈(130)은 디코딩 모듈(120)에서 해독된, 즉 기계어로 변환된 실행코드, 데이터 등을 전송받는다. 연산 논리 모듈(130)는 3개의 연산부로 구분되며, 각 연산부는 각 스테이지마다 순차적으로 연산을 수행하고 그 연산결과를 출력한다. The arithmetic logic module 130 receives the execution code, data, etc., which are decoded by the decoding module 120, that is, machine language. The arithmetic logic module 130 is divided into three arithmetic units, and each arithmetic unit performs arithmetic operations sequentially for each stage and outputs arithmetic results.

제1 연산부는 입력된 데이터들에 대해서 기본 연산(덧셈, 곱셈, 비교, fraction, floor 등)을 수행한 기본 연산 결과를 출력한다. 또한, 외적(Cross product)을 계산하기 위한 곱셈 연산을 수행한 곱셈 연산 결과를 출력한다. 기본 연산 결과 및 곱셈 연산 결과와, 미리 결정된 값(0 또는 1) 및 입력된 상수 데이터가 각 성분의 출력값이 될 수 있다. 각 성분에서는 멀티플렉서(MUX)를 이용하여 실행코드에 따라 제1 연산부에서의 최종 출력값을 선택한다. The first operation unit outputs a basic operation result of performing a basic operation (addition, multiplication, comparison, fraction, floor, etc.) on the input data. In addition, the result of the multiplication operation is performed by performing a multiplication operation for calculating the cross product. The basic operation result and the multiplication operation result, the predetermined value (0 or 1) and the input constant data may be output values of each component. In each component, a multiplexer (MUX) is used to select the final output value of the first operation unit according to the execution code.

제2 연산부는 내적(Dot product) 연산(DPH, DP3, DP4) 및 외적(Cross product) 연산을 위해 제1 연산부의 기본 연산 결과와 제1 연산부에서 출력되는 출력값들을 덧셈 연산한다. The second operation unit adds the basic operation results of the first operation unit and the output values output from the first operation unit for dot product operations DPH, DP3, and DP4 and cross product operations.

제3 연산부는 내적 연산을 위해 제2 연산부에서의 x 성분 출력값과 z 성분 출력값을 덧셈 연산한다. The third operation unit adds the x component output value and the z component output value in the second operation unit for the internal product operation.

제1 연산부, 제2 연산부 및 제3 연산부에서의 출력은 각각 제1 스테이지 출력부(132), 제2 스테이지 출력부(134) 및 제3 스테이지 출력부(136)를 통해 라이트백 모듈(140)에 라이트백되거나 다음 연산부로 전달된다. Outputs from the first calculator, the second calculator, and the third calculator are output to the writeback module 140 through the first stage output unit 132, the second stage output unit 134, and the third stage output unit 136, respectively. Is written back to or passed to the next operator.

각 스테이지 출력부(132, 134, 136)는 각 실행코드의 연산이 완료되는 시점에 연산결과를 출력한다. 디코딩된 실행코드가 연산 논리 모듈(130)에 전송된 후 제1 스테이지 출력부(132)는 연산 스테이지 수가 1인 실행코드의 연산결과를 출력하고, 제2 스테이지 출력부(134)는 연산 스테이지 수가 2인 실행코드의 연산결과를 출력하며, 제3 스테이지 출력부(136)는 연산 스테이지 수가 3인 실행코드의 연산결과를 출력한다. 즉, 제1 스테이지 출력부(132)는 ADD, MUL, DST, MOV, MAX, MIN, SGE, SLT, ABS, ARL, FLR, FRC 및 후술할 특별 연산 모듈에 의해 연산되는 특별 실행코드인 EXP, LOG, EX2, LG2, RCP, RSQ가, 제2 스테이지 출력부(134)에는 MAD, XPD가, 제3 스테이지 출력부(136)에는 DP3, DP4, DPH가 포함된다. 각 스테이지마다 개별적으로 연산 처리 과정이 수행된다. Each stage output unit 132, 134, 136 outputs an operation result at the time when the operation of each execution code is completed. After the decoded execution code is transmitted to the operation logic module 130, the first stage output unit 132 outputs an operation result of the execution code having the operation stage number 1, and the second stage output unit 134 outputs the operation stage number. The operation result of the execution code of 2 is output, and the third stage output unit 136 outputs the operation result of the execution code of 3 operation stages. That is, the first stage output unit 132 may include EXP, which is special executable code calculated by ADD, MUL, DST, MOV, MAX, MIN, SGE, SLT, ABS, ARL, FLR, FRC, and a special operation module to be described later. LOG, EX2, LG2, RCP, and RSQ include MAD and XPD in the second stage output unit 134, and DP3, DP4 and DPH in the third stage output unit 136. The calculation process is performed individually for each stage.

예를 들어, DP3, MAD, ADD 순으로 실행코드를 인출(fetch)하는 경우를 설명한다. 연산을 위한 스테이지는 클럭(CLK) 0부터 시작하는 것으로 가정한다. 클럭이 0일 때, 연산 논리 모듈(130)에 전송된 1순위 DP3는 제3 스테이지 출력부(136)의 실행코드에 해당하여 3 스테이지 후인 클럭 3에서 연산결과가 출력된다. 클럭이 1일 때, 연산 논리 모듈(130)에 전송된 2순위 실행코드인 MAD는 제2 스테이지 출력부(134)의 실행코드에 해당하여 2 스테이지 후인 클럭 3에서 연산결과가 출력된다. 그리고 클럭이 2일 때, 연산 논리 모듈(130)에 전송된 3순위 실행코드인 ADD는 제1 스테이지 출력부(132)의 실행코드에 해당하여 1 스테이지 후인 클럭 3에서 연산결과가 출력된다. 각 스테이지 출력부는 개별적으로 연산 처리를 수행하기 때문에 상술한 것과 같이 순차적으로 입력된 실행코드의 연산결과가 클럭 3 시점에서 동시에 출력되는 경우도 존재한다. 이 경우 각 연산결과의 처리는 후술할 라이트백 모듈(140)에서 이루어진다. For example, a case of fetching execution code in the order of DP3, MAD, and ADD will be described. It is assumed that the stage for the operation starts from clock CLK zero. When the clock is 0, the operation result is output at the clock 3, which is three stages later, corresponding to the execution code of the third stage output unit 136 in the first-order DP3 transmitted to the operation logic module 130. When the clock is 1, the MAD, which is the second-order execution code transmitted to the arithmetic logic module 130, corresponds to the execution code of the second stage output unit 134, and an arithmetic result is output at clock 3 two stages later. When the clock is 2, the operation result is output at the clock 3, which is one stage later, corresponding to the execution code of the first stage output unit 132, which is the third-order execution code transmitted to the operation logic module 130. Since each stage output unit performs arithmetic processing separately, there exists a case where the arithmetic results of sequentially executed execution codes are output simultaneously at the clock 3 time point as described above. In this case, the processing of each calculation result is performed by the writeback module 140 which will be described later.

연산 논리 모듈(130)은 특별 연산 모듈을 포함한다. 특별 연산 모듈은 실행코드를 산술 연산함에 있어서 종래 지연 시간, 즉 연산 사이클 수가 4 이상이었던 실행코드를 특별 실행코드로 지정하고, 특별 실행코드에 대해 각각 별도로 연산하여 정점 처리 장치 내에서는 연산 사이클 수가 1이 되도록 한다. 특별 연산 모듈에서 별도의 연산이 수행되는 특별 실행코드는 EXP, LOG, EX2, LG2, RCP, RSQ이다. Arithmetic logic module 130 includes a special computation module. The special arithmetic module designates the execution code that had a conventional delay time, that is, the number of operation cycles of 4 or more, as the special execution code in arithmetic operation of the execution code, and calculates each operation separately for the special execution code. To be Special execution code that performs separate operation in special operation module is EXP, LOG, EX2, LG2, RCP, RSQ.

라이트백(Write Back) 모듈(140)은 연산 논리 모듈(130)의 각 스테이지 출력 부(132, 134, 136)로부터 출력되는 실행코드의 연산결과를 제2 레지스터(110b)에 저장한다. 연산결과의 저장은 선입선출(FIFO; First In First Out) 방식에 따라 연산 논리 모듈(130)에 최우선적으로 입력된 우선순위가 최우선인 실행코드의 연산결과를 해당 목적지 레지스터에 저장한다. 연산결과는 그 종류에 따라 상술한 바와 같이 출력 레지스터(116), 임시 레지스터(112), 주소 레지스터(114) 등에 저장된다. The write back module 140 stores the operation result of the execution code output from each stage output unit 132, 134, 136 of the operation logic module 130 in the second register 110b. Storing the operation result stores the operation result of the execution code having the highest priority input to the operation logic module 130 according to the First In First Out (FIFO) method in the corresponding destination register. The operation result is stored in the output register 116, the temporary register 112, the address register 114 and the like as described above according to the type.

단, 연산 논리 모듈(130)은 다단 파이프라인 구조에 의해 각 스테이지 출력부가 개별적으로 연산 처리를 수행하는 바 동시에 2 이상의 연산결과를 출력할 수 있다. 이 경우 라이트백 모듈(140)은 우선순위가 1순위인 실행코드(상술한 예에서, DP3)의 연산결과를 해당 목적지 레지스터에 저장하고, 우선순위가 그 다음인 실행코드(상술한 예에서, MAD, ADD)의 연산결과는 다음 스테이지에서 처리되도록 내부에 일시 저장하고 바이패스(bypass)한다. 그리고 다음 스테이지, 즉 클럭 4 시점에서 바이패스된 연산결과(상술한 예에서, MAD, ADD 순)가 그 우선순위가 최우선으로 변경되며, 연산 논리 모듈(130)에서 출력되는 연산결과의 우선순위는 그 다음이 된다. However, the arithmetic logic module 130 may output two or more arithmetic results at the same time as each stage output unit individually performs arithmetic processing by a multi-stage pipeline structure. In this case, the writeback module 140 stores the operation result of the execution code of which priority is the first priority (in the above example, DP3) in the corresponding destination register, and the execution code of the next priority (in the above example, The results of the calculation of MAD and ADD are temporarily stored inside and bypassed for processing in the next stage. The priority of the calculation result (MAD, ADD in the above example) that is bypassed at the next stage, that is, clock 4, is changed to the highest priority, and the priority of the calculation result output from the calculation logic module 130 is Then comes.

또한, 라이트백 모듈(140)은 연산 논리 모듈(130)에서 2 이상의 실행코드에 따른 연산결과가 출력되고, 동시에 목적지 주소가 동일한 경우(즉, 제2 레지스터(110b)의 동일한 주소에 저장하고자 하는 경우) 각 실행코드에 상응하는 우선순위를 비교하고, 그 우선순위가 늦은(또는 낮은) 실행코드의 연산결과를 해당 목적지 주소에 저장한다. 우선순위가 빠른(또는 높은) 실행코드의 연산결과를 먼저 저 장하게 되는 경우, 다음 스테이지에서 우선순위가 늦은(또는 낮은) 실행코드의 연산결과가 동일한 저장 영역에 덮어 쓰여지게(overwrite) 된다. 따라서, 우선순위가 빠른(또는 높은) 실행코드의 연산결과는 아무런 의미가 없게 되고 정점 처리 전체에 있어서 단지 1 스테이지 만큼의 지연이 있을 뿐이기 때문이다. In addition, the writeback module 140 outputs an operation result according to two or more execution codes from the operation logic module 130, and simultaneously stores the result at the same address of the second register 110b when the destination address is the same. In this case, the priority of each execution code is compared and the operation result of the execution code of the lower priority (or lower) is stored at the corresponding destination address. If the operation results of the higher priority (or higher) executable code are stored first, the operation results of the lower priority (or lower) executable code are overwritten in the same storage area. Therefore, the operation result of the high-priority (or high) executable code is meaningless and there is only one stage of delay in the entire vertex processing.

본 발명의 바람직한 다른 실시예에 따르면, 정점 처리 장치는 포워딩 모듈(150)을 더 포함한다. According to another preferred embodiment of the present invention, the vertex processing apparatus further includes a forwarding module 150.

포워딩(forwarding) 모듈(150)은 디코딩 모듈(120)이 실행코드 및 오퍼런드를 해독함에 있어서 필요로 하는 데이터(예를 들어, 중간 연산결과)가 아직 해당 레지스터에 저장되기 이전에 라이트백 모듈(140)에 저장되어 있는 경우, 상기 데이터를 라이트백 모듈(140)로부터 직접 디코딩 모듈(120)로 전달하여 데이터 의존(data dependency)에 의한 장해(hazard)가 발생하지 않도록 한다. The forwarding module 150 is a writeback module before the data needed by the decoding module 120 to decode the executable code and the operand (eg, an intermediate operation result) is still stored in the register. If the data is stored at 140, the data is transferred directly from the writeback module 140 to the decoding module 120 so that a disturbance due to data dependency does not occur.

본 발명에서 라이트백 모듈(140) 및/또는 포워딩 모듈(150)은 멀티플렉서로 구성되어, 연산 논리 모듈(130)의 제1 스테이지 출력부(132), 제2 스테이지 출력부(134), 제3 스테이지 출력부(136)로부터의 출력을 입력받고, 제어 신호에 따라 선택된 스테이지의 출력을 지정된 제2 레지스터(110b)에 저장하는 것이 가능하다.In the present invention, the writeback module 140 and / or the forwarding module 150 are configured as multiplexers, so that the first stage output unit 132, the second stage output unit 134, and the third stage of the arithmetic logic module 130 are provided. It is possible to receive the output from the stage output unit 136 and to store the output of the stage selected in accordance with the control signal in the designated second register 110b.

본 발명에 따른 정점 처리 장치에서의 정점 처리를 위한 명령어(Instruction)의 필드 구성, 실행코드 룩업 테이블을 위한 실행코드의 분류 및 필드 구성에 대하여 이하 도 2 내지 도 4를 참조하여 상세히 설명한다. The field configuration of the instruction for vertex processing, the classification of the execution code for the execution code lookup table, and the field configuration in the vertex processing apparatus according to the present invention will be described in detail with reference to FIGS. 2 to 4.

도 2는 본 발명의 바람직한 일 실시예에 따른 명령어의 필드 구성을 나타낸 도면이고, 도 3은 실행코드 룩업 테이블에서의 그룹에 따른 명령어 디코딩 필드 영 역을 나타낸 도면이며, 도 4는 실행코드 인덱스와 실행코드 간의 실행코드 룩업 테이블의 일례를 나타낸 도면이다. FIG. 2 is a diagram illustrating a field configuration of an instruction according to an exemplary embodiment of the present invention, FIG. 3 is a diagram illustrating an instruction decoding field region according to a group in an execution code lookup table, and FIG. It is a figure which shows an example of the execution code lookup table between execution codes.

도 2를 참조하면, 정점 처리를 위한 64 비트 명령어(200)의 각 비트 필드 구성이 도시되어 있다. 2, each bit field configuration of a 64-bit instruction 200 for vertex processing is shown.

명령어(200)의 각 비트 필드는 다음과 같이 정의된다. Each bit field of the instruction 200 is defined as follows.

[0:2] - 상수 필드(240),[0: 2]-constant field (240),

[3:17], [18:32], [33:47] - 소스 오퍼런드 필드(230a, 230b, 230c, 이하 230이라 통칭함),[3:17], [18:32], [33:47]-source operand fields (230a, 230b, 230c, hereinafter referred to collectively as 230),

[48:56] - 대상 오퍼런드 필드(220),[48:56]-target operand field 220,

[58:63] - 실행코드 인덱스 필드(210)[58:63]-Execution code index field (210)

[3:10] - 확장 필드(250)[3:10]-Extended field (250)

[M:N] 은 명령어의 최하위 비트를 0으로, 최상위 비트를 63으로 가정한 경우 M 내지 N 비트 필드를 나타낸다. 여기서, M과 N은 자연수이고, M은 N 이상이다.[M: N] indicates an M to N bit field when the least significant bit of the instruction is 0 and the most significant bit is 63. Here, M and N are natural numbers, and M is more than N.

대상 오퍼런드 필드(220)는 명령어에 따른 연산 결과를 출력하고자 하는 목표 레지스터(Output Target Register)의 식별자(Destination-O), 목적지 주소(Destination-Address), x, y, z, w의 각 성분에 대한 마스크 정보(Wmask-W,X,Y,Z)를 포함한다. The target operand field 220 is each of an identifier (Destination-O), a destination address (Destination-Address), x, y, z, and w of an output target register to output an operation result according to the instruction. Mask information (Wmask-W, X, Y, Z) for the component is included.

소스 오퍼런드 필드(230)는 명령어에 따른 연산을 위한 값들이 저장되어 있는 소스 레지스터 선택 식별자(Source#-Sel), 소스 주소(Source#-Address), x, y, z, w의 각 성분의 스위즐 정보(Swizzle#), 반전 정보(N)를 포함한다. The source operand field 230 is a component of a source register selection identifier (Source # -Sel), a source address (Source # -Address), x, y, z, and w that store values for operation according to the instruction. Swizzle information (Swizzle #) and inversion information (N).

상수 필드(240)는 상수 레지스터의 상위 4 비트 기본 주소(Const.)를 포함한다.The constant field 240 contains the high four bit base address (Const.) Of the constant register.

확장 필드(250)는 제1 소스 오퍼런드(source0)의 각 성분에 대한 반전 여부를 나타내는 확장 반전 정보(N+eN), 제1 소스 오퍼런드(source0)의 각 성분에 대한 스위즐 여부를 나타내는 확장 스위즐 정보(eS+Swizzle0)를 포함한다. The extended field 250 includes extended inversion information (N + eN) indicating whether to invert each component of the first source operator source0, and whether to swizzle for each component of the first source operator source0. Extended swizzle information (eS + Swizzle0) indicating.

도 3을 참조하면, 실행코드 인덱스 필드(210)에서의 인덱스(index)에 따른 그룹화 방법 및 각 그룹별 디코딩 비트 필드를 도시하고 있다. Referring to FIG. 3, a grouping method according to an index in the execution code index field 210 and a decoding bit field for each group are shown.

기본적으로 64 비트 명령어(200)는 각 비트가 모두 실행코드 인덱스 필드(210), 대상 오퍼런드 필드(220), 소스 오퍼런드 필드(230), 상수 필드(240) 등에 모두 할당되어 있다. 하지만, 실행코드의 종류에 따라 모든 비트의 내용이 필요하지 않으며, 각 실행코드에 따라 필요로 하는 비트 필드가 서로 다르게 된다. 따라서, 필요로 하는 비트 필드에 따라 실행코드를 그룹화하고, 해당 그룹에 대해 필요로 하는 비트 필드만을 디코딩함으로써 디코딩 효율을 높이는 것이 가능하다.Basically, all bits of the 64-bit instruction 200 are all assigned to the execution code index field 210, the target operator field 220, the source operator field 230, the constant field 240, and the like. However, the contents of all bits are not necessary depending on the type of execution code, and the bit fields required by each execution code are different. Therefore, it is possible to increase the decoding efficiency by grouping the execution code according to the required bit field and decoding only the required bit field for the group.

정점 처리를 위한 64 비트 명령어(200)들은 소스 오퍼런드의 개수, 목적지 주소에 따라 하나 이상의 그룹으로 구분이 가능하며, 이러한 그룹을 실행코드 타입이라 칭한다. 즉, 실행코드 타입은 명령어에 의한 연산결과를 저장하는 목적지 주소 및/또는 연산에 필요로 하는 소스 오퍼런드의 개수에 따라 구분될 수 있다. The 64-bit instructions 200 for vertex processing may be divided into one or more groups according to the number of source operations and the destination address, and such a group is called an execution code type. That is, the execution code type may be classified according to the destination address that stores the operation result by the instruction and / or the number of source operations required for the operation.

실행코드 인덱스 필드(210)는 [58:63]의 6 비트 필드이다. '000000'은 아무런 연산도 수행하지 않는 실행없음(No operation, 360)을, '000001'~ '000011'은 실행코드 타입 A-0(310)을, '000100'~ '001111'은 실행코드 타입 A-1(320)을, '010000'~ '100111'은 실행코드 타입 A-2(330)을, '101000'~ '101111'은 실행코드 타입 A-3(340)을, '110000'~ '111111'은 실행코드 타입 S(350)을 의미한다. Execution code index field 210 is a six-bit field of [58:63]. '000000' means No operation (360), '000001' ~ '000011' means Execution Code Type A-0 (310), '000100' ~ '001111' means Execution Code Type A-1 (320), '010000' to '100111' are executable code types A-2 (330), '101000' to '101111' are executable code types A-3 (340), and '110000' to '111111' means execution code type S 350.

실행코드 타입 A-0(310)은 실행코드 인덱스 필드(210), 대상 오퍼런드 필드(220) 중 일부, 제1 소스 오퍼런드 필드(230a) 중 일부, 확장 필드(250) 중 일부, 상수 필드(240)를 사용한다. 주소 레지스터를 로딩(loading)하는 명령어인 ARL이 실행코드 타입 A-0(310)에 포함된다. Execution code type A-0 310 may include an execution code index field 210, a portion of target operand field 220, a portion of first source operand field 230a, a portion of extension field 250, The constant field 240 is used. An ARL, which is an instruction for loading an address register, is included in the executable code type A-0 310.

실행코드 타입 A-1(320)은 실행코드 인덱스 필드(210), 대상 오퍼런드 필드(220), 제1 소스 오퍼런드 필드(230a), 확장 필드(250), 상수 필드(240)를 사용한다. 1개의 소스 오퍼런드를 필요로 하는 명령어인 MOV, ABS, FRC, FLR, SWZ이 실행코드 타입 A-1(320)에 포함된다. Execution code type A-1 320 includes an execution code index field 210, a target operand field 220, a first source operand field 230a, an extension field 250, and a constant field 240. use. Execution code type A-1 320 includes MOV, ABS, FRC, FLR, and SWZ, which require one source operand.

실행코드 타입 A-2(330)은 실행코드 인덱스 필드(210), 대상 오퍼런드 필드(220), 제1 소스 오퍼런드 필드(230a), 제2 소스 오퍼런드 필드(230b), 확장 필드(250), 상수 필드(240)를 사용한다. 2개의 소스 오퍼런드를 필요로 하는 명령어인 ADD, MUL, DST, DP3, DP4, XPD, MAX, MIN, SGE, SLT, clamp, mulz이 실행코드 타입 A-2(330)에 포함된다. Execution code type A-2 330 includes an execution code index field 210, a target operand field 220, a first source operand field 230a, a second source operand field 230b, and an extension. Field 250 and constant field 240 are used. Execution code type A-2 330 includes instructions ADD, MUL, DST, DP3, DP4, XPD, MAX, MIN, SGE, SLT, clamp, and mulz that require two source operations.

실행코드 타입 A-3(340)은 실행코드 인덱스 필드(210), 대상 오퍼런드 필드(220), 제1 소스 오퍼런드 필드(230a), 제2 소스 오퍼런드 필드(230b), 제3 소스 오퍼런드 필드(230c), 상수 필드(240)를 사용한다. 3개의 소스 오퍼런드를 필요로 하는 명령어인 MAD이 실행코드 타입 A-3(340)에 포함된다. Execution code type A-3 340 includes an execution code index field 210, a target operand field 220, a first source operand field 230a, a second source operand field 230b, A three source operand field 230c and a constant field 240 are used. MAD, an instruction requiring three source operations, is included in executable code type A-3 (340).

실행코드 타입 S(350)은 실행코드 인덱스 필드(210), 대상 오퍼런드 필 드(220), 제1 소스 오퍼런드 필드(230a), 확장 필드(250), 상수 필드(240)를 사용한다. 특별 실행코드인 rEX2, rLG2, EX2, LG2, RCP, RSQ가 실행코드 타입 S(350)에 포함된다. The executable code type S 350 uses the executable code index field 210, the target operator field 220, the first source operator field 230a, the extension field 250, and the constant field 240. do. Special execution codes rEX2, rLG2, EX2, LG2, RCP, RSQ are included in execution code type S350.

실행없음(360)은 실행코드 인덱스 필드(210) 만을 사용한다. No execution 360 uses only the executable code index field 210.

상술한 실행코드 타입에 따라 필요로 하는 비트 필드 만을 디코딩함으로써 디코딩 모듈은 64 비트 필드를 전부 디코딩하지 않게 되어 디코딩 효율이 높아지게 된다. By decoding only the bit fields required according to the above-described execution code type, the decoding module does not decode all the 64-bit fields, thereby increasing the decoding efficiency.

도 4를 참조하면, 상술한 바와 같은 기준에 따라 각 실행코드들을 그룹화하고, 실행코드 타입에 따라 특정의 실행코드 인덱스를 할당한다. 이 외에도 다른 방법에 의해 실행코드를 그룹화하고 실행코드 인덱스를 할당하는 것이 가능함은 당업자에게 명백할 것이다. Referring to FIG. 4, groups of execution codes are grouped according to the above-described criteria, and specific execution code indexes are allocated according to the execution code types. In addition, it will be apparent to those skilled in the art that other methods can be used to group executable code and assign executable code indexes.

도 5는 OpenGL ARB와 Vertex Shader 1.1을 위한 실행코드 룩업 테이블의 설정 방법을 나타낸 도면이다. 5 is a diagram illustrating a method of setting an executable code lookup table for OpenGL ARB and Vertex Shader 1.1.

본 발명에서 정의하고 있는 명령어들은 OpenGL ARB와 DirectX Vertex Shader 1.1에서 정의하고 있는 명령어들을 모두 포함하고 있으며, 이에 따라 본 발명에서의 64 비트 명령어는 OpenGL ARB 및 DirectX에서 모두 사용가능함을 알 수 있다.Instructions defined in the present invention include all instructions defined in OpenGL ARB and DirectX Vertex Shader 1.1, and thus, it can be seen that 64-bit instructions in the present invention can be used in both OpenGL ARB and DirectX.

도 6은 실행코드 룩업 테이블에서의 실행코드의 필드 구성을 나타낸 도면이고,도 7은 연산 논리 모듈의 각 연산부에서의 연산 방법을 나타낸 도면이며, 도 8은 기본 연산 필드의 값을 나타낸 도면이다.FIG. 6 is a diagram illustrating a field structure of an execution code in an execution code lookup table, FIG. 7 is a diagram illustrating a calculation method in each operation unit of an arithmetic logic module, and FIG. 8 is a diagram illustrating a value of a basic operation field.

실행코드 룩업 테이블은 각 실행코드의 실행코드 인덱스에 따라 도 6에 도시 된 것과 같은 연산 논리 모듈 제어 데이터(600)를 저장하고 있다. 연산 논리 모듈 제어 데이터(600)는 출력 스테이지 정보(610), 크로스 인에이블(Cross Enable, 620), 제1 연산 결과 선택 정보(630), 기본 연산 필드 정보(640)를 포함한다. The execution code lookup table stores the operation logic module control data 600 as shown in FIG. 6 according to the execution code index of each execution code. The arithmetic logic module control data 600 includes output stage information 610, cross enable 620, first arithmetic result selection information 630, and basic arithmetic field information 640.

도 7을 참조하면, 연산 논리 모듈(130)은 제1 연산부(ALU Stage 0), 제2 연산부(ALU Stage 1), 제3 연산부(ALU Stage 2)를 포함한다. Referring to FIG. 7, the arithmetic logic module 130 includes a first operator ALU Stage 0, a second operator ALU Stage 1, and a third operator ALU Stage 2.

제1 연산부(ALU Stage 0)는 입력된 두 정점 데이터 A, B의 각 성분, 즉 w, z, y, x 성분에 따른 기본 연산부(710a, 710b, 710c, 710d)와, 외적(Cross Product) 계산을 위한 외적 연산부(720a, 720b, 720c)와, 각 성분의 출력값을 결정하는 출력값 결정부(730a, 730b, 730c, 730d)를 포함한다. The first operation unit (ALU Stage 0) is a basic operation unit (710a, 710b, 710c, 710d) according to each component of the input two vertex data A, B, that is, w, z, y, x component, Cross Product External calculation units 720a, 720b, and 720c for calculation and output value determination units 730a, 730b, 730c, and 730d for determining an output value of each component.

출력값 결정부(730a, 730b, 730c, 730d)는 제1 연산 결과 선택 정보(630)에 따라 각 성분의 출력값을 기본 연산부 출력값(Prime.x, Prime.y, Prime.z, Prime.w), 외적 연산부 출력값(Cross.x, Cross.y, Cross.z), 정점 데이터(C.x, C.y, C.z, C.w), 소정의 값(0, 1) 중에서 선택하여 출력한다. The output value determiner 730a, 730b, 730c, or 730d may convert the output value of each component according to the first operation result selection information 630 into the basic operation unit output values (Prime.x, Prime.y, Prime.z, Prime.w), The external computing unit selects and outputs from the output values (Cross.x, Cross.y, Cross.z), vertex data (Cx, Cy, Cz, Cw), and predetermined values (0, 1).

여기서, 외적 연산부 출력값(Cross.x, Cross.y, Cross.z)는 크로스 인에이블(620)에 따라 그 사용여부가 결정된다. 예를 들어, 크로스 인에이블(620)이 '0'인 경우 외적 연산부 출력값은 모두 0이 되어 외적 연산부 출력값은 사용되지 않는 것과 같은 효과를 가지며, '1'인 경우 계산된 외적 연산부 출력값이 사용되도록 한다. Here, the use of the external computing unit output values Cross.x, Cross.y, and Cross.z is determined according to the cross enable 620. For example, when the cross enable 620 is '0', the external operator output values are all zeros, and the external operator output values are not used. When the cross enable 620 is '0', the calculated external operator output values are used. do.

출력 스테이지 정보(610)는 연산 논리 모듈(130)에 포함된 제1 연산부, 제2 연산부, 제3 연산부 중에서 해당 실행코드에 의한 연산 결과가 최종적으로 출력되 는 스테이지 출력부를 결정한다. 연산 스테이지 수가 1인 실행코드의 경우 제1 스테이지 출력부(132), 연산 스테이지 수가 2인 실행코드의 경우 제2 스테이지 출력부(134), 연산 스테이지 수가 3인 실행코드의 경우 제3 스테이지 출력부(136)에 상응하는 값이 출력 스테이지 정보(610)가 된다. The output stage information 610 determines a stage output unit in which an operation result by the execution code is finally output among the first operation unit, the second operation unit, and the third operation unit included in the operation logic module 130. The first stage output unit 132 for the execution code having one operation stage number, the second stage output unit 134 for the execution code numbering the operation stage number 2, and the third stage output unit for the execution code numbering the operation stage number 3 The value corresponding to 136 becomes the output stage information 610.

기본 연산 필드 정보(640)는 제1 연산부의 기본 연산부(710a, 710b, 710c, 710d)에서의 기본 연산의 종류를 결정한다. 도 8에 도시된 것과 같이 기본 연산 필드 정보(640)에서의 비트 필드 값에 따라 산술(Arithmetic), 비교(Compare), 설정(Setting), 이동(Move), 특별 연산(Special Function)으로 나누어 그에 따른 기본 연산이 이루어지도록 한다. The basic operation field information 640 determines the type of basic operation in the basic operation units 710a, 710b, 710c, and 710d of the first operation unit. As shown in FIG. 8, the algorithm is divided into arithmetic, compare, setting, move, and special function according to the bit field value in the basic operation field information 640. Follow the basic operation.

제1 연산부(ALU Stage 0)에서 연산이 완료되고, 제1 연산 결과 선택 정보(630)에 의해 출력될 제1 연산 결과(O0)가 결정된다. 이후 해당 실행코드의 연산 스테이지 수가 1인 경우에는 제1 연산 결과(O0)가 제1 스테이지 출력부(132)를 통해 출력된다. The operation is completed in the first operation unit ALU Stage 0, and the first operation result O0 to be output by the first operation result selection information 630 is determined. Thereafter, when the number of operation stages of the corresponding execution code is 1, the first operation result O0 is output through the first stage output unit 132.

하지만, 연산 스테이지 수가 1이 아닌 경우에는 제1 연산 결과(O0)는 제2 연산부(ALU Stage 1)의 입력 값이 된다. 제2 연산부의 입력 값은 제1 연산부의 기본 연산부 출력값(Prime.x, Prime.y, Prime.z, Prime.w)과, 선택된 출력 연산 결과(O0.x, O0.y, O0.z, O0.w)이다. 입력 값은 각 성분별로 제1 덧셈부(740a, 740b, 740c, 740d)에 의해 덧셈 연산이 이루어지고, 그 결과가 제2 연산 결과(O1)로 출력된다. 이후 해당 실행코드의 연산 스테이지 수가 2인 경우에는 제2 연산 결과(O1)가 제2 스테이지 출력부(134)를 통해 출력된다. However, when the number of operation stages is not 1, the first operation result O0 is an input value of the second operation unit ALU Stage 1. Input values of the second operation unit are the basic operation unit output values (Prime.x, Prime.y, Prime.z, Prime.w) of the first operation unit and the selected output operation results (O0.x, O0.y, O0.z, O0.w). The input value is added by the first adders 740a, 740b, 740c, and 740d for each component, and the result is output as the second operation result O1. Thereafter, when the number of operation stages of the corresponding execution code is 2, the second operation result O1 is output through the second stage output unit 134.

하지만, 연산 스테이지 수가 2가 아닌 경우에는 제2 연산 결과(O1)는 제3 연산부(ALU Stage 2)의 입력 값이 된다. 제3 연산부는 제2 연산 결과 중 x 성분 값(O1.x)과 z 성분 값(O1.z)를 제2 덧셈부(750)를 거쳐 덧셈 연산하고, 그 결과를 제3 연산 결과(O2)로 제3 스테이지 출력부(136)를 통해 출력한다. However, when the number of operation stages is not 2, the second operation result O1 becomes an input value of the third operation unit ALU Stage 2. The third operation unit adds the x component value (O1.x) and the z component value (O1.z) of the second operation result via the second adder (750), and then calculates the result of the third operation result (O2). The output through the third stage output unit 136.

도 9는 연산 논리 모듈에서의 다단 파이프라인 구조에 의한 순차적인 연산의 흐름을 나타낸 도면이다. 9 is a diagram illustrating a sequential flow of operations by a multi-stage pipeline structure in a calculation logic module.

도 9를 참조하면, 제1 연산부는 기본 연산 또는 특별 실행코드에 의한 특별 연산을 통해 1 연산 스테이지 동안 미리 정해진 연산을 수행하고, 수행 결과인 제1 연산 결과(O0)를 출력한다. 제1 연산 결과(O0)는 제1 스테이지 출력부(132)를 통해 출력되거나 제2 연산부의 입력 값이 된다. Referring to FIG. 9, a first operation unit performs a predetermined operation during one operation stage through a basic operation or a special operation by a special execution code, and outputs a first operation result O0, which is an execution result. The first operation result O0 is output through the first stage output unit 132 or becomes an input value of the second operation unit.

제2 연산부는 제1 연산 결과(O0) 및 제1 연산부의 기본 연산부 출력값을 입력 값으로 하여 미리 정해진 연산을 수행하고, 수행 결과인 제2 연산 결과(O1)를 출력한다. 제2 연산 결과(O1)는 제2 스테이지 출력부(134)를 통해 출력되거나 제3 연산부의 입력 값이 된다. The second operation unit performs a predetermined operation using the first operation result O0 and the basic operation unit output value of the first operation unit as an input value, and outputs a second operation result O1 which is an execution result. The second operation result O1 is output through the second stage output unit 134 or becomes an input value of the third operation unit.

제3 연산부는 제2 연산 결과(O1)를 입력 값으로 하여 미리 정해진 연산을 수행하고, 수행 결과인 제3 연산 결과(O2)를 출력한다. 제3 연산 결과(O2)는 제3 스테이지 출력부(136)를 통해 출력된다.The third operation unit performs a predetermined operation using the second operation result O1 as an input value, and outputs a third operation result O2 which is a result of the operation. The third operation result O2 is output through the third stage output unit 136.

도 10은 본 발명의 바람직한 일 실시예에 따른 인코딩 장치의 구성블록도이다. 10 is a block diagram illustrating an encoding apparatus according to an embodiment of the present invention.

도 10을 참조하면, 인코딩 장치(1000)는 입력부(1010), 변환부(1020) 및 출력부(1030)를 포함한다. 입력부(1010)는 정점 처리를 위한 실행코드와, 대상 오퍼런드와, 하나 이상의 소스 오퍼런드를 포함하는 명령어를 수신한다. 변환부(1020)는 명령어를 기계어로 인코딩한다. 출력부(1030)는 변환부(1020)에서 변환된 기계어를 출력한다. Referring to FIG. 10, the encoding apparatus 1000 includes an input unit 1010, a converter 1020, and an output unit 1030. The input unit 1010 receives an instruction including an execution code for vertex processing, a target operator, and one or more source operators. The converter 1020 encodes the instruction in machine language. The output unit 1030 outputs the machine language converted by the conversion unit 1020.

상술한 것과 같이 기계어는 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기록된 대상 오퍼런드 필드와, 그리고 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기록된 하나 이상의 소스 오퍼런드 필드를 포함한다. As described above, the machine language includes an execution code index field in which an execution code type and an execution code index determined according to the type of execution code are recorded, and a target operand field in which a destination address for storing an operation result according to the execution code is recorded. And at least one source operand field in which a source address in which data for operation according to the execution code is stored is recorded.

도 11은 본 발명의 바람직한 일 실시예에 따른 디코딩 장치의 구성블록도이다. 디코딩 장치는 도 1에 도시된 정점 처리 장치의 디코딩 모듈(120)에 상응한다. 11 is a block diagram illustrating a decoding apparatus according to an embodiment of the present invention. The decoding apparatus corresponds to the decoding module 120 of the vertex processing apparatus shown in FIG. 1.

도 11을 참조하면, 디코딩 장치(1110)는 입력부(1110), 변환부(1120) 및 출력부(1130)를 포함한다. 입력부(1110)는 실행코드의 종류에 따라 미리 결정되는 실행코드 타입 및 실행코드 인덱스가 기록된 실행코드 인덱스 필드와, 상기 실행코드에 따른 연산결과를 저장하는 목적지 주소가 기록된 대상 오퍼런드 필드와, 그리고 상기 실행코드에 따른 연산을 위한 데이터가 저장된 소스 주소가 기록된 하나 이상의 소스 오퍼런드 필드를 포함하는 기계어를 수신한다. 변환부(1120)는 기계어를 정보신호로 변환한다. 출력부(1130)는 디코딩된 정보신호를 출력한다. Referring to FIG. 11, the decoding apparatus 1110 includes an input unit 1110, a converter 1120, and an output unit 1130. The input unit 1110 includes an execution code index field in which an execution code type and an execution code index, which are determined in advance according to the type of execution code, are recorded, and a target operand field in which a destination address storing the operation result according to the execution code is recorded. And a machine word including one or more source operand fields in which a source address in which data for operation according to the execution code is stored is recorded. The converter 1120 converts the machine language into an information signal. The output unit 1130 outputs the decoded information signal.

인코딩 장치(1000) 및 디코딩 장치(1110)에서의 명령어, 기계어 및 그 인코 딩 방법, 디코딩 방법은 앞서 상술하였는 바 상세한 설명은 생략한다. Instructions, machine words, encoding methods thereof, and decoding methods of the encoding apparatus 1000 and the decoding apparatus 1110 have been described above, and thus a detailed description thereof will be omitted.

상술한 바와 같이, 본 발명에 따른 정점 처리 명령어의 구조 및 이 명령어의 인코딩 장치, 디코딩 장치, 그 방법은 연산부들이 순차적으로 연결된 다단 파이프라인 구조의 연산 논리 모듈을 통해 각 실행코드 간에 최대 지연시간을 3 스테이지로 줄이는 것이 가능하도록 한다. As described above, the structure of the vertex processing instruction according to the present invention, the encoding apparatus, the decoding apparatus, and the method of the instruction is a maximum delay time between each execution code through the operation logic module of the multi-stage pipeline structure connected to the operation unit in sequence Make it possible to reduce to 3 stages.

또한, 실행코드 룩업 테이블을 사용하여 실행코드를 해독함으로써 연산 논리 모듈의 연산을 효율적으로 정의하고, 추후 실행코드의 확장 및 변경을 용이하게 하여 효율적인 정점 처리가 가능하다. In addition, by using the execution code lookup table, the execution code is decoded to efficiently define the operation of the operation logic module, and it is possible to easily extend and change the execution code later, thereby enabling efficient vertex processing.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to a preferred embodiment of the present invention, those skilled in the art to which the present invention pertains without departing from the spirit and scope of the present invention as set forth in the claims below It will be appreciated that modifications and variations can be made.

Claims

An input unit for receiving an instruction including execution code for vertex processing, a target operator, and one or more source operators;

A conversion unit for encoding the instruction in machine language; And

Including an output unit for outputting the machine language,

The machine language includes an execution code index field in which an execution code type and an execution code index determined according to the type of the execution code are recorded, a target operand field in which a destination address storing an operation result according to the execution code is recorded; And at least one source operand field in which a source address in which data for operation according to the execution code is stored is recorded.

The method of claim 1,

And the machine language comprises a 64-bit field.

The method of claim 1,

And the execution code type is classified according to a destination address storing the operation result.

The method of claim 1,

And the execution code type is classified according to the number of source operations required for the operation of the execution code.

Receiving an instruction comprising executable code for vertex processing, a target operand, and one or more source operands;

Encoding the instruction in machine language;

Outputting the machine language;

An execution code index field in which an execution code type and an execution code index determined according to the type of execution code are recorded, a target operand field in which a destination address storing an operation result according to the execution code is recorded, and the execution An input unit for receiving a machine language including at least one source operand field in which a source address storing data for operation according to a code is stored;

A converter for converting the machine language into a decoded information signal; And

And an output unit for outputting the decoded information signal.

The method of claim 6,

And the machine language comprises a 64-bit field.

The method of claim 6,

The method of claim 9,

And the converting unit decodes only the source operand field required for the operation of the execution code according to the execution code type.

The method of claim 6,

And an execution code lookup table for determining a basic operation type performed on each component corresponding to the execution code index.

The method of claim 6,

And an execution code lookup table that determines a stage to write back corresponding to the execution code index.

An execution code index field in which an execution code type and an execution code index determined according to the type of execution code are recorded, a target operand field in which a destination address storing an operation result according to the execution code is recorded, and the execution Receiving a machine language including one or more source operand fields in which a source address storing data for operation according to code is recorded;

Converting the machine language into a decoded information signal; And

Outputting the decoded information signal.