WO2008053597A1

WO2008053597A1 - Device for accelerating the processing of extended primitive vertex cache

Info

Publication number: WO2008053597A1
Application number: PCT/JP2007/001196
Authority: WO
Inventors: Kozakov Maxim
Original assignee: Digital Media Professionals Inc.
Priority date: 2006-11-01
Filing date: 2007-10-31
Publication date: 2008-05-08
Also published as: JPWO2008053597A1; JP2012014744A; JP5216130B2; JP4913823B2

Abstract

To solve problems with the on-chip processing of extended geometrical primitives such as a re-divided surface patch, a NURBS patch, an adjoining triangle, which are used as the input information in the algorithm to be used in three-dimensional computer graphics. The problems are solved by a system to be used in the three-dimensional computer graphics, such as the system including a primary vertex cache store (PVC), a vertex processing unit (VPU), a second vertex cache store (SVC), a primitive engine (PE), a fixed primitive assembly (FPA) and a vertex cache control (VCC).

Description

Specification

Equipment for accelerating processing of extended primitive vertex caches

[0001] The present invention relates to the field of three-dimensional computer graphics. More specifically, the present invention provides a method and a method for processing information in a hardware at high speed for a geometric primitive having many vertices called a primitive extended in the specification. About the system.

Background art

[0002] Traditionally, interactive computer graphics approximate simple complex geometric shapes and use simple geometric primitives such as points, lines, and triangles. Existing computer graphics hardware is optimized to speed up the processing of such simple primitives, especially triangular meshes. Primitive simple triangles are the simplest polygons that can be used to linearly approximate a 3D surface fragment. Various triangles, lines, and points can be approximated to fairly complex shapes. Currently existing hardware designs can greatly benefit from the possibility of independently processing primitive vertices in their sequence from each other.

[0003] The need for processing complex multi-vertex primitives occurs in various areas of 3D computer graphics. In effect, it models many complex shapes with higher-order plane primitives such as NURBS patches and subdivision surfaces. In either case, the coarse control mesh and rule set is used to generate a shape that is smooth, fine, and easily deformed by affecting the control mesh. Usually, the patch itself is formed by a variable-size set of vertices in the control mesh, as well as additional information that controls the process of tessellation. Enabling NURBS or subdivision surface patch selection in hardware results in interactive 3D graphics that can greatly benefit from storage of shape operations, bandwidth requirements, And greatly reduce simplicity . Another set of algorithms requires access to a more complex part of the processed form than just a single triangle in the approximation at a time. Usually, access to some limited neighboring vertices of the triangle is necessary to detect the silhouette edges, and computes the curvature at the mesh vertices and so on.

[0004] The support of processing complex primitives is difficult in existing 3D graphics accelerator structures for several reasons. When primitives are limited to three vertices only for simple triangles, processing of complex primitives immediately requires fast access to multiple vertices for processing algorithms. If attribute data for vertices of primitive complexes such as those used for subdivision surfaces and NURBS patches are interspersed across the memory, access will result in significant latency in data transfer Processing algorithms may have poor performance. Optimizes the programmable processing elements of the current pipeline architecture's geometric pipeline to process a single vertex at a time. It then assembles primitives from a sequence of vertices with fixed logic (which can't be manipulated on any other primitive type, but simple).

[0005] The kernel structure of 3D graphics hardware accelerators available for personal computers and workstations is also often faster for processing lists of simple primitives such as triangles, lines, and points. Adapt with expression. Figure 1 shows a portion of a typical 3D graphics hardware asset pipeline. The list of primitives is described by an index buffer 1100 and a vertex buffer 1200. Usually stored in the host machine memory 1000, the contents of the index and vertex buffer are illustrated in Figure 2A and Figure 2B. In the case of FIG. 2A, a set of triangles 101, possibly sharing a vertex, is represented by a set of vertex data that is successively packed into the vertex / buffer 1200 and index buffer 1100. In the case where the latter content references vertices by the vertex buffer and the index buffer triangle set 101, position 3, 1100 defines a triangle. Next 3 positions in set 101 If the majority of the vertices in the vertex / buffer 1200 are reused, at which point the reused vertex in the vertex buffer 1200 hits the point Triangle set surrogates can be quite compact when vertex data marked with a padded fill pattern usually requires much more accumulated space than the index buffer index. Referring to FIG. 2B, the representation is even more concise with the triangle strip 102 case. In this case, vertices taken with the previous two vertices referenced in the index buffer form a triangle. As a result, only one index is needed to define further triangles after the first is processed. The contents of the index buffer in this case describe the triangle set more effectively in terms of the number of index lists per triangle. Describe the same method defined by one set of line segments, with two vertices and one set point defined by one vertex per point using an index and a vertex buffer. Can do.

A vertex cache device is used to accelerate the processing of a set of primitives defined using indices and vertices / buffers. According to Figure 1, the contents of index buffer 1 100 are not only used to extract vertex buffer 1200, but also recently used to detect if vertices with the same index have been processed. May still be available in the cache. The vertex cache controller 2000 gets the contents of the index buffer 1 100 and analyzes it. Initially the cache is empty, so the vertex cache controller will deliver vertex buffer contents containing vertex data for index squirrel obtained from index buffer 1 100 to first vertex cache 3000. Is initialized. Typically, memory access potential penalties are obtained from the vertex buffer in a manner that minimizes a relatively large contiguous memory block that can contain vertex data correspondences that are not currently processed only by the index list by the vertex cache controller. Nevertheless, it is the first vertex cache 3000 and it was later indexed. Retained because it may be used by Sir Xuris. Since the index is already at the first vertex, if the vertex data caches 3000, no host memory access is performed.

[0007] Normally, the vertex data sent to the first vertex cache 3000 needs to be converted.

For example, vertex positions may need to be converted from one coordinate system to another. Vertex color is calculated based on standard, position, etc. The vertex cache controller 2000 controls the delivery of vertex data from the first vertex cache to the vertex processor 4000. The converted vertex data is sent to the secondary vertex cache 5000. A processing pipeline with fixed triangle setup 7000 and rasterizer 8000, using the converted vertex data from secondary vertex cache 5000 to form a fixed set of primitives such as triangles, lines, and points from the vertex column Assemble primitives to the rest of the and reach fixed primitive assembly 6000. As Index presents it in the secondary vertex cache 5000, it delivers the vertex data transformed by any other without overheading to the primitive assembly 6000.

[0008] Because of the benefits that accelerated processing of hardware for extended primitives can be introduced, especially in complex forms and in the image field, processing of such primitives on a chip is possible. A great effort has been put in designing the hardware. Introducing a completely different harddua that includes a subdivision surface on chip and NURBS surface tessellation, yet nevertheless devoted to the structure of the most complex complex processing tasks, Attempts were made to use programmable elements in existing structures. Simple primitive constants and traditional architectures are still limited in the absence of primitive generation facilities.

[0009] In W003 / 081528, it describes the possibility of reusing its hardware for any other purpose, or sharing the device for simple primitive processing and applications. There is no need to prepare complex structures A dedicated hardware unit is disclosed that processes a special description of the control mesh at the subdivision surface with the foot describing connectivity information for the control mesh in a form suitable for hardware processing. . A similar approach is also disclosed in US Patent Application Specification 2005/001 2750 (Patent Document 2), where accelerated processing is achieved with NURBS / Tuch hardware.

[001 0] With the introduction of programmable 3D acceleration hardware, several attempts were made to perform complex primitive processing with it. Nevertheless, an effective implementation is not possible with commercially available hardware when the facility is not limited to the generation of any primitives, even though the primitives are still simple to process.

[001 1] One of the problems with processing of extended primitives in random access, vertex data, particularly in the case of subdivision surface selection, is to provide the necessary information to the processing algorithm. Random memory access is currently expensive on existing hardware and is possible only in limited locations in the processing pipeline. The most versatile capability of existing hardware is related to mesh sampling that can be controlled by a vertex or fragment program in programmable hardware. For the usefulness of random memory access on a per-mesh basis, the realization obtains mesh connectivity information packed into the mesh. In the case of dedicated tessellation hardware, special mesh expressions that allow the acquisition of the information necessary to implement the tetsu selection algorithm are used with high preparation flaws to maintain costs. Another current device in traditional architectures that allows random memory access is the vertex cache.

[001 2] Patent Document 1: W003 / 081 528 pamphlet

Patent Document 2: US Patent Application Specification 2005/001 2750

Disclosure of the invention

Problems to be solved by the invention

[001 3] The present invention relates to an algorithm used in three-dimensional computer graphics. The purpose is to solve on-chip processing problems related to extended geometric primitives such as subdivision surface patches, NURBS patches, and adjacent triangles used as input information in rhythms. Such algorithms include Gatmu l 卜 G lark loop, which is a 4_3 subdivision scheme, NURBS surface segmentation, silhouette discovery, and the simplest geometric geometric primitive to construct a triangle 3 It includes various schemes implemented in known computer graphics, such as algorithms that require paired vertices.

[0014] Performing information processing on geometric primitives fully expanded on chip leads to the rapid rendering of complex geometric shapes. Furthermore, it leads to obtaining more realistic images using computer graphics, and also to obtaining primitives that are directly extended using simple primitives such as triangles.

[0015] The size of primitives in current 3D computer graphics is fixed. For example, a triangle has three vertices, a straight line has two vertices, and a point has one vertex. On the other hand, the primitive size can be any size, especially for extended primitives used in subdivision of surface patches. When using extended primitives within a reasonable range, the maximum number of vertices is required to be implemented on the chip.

[001 6] The next problem is the difficulty in achieving random memory access, especially when processing information using extended primitives. The vertex data composing the primitive can be distributed and stored in the storage unit as in simple primitives. The problem here is the number of vertices in the extended primitive, which is several times larger than usual, and it is necessary to access the memory randomly to fetch the corresponding vertex data. It happens. Random access in graphics devices is limited to produce serious defects if not properly cached Normally, it is used only for devices related to vertex caches and devices related to texture sampling.

[001 7] If only simple geometric primitives such as triangles are used effectively with hardware, the number of gates cannot be reduced, resulting in a complicated chip configuration. There is. Even if the logic circuit for drawing extended primitives is provided separately from the processing circuit for simple primitives, the configuration of the chip is complicated. Therefore, if simple primitives and extended primitives can be processed by a single processing system that is not handled by separate processing circuits, the problem of complex chips can be solved.

[0018] Furthermore, there are problems in expressing extended primitives. In order to unify the processing and API, that is, the application programming interface, for simple primitives and extended primitives to be widely used for various types of extended primitives, As in the case of the Tive, it is desirable to reduce the storage capacity and reduce the data transfer rate. The following describes how the present invention solves the above problems.

Means for solving the problem

[001 9] Basically, the present invention assigns an index to each vertex, stores the index in the index buffer, expresses the vertex in the primitive using the stored index, and then associates the index with the index. Vertex information is read from the vertex buffer and used for primitive processing. Furthermore, regular primitives such as triangles and quadrilaterals are used as regular primitives, but the present invention also uses primitives with four or more vertices than usual, while using these regular ones. Such primitives with a variable number of vertices and more than usual are called variable-size extended primitives. In the present invention, when vertex information about a primitive is input, it is determined whether the primitive is a normal primitive or an extended primitive of variable size. This judgment can be easily made by the number of vertices in one primitive. And In the case of ordinary primitives, computation processing may be performed in the same way as usual using the same equipment as usual. On the other hand, in the case of extended primitives of variable size, the index of the polygon that becomes the center is read, and then the index of the polygon that is adjacent to the triangle or the rectangle that becomes the center is read. Then, the vertex data related to these indexes is acquired, and the arithmetic processing to assemble the primitives is performed. The system of the present invention includes a vertex engine that receives various information from a computer and converts information related to vertices, and a primitive engine that receives vertex information converted from the vertex engine and assembles primitives. The primitives assembled by the primitive engine are rasterized by the rasterizer, stored as 3D computer graphics in a frame buffer, and rendered on the monitor. Vertex conversion means arithmetic processing such as viewpoint conversion for vertices. Regardless of whether the primitive is a point, a line, a triangle, or a polygon more than a quadrangle, the necessary arithmetic processing is performed on the vertex information. Primitive assembly means that the transformed individual vertices are assembled into a primitive. After primitive assembly, individual processing is performed for each primitive. In the present invention, the primitive may include a variable size. If the primitive size is 3, such as when the primitive is a triangle, the index table needs to be accessed every multiple of 3. Therefore, the device only needs to remember the number 3 and does not need to be described. However, in the present invention, since the primitive size can be changed for each primitive, it is difficult to know how far the next primitive is. Therefore, in the present invention, it is preferable to always describe so that the following elements constitute the primitive. In the present invention, the primitive size is described in the index table. When there are a plurality of index tables which is a preferred aspect of the present invention, it is preferable to describe the primitive size in the index table including the position attribute.

In order to draw more objects, it is preferable to reduce the number of vertices. That's right. For example, consider describing two vertices by describing a vertex for each triangle primitive. The first triangle, the vertex as _{_{v m, (V 1, V}} 2, v 3: However, v _m is _{_{(x m, y m, z}} m)) consist of, the color of each vertex, red , White and yellow. The second triangle adjacent to the first triangle is

Since share P 2, and _{P 3 (V 2, V 3} , V 4: However, P ₄ blue) can be written as. Then, although there are actually only 4 vertices, a description about 6 vertices is required. In other words, although this method is simple, a description of 3 xn vertices is required to draw n triangles. In other words, since there are many vertices that must be described, a large amount of memory is consumed. Next, the case of using a triangle strip is explained. In this drawing method, after describing the vertex of the first primitive, the description of the shared vertex is used when describing the next primitive. For example, if the primitive is a triangle, the next triangle is drawn with the condition that it uses the two vertices of the previous triangle. In other words, of the vertices of the triangle described earlier, the vertices excluding the first vertex become the two vertices of the next triangle, and the next triangle is expressed using another vertex. This method can significantly reduce memory costs. However, it is still necessary to describe 2 X n or more vertices to draw n triangles.

In the present invention, (V-,, V ₂ , v ₃ : However, v _m is x _m , y _m , z _m )) Is not a coordinate value, but simply stores information such as ( _{V 1} , v ₂ , v ₃ ). Using this information, each vertex information (for example, _{V 1} ¾ ( _{X 1} , _yi , _{Z 1} ) , Red). In other words, the index Te one table, _{V 1, V}

2, in V ₃ to form a single triangle, so as to form a next triangle in V _2, V 3, V _4, and stores the index. In this way, an index table that can read vertex data from the vertex buffer of the vertex engine is used.

, One vertex only needs to be described once, and as a result, the amount of description of the vertex is small. On the other hand, this method requires an index to be described. Index The width of the tableable depends on the primitive and is 3 if the primitive is a triangle. In this method, since one vertex is shared by multiple triangles, it is necessary to repeatedly deliver information about one vertex. An index table is usually one, and is sufficient for most situations. In fact, the OpenGL / ES interface has only one index table. However, some interfaces have independent index tables for vertex attributes such as XYZ coordinates (position attributes), colors (vertex color attributes), and texture coordinates (texture attributes). A famous example of such an interface is Direct 3D.

[0022] The gist of the present invention is to use a vertex cache fac iliti es ¾: in a suitable graphics selector to enable processing of extended geometric primitives. About talking. This is an added device compared to the conventional technology, and includes a primitive engine (primitive engine) used for assembling and processing the extended primitive proposed in the present invention, and an extended primitive. Achieved by using in combination with an extended index / vertex buffer to represent the polygon mesh used to represent

[0023] In the present invention, the extended primitive is expressed based on the extended index / vertex / kuffer for expressing the polygon mesh. There are currently 3D graphic libraries such as Direct3D and openGL, and they provide a method for quickly processing polygon meshes using such hardware. In the representation process, the vertex buffer stores the vertex attributes (vertex attributes) at each vertex of the polygon mesh to be processed, while the contents of the index buffer contain mesh connection information. The index buffer contains a description of the polygon array of the same size associated with the vertex numbers that make up the index string indicating the contents of the vertex buffer. An index sequence belonging to a polygon is By referring to the vertex data, the vertex sequence of the polygon is described, and then the polygon edge sequence formed by the continuous connection of the vertices in the polygon is described. In the present invention, the index buffer can use an index buffer in the same way as an extended primitive sequence of a certain size. An index that stores an index value that has been fixed in advance to other indexes that refer to vertices that form an extended primitive of various sizes together with the primitive size to represent the extended primitives of various sizes. A buffer is used.

[0024] The above-mentioned problems can be solved by improving the index buffer and the vertex buffer in order to perform arithmetic processing related to such extended primitives. A fixed-size extended size of different size that must specify the number of vertices, which is the number of additional vertices, in the vertex list via the extended primitives and indexes stored only in the index buffer. It is sufficient to represent primitives and variable-size extended primitives of different sizes. The creation of size, vertex list, and certain special types of primitives is done by the primitive creation algorithm itself, so the representation is not dependent on the primitive size. This is the same as the representation of simple primitives. From the point of view of the graphic library, the representation is the same as for simple primitives, and even if the primitive sequence is expanded compared to the simple primitive sequence, the API will not change significantly. It is not required and can reduce the required memory.

[0025] With this representation, memory costs can be reduced when vertices are shared among primitives whose vertices are stored only once in the vertex buffer. As with simple primitives, referencing an index requires much less memory space.

[0026] In the present invention, a combination of a vertex cache and a primitive engine is used in order to quickly compute a hardware-extended primitive. Use. The vertex cache allows access to vertices that have been processed immediately before with a short waiting time. In the case of the representation by index buffer and vertex buffer, the vertex index can be used as a cache tag. Therefore, if the vertex index is the same as that in the cache, the one stored in the latter is used for further processing. The present invention preferably uses the vertex cache for arithmetic processing of extended primitives.

[0027] Since simple primitives and extended primitives have similar representations, the extended primitives can also have a low-latency access to the vertices that were processed immediately before, as in simple primitives. Available. This eliminates the performance degradation caused by the many random accesses required when fetching the extended primitive vertex data. In addition, since the same vertex cache hardware is used in the processing of simple primitives and extended primitives, the hardware size can be reduced. Also, since the vertex cache is used, there is no upper limit on the maximum size of the extended primitive. In addition, the vertex cache is used for vertices in fixed-size extended primitives as well as vertices in variable-size extended primitives. However, effective arithmetic processing is possible by performing arithmetic processing on primitives that have vertex data in the vertex cache where a reasonable upper limit is set for the maximum size of the extended primitive that performs arithmetic processing. This solves the problem of processing fixed-size or variable-size extended primitives larger than simple primitives.

The invention's effect

[0028] In the present invention, the extended primitive assembly and processing are processed by the primitive engine, which is a module added to the conventional hardware. This module converts the input information into a transformed vertex map for each vertex of the primitive extended from the vertex cache. In addition, it receives size information about variable-size extended primitives from the cache controller.

[0029] The primitive engine executes an extended primitive arithmetic processing algorithm. The algorithm interprets the extended primitive vertex sequence and outputs the computed result as a simple primitive sequence. This can solve the following problems. The first is versatility when processing extended primitives. If it is controlled and programmable by the primitive engine, any extended primitive can be implemented using the expression by the extended index buffer and vertex buffer proposed by the present invention. , It allows quick access to vertex data. Since it is directly connected to the vertex cache, the latency problem of assembling extended primitives and acquiring vertex data for processing is greatly reduced. Reuse of the pipeline for arithmetic processing and arithmetic processing on the extended primitive chip also contribute to this. Since the operation result for the extended primitive is a simple primitive sequence and is directly supported by the result of the operation pipeline, the extended primitive sequence is directly used for the simple primitive without using the operation result to the host computer. Since it is converted into a column, the problem of performing arithmetic processing only on the chip can be solved.

[0030] The fixed index can be expanded by using an extended index / vertex / buffer expression, a minor modification to the vertex cache logic that computes simple primitives, and the primitive engine. On the chip, such as subdivision surface rendering that does not exist in the current 3D graphics hardware such as NURBS tessellation. It is possible to perform the arithmetic processing.

Brief Description of Drawings

[0031] [Figure 1] Figure 1 shows the conventional hardware architecture of the vertex cache device. Indicates

[Fig. 2A] Fig. 2A shows the index / vertex / buffer layout for sampling a sequence of triangles.

[Figure 2B] Figure 2B shows the layout of the index / vertex buffer for sampling the triangle strip sequence.

[FIG. 3] FIG. 3 shows a vertex cache of the architecture in the present invention.

[Figure 4A] Figure 4A shows a triangle and its neighboring fixed-size extended primitives.

[Figure 4B] Figure 4B shows a fixed-size extended primitive strip sequence around a triangle and its neighbors.

[Fig. 4G] Fig. 4C shows the index / vertex / uffer layout of the triangle and its neighboring fixed-size extended primitives.

[Figure 4D] Figure 4D shows the index / vertex buffer layout of a triangle and its neighboring fixed-size extended primitive strip sequence.

[Figure 5A] Figure 5A shows the triangle and its neighboring fixed-size extended primitive Suan sequence.

[Fig. 5B] Fig. 5B shows the structure of the edge-based flap Siletsu.

[Fig. 5G] Fig. 5C shows the index / vertex / uffer layout of the triangle and its neighboring fixed-size extended primitive Suan sequences.

[Figure 6A] Figure 6A shows a variable-size extended primitive.

[Fig. 6B] Fig. 6B shows a Gatmu I-Clark subdivision patch with extended primitives of variable size.

[Fig. 6G] Fig. 6C shows the layout of the Gatmu l I-Clark subdivision / tach index / vertex buffer with variable-size extended primitives.

[FIG. 7] FIG. 7 shows a communication path between the vertex cache control unit, the primitive engine, and the fixed size primitive integrated circuit introduced by the present invention.

[Figure 8 Figure 8-8 shows the rendering results without the detection and visualization of silhouettes. [Fig. 8B] Fig. 8B shows the rendering results when siletto detection and visualization are performed.

[Fig. 9A] Fig. 9A shows the rendering results of the wire-one frame shape without re-segmentation.

[Fig. 9B] Fig. 9B shows the rendering results for the wire-frame shape when subdivision is performed.

[Fig. 10A] Fig. 1 OA shows the result of rendering one wire frame without re-segmentation, and the inside of the box shows the coarse elements of the mesh.

[Fig. 10B] Fig. 1 OB shows the rendering result of the wire-one frame when subdivision is performed, and the inside of the box shows the coarse mesh elements that are smoothed during subdivision.

[Fig. 11A] Fig. 11A shows the rendered image without subdivision.

[Fig. 11B] Fig. 11B shows the rendered image when subdivision is performed.

Explanation of symbols

101 simple triangle row

102 simple triangle strip row

103 Simple triangle with adjacent primitives

104 Simple triangle strip sequence with adjacent primitives

105 Simple triangle with adjacent primitive fan train

106 Gatmul 卜 Glark Subdivided surface patches

107 Gatmul 卜 Glark Mosaic of subdivided surface patches

110-115 mesh fragment triangle

210 Cache Destination Multiplexer

220 Fixed Size Primitive Integrated Circuit Unit Source Multiplexer

1000 host memory

1100 Index buffer

1200 vertex buffer

2000 Vertex cache controller 2100 vertex counter

2300 Information transfer path for exchanging vertex delivery completion signals between the vertex cache controller and the primitive engine

3000 vertex cache storage

4000 vertex processing unit

5000 Second vertex cache storage

6000 fixed size primitive integrated circuit unit

7000 fixed size primitive set up unit

8000 Primitive Rasterizer

9000 primitive engine

9100 Primitive size register

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described. The present invention basically relates to the processing of on-chip complex geometric primitives (also called extended primitives) formed by a fixed or variable number of vertices.

[0034] The first aspect of the present invention describes simple primitives and also describes variable-size extended primitive sequences using four or more vertex data for each variable-size extended primitive. A method for two-dimensional computer graphics, which can store a vertex group including a plurality of attributes. The position of the attribute in the buffer memory is obtained by multiplying the index that is the vertex number in the vertex sequence by an integer, and the attribute Using the vertex / queffer that can be obtained by biasing with the number indicating the type, using the index and the number indicating the attribute type, in the vertex attribute memory of the vertex in the vertex buffer Specify the position of the vertex buffer, and the vertex position attribute value sequence in the vertex / uffer in relation to the vertex sequence The variable size extended primitive size is stored as an index, the fixed size primitive sequence can be reconstructed using the vertex sequence, and the variable size extended primitive sequence is stored. , Using the primitive size together The index buffer identification process, the index buffer identification process, and the remaining vertex attributes that were not found in the above process, even with variable size extended primitives This method relates to a method having a plurality of index buffer specifying steps for obtaining all remaining vertex attributes in the same manner as the vertex position attribute is obtained except that the index relating to the primitive size is not used. In this specification, extended primitives may contain more than four vertices. For example, the number of vertices is 4, 5, 6, 7, 8, 9, or 10.

[0035] The above-described method is such that the fixed-size primitive sequence can be reconstructed by using the vertex sequence, and in the case of a variable-size extended primitive sequence, it can be reconstructed by using the primitive size. , Storing an index sequence corresponding to the vertex sequence of the vertex position attribute in the vertex buffer, and identifying an index buffer that stores the size of the variable-sized extended primitive as the index value of the index buffer. Therefore, the geometric primitive sequence can be reconstructed from the vertex sequences that compose any fixed size. In addition, since the primitive size can be included before the index sequence that refers to the vertex position attribute of the variable size extended primitive, the method can be reconstructed by the size and the vertex sequence that forms the size. Can describe variable size extended primitives. The method may further include the step of identifying multiple index buffers for all remaining vertex attributes, so that each attribute is addressed by its own index, and therefore a separate index buffer is used. If required, all required vertex attributes can be specified for extended primitive vertices, such as neighboring points.

[0036] The steps of the method for specifying an extended primitive sequence introduced by the present invention have the following merits. Various types of extended primitives can be identified, for example, types that can be reconstructed from fixed-size or variable-sized vertex sequences. Referenced by the compact index It becomes compact by being expressed as vertices shared by primitive sequences. This method is used to identify triangle / quadrature meshes.

Since it is extended to the index buffer / vertex / buffer representation used in the 3D graphic library, it can be expanded and compared with simple primitive columns such as triangles, rectangles, lines, and points. Only a few modifications to the library API s are needed for applications to prepare and duplicate mitigation sequences. Each step will be described in detail below.

[0037] The method extends the index / top buffer to represent simple primitive sequences to describe fixed-size or variable-size extended primitive sequences. In this specification, simple primitives refer to basic shapes used in computer graphics, such as triangles, rectangles, lines, and points. Such processing of primitive sequences is usually the biggest problem in 3D graphic libraries such as open G L and di rec t 3D. In this specification, the extended primitive means a geometric primitive formed by a fixed number or a variable number of vertex sequences having four or more vertex numbers.

[0038] An index buffer / vertex buffer for representing a simple primitive means that a simple primitive is represented as a simple primitive sequence represented as a vertex sequence constituting the primitive. Vertex buffer means a storage device of vertex attributes used in computer graphics. In this specification, the vertex attribute means an attribute associated with a point in a four-dimensional homogeneous space used as a vertex of a primitive. Preferred examples of vertex attributes include points in space, colors, texture coordinates, normal vectors, tangent vectors, and so on. Attributes can be of various dimensions, such as scalars, two component vectors, three component vectors, and various values such as 1-byte integer, 2-byte integer, 4-byte integer, 4-byte floating point, etc. Can take a type. Each attribute example requires a fixed memory size storage device determined by its dimension and value type. Summit The method of identifying a point buffer includes placing attribute strings in memory in such a way that attribute values of the same attribute type are placed in memory in the same way. Therefore, the position of an attribute in the vertex buffer can be easily recovered by its position and placement in the column associated with the attribute type. Thus, attribute values can be represented by integer positions or indexes in the column. If all attribute columns have the same size, the vertex index value derived from the attribute value vertex sequence will increase in relation to the number of vertices in the sequence. In such a case, a vertex sequence can be specified, and therefore a simple primitive sequence can be specified without using an index buffer.

[0039] In this specification, such a situation is called “vertex buffer-only representation” “ver tex buffer-on ly represent” on. An array in memory that contains a sequence of integer values to identify. If a vertex sequence contains only one of all types of index sequences of its associated vertex attributes, specify one index buffer to fully describe the vertex sequence for all vertex attributes It is enough. Conversely, there may be cases where each vertex attribute type is related to the vertex sequence and is unique to the vertex index of the vertex index. In such cases, the number of index buffers to identify is only the same as the number of attribute types. Combining vertices / queffers with an index buffer with all the required index arrays in it forms an index / vertex / queffer that is represented by a sequence of simple primitives.

[0040] The process of specifying the vertex buffer is to specify the vertex attributes (attr i butes) associated with each vertex, such as the intex buffer and / or vertex / kuffa in a simple primitive sequence, as described above. Process.

[0041] The step of specifying the index buffer is a step for specifying an index related to the attribute of the vertex position in order to form a primitive in which the vertex is extended. Various extended size attributes on extended primitives For this reason, this step includes the step of specifying the primitive size according to the index value in the index buffer for specifying the vertex position. If a variable-size primitive is used to implement a vertex sequence that constitutes a primitive, this step may include identifying an extended primitive sequence by specifying the primitive size for each primitive. The present invention can be used for an extended primitive that can gather all necessary information by the size of the primitive, the vertex sequence that can form the primitive, and the specific value of the vertex sequence.

In the present invention, an example of a fixed-size extended primitive is a triangle “Triangle with Neighborhood” (TWN) primitive that includes a neighbor. Such a primitive is a mesh that consists of a triangle containing three triangles adjacent to each side of the triangle. Can be formed by studying in a continuous mesh)

[0043] The above is described with reference to Fig. 4 (A). Figure 4 (A) is a conceptual diagram of a triangular mesh. As shown in Fig. 4 (A), fragment 103 is drawn on the triangular mesh.

Vertices _{_{{ν ,, ν 2, ν 3}} } with respect to the triangle formed by the TWN primitive, and the triangle, the triangle _{_{{ν ,, ν 2, ν 3}} } sides {v _2, vj of, The triangle {ν, adjacent to {ν ,, ν ₃ }, {ν ₃ , ν ₂ }. , Ν ,, ν ₂ }, {ν ,, ν ₅ , ν ₃ },

formed by {ν ₂ , ν ₃ , ν ₄ }. In this specification, the triangle {ν ,, ν ₂ , ν ₃ } is also called the “center” triangle. In this way, a total of 6 vertex sequences are required to represent the TWN primitive. Preferred examples of vertex determination in TWN primitives, that is, the mapping method between vertex positions and the connection relationship between each vertex in the primitive and other vertices are as follows. Vertex sequence {ν. For triangles adjacent to the triangle formed by the ν ,, ν _2}, sides _{{ν 0, V,},} {ν ,, ν 2}, {ν 2, V Q} vertices corresponding to [nu _01, 12, V20 K: Ding: Shi, V ₀₁ , V, 2, ν ₂₀ Each of the triangles adjacent to the central triangle, and the triangles other than (ν ₀ , ν ,, v ₂ } of the triangle sharing the above-mentioned side), the vertex sequence for expressing the TWN primitive is { ν ₀ , ν,, v ₀₁ , v ₂ , v ₂₀ , v ₁₂ }. For the triangle {v ₂ , v,, ν ₃ } shown in Fig. 4 (A), the vertices for expressing the TWN primitive are {ν ₂ , ν ,, Vo, v ₃ , v ₄ , v ₅ }. Become. That is, if “V” is a vertex position having the attribute index j, the vertex position sequence of the specific index of the TWN primitive of the central triangle {v ₂ , V,, v ₃ } is {2, 1,0,3,4 , 5}. If the edges of the original mesh are open, at least one triangle adjacent to them can be adjacent, and if all edges in the mesh have no more than two adjacent triangles, Further TWN primitives can be generated for all such triangles. In addition, for a side that has only one adjacent triangle, an artificially missing triangle can be compensated. One way is to make a degenerate triangle by using the artificially created open edge vertices twice. Another method is to use the vertex of the central triangle at the opposite position to the open edge. In this case, the triangle does not degenerate and becomes a central triangle.

[0044] A fixed-size extended primitive sequence can be formed by concatenating the vertex sequence of each primitive in the primitive sequence. The index buffer for vertex position attributes can be expressed as an intex sequence of vertex position attributes of the vertices in the concatenated sequence. For TWN primitives, there are three preferred methods for representing primitive sequences. In other words, what is expressed as a separate TWN primitive sequence, what is expressed as a TWN strip, and what is expressed as a TWN fan. They are based on separate central triangle rows, based on central triangle strips, and based on central triangle fans.

[0045] Separated TWN primitive sequences can be designed by concatenate the vertices for each generated TWN primitive with the TWN primitive for each triangle as the central triangle. Fig. 4 (A In), the central triangle _{{v  2, ν} ,, v 3} and _{{v  3, ν} ,, v 5} depicts a fragment of TWN primitives formed by rows consisting of (fragment). The corresponding vertices are {ν ₂ , ν,, v ₀ , v ₃ , v ₄ , v ₅ , v ₃ , v,, v ₂ , v ₅ , v ₆ , v ₇ }, and the corresponding vertex position attributes The index sequence corresponding to is {2, 1,0,3,4,5,3, 1,2,5,6,7} · as shown in FIG.

[0046] A TWN strip can be composed of a triangular strip. In the triangle strip, the fact that two consecutive triangles share two vertices allows the second or next TWN primitive in the strip to be defined by only two vertices for each primitive. In the present invention, the vertex sequence {ν ₀ , ν ,, v ₂ , v ₃ , v ₄ , v ₅ ,-"n is a triangular strip formed by this, {v  ₀ v ,, v ₀ v ₂ , v, v ₃ , v ₂ v ₄ , v ₃ v ₅ , ■■■} Vertical row of triangles adjacent to the triangle adjacent to the triangle strip on the opposite side along the side {v ₀₁ , v ₀₂ , v ₁₃ , v ₂₄ , v ₃₅ , "'n For a triangle strip formed by this, the vertex sequence defined by the 懼 strip is That is, {ν ₀ , V ,, v ₀₁ , v ₂ , v ₀₂ , v ₃ , v ₁₃ , v ₄ , v ₂₄ , v ₅ , v ₃₅ , ■■■}, the first 6 vertices The sequence defines the first TWN primitive, and the next two vertices represent the TWN primitive that follows it. In Fig. 4 (B), a mesh fragment is represented by a triangle strip 104 consisting of two triangles {v  ₂ , ν, v ₃ } and {v  ₃ , ν, v ₅ }. Yes. This strip can be formed by four length vertex sequences {v  ₂ , V 1, v ₃ , v ₅ }. In the example of Fig. 4 (B), the two TWN primitives that define the TWN strip are {ν ₂ , ν ,, v ₀ , v ₃ , v ₄ , v ₅ , v ₇ , v ₆ }, and correspond to them. As shown in Fig. 4 (D), the index sequence for specifying the vertex position is {2, 1,0,3,4,5,7,6}.

[0047] Similar to TWN strips, TWN fans can also be formed from triangular fans. A TWN fan can also reduce the number of index buffers required to represent the TWN primitive sequence of the triangle fan sequence that is the corresponding center. In the present invention, the triangular fan has a vertex sequence {ν ₀ , ν ,, ν ₂ , ν ₃ , ν ₄ , ν ₅ , ■■■}, and {v & shy ^ v ', ν, ν ₂ , ν ₂ ν ₃ , ν ₃ ν ₄ , ν ₄ ν ₅ , It can be formed by triangular vertices {v ₀₁ , v ₁₂ , v ₂₃ , v ₃₄ , v ₄₅ , ■■■} adjacent to the triangle in the triangular fan at the opposite vertex of the side. Vertex string characterizing define TWN fan is represented by _{{V 0, V ,, Voi,} V 2, V, 2, V 3, V 23, V 4, V 34, V 5, V 45, ■■■} , The first six vertices mean the first TWN primitive, and every second vertice defines a consecutive TWN primitive. Note that the first six vertices are different from the TWN strip. FIG. 5 (A), two triangles constituting the triangle fan 1 05 shows a mesh fragments by _{_{the;; {2, v 3,}} v 4 v & shy} {v & shy 2, v ,, v 3} and. This fan can be expressed by a vertex sequence {v & shy ^^ v ^ vj of length 4. In the present invention shown in FIG. 5 (A), the vertex sequence of the TWN fan by two TWN primitives is {ν ₂ , ν ,, Vo, v ₃ , v ₅ , v ₄ , v ₆ , v ₇ }, The index sequence for the attribute of the corresponding vertex position is {2, 1,0,3,5,4, 6,7} as shown in Fig. 5 (G).

Preferred examples of variable-size extended primitives include the Gatmul 卜 Glark subdivision patch primitive (GGSP). In the present invention, CCS P is composed of polygons represented by a quadrilateral mesh having one or more vertices with different edges and different numbers in each polygon. From the viewpoint of a quadrilateral mesh by Gatmul 卜 Glark subdivision, one vertex with a number of adjacent edges being different from four is called an irregular vertex. It is released. The number of edges adjacent to a vertex is called a valence of vertex. In other words, vertex vertices other than 4 are considered irregular vertices from the viewpoint of Gatmul 卜 Glark subdivision. A GGSP primitive is formed by vertices in a rectangle and all adjacent vertices in a polygon that shares the vertices of the rectangle. In the GGSP primitive, the rectangle located at the center is called the center square. If there are vertex values for irregular vertices, the center rectangle differs from other rectangles in the mesh, and the number of vertices in the GGS P primitive also change around the center rectangle. In the GGSP primitive, The number of vertices related to the irregular vertex value can be expressed as N = 2 * V + 8, where V is the number of vertices in the GGSP primitive and V is the vertex value of the irregular vertex. . In this case, multiplying V by a predetermined number is called scaling, 2 is called scale, and adding a predetermined number such as 8 is called bias. Since the size is known, a CCS P primitive can be represented by a sequence of vertices that are mapped between vertices in a sequence of vertices, thereby determining the position of the CCS P primitive. In a preferred embodiment of the invention, the vertex sequence and mapping are formed as follows. For quadrilateral meshes, the CCS P primitive has vertices {νο, ν ,, v ₂ , v ₃ }, which are square edges (VoV ,, v, v ₂ , v ₂ v ₃ ,

v ₃ v ₀ ) respectively. V if there are irregular vertices. Is. Vertex sequences describing CCS P primitives are formed as follows: First

, The vertices of the rectangle are arranged in numerical order {Vo.V ,, V ₂ , V ₃ }. Then, for example, V ₀ and V

The vertices of adjacent sides that share v ₀ and belong to the quadrangle that makes up the mesh are selected. The sixth is on the same side as V, and is the same rectangular vertex as v ₀

. The remaining adjacent vertices are selected according to the following rules. After an adjacent vertex is selected, the next vertex is selected to share the previously selected edge. As shown in Fig. 6 (A), the fragment of the control mesh (106) consists of two adjacent CCS P primitives {v ₉ , v ₅ , v ₆ , v ₁₀ } and {v ₉ , v ₁₀ , v ₁₆ , v ₁₅ } is described as the center rectangle. Vertex v ₉ is an irregular vertex with a vertex value of 5. The vertex sequences describing the first and second primitive are {ν ₉ , ν ₅ , ν ₆ , ν, ο, ν ₈ , ν ₄ , ν ₀ , ν, ν ₂ , ν ₃ , ν ₇ > _{_{, ν ,,, ν 17, ν 16}} , ν 15, ν 14, ν 13, ν 12} and {ν _9, ν ₁₀

, ν ₁₆ , ν ₁₅ , V ₅ , V ₆ , V ₇ , V „, ν ₁₇ , V ₂₁ , V ₂₀ , ν ₁₉ , ν ₁₈ , ν ₁₄ , ν ₁₃ , ν ₁₂ , V ₈ , V ₄ } In the present invention, the index indicating the CCS 頂点 primitive vertex position attribute is determined in advance with the CCS P primitive in the index buffer containing the index indicating the vertex position attribute. The contents of the vertex groups in the two CCS P primitives shown in Fig. 6 (A) are the index buffers 1 200 (or the top) as shown in Fig. 6 (C). Point buffer 1 1 0 0) is (18, 9, 5, 6, 10, 8, 4, 0, 1, 2, 3, 7, 1 1, 1 7, 1 6, 15, 14, 13, 12, 18, 9, 10, 1 6, 15, 5, 6, 7, 1 1, 1 7, 21, 20, 1 9, 18, 14, 13, 12, 8, 4}

[0050] The process of identifying the index buffer sequence for the remaining vertex attributes is to identify the vertex attribute so that a different index is required so that the same index is not used to refer to all attributes of the vertex. Process. In this case, the vertices are formed by a set of attribute indexes, which are the indices associated with the different attributes of the vertices. If the above is not true, all vertex attributes can be specified by one index corresponding to the vertex position attribute. Otherwise, a set of index buffers must be specified to specify all vertex attributes. Since the index buffer for the vertex position attribute includes the primitive size, special handling is used when handling extended primitives of variable size as described above. Other index buffers do not have information about the primitive size, and can be of a length smaller than that of the vertex position attribute when representing variable-size primitives. In the present invention, the index of the vertex attribute for a vertex is determined as follows. For extended attributes of fixed size, the index buffer value at the i-th position is derived from the index column of the i-th vertex in the index buffer, where N is the length of the vertex column and i is greater than or equal to 0 and less than N. This is the value of all the attributes you want. In the case of extended primitives of variable size, the situation becomes more complex. For vertex sequences formed by concatenating vertex sequences to represent variable-size extended primitive sequences, the index sequence of the i-th vertex is (( i + N; _ 1) is formed based on the i th value of the other index buffer and the i th vertex, where i is the initial value of the number of primitives. The number of primitives.

[0051] The second aspect of the present invention is a method for processing fixed-size or variable-size extended primitives at high speed using a vertex cache (vertex buffer). Variable size pre from buffer Etching the primitive's primitive size and delivering the primitive size to a primitive circuit, a processor that is programmable to process a specific circuit or extended primitive assembly; If the data is not available, the vertex data for the extended primitive vertices in the vertex cache is fetched, converted, stored, and converted for assembly and operation of the extended primitive. Delivering vertices to the primitive engine, assembling and processing extended primitives in the primitive engine, and simple fixed-size simple processes obtained by the extended primitive arithmetic processing in the primitive engine. Delivering the limit via a primitive rasterization pipeline to a fixed-size primitive integration circuit.

The method is based on an extended 3D computer graphics high speed / extend vertex cache device in one piece of software to implement fixed size or variable size extended primitive processing. It is. In this specification, a vertex cache facility means a system for achieving high-speed processing of a simple sequence of primitives specified using a value from an index or vertex buffer. To do. In this specification, the function of the vertex cache device traverses (sweeps) the specified vertex sequence using the value from the index / vertex buffer, and determines the attribute sequence of the corresponding vertex in the vertex sequence to be processed. Determine whether the vertex is in the same column as the available attribute in the storage unit in the vertex cache device as the top cache storage unit (store). Sample vertex attribute values from vertices / buffers according to the vertex attribute index sampled from the index buffer, or if the vertices are not in the vertex cache and are stored in the vertex cache storage, a vertex buffer is generated sequentially. May be. The assembled vertex is sent based on the attribute value for vertex transformation, and additionally the transformation result is stored in the vertex cache storage. Strange The converted vertices are delivered to a fixed-size primitive integrated circuit device to represent a simple primitive sequence represented by an index / vertex buffer. If the vertex transformation is accelerated, or if the vertex data is in the vertex cache, the transformed vertex is fixed-size primitive integrated circuit to eliminate the need to sample the vertex buffer and eliminate the need to transform the vertex again. You may deliver to an apparatus. Additional devices may be provided as appropriate depending on the mounted mode. The term fixed primitive assembler (fixed primitive assembler) means a system that collects and reconstructs simple primitives from a sequence of vertices. In this specification, collecting primitives means gathering all the information needed to reconstruct a primitive, for example, all of the primitives for further processing. It means that information about vertices is accumulated. For example, in the case of separate triangle rows, the three vertex rows that exist in each triangle correspond to the necessary information. Similarly, for separate lines, it corresponds to two consecutive vertices that make up the line. In the case of a triangle strip, it corresponds to the three vertices of the first triangle and the vertices of adjacent triangles. The fixed primitive assembler is processed and returns the sequence of vertices according to the type of primitive delivered to the state of the simple primitive so that it can be sent to a raster pipeline that only processes simple primitives. .

[0053] The method according to the second aspect of the present invention extends the vertex cache device in various aspects. First, according to the second aspect of the present invention, a new logic circuit is required to fetch a variable-size extended primitive. It then exchanges information with the primitive engine (a separate device for assembling and processing information about extended primitives) that functions between the vertex cache unit and the fixed primitive assembly unit.

[0054] Since the method includes fetching the size of the primitive from the index buffer for the vertex position attribute, the vertex of the variable-size extended primitive expressed using the method according to the first aspect of the present invention is used. Process column Can be used for sshing. The process of fetching, transporting, and accumulating extended primitive vertex data in the vertex cache can assemble vertices from vertex attribute values, transport vertices, and When referring to, the retrieval is performed quickly and the vertex can be stored in the vertex cache store. This process is the same regardless of whether the extended primitive sequence or simple primitive sequence is processed. This method involves delivering the transferred vertices to the primitive engine, processing the primitives extended in the primitive engine, and processing the processed results in the form of a simple primitive sequence for the rest of the processing pipeline. This includes the process of delivering to the factory. Considering the first process, the necessary information for assembling the primitives extended by the process, the processing algorithm that drives the primitive engine, and the presence of primitive interference are provided. The output result is delivered for the same processing as in. In combination, there are the following Merits. The process of fetching, distributing and storing vertex data in the vertex cache is substantially the same whether processing simple primitives or extended primitive sequences. This method allows random access to vertices / buffers to assemble vertices of extended primitives from attributes related to attribute indexes fetched from the index buffer without using any special method. be able to. For the same reason as described above, most of the logic circuits used in the process of fetching, distributing, and storing vertex data must be shared between simple primitive processing and extended primitive processing. Therefore, the hardware cost of the logic circuit implemented to realize the extended primitive processing can be reduced. Using a vertex cache reduces the latency when delivering primitive engine vertex data to the extended primitive assembly and processing algorithm. As a result, The performance related to the operation processing of the extended primitive will be improved. The steps of fetching the primitive size of a variable-size primitive from the index buffer and delivering it to the primitive engine, assembling the extended primitive, and delivering the converted vertex to the primitive engine for processing Makes it possible to deliver on the primitive engine all the information necessary to compute the primitives extended using the method introduced by the first aspect of the present invention. Output of extended primitives processed by the primitive engine is achieved in a manner similar to that of simple primitives, and the pipeline for primitive processing needs to be particularly modified to perform the processing of extended primitives. There is no. Furthermore, in the processing of simple primitive sequences, the vertices converted by simple primitive sequence processing are directly delivered from the vertex cache to the rest of the processing pipeline bypassed by the primitive engine. This method does not require any additional additions compared to processing simple primitive sequences.

[0055] The primitive size of a variable-size primitive is fetched from the index buffer of the vertex position attribute, and the primitive size is programmed by an arithmetic unit that can be processed in a specific circuit or an extended primitive assembly. The process of delivering to a primitive engine is as follows. In other words, it is the process of retrieving the primitive size from the index buffer of the vertex position attribute. In order to represent the extended primitive determined in connection with the first aspect of the present invention, a variable-size extended primitive is positioned there. The primitive size is delivered to the primitive engine as it may be needed in the processing of the extended primitive.

[0056] In the present invention, the information about the primitive size precedes the information about the vertex of the primitive in the index buffer of the vertex position attribute. As a result of the primitive size fetching, the latter Any other information about the limit is delivered to the primitive engine ahead of it. Knowing the primitive size can determine the index array offset of the vertex position attribute with respect to the beginning of the next variable-size primitive, so that the index contains the size information of the next primitive. It will be. Primitive size is also needed to be able to begin operations on variable-size primitives. The latter ends after the primitive engine has obtained the top of all primitives controlled by the primitive size. This step is only required when implementing variable size extended primitives. This step can be omitted when computing other primitives.

[0057] If the vertex data cannot be obtained from the vertex cache, the vertex data about the vertex of the extended primitive in the vertex cache is fetched, converted, and stored. is there. As described above, this is the process of fetching, converting, and storing in the vertex cache by the vertex cache device. Since the vertex cache device does not perform the operation processing of the extended primitive, the processing of this process is performed on the extended primitive sequence of a fixed size even if it is a simple primitive with respect to the vertex cache device. Even if it is processing, it does not change. Therefore, it is possible to share on the device side such as a circuit for performing simple primitive calculation processing and extended primitive calculation processing. This step can be omitted if the vertex information is already stored in the vertex cache. This process is performed even when the index buffer / vertex / buffer is used to represent an extended primitive sequence. In such cases, the vertex attribute index is sampled from the index buffer. Also, when the vertex buffer represents only fixed-size extended primitives, the vertex attribute index is generated in order.

[0058] When processing an extended primitive of variable size, the process of fetching the primitive size may require separate indexing for different vertex attributes. If a data buffer is used, the process of fetching primitive vertex data may need to be modified. The reason is that the index buffer that describes the length of the index buffer for the vertex position attribute and the index buffer that describes all other attribute columns contain information about the primitive size, and the length is the number of primitive columns. This is because it is larger than other index buffers. The process of fetching vertex data for a vertex needs to form an index sequence for each vertex attribute corresponding to the vertex in question. When processing separate index arrays for different vertex attributes, fixed-size primitives, or simple primitive sequences, this formation is the position of all index buffers corresponding to the vertex positions. This is done by sampling. When processing variable-size extended primitive sequences, this formation is modified as follows. In other words, the index values for all attributes other than the vertex position attribute are obtained by sampling the position of the index buffer determined by the vertex position, but the index of the vertex position attribute is the same as that of the other attributes. Obtained by the index buffer of the vertex position attribute of the position obtained by adding the number of previous primitives being processed. This process uses an index buffer and a vertex buffer for an extended primitive string of variable size, and is introduced according to the first aspect of the present invention.

The process of delivering the converted vertex to the primitive engine for the assembly and operation of the extended primitive is the primitive engine for assembling and calculating the input primitive information. It is a process for delivering to the algorithm by. In the present invention, this step is accomplished by selecting a primitive engine to which the transformed vertices from the vertex cache are delivered instead of a fixed primitive assembler. In this specification, the term “primitive engine” is used to realize the processing of extended primitives by Means a fixed circuit or programmable system. In the case of extended primitives of variable size, information on primitive vertices that are received and stored internally are stored. Reconstruct primitives based on information about accumulated primitives. Arithmetic processing is performed to reconstruct primitives according to an algorithm that results in a simple primitive sequence that can be accessed by a fixed primitive assembly device as a result of arithmetic processing of extended primitives of variable size.

[0060] The process of delivering a fixed-size simple primitive obtained by the operation processing of the extended primitive in the primitive engine to the fixed-size primitive integrated circuit via the pipeline for primitive rasterization includes the following steps. Including. This is the process of delivering extended primitives that are formatted for processing by a fixed primitive assembly device. In the present invention, this delivery is performed in the same way as when the converted vertex is directly delivered from the vertex cache to the fixed assembly device when a simple primitive sequence is processed by the vertex cache device. Is done. Therefore, it is not necessary to modify the rasterization pipeline in order to be able to perform operations on extended primitives, rather than having to modify fixed primitive assembly devices.

[0061] In the primitive engine, the process of assembling and processing the extended primitive is a process for implementing a predetermined algorithm using the primitive engine and performing an arithmetic process on the extended primitive using the algorithm. Preferred examples of algorithms for processing fixed-size extended primitives implemented by the primitive engine include, but are not limited to: In other words, the TWN primitive sequence is processed to realize detection and visualization of mesh series. In this specification, mesh silhouette means a set of triangle edges shared by a set of triangles, one facing the view direction and the other facing the other direction. Visualize mesh silhouette Optimized means a way to visually enhance the silhouette edge when rendering a mesh. The outline of the algorithm for detecting and visualizing silhouettes is as follows. Stores vertex data for TWN primitives. Determine the direction of the triangle adjacent to the central triangle. This process may be performed partially in parallel. Then, the silhouette edge is detected by comparing the direction of the triangle from each of the three sides of the central triangle. Generate a square flap of the detected silhouette edge and deliver it for rasterization. The same processing is performed for the next primitive. Since the primitive size is fixed, there is no need to fetch the primitive size.

[0062] In the case of a TWN strip or TWN fan, the first and subsequent primitives are stored differently in the strip / fan. The first primitive requires six vertices before the operation on the primitive begins. The second and subsequent primitives can use already stored vertices, so only two more vertices are required. In the case of a separated TWN primitive sequence, vertex data is accumulated in the same way as the first primitive even for the second and subsequent primitives.

[0063] As shown in Fig. 4 (B), these correspond to the two central triangles 111 and 113 respectively composed of the vertex sequences {v ₂ , v ,, v ₃ } and {v ₃ , v ,, v ₅ } As shown in Fig. 4 (D), the TWN primitive strip to be used is the contents of the index buffer corresponding to the vertex position attribute sequence indicated by {2, 1, 0,3, 4,5, 7,6}. Expressed. The vertex cache device reads the vertex corresponding to the index from the index buffer and transmits it to the primitive engine. For example, the sixth vertex transmitted from the vertex cache device to the primitive engine is equivalent to an index value of 5. If the primitive engine obtains information about all vertices for the center triangle 1 1 1 and the adjacent triangles 1 1 0, 1 1 2, and 1 1 3, the primitive engine will get the first triangle in the TWN strip sequence. It is possible to start detecting the side of the straight line. The calculation of the direction of the triangle is evaluated by a scalar consisting of three inner products consisting of three component vectors consisting of X, y, and w at the vertex position of the triangle in system coordinates. Is done.

[0064] [Equation 1] xo y ₀ ^w o

detA 2 ^X l Yl ^W l

χ ₂ y ₂ ^w 2

[0065] Here, the subscripts 0, 1, and 2 mean the first, second, and third vertices in the triangle, respectively. If this sign is different from two triangles that share an edge, the edge is a silhouette. Therefore, four evaluations of the TWN primitive center triangle and three adjacent triangles are required to determine the silhouette edge.

[0066] To detect the silhouette edge, an extra geometric figure that forms a rectangular flap in the direction perpendicular to the observer's visual field direction, that is, the direction of the axis of the eye at the viewpoint coordinates, is designed. It extends outside the object. The direction outside the object is represented by the attributes of the other vertices that can be obtained after the vertices are transported from the vertex cache unit to the primitive engine, and the normal vectors at each vertex of the mesh. Converted to normal direction. For example, the first and second vertices that make up the Silette edge are V in the Xo, y ₀ , Zo, Wo viewpoint coordinate system and x,, Υ, ζ, w, viewpoint coordinate system. And V, with normal vectors n _Q and, respectively, associated with them. The generated flap is a rectangle represented by vertices with the following coordinate values in the viewpoint coordinate system: {xo, y ₀ , zo, wol, {χ ,, y, ζ,, w,}, {x! + offsetx + ri! x + w ,, y, + offset _Y * n _1y , ζ ,, w,}, and {x ₀ + offset _x * n _0x , y ₀ + offset _Y * n _0y , Zo, Wol o The flaps obtained in this way are generated to have a certain width in the screen space regardless of the distance to the observer. FIG. 5 (B) is the same value as the coefficient offset _x and the coefficient offset _Y, if the the n _0z and n _1z 0 It is a figure which shows the structure of such a flap.

[0067] For the next primitive, the vertex cache unit delivers two vertices with index values 7 and 6, thereby defining two more triangles 1 1 5 and 1 1 4. With this information, the silhouette of the second triangle in the strip can be calculated. However, it should be noted that the previously obtained four vertices are reused. Furthermore, by calculating the direction of the triangle, the two triangles will be used again. That is, one is the triangle in question and the other is the adjacent triangle. In the case shown in Figure 4B, the two triangles are 1 1 3 and 1 1 1 respectively. This method can greatly reduce the computational cost of implementing an algorithm for silhouette detection. In order to prevent the occurrence of overlapping geometric flaps on the side of the silhouette, for example, the flap must be generated to be a T W N primitive in a certain direction of a triangle, such as the direction facing the viewer. Other primitives can be ignored.

[0068] Finally, for each processed T W N primitive, the primitive engine can also output a simple fixed-size primitive. In the special case of silhouette detection and visualization algorithms, the output includes an invariant central triangle and two triangles that form the geometric flap that forms the silhouette edge. For flaps, some attributes can be changed to implement the algorithm. For example, if the color forms a black appearance, it is replaced with a black one. The output triangle is delivered to the fixed primitive assembler and processed as if it had been delivered directly from the vertex cache unit.

[0069] An example of an algorithm for realizing that the primitive engine processes a variable-size extended primitive is one that implements the Gatmu l 卜 G lark subdivision method. Not. Similarly, other subdivision schemes such as loop subdivision schemes and 4_3 subdivision schemes can be realized. Gatmu l 卜 G l ark, CC, subdivision method consists of multiple processes, A smooth and fine grid-like mesh is generated from the menu, and each process is a process of subdividing the mesh obtained in the previous process. In other words, the rules in the basic mesh are applied in a recursive manner and refined. The rule generates a new vertex for each face of the mesh obtained in the previous process, generates a new vertex for each side of the mesh obtained in the previous process, and Rearrange mesh vertex positions with respect to position. In any of the above three cases, the vertex position of the mesh in the next process is limited to the linear combination of the vertices that are adjacent to the side or face to which it belongs. More specifically, the face point (fac e po i nts) is located at the average position of the face of the original vertex. The position of an edge point is calculated as the average of the center position of the original edge and the average of two new adjacent face points. The vertices from the previous process are positioned as shown in the following equation. S '= (Q + 2R + S (n-3)) / n. Where S 'is the latest position of the vertex, Q is the average value of new face points located around the vertex, and R is the average value of the midpoints of the edges sharing the vertex. , S is the vertex position in the previous process, and n is the number of edges sharing the vertex, that is, the vertex value.

[0070] The number of vertices required to calculate the vertex position in the next subdivision step is fixed to 6 for the edge vertex and 4 for the face. On the other hand, in order to determine the positions of vertices from the previous process, information on all vertices belonging to one adjacent object to be rearranged is required. A neighbor with vertices is formed by all vertices that share a mesh surface with it. In case of the shown in FIG. 6 (A), FIG. 6 (B)及beauty FIG 6 (C), the adjacent ones of the vertex v ₉ in the base mesh 1 0 6, vertex _{v 13, v, 2, v} 8, _{_{_{v 4, v 5, v 6}}} , v, o, is formed by v _16, v ₁₅ and v _14, its value is 5. The value of the vertices can be arbitrary, so when executing a subdivision rule, information about the number of vertices in the neighborhood is needed. However, in practice, the maximum vertex value can be limited, and the subdivision rule can be implemented without being particularly limited to subdivide a certain account.

[0071] The CC scheme can be used for all meshes, but different for each patch. The following rules can also be used. A subdivision surface patch can be formed to deviate from the face of the basic mesh as a sequence of vertices belonging to one neighbor of the face. In other words, since the subdivision rule is limited to one adjacent object, the collection of adjacent objects with face vertices includes the vertices of the face itself. After two CC subdivisions, all faces in the subdivision mesh are rectangular and it is known that there is no abnormal irregular vertex on the face. The basic surface can be designed that way, but only one CC subdivision process needs to be performed to achieve the same result. In any case, it is possible to consider subdivision of patches formed so as not to be irregular rectangles with one or more irregular vertices per face without losing the general CC scheme. The CCS P primitive described in the first aspect of the present invention satisfies the above requirements, and CC subdivision can be realized by the CCS P primitive group corresponding to the square group of the basic mesh.

With reference to Fig. 6, the method of forming CCS P primitives from the index / vertex buffer and the algorithm for CC subdivision in the primitive engine are described. Figure 6 (A) shows an irregular vertex v ₉ with a vertex value of 5 formed by the central squares {v ₉ , v ₅ , v ₆ , v ₁₀ } and {v ₉ , v ₁₀ , v ₁₆ , v ₁₅ }. FIG. 11 is a diagram showing a basic control mesh 106 having two adjacent CCS P primitive sequences to be shared. The index buffer that contains the index column of the vertex position attribute is as follows.

{18,9,5,6, 10,8,4,0, 1,2,3,7, 11, 17, 16, 15, 14, 13, 12, 18,9, 10, 16, 15,5 , 6,7,1 1, 17,21,20, 19, 18, 14, 13, 12,8,4} ₀ And the processing of the CCSP sequence is as follows. The process of fetching the primitive size is performed by a vertex cache device that retrieves the primitive size value 18 from the first position in the vertex position attribute index group and delivers it to the primitive engine. The vertex cache unit delivers 18 vertices to the primitive engine according to the vertex position attribute index. When the last vertex of the primitive corresponding to vertex v ₁₂ is delivered to the primitive engine, the CCS P primitive Eve's calculation process begins. Since the primitive size and the vertex sequence can be obtained, the CCS P primitive can be reconstructed by the description method introduced by the first aspect of the present invention by the algorithm implemented by the primitive engine. After rebuilding the primitive, all the information for performing the subdivision is available, so the reconstructed CCS P primitive is the Gat mult Glark subdivision surface (Jeffrey Bolz and Peter

Schroder, in Proceedings of the Web3D 2002 Symposium (WEB3D-02), page s 11-18,

New York, February 24 28 2002, ACM Press) and other subdivision algorithms can be subdivided. The subdivision scheme used is a set of quadrilaterals that correspond to the fine tessellation of the central quadrilateral of the C C S P primitive that has been processed to be rendered as a fragment of the mosaic mesh (107). Each of the resulting squares is further processed and divided into two triangles for rasterization and delivered to a fixed primitive assembler. These steps are repeated for the next CCS P primitive computation.

[0073] A third aspect of the present invention is a processing method according to the first aspect of the present invention in which a fixed-size primitive or a variable-size primitive sequence is processed using the method according to the second aspect of the present invention. It is related with the system for realizing.

[0074] As shown in FIG. 3, the system according to the third aspect of the present invention is configured to change the processes related to the vertex processing of the prior art in order to process the extended primitive. The vertex data from the vertex buffer 1 200 is fetched according to the index stored in the index buffer 1 1 00 and generated, and the vertex data is converted to the vertex processing unit 4000. The converted vertex data is delivered to the Primitive Engine 9000, where the extended primitive is assembled (ass emb I e) and processed. Simple primitives generated by the Primitive Engine 9000 extended primitive assembly process are a collection of fixed primitives. Delivered to the rest of the processing pipeline, such as product circuit 6 0 0 0. Such a process is particularly useful only when processing extended primitives. When processing simple primitives, the transferred vertex from the second step may be directly delivered to the integrated circuit 600 that processes the fixed primitive. For the integrated circuit 6 0 0 0 that processes the fixed primitive, for example, a known circuit as shown in FIG.

[0075] As shown in FIG. 3, a specific example of the system of the present invention is a system including the following modules. That is, the system of the present invention includes a vertex cache control unit, VGG, 2000, first vertex cache storage unit, PVG, 3000, second vertex cache storage unit, SVG, 5000, one or more vertex processing units, VPU, 4000, primitive engine, PE, 9000, and fixed size primitive integration circuit, FPA, 6000. The remaining pipeline relates to a system with a fixed size primitive set unit 7000 and a rasterizer 8000 that implements processing of fixed rows of simple primitives like triangles.

[0076] FIG. 3 does not show other units that exchange information and process information with the above system. This is for the sake of brevity and assembling what is characteristic of the present invention. Specific examples of omitting the descriptions include host CPU host memory and triangular rasterization pipeline. Information exchange between them will be explained to the extent necessary in relation to the present invention. In the module described above, the logic circuit of the vertex cache control unit for processing the primitive engine and the variable-size extended primitive can be modified as appropriate, and other known ones can be used as appropriate. In other words, modules other than those mentioned above should be adopted as appropriate in order to realize simple fixed-size primitives such as dots, lines, and triangles, as used in current 3D computer graphics. Can do.

[0077] The vertex cache control unit, VCC, 2 0 0 0, performs the following processing. Analyzes the contents of the index buffer. Vertex index fetched from index buffer 1 1 0 0 according to the state of PVC 3 0 0 0 and SVC 5 0 0 0 Fetch the contents of the vertex buffer of PVC3000 according to In order to execute the processing for each vertex of the vertex data, the vertex data transmitted from the PVC 3000 to the VPU (s) 4000 is controlled. Controls the accumulation of transferred vertex data that can be transmitted from the VPU (s) 4000 to the S VC5000. The contents of SVC5000 are sent to PE 9000 or FPA 6000 according to the type of primitive (whether it is an extended primitive of variable size) (specifically, the type of primitive is determined and the primitive is variable) If the primitive is extended in size, the contents of S VC 5000 are delivered to PE 9 000). When processing extended primitives of variable size, deliver information about the size of the primitive to the P9000. Inform PE 9000 that after primitive size information and vertex data for all primitive vertices has been delivered, it may begin processing for variable-size extended primitives. As noted above, the preferred embodiment of the present invention is that primitive sizes are only delivered to PE 9000. In addition, the contents of the SVC are delivered to the PE, and after the information about the size of the primitive and the vertex data for all primitive vertices is delivered to the PE 9000, the variable primitive is about the extended primitive. Informed that there is a possibility of starting processing. The process of analyzing the index buffer is also a process peculiar to the present invention, and is related to the extraction of the primitive size and the processing of the extended primitive of the variable size. As other processing steps, known processing steps performed on simple primitives can be adopted as appropriate.

The first vertex cache storage unit PVC is a place where information from a large number of vertices / buffers is continuously transferred from the host memory and cached. The PVC functions even if it stores the vertex data that was not processed by VP U (s) for the per-vertex processing. . For a relatively large and continuous memory block, the PVC is filled with undelivered vertex data, so that Can reduce the problem of waiting time. In addition, the PVC and VPU (s) are physically or electrically located close to each other, so that there is a waiting time for transport when there is vertex data that is not delivered in the PVC and must be processed. Can be reduced.

[0079] The vertex processing unit VPU (s) is a module that achieves a certain fixed function, eg, O p e n G L and / or D i e c t 3 D

It may be a programmable module so that vertex processing based on 3D gramap ics API s can be executed. VPU (s) receives various non-transferred vertex attributes such as position, color, and text coordinates as input, and generates various attributes of transferred vertices such as position, color, text coordinates, and viewpoint vector. The number and dimensions of inputs and outputs, that is, the format of input and output vertex information, may be different. VPU (s) receives information from the PVC and passes the output to the SVC.

[0080] The primitive engine PE is a module that implements a fixed function for performing information processing related to an extended primitive, or a programmable module. Implementing PE as a module that realizes a fixed function is effective for applications that achieve high performance and limited functions. Realizing PE as a programmable module is preferable because it gives the algorithm for processing extended primitives a degree of choice. In the extended primitive processing mode, the PE receives the output about the vertices transferred from the SVC, assembles the primitive, processes the primitive, and outputs it as a sequence of simple primitives that the FPA can understand. PE is a module unique to the present invention. On the other hand, when implemented as a programmable module, in addition to adding a function to control the operation result of the extended primitive obtained in the form of a simple primitive sequence to be transferred to the FPGA, it is programmable. It only needs to have the same functions as programmable VP U (s), such as sharing many functional logics with VP U (s). PE is F As in the case of PA, input information is received from S VC and VCC. The main differences between PE and other devices are the information on the extended primitive size and the vertex data to deliver all the vertices of the extended primitive when processing variable size extended primitives. This is because it is necessary to send and receive a notification signal about delivery. The VCC uses the same data channel to convey information about the primitive size to the PE for vertex data as described below.

[0081] Finally, the fixed-size primitive integration circuit from the vertex sequence delivered from the SVC is a module that has a fixed function for assembling simple primitives such as points, lines, and triangles. SVC implements a set of primitives that can be used for modern 3D graphics APIs such as openGL and direct3D, such as points, lines, line loops, triangles, triangle strips, and triangle fans. When processing extended primitives, “8 is the input to receive input information from £”, “巳”, and SVC and VCC are the vertices to FPA as the protocol for transferring information about transferred vertices. The source of information may be completely transparent to this module in order to use the same one for data transfer.

[0082] The vertex cache control unit controls all primitive processing on the chip. The vertex cache control unit uses the information stored in the PVC and the SVC to indirectly reference the vertex data for the primitive through the information stored in the index buffer. The point buffer processing can be accelerated. In addition, the vertex cache control unit can accelerate processing by referencing only the vertex buffer when the vertex data for the primitive is formed by the vertex data string in the vertex buffer using PVC. VCC operates in two modes depending on the primitive: a mode for processing fixed-size primitives and a mode for processing extended-size primitives. VCC processing is done for simple primitives and extended primitives of fixed size The same processing as in the case is performed.

[0083] First, in order to calculate the vertex attribute index set corresponding to the currently processed vertex, the vertex data related to the vertex that was not transferred is loaded and sent to the VPU (s). If there is no vertex data transferred from the currently sampled SVC to the PVC with the transferred vertex corresponding to the vertex attribute index string, the VCC is transferred from the host memory to the PVC as a vertex buffer. Upload a chunk of content to be stored. The chunk contains vertex data about the index sequence that is queried using the empty space in the PVC, or overwrites the unused data previously stored there. The chunk may contain vertex data from other indices. If the chunk exists in the PVC, it may be used further. When vertex data becomes available in P V C, O begins delivery to R (s), either by the round-robin method or by other methods. If there is a converted vertex data force <<, which does not exist in the SVC and is not transferred to the PVC, VCC transfers the vertex data from VP U (s) to the PVC without accessing the host memory. Begin. For this reason, the access time to the host memory can be reduced. In this way, PVC leads to an improvement in the processing speed for processing the vertex data that is not transferred.

[0084] Storage locations of different vertex attributes in the host memory are preferably distributed for different spacing and attribute sizes. The VCC assembles input data to the VPU from such distributed vertex attributes. In a specific implementation, the input to the VPU is a floating vector of 4 components, and the input data for each vertex can span several such vectors. The VCC assembles the input data from the distributed vertex attributes by performing the necessary type of conversion processing. For example, a 2-byte integer value is converted to a floating-point number, and the packing attribute is converted to 4 vectors according to the configuration. For example, two sets of two component textures and one coordinate attribute are four components. Converted to input vector. It is transferred as a lot of input data when the VPU can operate.

[0085] When the vertex data processing is completed and one or several VPS (s) are available, the input for the next vertex to be transferred is input to the VPU. Delivered as 4 component float in g point vectors. After receiving a certain number of vectors, the VPU starts processing on the vertices. The number depends on the number of attributes per vertex and their packing, and is determined via VPU. When the VPU finishes the transfer, the VPU outputs the four component floating-point vectors to the SVC. When several V PUs operate in parallel, the VCC manages that the V PU for representing the vertex sequence accesses the SVC. The format of the transferred vertices, ie the number of attributes and the packing to the output vector, may be different from the input.

[0086] The storage entity of the SVC is also a four component floating point vector. Therefore, each transferred vector is occupied by an input string to the SVC.

In an actual embodiment, the SVC simply performs the FI FO queue, and if the FI FO overflows when transferring the transferred data to the recently processed vertex, the oldest data is quickly output. . Of course, it does not preclude SVC from storing other methods such as LRU (least recently used). The essence of the present invention is not related to such a trivial matter.

[0087] In the operation processing of the simple primitive sequence, the transferred vertex data is delivered from the SVC to the FPA according to the order of the vertex sequence. The FPA starts the vertex operation when a predetermined number of four component vectors are received from the SVC. The predetermined number is equal to the number of vectors output by the VPU for the transferred vertices. The FPA assembles triangles from the transferred vertex sequence and delivers them to the rasterization pipeline for rasterization. [0088] Since the processing of the extended primitive is performed in the PE, the converted vertex data is first sent to the PE. The PE also has two main operation modes: fixed-size primitive arithmetic processing and variable-size extended primitive arithmetic processing. In the fixed-size primitive operation mode, the PE starts vertex processing when it receives a predetermined number of four component vectors from the SVC, as in the FPA. The result of the extended primitive processing is delivered from P F to FPA in the form of a simple sequence of primitives. The sequence of simple primitives output from the PE is realized in a format that can be reproduced as the vertex sequence of simple primitives, and the primitive sequence is represented by four component vectors. This is because most hardware used for arithmetic processing of simple primitives such as VCC, PVC, VPUs, SVC, and FPA is expanded by adding an arithmetic processing mechanism for primitive processing extended to PE. This means that it can be reused when processing primitives. Furthermore, with respect to vertex attributes stored in the host memory, vertex caches are used only to the same extent as simple primitives, so that the extended primitives can be processed quickly. It is possible to realize a rapid calculation process on the chip for the primitives that have been processed.

[0089] In the mode of processing fixed-size extended primitives, the PE executes the arithmetic algorithm after receiving each vertex data, that is, after receiving a specific number of four vectors from the SVC. To do. The computation algorithm manages the accumulation of received vertex data, and detects the moment when the fixed-size extended primitive is reconstructed from the vertex sequence and the moment when the computation is simplified.

[0090] One preferred operating system for computing fixed-size extended primitives is the mode of computing TWN primitive sequences, which is the above-mentioned fixed-size extended primitives. This is done according to the calculation process. The sequence of TWN primitives is the same as one of the central triangle sequences Spatial consistency. However, since the size of the primitive is twice that of a simple triangle, the cache size is the same as that of the computation processing of the central triangle sequence with the cache hit level as the size of the secondary cache. Must be increased.

[0091] The T W N primitive sequence is processed in the same way as the simple primitive sequence for vertex data fetching, transport, and caching. The difference is that the destination of the converted vertex data is not PE but PE. The converted vertex data is delivered from S V C to PE one by one in the order of the input vertices.

[0092] In order to perform arithmetic processing of extended primitives of variable size, the V C C logic circuit is modified in the present invention to provide the following means. That is, means for accurately analyzing the contents of the index buffer and separating the primitive size from the vertex attribute index of the vertex position attribute of the vertex forming the primitive, means for delivering information about the primitive size to the PE, and Measure the number of vertices of extended primitives of variable size that have not yet been delivered, and send them to the PE, and the delivery of vertex data has been completed in order to start the primitive processing. Is a means of making PE known. This modification is unique to the present invention. The remaining functions of V CC can be used as they are for processing fixed-size primitives. Therefore, the functions for processing fixed-size primitives can be used as they are.

[0093] In a specific implementation, the first index of the primitive in the index buffer for the vertex position attribute is its size in the VCC logic circuit's extended primitive arithmetic processing mode. To decide. Explaining based on Fig. 7, VCC 2 0 0 0 initializes the vertex counter 1 2 0 0 inside it, and counts for each index processed next to detect the start of the next primitive. Reduce one price. Since PE is available for primitive processing, the size is extended When the vertex attribute of the vertex sequence to form the ive is delivered from the SVC, it is delivered to the primitive size register 9100 in the PE using the same path. The primitive size register 9100 is implemented so that it can be accessed by the primitive arithmetic processing algorithm in the PE, and the primitive can be reconstructed using the vertex sequence.

Since information about the primitive size is delivered from the PE to the VCC, the VCC can start sending vertex data converted from the SVC to the PE. The processing for vertex data fetching, transport, and delivery to the SVC is the same as for fixed primitives. Vertex delivery commands are determined by the index string stored in the index buffer following the first index containing size information. The algorithm for computing the primitives in the PE accumulates vertex information and reduces the value of the VCC vertex counter 210 0 after each vertex attribute sequence is delivered to the PE. To be implemented. When the counter reaches 0, it means that all vertex data has been delivered to the PE, and a notification signal to that effect is delivered from the VCC to the PE via connection 2300. When a signal is received via connection 2300, the PE begins internal primitive processing. One preferred example of a system operation is to perform processing on a CCS P primitive sequence to generate a subdivision surface from a control mesh described as a basic CCS P primitive sequence. Arithmetic processing is performed according to the arithmetic processing of extended primitives of variable size. Spatial coherency with the CCS P primitive column is extremely high. In fact, in the calculation of adjacent patches other than irregular vertices, CCS P primitives can share as many as 12 vertices out of 16 vertices, and therefore to process the next GGSP primitive. It is only necessary to perform arithmetic processing that adds only four vertices to and deliver it to the SVC. In this way, the use of a vertex cache to compute the extended primitives greatly reduces the data access costs, and the subdivision patch chip in the vicinity of the patch. It is possible to realize arithmetic processing on the network. In the example shown in Figure 6, there are two consecutive patches that share 14 of the 22 vertices, and the SVC hit rate is about 64%, which is higher for longer sequences. It is considered to be. If a vertex cache or index buffer / vertex / buffer is not used to perform processing related to such large primitives, a very large memory cost will be required for vertex processing.

[0095] In the case of the CCS P primitive subdivision patch that receives the signal from the connection 2300 and processes it as described above, all the information necessary for the operation processing is delivered to the PE.卜 Glark subdivision surface algorithm (Jeffrey Bo I z and

Peter Schroder, in Proceedings of the Web3D 2002 Symposium (WEB3D-02)), the Web3D 2002 Symposium (WEB3D-02), pages 11-18, New York, Feb ruary

24 28 2002, ACM Press, can start to create subdivided meshes contained in the center rectangle of CCS P primitives according to the algorithm and other subdivided algorithms for each patch.

As with other extended primitives, the result of the subdivision patch operation processing is a triangle, a triangle strip in FPA, etc., depending on the input information. Another aspect of the present invention is to realize the above-described system in an integrated circuit, which can perform fixed-type or variable-size extended primitives on-chip. We also provide graphic cards that include such integrated circuits and video game devices that include such integrated circuits.

[0096] The present invention relates to a system using an integrated circuit as described above in order to realize high-speed processing of a subdivision surface generated by the Gatmul 卜 Glark subdivision scheme. The apparatus for processing the extended primitive of the present invention can be used for interactive 3D computer graphics for real-time rendering. As 3D computer graphics , Implements an algorithm for accessing multiple vertices of a polygon mesh at the same time, and for real-time rendering of complex geometric shapes represented by grid eyes and subdivided surfaces using NURBS patches. Something related to it.

[0097] For example, such a device includes a hard-work selection card in 3D computer graphics for visualizing a 3D image, a personal digital assistance, a video game device, a force navigation system, etc. Can be used for

Example 1

[0098] These images were obtained by mounting the device of the present invention on an emulator known as PIC C of Digital Media Professional Co., Ltd. to obtain various 3D computer graphics images. 8A, FIG. 8B, FIG. 9A, FIG. 9B, FIG. 10A, FIG. 10B, FIG. 11A, and FIG. 11B are image results obtained using the apparatus of the present invention. Indicates. Figures 8A and 8B compare the direction of the silhouette and examine the possibility of visualization. Figure 8A cannot be visualized, and Figure 8B can be visualized. For example, for animation applications where it is necessary to emphasize the shape of the contour line when animating a manga character, visualization using a silhouette as shown in Fig. 8B can be used.

[0099] FIGS. 9A and 9B show a rendering of a simple disk-like shape before and after re-division in the present invention. Figure 9A shows the one using the original rough mesh, and Figure 9B shows the one using the mesh subdivided by adding polysilver in the subdivision process. Figures 10-8 and 1OB are figures showing re-division into a more complicated shape compared to Figures 9A and 9B. Figures 11A and 11B relate to the shading of Figures 10A and 10B, respectively, excluding one wire frame. The present invention is accelerated by the vertex cache and can implement the subdivision algorithm on-chip. Therefore, the present invention provides real-time generation and visualization of complex shapes obtained by subdivision based on simple meshes, which is an important feature in video games. Can be performed.

Industrial applicability

The extended primitive processing apparatus of the present invention can be used in the field of real-time 3D computer graphics.

Claims

The scope of the claims

[1] A method for 3D computer graphics that describes simple primitives and describes variable-size extended primitive sequences using four or more vertex data for each variable-size extended primitive A vertex sequence including a plurality of attributes can be stored, and the position of the attribute in the buffer memory is obtained by multiplying an index that is a vertex number in the vertex sequence by an integer number and a number indicating the attribute type. It is specified using a vertex buffer that can be obtained by biasing, and using the index and a number indicating the type of the attribute, the vertex attribute in the vertex buffer in the memory is stored in the vertex buffer. The position of the vertex buffer is identified, and the vertex buffer is associated with the vertex sequence in the vertex buffer. Stores the vertex position attribute value sequence, stores an index related to the primitive size, which is the size of the variable-size extended primitive, and allows the fixed-size primitive sequence to be reconstructed using the vertex sequence, Enabling the variable-size extended primitive sequence to be reconstructed by using an index relating to a primitive size, identifying an index in an index buffer, identifying an index buffer, and

For the remaining vertex attributes that were not obtained in the previous step, even if the primitives were extended in variable size, the remaining vertex attributes were retained in the same manner as the method for obtaining the vertex position attributes, except that the primitive size index was not used. Multiple index buffer specific processes to find all the vertex attributes

Having a method.

[2] Instead of the above-mentioned “step for identifying multiple index buffers”,

The method according to claim 1, wherein all attributes are shared in the vertex with the attribute index in the attribute column, and the index buffer in the vertex position attribute column is used to refer to other vertex attributes as well.

3. The fixed size extended primitive sequence is formed from the vertex sequence in the vertex buffer without using the index buffer to describe the fixed size extended primitive. Method.

[4] In 3D computer graphics, a method for computing fixed-size or variable-size extended primitive sequences using a vertex cache,

This is a unit that fetches the primitive size of a variable-size extended primitive from the index buffer of the vertex position attribute, and performs assembly and arithmetic processing on the logic circuit with a fixed primitive size or a programmable extended primitive. Delivering to the primitive engine,

When there is no more vertex data in the vertex cache, fetch the vertex data for the extended primitive vertices in the vertex cache, convert and store the vertex data;

Delivering the converted vertex data to the primitive engine for assembling and processing the extended primitive;

Assembling and computing the extended primitives in the primitive engine;

Assembling fixed-size simple primitives resulting from the processing of extended primitives in the primitive engine and delivering them to a fixed-size primitive integrated circuit for transport to a primitive rasterization pipeline;

Including methods.

[5] Fetching the variable-size extended primitive includes:

This is done by accelerating the index buffer that describes the vertex position attribute sequence for the extended primitive's vertices, and by retrieving the first index value for that primitive, along with the next Ni index that references the vertices that form the primitive. , Here, indicates the recovered value of the i-th primitive, and the size of the next primitive is stored in the Ni + 1 position following the size position of the previous primitive in the index buffer.

The method of claim 4.

[6] The fetching in the form of a column of vertex attributes of the unconverted vertex data is

In the case of fixed-size extended primitives, the fixed-size extended primitives or vertices / kuffers are indicated by aligning those stored in the vertex / kuffer,

When performing operations on variable-size extended primitives, a fixed-size with the first index associated with the index buffer primitive that contains the index of the vertex position attribute that is not relevant by referring to the contents of the vertex buffer For extended primitives or variable-size extended primitives, this is done from the vertices / buffers of the columns selected from one or more index buffers,

The method of claim 4.

[7] A system used in 3D computer graphics,

The contents of the recently used vertex / buffer part can be accessed, the initial vertex buffer store (P V C) for obtaining the vertex data from the vertex / buffer,

One or more vertex processing units (V P) for transporting vertices using the vertex data obtained from the initial vertex buffer store (P V C).

U) and

Simple and extended primitive processing consisting of a second vertex cache store (SVC) and a simple primitive sequence to access transformed vertex data recently processed by the vertex processing unit (VPU). Primitive engine for collecting necessary information (PE) and

A fixed primitive assembly (FPA) for collecting simple primitives from a sequence of fixed primitive types and outputting the simple primitives to a raster pipeline for rasterization;

Access and deliver the contents of the index buffer,

When processing the extended variable-size primitive, information on the primitive size is transmitted to the primitive engine (PE) to control the vertex / kuffa part fetching in the initial vertex / buffer (PVC). ,

Controls the collection of vertex information from the attribute values stored in the initial vertex buffer store (PVC), and if the transferred vertices are lost from the initial vertex buffer store (PV C), the vertex processing is performed for transfer. Control to transfer the collected vertices to the unit (VPU), and

In order to collect information about simple primitives and extended primitives and to process them, the distributed vertex information from the second vertex cache store (SVC) to the primitive engine (PE) is distributed. Control

A vertex cache controller (VCC),

system.

[8] The vertex data can be directly delivered to the fixed-size primitive integrated circuit without using the process of delivering the converted vertex data to the primitive engine. The fixed-size primitive integrated circuit, the fixed-size simple primitive, the fixed size Primitive engine implements the processing of extended primitives and variable-size extended primitives

The system according to claim 7.

[9] When performing variable-size extended primitive processing, The process of delivering the primitive size to the sub-engine also delivers the vertex data converted to the primitive engine to form the vertices forming the primitive.

The system according to claim 7.

[10] The vertex processing unit (VPU) is designed to perform a fixed-size primitive or variable-size primitive processing mode, and the fixed-size primitive processing mode is a fixed-size simple primitive operation. Processing and fixed-size extended primitive arithmetic processing, while variable-size primitive arithmetic processing mode is used for variable-size extended primitive arithmetic processing.

The system according to claim 7.

[1 1] An integrated circuit including the system according to claim 7,

An integrated circuit in which a plurality of components on the integrated circuit are executed in real time and can perform arithmetic processing of an extended primitive sequence having a fixed size or an extended primitive sequence having a variable size in real time.

[12] A graphics card comprising the integrated circuit according to claim 11.

13. A video game machine comprising the integrated circuit according to claim 11.

14. A real-time surface subdivision rendering system comprising the integrated circuit according to claim 11.