CN101216932B

CN101216932B - Methods of graphic processing arrangement, unit and execution triangle arrangement and attribute arrangement

Info

Publication number: CN101216932B
Application number: CN2008100018156A
Authority: CN
Inventors: 焦阳; 洪洲; 尹莉; 许云杰
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2008-01-03
Filing date: 2008-01-03
Publication date: 2010-08-18
Anticipated expiration: 2028-01-03
Also published as: CN101216932A

Abstract

The invention discloses a method of a graphic processing device, a unit and execution triangular configuration and property configuration. Each embodiment thereof comprises at least one execution unitwhich can be used for the multithreading operation and also can execute at least one thread of the triangular configuration operation and the property configuration operation, and the thread used for the operation of a pixel shader, a geometry shader and a vertex shader. The inventive method of the graphic processing device, the unit and execution triangular configuration, and the property config uration, can reduce the number of the gates in the system via removing at least parts of the hardware components and cause the more effective graphics pipelines with flexibility and expansibility to the modification of the bugs, the accession of the new characters or the adjustment of the algorithm.

Description

The method of graphic processing facility, unit and execution triangular arrangement, attribute configuration

Technical field

Content of the present invention is about computer graphics system, and more particularly, about triangular arrangement and the system in attribute configuration stage and the method for graphics pipeline (graphics pipeline).

Background technology

As everyone knows, the technology and the science of three-dimensional (" 3-D ") computer graphical are the generation or the reproduction of two dimension (" the 2-D ") image about the 3-D object, to show or to be presented on display device or the monitor, such as cathode-ray tube (CRT) (CathodeRay Tube, CRT) or LCD (Liquid Crystal Display, LCD).Object can be the simple geometry primitive, such as, point, line segment, triangle or polygon.Represent object by plane polygon with a succession of connection, such as, by object being expressed as the plane triangle of a succession of connection, complicated object can be reproduced on the display device.All geometric primitive can be finally ((for example, the end points of line segment or polygonal angle) coordinate (X, Y, Z)) is described for example, to define point by a summit or one group of summit.

In order to produce the data set that is shown in as the 2-D projection of expression 3-D primitive on computer monitor or other display device, the summit of primitive can via a succession of operation in the graphic rendition pipeline or the processing stage handle.Graphics pipeline only is a series of processing unit or stage, wherein can be used as the input of follow-up phase from the output of previous stage.For example, in the content operation stage of graphic process unit, these stages comprise every summit (per-vertex) operation, primitive assembly operation, pixel operation, textured component operation, reproduction processes operation and fragment operation.

In typical graphic display system, image database (for example, order inventory) can store the descriptor of the object of a scene.Object is by describing with numerous little polygons that cover the surface of object, covering wall or other surperficial same way as the calculon piece.Each polygon be described to the inventory of apex coordinate (X, Y, Z in " model " coordinate) and material surface characteristic (that is, color, texture, glossiness etc.) some specifications, and at place, each summit the normal vector (normal vector) with respect to the surface.For the three-dimensional object with complicated bend surface, generally speaking, polygon is necessary for triangle or quadrilateral, and the latter can be decomposed into paired triangle all the time.

Import the angle of inspecting of selection, the convertible object coordinate of transform engine (transformation engine) corresponding to the user from the user.In addition, the user can specify the size of the visual field, image to be produced and the rear end of field of view volume, to comprise on demand or to eliminate background.

In case selected this to inspect district (viewing area), cutting (clipping) logical circuit is eliminated to be in and is inspected the outer polygon in district (that is, triangle) and " cutting " part and be in and inspect in the district and part is in and inspects the outer polygon in district.These polygons through cutting are in the part of inspecting in the district corresponding to polygon, wherein new edge is corresponding to the edge of inspecting the district.The depth value (Z coordinate) that vertex of polygon is followed with the coordinate (X, Y coordinate) that corresponds to review screen and each summit correspondence is transferred to the next stage.In canonical system, next add lighting model according to light source, then with polygon with and colour transfer to the reproduction processes device.

For each polygon, the reproduction processes device judges which location of pixels is covered by polygon, and attempts relevant colour and depth value (Z value) are write in the frame buffer (frame buffer).Polygonal depth value (Z) that the reproduction processes device will just handled compares with the depth value (it may be written in the frame buffer) of a pixel.If the depth value of new polygon pixel is less, represent that it is in the polygonal front end that writes to frame buffer, then its value is the value in the replacement frame impact damper, because new polygon will be covered before treated and write to polygon in the frame buffer.This process can be repeated to till all polygons of reproduction processes always.At this moment, Video Controller with the content of frame buffer by being shown on the display with reproducing an one scan line of order.

The presetting method of carrying out instant reproduction normally is shown as polygon and is positioned within the polygon or outer pixel.Define polygonal edge and in static display, seem to have the zigzag outward appearance, and in the animation display device, appear as a towing outward appearance.The potential problems that produce this effect are called skew (aliasing), and are called inverse migration (anti aliasing) technology through using with the method that reduces or eliminates problem.

Inverse migration method at the screen video picture does not need to know the object that is reproducing, because it only uses the pipeline output sample.A kind of typical inverse migration method is utilized and a kind ofly is called as the multisample inverse migration (it is above sample of every pixel sampling in single transmission for Multi-Sample Anti-Aliasing, linear inverse migration technology MSAA).The sample that each pixel needs or the number of sub-pixel are called as sampling rate, and in theory, when sampling rate increased, relevant memorizer information amount also increased.

Though aforementioned content has been summarized the operation of various processing components briefly, those skilled in the art will appreciate that the processing about graph data needs considerably to strengthen.Therefore, whenever possible, then need improved treatment, design and make efficient.The fixed function stage of graphics pipeline such as triangular arrangement and attribute configuration, is that to be used for the processing of the geometric primitive of graphics pipeline and pixel necessary.These fixed function stages that are included in the known Graphics Processing Unit are to carry out in fixed function nextport hardware component NextPort or specialized hardware.Door, order wire and hardware cost that general independent triangular arrangement of using and attribute configuration unit need a great deal of.In addition, the triangular arrangement and the attribute configuration stage of change graphics pipeline need change these expensive nextport hardware component NextPorts.Therefore, exist unsolved so far demand to overcome the deficiency of prior art.

Summary of the invention

The invention relates to triangular arrangement and the system in attribute configuration stage and the method for implementing graphics pipeline.In brief, its framework of the embodiment of a system of the present invention can followingly be realized: this system comprises at least one performance element, this performance element is used for multithreading operation, and wherein this performance element is carried out at least one thread that is used for triangular arrangement operation and attribute configuration operation.This performance element is at least one in the operating with execution vertex shader (vertex shader) operation, pixel coloring device (pixel shader) operation and geometric coloration (geometry shader) of able to programmeization.This performance element is ended at least one thread for triangular arrangement is operated and the attribute configuration operation is set up.This performance element will be operated the data of (from least one thread) from triangular arrangement and export the outer at least one nextport hardware component NextPort of performance element so far, and described able to programmeization triangular arrangement operation is from described at least one thread.When described at least one nextport hardware component NextPort receives treated described data, this performance element recovers the thread of termination.At last, this performance element is stored in the result of execution thread in the impact damper at least one performance element, uses for the thread of being set up by this performance element subsequently.

The present invention provides a kind of Graphics Processing Unit in addition, comprise: at least one performance element, described at least one performance element is used for multithreading operation, wherein said at least one performance element is carried out at least one thread that is used for triangular arrangement operation and attribute configuration operation, and described at least one execution units is carried out tinter operation able to programme; And a performance element collection zone control system, in order to the described at least one thread of scheduling with the described at least one performance element of management; Simultaneously initial described at least one thread and the described tinter operation able to programme that is used for described triangular arrangement operation and the operation of described attribute configuration of wherein said performance element collection zone control system.

One embodiment of method of the present invention comprises the step that receives vertex data, and this vertex data is corresponding to geometric primitive.This embodiment more is included in the performance element that is used for multithreading operation and sets up a thread, and wherein this performance element is carried out tinter operation able to programme.This embodiment more is included in the execution thread vertex data is carried out the triangular arrangement operation.At last, this embodiment is included in and carries out the attribute configuration operation in this thread to produce the pixel property and the terminate thread of related top data identification.

The method of graphic processing facility of the present invention, unit and execution triangular arrangement, attribute configuration, removable to the small part nextport hardware component NextPort, and then the quantity of the door in the minimizing system, and cause more effective graphics pipeline, for the modification of program error, the interpolation of new feature or the adjustment of algorithm, have dirigibility and extensibility.

Description of drawings

Fig. 1 describes the functional flow diagram of some assembly in the graphics pipeline in the computer graphics system.

Fig. 2 describes to illustrate the fixed function of graphics system and the calcspar of programmable component.

Fig. 3 describes to illustrate the functional block diagram of some intraware of Graphics Processing Unit and Graphics Processing Unit.

Fig. 4 describes to illustrate certain fixed function of graphics system and the calcspar of programmable component.

Fig. 5 describes to illustrate the functional block diagram of some intraware of Graphics Processing Unit and Graphics Processing Unit.

Fig. 6 describes the process flow diagram of the method for the embodiment of disclosure according to the present invention.

Embodiment

Below will be described in detail (as illustrated in graphic) to various embodiment of the present invention.Having described some embodiment though these are graphic, is not one or more embodiment that discloses herein in order to content of the present invention is limited to.On the contrary, scope of the present invention can contain all substitute, revise with and equivalent.

As above, the invention relates to a kind of being used for is integrated into system and method in the performance element able to programme with the operation of triangular arrangement and attribute configuration.Before the implementation detail of various embodiment was discussed, at first referring to Fig. 1, the calcspar of some assembly in its explanation graphics pipeline 100, these assemblies can be embodiments of the invention and utilize or be used for embodiments of the invention.Primary clustering shown in Figure 1 is vertex shader 110, geometric coloration 120, triangular arrangement unit 130, span and pixel sheet generator (spanand tile generator) 140, attribute configuration unit 150, pixel coloring device 160 and frame buffer 170.Those skilled in the art should be as can be known and are understood the general utility functions and the operation of these assemblies, therefore need not herein it is described in detail.Yet in brief, graphic primitive can be defined by position data (for example, X, Y, Z and W coordinate) and illumination and texture information.These all information can be passed to vertex shader 110.As known, vertex shader 110 can be carried out various conversions to the graph data of ordering inventory certainly and being received.In this regard, data can be converted to model visual field coordinate (Model View coordinate), be converted to projection coordinate (Projection coordinate) and finally be converted to screen coordinate (Screen coordinate) again from world coordinates (Worldcoordinate).Vertex shader 110 performed function treatment are that those skilled in the art are known, need not to be described further in this article.Vertex shader 110 exports geometric primitive to geometric coloration 120.

Geometric data and other graph datas that geometric coloration 120 is produced are transferred into triangular arrangement unit 130, to carry out the triangular arrangement operation.The concrete function of triangular arrangement unit 130 can be different because of different embodiment with implementation detail.Generally speaking, the related top information of triangle primitive can be passed to triangular arrangement unit 130, and can be to by the various primitive executable operations that graph data defined that are passed to triangular arrangement unit 130.Except other operations, can in triangular arrangement unit 130, carry out some geometric transformation.

For a given summit, can provide the geometric data such as x, y, z and w information (wherein, x, y and z are geometric coordinate, and w is homogeneous coordinates (homogeneouscoordinate)).As known in the art, can carry out various conversions, for example, the self model space is to world space (world space), to eye space, to projector space, to homogeneous space, to regular device coordinate (normalized devicecoordinate) (or NDC), and at last to screen space (carrying out by the video port conversion).Should be appreciated that some assembly that graphics pipeline has been omitted in the explanation of this paper is described and clearness being easy to, but those skilled in the art and Yan Yingke knows.As a limiting examples, for the sake of clarity, omitted some stage of the reproduction processes pipeline of graphics pipeline, but generally it will be understood by a person skilled in the art that, graphics pipeline can comprise other stages.

Now referring to Fig. 2, it illustrates some assembly of graphics pipeline 200 or the calcspar in stage.First assembly is a command stream processor (command stream processor) 252, and it receives or read the summit from storer 250 basically, and this summit is in order to form geometric primitive and to set up work item for pipeline.In this regard, command stream processor 252 is from memory read data, and from then on data produce triangle, line, point or other primitives for the treatment of introduction pipe line.This geological information is in case through combination, then be passed to vertex shader 254.Be represented as in this vertex shader 254 and have circular edge, in the present invention circular edge in order in the presentation graphic pipeline by the stage as described in carrying out that instruction among performance element able to programme or the performance element Ji Qu (as describing among Fig. 3) realizes.As known, vertex shader 254 is handled the summit by carrying out such as the operation of conversion, scanning and illumination.Thereafter, vertex shader 254 with data transfer to geometric coloration 256.Geometric coloration 256 receives the summit of a complete primitive as input, and can export form single topology (such as, triangle bar, lines, some inventory etc.) a plurality of summits.Geometric coloration 256 also can be carried out various algorithms, such as inlaying (tessellation), umbra volume (shadow volume) generation etc.

Geometric coloration 256 exports information to triangular arrangement unit 257, and as known, it carries out the operation such as the trifling repulsion of triangle, determinant calculating, selected, pre-attribute configuration KLMN, edge function calculating and securing band cutting.Generally it will be understood by a person skilled in the art that necessity operation of triangular arrangement unit, and need not further it to be described in detail.Triangular arrangement unit 257 exports information to span and pixel sheet generator 258.This stage of graphics pipeline is known in this technology, and need not to go through further.Yet, sum up and opinion, if will not reproduce to screen by this triangle, span and pixel sheet generator 258 can be carried out leg-of-mutton repulsion operation.Should be appreciated that other elements of reproduction processes pipeline can be operated, such as, Z test or other fixed function elements of graphics pipeline.For example, can carry out Z tests and judges whether the leg-of-mutton degree of depth should repel triangle for reproducing to screen with further judgement.Yet these elements are not further discussed in this article, understand because it should be general those skilled in the art.

Do not repel if the triangle of being handled by triangular arrangement unit 257 is subjected to other stages of span and pixel sheet generator 258 or graphics pipeline, then the attribute configuration unit 259 of graphics pipeline will be carried out attribute configuration and operate.Attribute configuration unit 259 is created in the inventory of the interpolation parameter of the known and attribute that needs to be determined in the subsequent stage of pipeline.In addition, as known, attribute configuration unit 259 is handled and the relevant various attributes of just being handled by graphics pipeline of geometric primitive.

Each pixel that the primitive of being exported by attribute configuration unit 259 covers need be through the processing of pixel coloring device 260.As everyone knows, pixel coloring device 260 is carried out interpolation method and other operations of judging the pixel color that exports frame buffer 262 to.The operation of various assemblies illustrated in fig. 2 is known to those skilled in the art, and need not in this article to be described further.Therefore, the concrete enforcement of these inside, unit and operation need not to describe in this article.

Now referring to Fig. 3, it describes Graphics Processing Unit (graphicsprocessing unit, GPU) 300 of an embodiment.This graphics system has the ability of setting up tinter able to programme such as geometric coloration, pixel coloring device, vertex shader or known other tinters.Described tinter is set up by program and can be by at least one execution in a plurality of performance element collection able to programme district 306 (hereinafter referred to as performance element collection district 306).Should be appreciated that performance element collection district 306 can comprise the processing core that can carry out multithreading operation.Therefore, an above thread of the tinter of particular type can start to distribute in performance element collection district 306.For example, performance element collection district 306 can and carry out the thread that is used for geometric coloration 310 to one group of data startup, and simultaneously another group is started another thread in vertex shader 308.About the structure of performance element Ji Qu and the example of operation, please refer to the U. S. application case sequence number 11/406,543 in the application of applying on April 19th, 2006 that coexists.

Yet, sum up above structure, each performance element in the performance element collection district 306 can be handled a plurality of instructions at single clock pulse in the cycle.Therefore, each performance element can be handled a plurality of threads simultaneously.For example, as above mention, performance element can be handled thread that is used for the geometric coloration operation and the thread that is used for the pixel coloring device operation simultaneously.The task that scheduler received into from a plurality of tinter stage to be carrying out the calculating relevant with tinter, and assigns it to the performance element with ability.Thread in the performance element in performance element collection district 306 through each scheduling to carry out the calculating relevant with tinter, make its can be along with the time the given thread of scheduling, the tinter that is used for the different tinter stage with execution is operated.In addition, in given performance element, some thread can be assigned to the task of a tinter, and other threads can be assigned to the task of other tinter unit simultaneously.In this way, but the load between the performance element in the balanced system to reach the flow optimization.Similarly, but can utilize load between the thread so that the maximization of the flow of system in the balance performance element collection district.Because the prior art graphics system is used special-purpose tinter hardware, so can't will be used for graphics system such as the firm and dynamic thread management in above structure.Therefore, can't realize the dirigibility and the extensibility of the graphics system of this structure.

Performance element Ji Qu control with get subsystem 304 soon and contain for level two (1evel 2) memory cache of performance element collection district 306 uses and in order to the system (not shown) in scheduling performance element collection district 306.In this Graphics Processing Unit, communicating by letter between performance element collection district 306 and its external module be by performance element Ji Qu control with get subsystem 304 soon and carry out, yet, also other lines and/or communication link directly can be set up to performance element Ji Qu to help the execution of graphics pipeline as known.In detail, triangular arrangement unit 314, attribute configuration unit 316 and span are and to get the fixed function hardware logic assembly that subsystem 304 is communicated by letter with performance element collection district 306 soon via performance element Ji Qu control with pixel sheet generator 318.

Mention referring to Fig. 2 as above, for the sake of clarity, from graphic some assembly that has omitted graphics pipeline.Similarly, for the sake of clarity, Fig. 3 has omitted some assembly of Graphics Processing Unit 300; Yet generally it will be understood by a person skilled in the art that to need other assemblies.For general those skilled in the art, the operation that is used for triangular arrangement, attribute configuration and span generator/pixel sheet generator is known, and need not further to go through.As an embodiment, triangular arrangement unit 314 is carried out such as following operation: the trifling repulsion of triangle, determinant calculating, bounding box calculating, selected, pre-attribute configuration KLMN, edge function generation, cutting and securing band cutting.Similarly, attribute configuration unit 316 is carried out such as corresponding in the preparation pixel coloring device and the processing operation of the attribute of the pixel in the pixel coloring device operation.

Now referring to Fig. 4, it describes the graphics pipeline 400 of one embodiment of the invention.The graphics pipeline of describing among Fig. 4 400 has different innovations with the graphics pipeline in the prior art.Data order stream handle 452 to move usually in pipeline certainly downwards.As above mention, vertex shader 454 has circular edge, and this represents its stage for the graphics pipeline implemented by the instruction of carrying out in performance element able to programme or the performance element collection district.Similarly, geometric coloration 456 also is the stage able to programme of graphics pipeline, and therefore implements by the instruction of carrying out in performance element able to programme or the performance element collection district.

As above mention, 457 stages of triangular arrangement of graphics pipeline are generally the fixed function stage, and it means, and this stage is not that the user is programmable.457 stages of triangular arrangement are accepted data and data are carried out scheduled operation and exported the result.The previous enforcement in 457 stages of triangular arrangement generally include with the performance element able to programme that is used for the stage able to programme of graphics pipeline 400 (such as, geometric coloration 456 or vertex shader 454) the independent nextport hardware component NextPort that separates.According to embodiments of the invention, 457 stages of triangular arrangement may be implemented in performance element able to programme or the performance element collection district, although 457 stages of triangular arrangement are not user's stage able to programme of graphics pipeline usually.As above mention, the triangular arrangement operation can comprise the trifling repulsion of triangle, determinant calculating, bounding box calculating, selected, pre-attribute configuration KLMN, edge function generation, cutting and securing band cutting.

Similarly, according to this embodiment, 459 stages of attribute configuration also may be implemented in the performance element able to programme, although 459 stages of attribute configuration are not user's stage able to programme of graphics pipeline 400 usually.Attribute configuration operation can comprise corresponding in the preparation pixel coloring device and the processing attribute of the pixel in the pixel coloring device operation.According to content of the present invention, the operation that is used for 457 stages of triangular arrangement and 459 stages of attribute configuration may be implemented in software but not in the fixed function nextport hardware component NextPort.In other words, can send an instruction group to data set operation to finish the operation of triangular arrangement or attribute configuration with the software of performance element collection district's interaction.

According to Fig. 4, span and pixel sheet generator 458 are the fixed function nextport hardware component NextPort, but not are implemented on the stage of the graphics pipeline in the performance element able to programme.Yet, it will be understood by a person skilled in the art that generally other stages of span and pixel sheet generator or graphics pipeline (including, but is not limited to the fixed function stage of not shown reproduction processes pipeline) also can be instructed via executive software in performance element able to programme and be implemented.

Now referring to Fig. 5, it describes the Graphics Processing Unit 500 of one embodiment of the invention.As above mention, for the sake of clarity, omitted some assembly of Graphics Processing Unit 500; Yet, it will be understood by a person skilled in the art that generally hardware that other are not described and logic module can be present in the Graphics Processing Unit 500.Graphics Processing Unit 500 comprises a plurality of performance element collection able to programme district 506 (hereinafter referred to as performance element collection district 506) and performance element Ji Qu control and gets subsystem 504 soon.Performance element Ji Qu controls thread management and the user of system and other communication between components in the Graphics Processing Unit 500 with the processing core of getting subsystem 504 may command performance element collection districts 506 soon.The getting subsystem soon and also can reside at 506 controls of performance element collection district and get in the subsystem 504 soon of one or more memory cache that uses by performance element Ji Qu.For example, get subsystem soon and can be used for storage data for the use of thread subsequently of carrying out the triangular arrangement operation, or be used for typical memory transfer by vertex shader thread 508.Perhaps, each performance element in the performance element collection district 506 can comprise the performance element impact damper, is used for the storage by the data of the use of carrying out in same performance element of thread subsequently.

As above mention, the user of the graphics pipeline stage able to programme (such as, geometric coloration 510, vertex shader 508 or pixel coloring device 512) can in performance element collection district 506, carry out.Because performance element collection district 506 is generally the processing core that can carry out multithreading operation, so performance element Ji Qu controls and gets the scheduling that subsystem 504 is responsible for the thread in performance element collection district 506 usually soon.When performance element Ji Qu control when getting subsystem 504 soon and receive the execution request of tinter able to programme, it will indicate performance element foundation in the performance element collection district 506 to be used for the new thread of the execution of tinter.Performance element Ji Qu control with get subsystem 504 soon and can manage load in the performance element collection district 506, and from one type tinter the transformation resource to another type tinter, with the flow of managing graphic pipeline effectively.These thread management technology are known and need not further to go through in this article.Yet, for instance, if pixel coloring device 512 is bottleneck source (with regard to the flow of GPU 500), performance element Ji Qu control with get soon subsystem 504 can be with more performance element resource distribution to pixel coloring device 512 so that improve flow.

According to one embodiment of the invention, when the execution of graphics pipeline needs triangular arrangement 520 or attribute configuration 522 operations, can set up extra thread to carry out the operation of triangular arrangement or attribute configuration.With respect to the Graphics Processing Unit (the triangular arrangement unit of Fig. 3 and attribute configuration unit are the independent nextport hardware component NextPort in the GPU) of Fig. 3, the triangular arrangement 520 of present embodiment and 522 stages of attribute configuration can be implemented in the software of carrying out in performance element collection district 506.In other words, except the thread of carrying out tinter operation able to programme as the above mentioned, by in performance element, setting up the thread that to carry out triangular arrangement and attribute configuration operation, can make performance element collection district 506 can carry out triangular arrangement and attribute configuration operation.

The software instruction of execution triangular arrangement and attribute configuration operation can be stored in performance element self, performance element Ji Qu control and get in the subsystem 504 soon, and can derive from performance element self, performance element Ji Qu control and get subsystem 504 soon, perhaps, the software instruction of implementing the operation of triangular arrangement and attribute configuration other positions that can derive from the software service driver or should understand by general those skilled in the art.

In order to carry out triangular arrangement 520 and attribute configuration 522 operations, can in performance element collection district 506, set up thread.Triangular arrangement 520 and attribute configuration 522 operations can be executed in the thread, but not are executed in the nextport hardware component NextPort that separates with performance element collection district 506.Because performance element collection district 506 can carry out multithreading operation, so can set up the thread that is used to carry out triangular arrangement 520 and attribute configuration 522 operations, and can carry out other tinters operations or even the additional thread of triangle and attribute configuration operation simultaneously.

In the Graphics Processing Unit 500 of this embodiment, span and pixel sheet generator 518 can be embodied as the external hardware assembly in performance element collection district 506.As known, after finishing triangular arrangement 520 operations, at least some gained data (determinant, bounding box and the Z difference that comprise edge function, calculating) from triangular arrangement 520 operation can be exported to span and pixel sheet generator 518 and not shown graphics pipeline other possible stages (such as, Z tests).During finishing triangular arrangement 520 operation back and span and pixel sheet generator 518 executable operations, can end to carry out the thread that triangular arrangement 520 is operated.After span and pixel sheet generator 518 or the operation of other graphics pipelines are finished,, then get final product terminate thread if just be ostracised by the line-controlled geometric primitive of figure tube.

In other words, if will geometric primitive not reproduce, under situation about covering by other primitives, then may needn't continue the primitive in the processing graphics pipeline in geometric primitive to screen.If do not repel geometric primitive in this part of graphics pipeline, then thread can continue to carry out by carrying out attribute configuration 522 operations.As known, attribute configuration 522 operations in the graphics pipeline can be included in to be carried out before user's programmable pixel tinter 512 threads, handles a plurality of attributes corresponding to a plurality of pixels, and each in described a plurality of pixels comprises the part of described a plurality of attributes.After being to finish in the thread attribute configuration 522 operation, can with the data storing of gained in performance element Ji Qu control with get level two memory caches subsystem 504 in soon for thread (comprising the pixel coloring device thread) use subsequently.Perhaps, can be with in the impact damper of gained data storing in each performance element from thread, and make it be used in next thread (if thread need use data) of setting up in the performance element.For example, after the thread of carrying out triangular arrangement 520 and attribute configuration 522 operations stops, can in performance element, set up the pixel coloring device 512 of a pair of Ying Yu by the pixel property of attribute configuration 522 phase process, wherein after carrying out previous thread, pixel property and other data that need be used for the pixel coloring device thread reside at impact damper.Other embodiment can comprise that the interior ad hoc logic module of performance element is to strengthen the usefulness of certain triangular arrangement or attribute configuration operation.For example, particular logic circuit can be incorporated in the performance element, to carry out the task of repelling the operation of equilateral triangle configuration phase such as trifling triangle.

Embodiments of the invention provide the advantage of comparing with the Graphics Processing Unit of implementing in conjunction with the independent nextport hardware component NextPort in triangular arrangement and attribute configuration stage.Particularly, with respect to be embodied as with the performance element collection distinguish from triangular arrangement unit 520 and/or attribute configuration Unit 522 of nextport hardware component NextPort, implement the door number that the triangular arrangement 520 of graphics pipeline and 522 stages of attribute configuration can reduce Graphics Processing Unit 500 in the software instruction in being executed in performance element collection district.As known, the graphics application program design interface needs the various able to programme stage of performance element collection district 506 to allow GPU to carry out graphics pipeline, such as geometric coloration, vertex shader or pixel coloring device.In GPU, implement triangular arrangement and removable described at least nextport hardware component NextPort of attribute configuration stage at least in the already present performance element collection district 506, and then the quantity of the door in the minimizing system.Should be appreciated that the door number that reduces Graphics Processing Unit according to embodiments of the invention can reduce the cost that designs and/or produce GPU.In addition, by remove in order to data transfer to as the triangular arrangement unit of independent nextport hardware component NextPort or attribute configuration unit and/or from the triangular arrangement unit or the needs of the GPU of the hardware lines of attribute configuration unit Data transmission, also can reduce the cost of system.This is particularly useful in lower floor's end (low end) Graphics Processing Unit or computer system, and wherein, cost is in the design of nextport hardware component NextPort and makes is important consideration.

In addition, embodiments of the invention can cause more effective graphics pipeline, because triangular arrangement 520 and attribute configuration 522 are executed in the performance element collection district 506 that can carry out multithreading operation.Should be appreciated that, can be by effective execution that thread is controlled and graphics pipeline is reached in scheduling of performance element Ji Qu.For example, if triangular arrangement behaviour is for causing the reason of graphics pipeline bottleneck, then can increases resources allocation to triangular arrangement and operate to alleviate bottleneck or to relax the usefulness that reduces from performance element Ji Qu.Perhaps, if another stage of graphics pipeline (such as, pixel coloring device) be the reason of the bottleneck among the GPU, then can increase resources allocation to the flow of pixel coloring device thread from performance element Ji Qu with the increase system.In addition, can set up a system that does not more depend on single bottleneck point by the design of implementing the operation of attribute configuration and triangular arrangement in the thread in performance element collection district 506.By utilizing thread management known in this technology and scheduling to reach an agreement on to manage the load in performance element collection district 506, graphics pipeline can be more effective.

Another advantage that embodiments of the invention provide is dirigibility that independent hardware component produced and extensibility because of elimination triangular arrangement and attribute configuration operation.For example, embodiments of the invention can be changed configuration 520 of Graphics Processing Unit intermediate cam shape or 522 stages of attribute configuration by change in order to carry out the software instruction of triangular arrangement or attribute configuration operation in performance element.On the contrary, with the performance element collection distinguish from triangular arrangement and attribute configuration nextport hardware component NextPort may need triangular arrangement or the attribute configuration stage of new nextport hardware component NextPort with the change graphics pipeline.For the interpolation of the modification of program error, new feature or be used for triangular arrangement 520 or the adjustment of the algorithm of the enforcement in 522 stages of attribute configuration, this dirigibility can be useful.

Now referring to Fig. 6, it describes the process flow diagram of method embodiment 600 of the present invention.In step 602, receive the vertex data of expression geometric primitive, handle for the triangular arrangement and the attribute configuration stage of graphics pipeline.Just the vertex data of the geometric primitive of being handled by graphics pipeline is usually from geometric coloration output, for the processing in triangular arrangement stage.In step 604, in performance element, set up thread via software instruction, to carry out triangular arrangement operation (step 606).As mentioned above, the operation of the triangular arrangement in the graphics pipeline can include, but is not limited to: the trifling repulsion of triangle, determinant calculating, bounding box calculating, selected, pre-attribute configuration KLMN, edge function generation, cutting and securing band cutting.

In step 608, after finishing the triangular arrangement operation, export bounding box to span and pixel sheet generator.Also the Z difference is exported to the Z test phase (ZL1, ZL2) of graphics pipeline.Other elements of the graphics pipeline of the output that links to the triangular arrangement stage are not discussed, but it is known herein for general those skilled in the art.For example, the triangular arrangement stage can export data to other elements of reproduction processes pipeline to be used for processing.More than finishing the triangular arrangement operation and having produced at least, after the output, end thread till data are back to performance element.For example, if thread exports data other stages of span and pixel sheet generator, Z test or reproduction processes pipeline to, then thread must wait for that the operation of carrying out to the stage finishes before continuing to carry out attribute configuration and operating.In step 610, end thread.

In step 612, if triangle or geometric primitive are not subjected to the repulsion of span and pixel sheet generator or Z test, then thread is recovered (step 614), and in step 616, in thread, carry out the attribute configuration operation, to produce the pixel property relevant with described vertex data.For example, if other elements of graphics pipeline (such as, Z test) judge and need not to export triangle in the graphics pipeline later phases frame buffer, then can repel triangle or geometric primitive.In this case, the attribute configuration operation is unnecessary.After having carried out the attribute configuration operation, in step 618, store data from thread.Mention as above embodiment with reference to figure 6, can be with in the impact damper of data storing in performance element from thread, the thread subsequently that is used for being set up by performance element uses.Perhaps, also can be with data storing in using for the thread of in other performance elements, being set up by the getting soon in the subsystem of other performance element accesses.Wherein, described thread subsequently is to be selected from following at least one: pixel coloring device thread, vertex shader thread, and the thread that can carry out described triangular arrangement operation and the operation of described attribute configuration.In step 620, terminate thread, and then performance element can be dispensed to the thread in other stages that are exclusively used in graphics pipeline.

Embodiments of the invention may be implemented in hardware, software, firmware or its combination.In certain embodiments, the compression of color data can be implemented by being stored in the storer and by suitable instruction execution system performed software or firmware.If be implemented in the hardware, as in alternate embodiment, can implement triangular arrangement and attribute configuration stage by any or combination in the following known technology: have the discrete logic (discrete logic circuit) that is used for data-signal is implemented the logic gate of logic function, special IC (application specificintegrated circuit with logic gate of appropriate combination, ASIC), programmable gate array (programmable gatearray, PGA), field programmable gate array (field programmable gate array, FPGA) etc.

Should understand as quite haveing the knack of operator of the present invention, any process prescription in the process flow diagram or square should be interpreted as representation module, section or comprise the concrete logic function that is used for implementation process or the part of the procedure code of one or more executable instruction of step, and substitute and implement to be included in the category of preferred embodiment of the present invention, in this category, can carry out function by the order different with the order that institute discloses or discusses, comprise substantially simultaneously or that by reversed sequence this functionally decides on related.

The above only is preferred embodiment of the present invention; so it is not in order to limit scope of the present invention; any personnel that are familiar with this technology; without departing from the spirit and scope of the present invention; can do further improvement and variation on this basis, so the scope that claims were defined that protection scope of the present invention is worked as with the application is as the criterion.

Being simply described as follows of symbol in the accompanying drawing:

110: vertex shader

120: geometric coloration

130: the triangular arrangement unit

140: span and pixel sheet generator

150: the attribute configuration unit

160: pixel coloring device

170: frame buffer

200: graphics pipeline

250: storer

252: the command stream processor

254: vertex shader

256: geometric coloration

257: the triangular arrangement unit

258: span and pixel sheet generator

259: the attribute configuration unit

260: pixel coloring device

262: frame buffer

300: Graphics Processing Unit (GPU)

304: performance element Ji Qu controls and gets subsystem soon

306: a plurality of performance element Ji Qu able to programme

310: geometric coloration

312: pixel coloring device

314: the triangular arrangement unit

316: the attribute configuration unit

318: span and pixel sheet generator

400: graphics pipeline

450: storer

452: the command stream processor

454: vertex shader

456: geometric coloration

457: triangular arrangement

458: span and pixel sheet generator

460: pixel coloring device

462: frame buffer

500: Graphics Processing Unit

504: performance element Ji Qu controls and gets subsystem soon

506: a plurality of performance element Ji Qu able to programme

508: vertex shader

510: geometric coloration

512: pixel coloring device

518: span and pixel sheet generator

520: triangular arrangement

522: attribute configuration.

Claims

1. a graphic processing facility is characterized in that, comprising:

At least one performance element, described at least one performance element is used for multithreading operation, and wherein said at least one performance element is carried out at least one thread that is used for operation of able to programmeization triangular arrangement and the operation of able to programmeization attribute configuration; Wherein

Described at least one performance element is at least one in the operating with the operation of execution vertex shader, pixel coloring device operation and geometric coloration of able to programmeization;

Described at least one performance element ends to be used for described at least one thread of described able to programmeization triangular arrangement operation and the operation of described able to programmeization attribute configuration;

Described at least one performance element will export the outer at least one nextport hardware component NextPort of described at least one performance element from the data of described able to programmeization triangular arrangement operation to, and described able to programmeization triangular arrangement operation is from described at least one thread;

When described at least one nextport hardware component NextPort receives treated described data, described at least one performance element recovers suspended described at least one thread; And

The result that described at least one performance element will be carried out described at least one thread is stored in the interior impact damper of described at least one performance element, and the thread subsequently that is used for being set up by described at least one performance element uses.

2. a Graphics Processing Unit is characterized in that, comprising:

At least one performance element, described at least one performance element is used for multithreading operation, wherein said at least one performance element is carried out at least one thread that is used for triangular arrangement operation and attribute configuration operation, and described at least one execution units is carried out tinter operation able to programme; And

One performance element collection zone control system is in order to the described at least one thread of scheduling with the described at least one performance element of management;

Simultaneously initial described at least one thread and the described tinter operation able to programme that is used for described triangular arrangement operation and the operation of described attribute configuration of wherein said performance element collection zone control system.

3. Graphics Processing Unit according to claim 2 is characterized in that, described attribute configuration operation comprises a plurality of attributes of processing corresponding to a plurality of pixels, and each in wherein said a plurality of pixels comprises the part of described a plurality of attributes.

4. Graphics Processing Unit according to claim 2 is characterized in that, described at least one performance element is operating to carry out described triangular arrangement operation and described attribute configuration of able to programmeization.

5. Graphics Processing Unit according to claim 2 is characterized in that,

Described at least one execution units ends to be used for described at least one thread of described triangular arrangement operation and the operation of described attribute configuration;

Described at least one performance element will export the outer at least one nextport hardware component NextPort of described at least one performance element from the data of described triangular arrangement operation to, and described triangular arrangement operation is from described at least one thread; And

When described at least one nextport hardware component NextPort receives treated described data, described at least one performance element recovers suspended described at least one thread.

6. Graphics Processing Unit according to claim 2 is characterized in that, described at least one performance element more comprises:

One impact damper is in order to store the result of described at least one thread of carrying out described triangular arrangement operation and the operation of described attribute configuration.

7. a method of carrying out triangular arrangement and attribute configuration in graphic system is characterized in that, comprises the following steps:

Receive vertex data, described vertex data is corresponding to a geometric primitive,

Set up a thread in being used for a performance element of multithreading operation, described performance element is carried out tinter operation able to programme,

In described thread, described vertex data is carried out the triangular arrangement operation,

In described thread, carry out the attribute configuration operation with the generation pixel property relevant with described vertex data, and

Stop described thread.

8. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, more comprises the following steps:

End described thread,

To export a span and pixel sheet generator to from the data of described triangular arrangement operation,

Receive treated described data from described span and pixel sheet generator,

Carry out the attribute configuration operation producing described pixel property from treated described data, and

Recover described thread.

9. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, described performance element is carried out pixel coloring device, geometric coloration and vertex shader operation.

10. the method for carrying out triangular arrangement and attribute configuration in graphic system according to claim 7 is characterized in that, more comprises:

Produce another thread by another performance element that is used for multithreading operation, described another thread is in order to carry out described triangular arrangement operation with the described thread parallel ground of described performance element;

Wherein said another thread and described thread are carried out simultaneously.