CN115760543A - Thread processing method, device, equipment and storage medium for ordered view of rasterizer - Google Patents

Thread processing method, device, equipment and storage medium for ordered view of rasterizer Download PDF

Info

Publication number
CN115760543A
CN115760543A CN202211403828.2A CN202211403828A CN115760543A CN 115760543 A CN115760543 A CN 115760543A CN 202211403828 A CN202211403828 A CN 202211403828A CN 115760543 A CN115760543 A CN 115760543A
Authority
CN
China
Prior art keywords
thread
thread bundle
bundle
primitive
assembled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211403828.2A
Other languages
Chinese (zh)
Other versions
CN115760543B (en
Inventor
李磊
武凤霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Granfei Intelligent Technology Co.,Ltd.
Original Assignee
Glenfly Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glenfly Tech Co Ltd filed Critical Glenfly Tech Co Ltd
Priority to CN202211403828.2A priority Critical patent/CN115760543B/en
Publication of CN115760543A publication Critical patent/CN115760543A/en
Application granted granted Critical
Publication of CN115760543B publication Critical patent/CN115760543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Generation (AREA)

Abstract

The application relates to a thread processing method, a thread processing device, thread processing equipment and a storage medium for an ordered view of a rasterizer, wherein the thread processing method comprises the following steps: receiving an input primitive; rasterizing the input primitive; performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled: if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting; and if so, starting the first thread bundle. In the hardware thread assembly process, the invention ensures that no overlapped pixels are contained in any thread bundle, determines the sequence of different threads accessing the same address in the unordered access view, gives the attribute of ordered access to the unordered access view to the rasterizer, improves the rendering effect and expands the use scene of the rasterizer in the process of drawing the graph.

Description

Thread processing method, device, equipment and storage medium for raster device ordered view
Technical Field
The present application relates to the field of graphics processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a thread of an ordered view of a rasterizer.
Background
The Unordered Access View (UAV) is a View of Unordered Access resources, and the UAV can perform Unordered read/write Access simultaneously through multiple threads without causing memory conflict. UAV supports reading and writing to textures or other resources in any order, typically UAVs are used to write updated results to textures that are subsequently accessed via texture read instructions bound to Shader Resource Views (SRVs) during the rendering process.
However, in some application scenarios, the UAV cannot render graphics well, objects such as wire mesh, smoke, fire, vegetation, and colored glass use transparency to obtain a desired effect, and when processing graphics of vegetation drawn in front of a glass building, a plurality of textures including transparency need to be combined together, however, a standard graphics pipeline may not correctly overlap a plurality of textures including transparency in space, which may result in a poor rendering effect.
Accordingly, those skilled in the art are working to develop an attribute that gives the UAV ordered access, thereby improving the rendering effect of the unordered access view.
Disclosure of Invention
Therefore, in order to solve the above technical problems, it is necessary to provide a thread processing method, apparatus, device and storage medium for a rasterizer ordered view, which are used to solve the problem in the conventional technology that a standard graphics pipeline may not correctly overlap multiple textures including transparency in space, resulting in a poor rendering effect.
In a first aspect, the present application provides a thread processing method for an ordered view of a rasterizer, including:
receiving an input primitive;
rasterizing the input primitive;
performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled:
if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting;
and if so, starting the first thread bundle.
In one embodiment, the step before starting the first thread bundle further comprises:
performing hit test on the assembled first thread bundle and the running thread bundle:
if so, performing next hit test on the first thread bundle;
and if the first thread bundle is missing, starting the first thread bundle.
In one embodiment, the step of performing overlap detection on the rasterized input primitive and the assembling thread bundle comprises:
after receiving the input primitive, calling the primitive information of each primitive in the first thread bundle;
detecting whether overlapping pixels exist between the primitive information of the input primitive and the primitive information of each primitive in the first thread;
wherein the primitive information is generated when rasterizing the input primitive.
In one embodiment, the step before starting the first thread bundle further comprises:
judging whether the assembled first thread bundle meets a preset condition, if so, generating thread information, and performing hit test on the first thread bundle and the running thread bundle according to the thread information; the thread information comprises thread identification and primitive information of each primitive in the thread; the primitive information comprises the coordinates and the coverage information of the primitives;
the preset conditions include:
and counting pixel blocks of the primitives contained in the assembled first thread bundle, and if the counting result is greater than or equal to the maximum value which can be contained in one thread bundle, and/or the assembled first thread bundle contains the last primitive in the drawing command.
In one embodiment, the step of hit testing the assembled first thread bundle and the running thread bundle comprises:
after the first thread bundle is assembled, calling thread information of the running thread bundle;
detecting whether there are overlapping pixels between primitive information in the first thread bundle and primitive information in a running thread bundle.
In one embodiment, the step of initiating the first thread bundle comprises:
generating an unordered access view start signal in response to a start signal, wherein the start signal is generated after the first thread bundle is assembled or when a hit test result is a miss;
executing the first thread bundle in response to the out-of-order access view start signal;
after the execution is finished, generating an unordered access view end signal;
releasing primitives contained in the first thread bundle in response to the out-of-order access end-of-view signal.
In one embodiment, the step before executing the first thread bundle further comprises:
responding to the starting signal, and adding the first thread bundle into a queue to be started;
in response to the out-of-order access view start signal, executing the first thread bundle after removing the first thread bundle from a queue to be launched.
In one embodiment, the step after executing the first thread bundle further comprises:
responding to the out-of-order access view ending signal, adding the first thread bundle into a waiting release thread queue, and generating a release signal;
and releasing the primitives contained in the first thread bundle after removing the first thread bundle from the release-waiting thread queue in response to the release signal.
In a second aspect, the present application further provides a thread processing apparatus for an ordered view of a rasterizer, including:
the execution module is used for receiving input primitives;
a rasterization module for rasterizing the input primitive;
the overlapping detection module is used for performing overlapping detection on the input primitive after rasterization and the first thread bundle which is being assembled and outputting a detection result;
the thread assembly module is used for adding the input graphic primitive into a first thread bundle which is being assembled to wait for starting when the detection result is that the input graphic primitive is not overlapped; the first starting signal is also output when the detection result is the overlapping;
the execution module is further configured to start the first thread bundle according to the first start signal.
In one embodiment, the thread processing apparatus further includes:
the sequencing module is used for performing hit test on the assembled first thread bundle and the running thread bundle and outputting a test result; the first thread bundle is also used for performing next hit test on the first thread bundle when the test result is hit; the test circuit is also used for outputting a second starting signal when the test result is missing;
the execution module is further configured to start the first thread bundle according to the second start signal.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
receiving an input primitive;
rasterizing the input primitive;
performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled:
if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting;
and if so, starting the first thread bundle.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
receiving an input primitive;
rasterizing the input primitive;
performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled:
if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting;
and if so, starting the first thread bundle.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
receiving an input primitive;
rasterizing the input primitive;
performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled:
if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting;
and if so, starting the first thread bundle.
The thread processing method, the thread processing device, the thread processing equipment and the thread processing storage medium for the ordered view of the rasterizer at least have the following advantages:
the invention reuses the conventional hardware pipeline of the unordered access view, and in the hardware thread assembly process, the input primitive after rasterization and the thread bundle being assembled are subjected to overlapping detection, so that the condition that the interior of any thread bundle does not contain overlapped pixels is ensured, the sequence of different threads in the unordered access view accessing the same address is determined, the attribute of ordered access of the unordered access view of the rasterizer is given, the rendering effect is improved, and the use scene of the unordered access view in the graph drawing is also expanded.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification.
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram that illustrates a method for thread processing for an ordered view of a rasterizer in accordance with one embodiment;
FIG. 2 is a flowchart illustrating a thread processing method for an ordered view of a rasterizer in accordance with an alternate embodiment;
FIG. 3 is a flowchart illustrating the step of initiating a first thread bundle in one embodiment;
FIG. 4 is a block diagram of a thread processing apparatus for rasterizer ordered views in one embodiment;
FIG. 5 is a block diagram showing the construction of a thread processing apparatus for an ordered view of a rasterizer in another embodiment;
FIG. 6 is a block diagram showing the construction of a thread processing apparatus for an ordered view of a rasterizer in another embodiment;
FIG. 7 is a block diagram of an overlap detection module in one embodiment;
FIG. 8 is a block diagram that illustrates the structure of a sort module in one embodiment;
FIG. 9 is a block diagram of an activation and deactivation module in one embodiment;
FIG. 10 is a schematic diagram of two triangles to be drawn in one embodiment;
FIG. 11 is a diagram illustrating a result of rendering two overlapping triangles in one embodiment;
FIG. 12 is a diagram illustrating an alternative rendering of two overlapping triangles in one embodiment;
FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
Some exemplary embodiments of the invention have been described for illustrative purposes, and it is to be understood that the invention may be practiced otherwise than as specifically described.
Referring to fig. 1, in one possible embodiment, the present application provides a thread processing method for an ordered view of a rasterizer, including:
and step S101, receiving an input primitive.
And step S102, rasterizing the input primitive.
And step S103, overlapping detection is carried out on the input primitive after rasterization and the first thread bundle (Warp) being assembled.
And step S104, if the primitive is not overlapped, adding the input primitive into the first thread bundle which is being assembled, and waiting for starting.
Step S105, if the first thread bundle is overlapped, starting the first thread bundle.
Specifically, a graphics pipeline (graphics pipeline) is a series of processes running on graphics hardware, and is divided into a plurality of stages, such as an input stage, a rasterization stage, and a rendering stage, and by pushing graphics data into a pipeline and performing stage processing, a 2D image representing a 3D scene is finally obtained. In order to make the graphics processing operations easier to perform, the frame to be displayed is usually divided into a number of similar primitive components, i.e. primitives, which are usually simple polygons, and most models are represented by individual triangle faces, since a triangle is the most basic polygon and any polygon can be converted into a number of triangles. For convenience of description, the following primitives in the present embodiment are illustrated in the form of triangles, the graphics data to be processed is divided into a plurality of triangles in advance, and each vertex of each triangle has a set of data associated therewith, such as position, color, texture, and other attribute data, which is applied to the processing of the subsequent stage.
Continuing to explain, receiving an input triangle, rasterizing the input triangle, traversing each point in the input triangle, converting geometric information data output in a geometric stage of the input triangle into a series of pixel points on a screen, and obtaining primitive information, wherein the primitive information includes coordinates and coverage information of a primitive.
And according to the triangle information, performing overlapping detection on the rasterized input triangle and the first thread bundle which is being assembled, wherein the step of overlapping detection comprises the following steps: after receiving input triangles, calling triangle information of each triangle in a first thread bundle; whether overlapping pixels exist between the triangle information of the input triangle and the triangle information of each triangle in the first thread is detected. If the input triangle is overlapped with the first thread bundle which is being assembled, the pixel overlap exists between the input triangle and the first thread bundle which is being assembled, at the moment, after the first thread bundle is assembled, the first thread bundle is directly started, and the input triangle is continuously subjected to overlap detection with the next assembled thread bundle; if not, the input triangle and the first thread bundle being assembled are not overlapped, at this time, the input triangle can be added into the first thread bundle, and after the assembly is completed, the first thread bundle added into the input triangle is started.
According to the thread processing method for the ordered view of the rasterizer, a conventional hardware pipeline for the unordered access view is multiplexed, in the hardware thread assembling process, overlapping detection is carried out on the input primitive after rasterization and the thread bundle being assembled, it is ensured that no overlapped pixel is contained in any thread bundle, the sequence of different threads in the unordered access view when accessing the same address is determined, the attribute of ordered access of the unordered access view of the rasterizer is given, the rendering effect is improved, and the use scene of the method in the graphic drawing is expanded.
Referring to fig. 2, in a possible embodiment, the thread processing method for an ordered view of a rasterizer of the embodiment of the present application further includes:
step S201, receiving an input primitive.
And step S202, rasterizing the input primitive.
And step S203, overlapping detection is carried out on the input primitive after rasterization and the first thread bundle (Warp) being assembled.
Step S204, if the primitive is not overlapped, adding the input primitive into the first thread bundle which is being assembled, and waiting for starting.
And step S205, if the first thread bundle and the running thread bundle are overlapped, performing hit test on the assembled first thread bundle and the running thread bundle.
Step S206, if the first thread bundle is Hit, the next Hit (Hit/miss) test is performed on the first thread bundle.
Step S207, if the thread is missing, the first thread bundle is started.
Specifically, the steps before starting the first thread bundle further include: performing hit test on the assembled first thread bundle and the running thread bundle, wherein the step of hit test comprises the following steps: after the first thread bundle is assembled, calling thread information of the running thread bundle, wherein the thread information is generated after the assembly is completed and comprises thread identification and triangle information of each triangle in the thread; it is detected whether there are overlapping pixels between the triangle information in the first thread bundle and the triangle information in the running thread bundle. If so, determining that the submission sequence of the current input triangle is inconsistent with the execution sequence of the thread bundles, and performing next hit test on the first thread bundle; if the input triangle is not consistent with the thread bundle, the submission sequence of the current input triangle is consistent with the execution sequence of the thread bundle, and at the moment, the first thread bundle can be directly started.
Further illustratively, the steps prior to initiating the first thread bundle further include: and after the first thread bundle is assembled, judging whether the assembled first thread bundle meets a preset condition, if so, generating thread information, and performing hit test on the first thread bundle and the running thread bundle according to the triangular information in the thread information. Wherein the preset conditions include: counting the pixel blocks of the triangles contained in the assembled first thread bundle, and if the counting result is larger than or equal to the maximum value which can be contained in one thread bundle, and/or counting the last triangle contained in the drawing command in the assembled first thread bundle.
In a possible embodiment, the thread processing method for an ordered view of a rasterizer of the embodiment of the present application further includes: analyzing thread information, selecting an interpolation method based on an interpolation mode of the input attribute of the pixel shader, and generating an interpolation result; the interpolation result is applied to a subsequent rendering stage, and the execution unit colors the triangle according to the interpolation result.
According to the thread processing method for the ordered view of the rasterizer, a conventional hardware pipeline for the unordered access view is multiplexed, in the hardware thread assembling process, overlapping detection is carried out on the input primitive after rasterization and the thread bundle being assembled, it is ensured that no overlapped pixel is contained in any thread bundle, and the sequence of different threads in the unordered access view when accessing the same address is determined; furthermore, the thread waiting for starting and the thread running are subjected to hit test, so that the execution sequence of the thread bundle is consistent with the submission sequence of the input triangle, the calling sequence of the shader is ensured, the attribute of ordered access of the unordered access view of the rasterizer is given, the rendering effect is improved, and the use scene of the rasterizer in the process of drawing the graph is expanded.
Referring to fig. 3, in a possible embodiment, the step of starting the first thread bundle in the thread processing method for the rasterizer ordered view of the embodiment of the present application includes:
step S301, in response to the start signal, generates an Unordered Access View (UAV) start signal.
Step S302, responding to the UAV starting signal, executing a first thread bundle.
And step S303, after the execution is finished, generating a UAV finishing signal.
Step S304, in response to the UAV end signal, releasing the triangle included in the first thread.
Specifically, the start signal is generated after the first thread bundle is assembled or when the hit test result is missing, and a UAV start signal is generated in response to the start signal; responding to the UAV starting signal, executing a first program bundle, and generating a UAV ending signal after the drawing software runs the storage operation related to the instruction in the first program bundle; in response to the UAV end signal, the triangles included in the first thread bundle are released to make the triangles in the first thread bundle available for subsequent invocation.
Further, the step before executing the first thread bundle further comprises:
responding to a starting signal, and adding a first thread bundle into a queue to be started; in response to the UAV start signal, the first thread bundle is executed after being removed from the queue to be launched.
Further, the step after executing the first thread bundle further comprises:
responding to an out-of-order access view ending signal, adding a first thread bundle into a waiting release thread queue, and generating a release signal; in response to the release signal, the triangle included in the first thread bundle is released after the first thread bundle is removed from the waiting-to-release thread queue.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
According to the thread processing method of the ordered view of the rasterizer, the thread bundle is removed from the queue to be started before the thread bundle is executed, and after the first thread bundle is removed from the queue to be released after the thread bundle is executed, the triangles contained in the thread bundle are released. The method realizes that when the overlapped pixel shader is called, the write of the executed Raster Order View (ROV) can be read by a subsequent call, and the read of a previous call cannot be influenced; the ROV read performed reflects the write of the previous call and must not reflect the function of the write of the subsequent call.
Based on the same inventive concept, the embodiment of the application also provides a thread processing device of the ordered view of the rasterizer, which is used for realizing the thread processing method of the ordered view of the rasterizer. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the thread processing device provided below can be referred to the above limitations on the thread processing method, and are not described herein again.
Referring to FIG. 4, in one possible embodiment, a thread processing apparatus for providing an ordered view of a rasterizer includes: the device comprises an execution module, a rasterization module, a thread assembly module and an overlap detection module.
The execution module is used for executing the instructions in the shader, receiving external input values and outputting final calculation results; the execution module comprises a front end and a rear end, wherein the front end is responsible for vertex coloring, and the rear end is responsible for pixel coloring.
The execution module is further configured to receive an input primitive, specifically in this embodiment, an input triangle. It will be appreciated that each vertex of the input triangle has associated with it a set of data, for example, position, colour, texture and other attribute data, which is applied to subsequent stages of processing.
And the rasterization module is used for rasterizing the input triangle.
Specifically, traversing each point in the input triangle aiming at the received input triangle, converting the geometric information data output in the geometric stage of the input triangle into a series of pixel points on a screen to obtain triangle information, and transmitting the triangle information to a thread assembly module, wherein the triangle information comprises coordinates and coverage information of the triangle.
And the thread assembly module is used for transmitting the triangular information output by the rasterization module to the overlapping detection module for the overlapping detection module to carry out overlapping detection.
And the overlapping detection module is used for performing overlapping detection on the rasterized input triangle and the assembled first thread bundle according to the triangle information and outputting a detection result. Wherein the step of overlap detection comprises: after receiving input triangles, calling triangle information of each triangle in a first thread bundle; whether overlapping pixels exist between the triangle information of the input triangle and the triangle information of each triangle in the first thread is detected.
The thread assembly module is also used for adding the input triangle into the first thread bundle which is being assembled to wait for starting when the detection result is that the input triangle is not overlapped; and the controller is also used for outputting a first starting signal after the first thread bundle is assembled when the detection result is the overlapping.
The execution module is further configured to start the first thread bundle according to the first start signal.
The thread processing device of the ordered view of the rasterizer multiplexes a conventional hardware pipeline of the unordered access view, and reduces the hardware design difficulty and the number of logic gates in practical application. Meanwhile, in the hardware thread assembling process, overlapping detection is carried out on the rasterized input primitive and the assembled thread bundle, it is ensured that no overlapped pixel is contained in any thread bundle, the sequence of different threads accessing the same address in the unordered access view is determined, the attribute of the unordered access view of the rasterizer is endowed, the rendering effect is improved, and the use scene of the rasterizer in the graphic drawing is expanded.
Referring to fig. 5, in a possible embodiment, the thread processing apparatus for an ordered view of a rasterizer of an embodiment of the present application further includes: and a sorting module.
And the sequencing module is used for performing hit test on the assembled first thread bundle and the running thread bundle and outputting a test result. Wherein, the step of hit testing comprises: after the first thread bundle is assembled, calling thread information of triangles included in the running thread bundle, wherein the thread information is generated after the assembly is completed and comprises thread identification and triangle information of each triangle in the thread; it is detected whether there are overlapping pixels between the triangle information in the first thread bundle and the triangle information in the running thread bundle.
And when the test result is hit, the first thread bundle is subjected to next hit test.
And the second starting signal is also output when the test result is missing.
And the execution module is also used for starting the first thread bundle according to the second starting signal.
Further, the thread assembling module is further configured to, after the first thread bundle is assembled, determine whether the assembled first thread bundle meets a preset condition, and if yes, send a thread application request to the execution module. Wherein the preset conditions include: and counting the pixel blocks of the triangles contained in the assembled first thread bundle, and if the counting result is larger than or equal to the maximum value which can be contained in one thread bundle, and/or the assembled first thread bundle contains the last primitive in the drawing command.
The execution module is also used for distributing resources for the first thread bundle after receiving the thread application request, generating thread information, and outputting the thread information to the thread assembly module.
And the thread assembling module is also used for transmitting the thread information to the sequencing module.
Further, the thread processing apparatus for an ordered view of a rasterizer according to an embodiment of the present application further includes: and an interpolation module.
And the thread assembling module is also used for transmitting the thread information to the interpolation module.
And the interpolation module is used for analyzing the thread information, selecting an interpolation method based on the interpolation mode of the input attribute of the pixel shader, generating an interpolation result, and outputting the interpolation result to the execution module so that the execution module can execute subsequent vertex or pixel shading operation.
The thread processing device for the orderly view of the rasterizer multiplexes a conventional hardware pipeline for unordered access view, and reduces the hardware design difficulty and the number of logic gates in practical application. Meanwhile, in the hardware thread assembly process, the input primitive after rasterization and the thread bundle being assembled are subjected to overlap detection, it is ensured that no overlapped pixel is contained in any thread bundle, the sequence of different threads accessing the same address in the unordered access view is determined, furthermore, the thread waiting for starting and the thread running are subjected to hit test, so that the execution sequence of the thread bundle is consistent with the submission sequence of the input triangle, the calling sequence of the shader is ensured, the attribute of ordered access of the unordered access view of the rasterizer is given, the rendering effect is improved, and the use scene of the rasterizer in the process of drawing the graph is expanded.
Referring to fig. 6, in a possible embodiment, the thread processing apparatus for an ordered view of a rasterizer of an embodiment of the present application further includes: the device comprises a starting and releasing module and a storing and loading module.
The starting and releasing module is used for responding to the starting signal, generating a UAV starting signal and transmitting the UAV starting signal to the storing and loading module; the start signal is generated after the first thread bundle is assembled or when the hit test result is missing.
And the execution module is used for executing the first thread bundle according to the UAV starting signal, generating a UAV ending signal after the storage operation related to the instruction in the first thread bundle is finished, and transmitting the UAV ending signal to the storage and loading module.
And the storage and loading module is used for transmitting the UAV end signal to the starting and releasing module so as to enable the UAV end signal to release the triangle contained in the first thread bundle for subsequent calling.
The storage and loading module is also used for receiving and analyzing the access request sent by the execution module; the device is also used for receiving a loading request sent by the execution module, calculating an access address according to the loading request, then sending a read-write request to the storage system, and sending the returned data of the storage system to the execution module after being assembled.
Further, the starting and releasing module is also used for receiving the first starting signal or the second starting signal and adding the first thread bundle into the queue to be started; the UAV starting module is also used for removing the first thread bundle from the queue to be started after outputting the UAV starting signal; and the UAV controller is also used for receiving the UAV end signal, adding the first thread bundle into the waiting release thread queue, generating a release signal, and removing the first thread bundle from the waiting release thread queue after outputting the release signal.
The following describes in detail a structure of a thread processing apparatus for an ordered view of a rasterizer provided in an embodiment of the present application.
Referring to fig. 7, the overlap detection module includes: a first control unit and an overlap detector.
The first control unit is used for receiving triangle information of an input triangle sent by the thread assembly module, sending a read request to the thread assembly module, and acquiring triangle information in a first thread bundle being assembled, wherein the triangle information comprises an address and coverage information of the triangle.
And the overlapping detector is used for detecting whether overlapping pixels exist between the input triangle and the assembled first thread bundle according to the triangle information and outputting a detection result to the thread assembling module.
Referring to fig. 8, the sorting module includes: the second control unit, the information memory and the test unit.
The second control unit is used for receiving the thread information of the first thread bundle sent by the thread assembly module, wherein the thread information comprises thread identification and triangle information of each triangle in the thread; and also for sending read requests to the information store to obtain thread information for triangles contained in the running thread bundle.
An information memory for storing thread information of triangles included in a running thread; and the triangle releasing module is also used for releasing the triangles contained in the thread according to the releasing signal and the thread identification carried in the releasing signal.
The testing unit is used for performing hit testing on the first thread bundle and the running thread bundle according to the thread information and outputting a testing result; the first thread bundle is used for performing next hit test when the test result is hit; and the controller is also used for outputting a second starting signal when the test result is missing.
Referring to fig. 9, the activation and release module includes: the third control unit, waiting to start the thread queue and waiting to release the thread queue.
The third control unit is used for adding the first thread bundle into a waiting starting thread queue according to the first starting signal or the second starting signal, and removing the first thread bundle from the waiting starting thread queue after outputting the UAV starting signal; and the UAV release device is also used for adding the first thread bundle into the waiting release thread queue after receiving the UAV end signal, generating a release signal, and removing the first thread bundle from the waiting release thread queue after outputting the release signal to the information memory.
And waiting for starting the thread queue for buffering the thread bundle needing to be started.
And waiting for releasing the thread queue for buffering the thread bundle needing to be released.
Referring to fig. 10, in a possible embodiment, when two overlapped triangles 1 and 2 need to be drawn, if a rasterizer unordered access view is adopted, the UAV stored instruction in the drawing software runs in parallel in the execution module, and if the thread of the triangle 1 runs faster than the thread of the triangle 2, the result is shown in fig. 11; otherwise, as shown in fig. 12. Therefore, the results from unordered access views are uncertain when there is pixel overlap in the drawn triangles.
The thread processing device for the ordered view of the rasterizer in the embodiment multiplexes a conventional hardware pipeline of the unordered access view, and determines the drawing result of the unordered access view aiming at the situation that pixels of the drawn triangle are overlapped. For convenience of description, in this embodiment, two overlapped triangles are drawn as an example for illustration, and the drawing step includes:
step one, a rasterization module performs rasterization on an input triangle 1 (T0) and an input triangle 1 (T1) to obtain triangle information, and sends the triangle information of the T0 and the T1 to a thread assembly module in sequence;
step two, the thread assembly module sends the received T0 to an overlap detection module for overlap detection;
after the overlap detection module receives the T0, the first control unit sends a read request to the thread assembly module, the thread assembly module returns a null value, the output result of the overlap detector is non-overlap, and the thread assembly module is informed;
step four, the thread assembly module receives the non-overlapping detection result and adds T0 into the current thread bundle; receiving the T1 and sending the T1 to an overlap detection module;
after the overlap detection module receives the T1, the first control unit sends a read request to the thread assembly module, the thread assembly module returns T0, the triangle overlap detector performs overlap detection on the T1 and the T0 in the current thread bundle, the detection result is overlap, and the thread assembly module is informed;
step six, the thread assembly module learns that the T1 and the current thread bundle are overlapped, so that the current thread bundle can be started, and a thread application request is sent to the execution module;
processing the application thread request by the execution unit, allocating resources required by thread execution, and returning thread information to the thread assembly module, which is marked as Warp0;
step eight, the thread assembly module sends the triangle information in the Warp0 to an interpolation module, and the interpolation module interpolates the input attribute of the pixel and sends the interpolation attribute to an execution module; meanwhile, the thread assembly module also sends the triangles in the Warp0 to the sorting module;
step nine, the sorting module will perform hit test after receiving the triangle information in the Warp0, and the test result is Miss (Miss) because the current information memory is empty. The test unit writes T0 in the Warp0 into the information memory and sends a starting signal to the starting and releasing module;
step ten, the starting and releasing module stores the Warp0 information into a waiting starting thread queue, then sends a UAV starting signal to the storing and loading module, and then removes the Warp0 from the waiting starting thread queue;
step eleven, the storage and loading module forwards the UAV starting signal of Warp0 to the execution module;
step twelve, when receiving a UAV starting signal of Warp0, the execution module starts to execute UAV storage operation in the drawing software and sends a storage request to the storage and loading module, and the storage and loading module writes data into a position with a specified address in the memory; after the UAV storage operation is finished, sending a UAV end signal of Warp0 to a storage and loading module;
step thirteen, the storage and loading module forwards the UAV ending signal of Warp0 to the starting and releasing module;
step fourteen, after receiving the UAV end signal of Warp0, the starting and releasing module removes the Warp0 from the waiting to release thread queue, sends a release signal for releasing the triangle in the Warp0 to the sorting module, and removes the Warp0 from the waiting to release thread queue;
fifteenth, the sorting module receives a release signal of the Warp0 and releases the T0 from the information memory;
sixthly, after the thread Warp0 is executed, the thread assembly module continues to perform thread assembly on the T1, and the T1 belongs to the last triangle in the current drawing command, so that the T1 can be independently assembled into a new thread Warp1; the rest steps refer to steps three to fifteen, and are not described again.
Through the above steps, the final rendering result of this embodiment is shown in fig. 11.
The modules in the thread processing device of the rasterizer ordered view can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one possible embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device comprises a processor, a memory, an Input/Output (I/O) interface and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data generated in the process of thread processing. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of thread processing for an ordered view of a rasterizer.
It will be appreciated by those skilled in the art that the configuration shown in fig. 13 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one possible embodiment, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the thread processing method of one rasterizer ordered view in the above embodiments when executing the computer program.
In one possible embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps in the thread processing method of one rasterizer ordered view in the above embodiments.
In a possible embodiment, a computer program product is provided, which comprises a computer program, which when executed by a processor, implements the steps in the thread processing method of a rasterizer ordered view in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (13)

1. A thread processing method for ordered views of a rasterizer is characterized by comprising the following steps:
receiving an input primitive;
rasterizing the input primitive;
performing overlap detection on the input primitive after rasterization and the first thread bundle being assembled:
if not, adding the input graphic element into the first thread bundle which is being assembled, and waiting for starting;
and if so, starting the first thread bundle.
2. The thread processing method of claim 1, wherein the step prior to initiating the first thread bundle further comprises:
performing hit test on the assembled first thread bundle and the running thread bundle:
if so, performing next hit test on the first thread bundle;
and if the first thread bundle is lacked, starting the first thread bundle.
3. The method of claim 1, wherein the step of detecting overlap of the rasterized input primitive and the assembling bundle comprises:
after the input primitive is received, calling the primitive information of each primitive in the first thread;
detecting whether overlapping pixels exist between the primitive information of the input primitive and the primitive information of each primitive in the first thread;
wherein the primitive information is generated when rasterizing the input primitive.
4. The thread processing method of claim 2, wherein the step prior to initiating the first thread bundle further comprises:
judging whether the assembled first thread bundle meets a preset condition, if so, generating thread information, and performing hit test on the first thread bundle and the running thread bundle according to the thread information; the thread information comprises thread identification and primitive information of each primitive in the thread; the primitive information comprises the coordinates and the coverage information of the primitives;
the preset conditions include:
and counting pixel blocks of the primitives contained in the assembled first thread bundle, and if the counting result is greater than or equal to the maximum value which can be contained in one thread bundle, and/or the assembled first thread bundle contains the last primitive in the drawing command.
5. The thread processing method of claim 4, wherein the step of hit testing the assembled first thread bundle and the running thread bundle comprises:
after the first thread bundle is assembled, calling thread information of the running thread bundle;
detecting whether there are overlapping pixels between primitive information in the first thread bundle and primitive information in a running thread bundle.
6. The thread-processing method of claim 1 or 2, wherein the step of initiating the first thread bundle comprises:
generating an out-of-order access view start signal in response to a start signal, wherein the start signal is generated after the first thread bundle is assembled or when a hit test result is missing;
executing the first thread bundle in response to the out-of-order access view start signal;
after the execution is finished, generating a disordered access view finishing signal;
releasing primitives included in the first thread bundle in response to the out-of-order access view end signal.
7. The thread processing method of claim 6, wherein the step prior to executing the first thread bundle further comprises:
responding to the starting signal, and adding the first thread bundle into a queue to be started;
in response to the out-of-order access view start signal, executing the first thread bundle after removing the first thread bundle from a queue to be launched.
8. The thread processing method of claim 7, wherein the step after executing the first thread bundle further comprises:
in response to the out-of-order access view end signal, adding the first thread bundle to a waiting release thread queue and generating a release signal;
and releasing the primitives contained in the first thread bundle after removing the first thread bundle from the release-waiting thread queue in response to the release signal.
9. A thread processing apparatus for rasterizer ordered views, comprising:
the execution module is used for receiving an input primitive;
a rasterization module for rasterizing the input primitive;
the overlapping detection module is used for performing overlapping detection on the input primitive after rasterization and the first thread bundle which is being assembled and outputting a detection result;
the thread assembly module is used for adding the input graphic primitive into a first thread bundle which is being assembled to wait for starting when the detection result is that the input graphic primitive is not overlapped; the first starting signal is also output when the detection result is the overlapping;
the execution module is further configured to start the first thread bundle according to the first start signal.
10. The thread processing apparatus of claim 9, further comprising:
the sequencing module is used for performing hit test on the assembled first thread bundle and the running thread bundle and outputting a test result; the first thread bundle is used for performing next hit test on the first thread bundle when the test result is hit; the test circuit is also used for outputting a second starting signal when the test result is missing;
the execution module is further configured to start the first thread bundle according to the second start signal.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.
CN202211403828.2A 2022-11-10 2022-11-10 Thread processing method, device, equipment and storage medium for rasterizer ordered view Active CN115760543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211403828.2A CN115760543B (en) 2022-11-10 2022-11-10 Thread processing method, device, equipment and storage medium for rasterizer ordered view

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211403828.2A CN115760543B (en) 2022-11-10 2022-11-10 Thread processing method, device, equipment and storage medium for rasterizer ordered view

Publications (2)

Publication Number Publication Date
CN115760543A true CN115760543A (en) 2023-03-07
CN115760543B CN115760543B (en) 2024-02-13

Family

ID=85368867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211403828.2A Active CN115760543B (en) 2022-11-10 2022-11-10 Thread processing method, device, equipment and storage medium for rasterizer ordered view

Country Status (1)

Country Link
CN (1) CN115760543B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843540A (en) * 2023-08-31 2023-10-03 南京砺算科技有限公司 Graphics processor and graphics processing apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015153167A1 (en) * 2014-04-05 2015-10-08 Sony Computer Entertainment America Llc Varying effective resolution by screen location by altering rasterization parameters
CN110704768A (en) * 2019-10-08 2020-01-17 支付宝(杭州)信息技术有限公司 Webpage rendering method and device based on graphics processor
CN111210526A (en) * 2019-12-31 2020-05-29 西安翔腾微电子科技有限公司 GPU geometric primitive initial mark management method in plane clipping
CN113256764A (en) * 2021-06-02 2021-08-13 南京芯瞳半导体技术有限公司 Rasterization device and method and computer storage medium
CN114153500A (en) * 2021-12-01 2022-03-08 海光信息技术股份有限公司 Instruction scheduling method, instruction scheduling device, processor and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015153167A1 (en) * 2014-04-05 2015-10-08 Sony Computer Entertainment America Llc Varying effective resolution by screen location by altering rasterization parameters
CN110704768A (en) * 2019-10-08 2020-01-17 支付宝(杭州)信息技术有限公司 Webpage rendering method and device based on graphics processor
CN111210526A (en) * 2019-12-31 2020-05-29 西安翔腾微电子科技有限公司 GPU geometric primitive initial mark management method in plane clipping
CN113256764A (en) * 2021-06-02 2021-08-13 南京芯瞳半导体技术有限公司 Rasterization device and method and computer storage medium
CN114153500A (en) * 2021-12-01 2022-03-08 海光信息技术股份有限公司 Instruction scheduling method, instruction scheduling device, processor and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周传辉: "批量修改线宽方法的探讨", 武汉科技大学学报(自然科学版), no. 04 *
聂瞾等: "基于Zigzag块扫描的光栅化算法设计与实现", 《科技风》, pages 37 - 38 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843540A (en) * 2023-08-31 2023-10-03 南京砺算科技有限公司 Graphics processor and graphics processing apparatus
CN116843540B (en) * 2023-08-31 2024-01-23 南京砺算科技有限公司 Graphics processor and graphics processing apparatus

Also Published As

Publication number Publication date
CN115760543B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
US9934551B2 (en) Split storage of anti-aliased samples
US9747692B2 (en) Rendering apparatus and method
JP2008500625A (en) Tile-based graphic rendering
US20100064291A1 (en) System and Method for Reducing Execution Divergence in Parallel Processing Architectures
US10943389B2 (en) Removing or identifying overlapping fragments after z-culling
US8212825B1 (en) System and method for geometry shading
KR102545176B1 (en) Method and apparatus for register management
GB2517047A (en) Data processing systems
CN111279384B (en) Compression and decompression of indices in a graphics pipeline
US10846908B2 (en) Graphics processing apparatus based on hybrid GPU architecture
CN109978977A (en) The device and method for executing the rendering based on segment using the graph data prefetched
CN115760543B (en) Thread processing method, device, equipment and storage medium for rasterizer ordered view
US20230169728A1 (en) Throttling hull shaders based on tessellation factors in a graphics pipeline
CN115237599B (en) Rendering task processing method and device
US9779537B2 (en) Method and apparatus for ray tracing
KR102657587B1 (en) Method and apparatus for rendering a curve
CN115049531B (en) Image rendering method and device, graphic processing equipment and storage medium
KR20160068204A (en) Data processing method for mesh geometry and computer readable storage medium of recording the same
US11854139B2 (en) Graphics processing unit traversal engine
US20200234484A1 (en) Graphics processing systems
US20170186213A1 (en) Methods and apparatuses for determining layout of stored texture
KR102589969B1 (en) Graphics processing unit, graphics processing system and graphics processing method of performing interpolation in deferred shading
CN101127124A (en) Method and apparatus for transforming object vertices during rendering of graphical objects for display
GB2596363A (en) Hierarchical acceleration structures for use in ray tracing systems
US20240013469A1 (en) Programmable pixel blending pipeline, programmable pixel blending method and apparatus, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 200135, 11th Floor, Building 3, No. 889 Bibo Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Granfei Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 200135 Room 201, No. 2557, Jinke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee before: Gryfield Intelligent Technology Co.,Ltd.

Country or region before: China