CN111144057B - Performance analysis method and device for graphics rendering pipeline and computer storage medium - Google Patents

Performance analysis method and device for graphics rendering pipeline and computer storage medium Download PDF

Info

Publication number
CN111144057B
CN111144057B CN201911394959.7A CN201911394959A CN111144057B CN 111144057 B CN111144057 B CN 111144057B CN 201911394959 A CN201911394959 A CN 201911394959A CN 111144057 B CN111144057 B CN 111144057B
Authority
CN
China
Prior art keywords
rendering
stage
analyzed
data
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911394959.7A
Other languages
Chinese (zh)
Other versions
CN111144057A (en
Inventor
李洋
张竞丹
樊良辉
马超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Xintong Semiconductor Technology Co ltd
Original Assignee
Xi'an Xintong Semiconductor Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Xintong Semiconductor Technology Co ltd filed Critical Xi'an Xintong Semiconductor Technology Co ltd
Priority to CN201911394959.7A priority Critical patent/CN111144057B/en
Publication of CN111144057A publication Critical patent/CN111144057A/en
Application granted granted Critical
Publication of CN111144057B publication Critical patent/CN111144057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Generation (AREA)

Abstract

The embodiment of the invention discloses a performance analysis method and device of a graphics rendering pipeline and a computer storage medium, wherein the method can comprise the following steps: for each rendering stage to be analyzed in a graphics rendering pipeline, counting the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time; comparing the output data amount with the input data amount in the set period, and determining whether a comparison result accords with a set judgment strategy: determining that the rendering stage to be analyzed is normal in working performance according to the comparison result conforming to the judging strategy; and determining that the working performance of the rendering stage to be analyzed is abnormal according to the comparison result which does not accord with the judging strategy.

Description

Performance analysis method and device for graphics rendering pipeline and computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of graphic processing units (GPU, graphics Processing Unit), in particular to a performance analysis method and device of a graphic rendering pipeline and a computer storage medium.
Background
GPU simulation belongs to an important link in the chip design process of the GPU, and graphic rendering simulation software simulates the working process of the GPU by utilizing a GPU rendering pipeline and timely discovers and verifies the functional problems and the performance problems existing in the chip design process of the GPU.
At present, in the process of simulating a graphics rendering pipeline of a GPU, system-level performance analysis is mostly adopted. If the simulation software is in the process of simulating the graphics rendering pipeline of the GPU to perform graphics rendering, the rendering speed is low, or the expected graphics cannot be rendered in the expected time, the current system-level performance analysis cannot accurately know which stage in the graphics rendering pipeline has a problem, so that the graphics cannot be drawn or rendering is slower; thus, the problem cannot be accurately located, making it difficult to make a targeted solution.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention desirably provide a method, an apparatus, and a computer storage medium for analyzing performance of a graphics rendering pipeline; the granularity of performance analysis can be reduced in the process of performing simulation on the graphic rendering pipeline of the GPU, the positioning precision of the functional problem of the graphic rendering pipeline is improved, and a targeted solution is conveniently provided.
The technical scheme of the embodiment of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for analyzing performance of a graphics rendering pipeline, the method including:
for each rendering stage to be analyzed in a graphics rendering pipeline, counting the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time;
comparing the output data amount with the input data amount in the set period, and determining whether a comparison result accords with a set judgment strategy:
determining that the rendering stage to be analyzed is normal in working performance according to the comparison result conforming to the judging strategy;
and determining that the working performance of the rendering stage to be analyzed is abnormal according to the comparison result which does not accord with the judging strategy.
In a second aspect, an embodiment of the present invention provides a performance analysis apparatus of a graphics rendering pipeline, the apparatus comprising: a statistics section, a comparison section, a first determination section, and a second determination section; wherein, the liquid crystal display device comprises a liquid crystal display device,
the statistics part is configured to count the input data quantity of the rendering stage to be analyzed and the output data quantity of the rendering stage to be analyzed in a set period of time for each rendering stage to be analyzed in a graphics rendering pipeline;
the comparison part is configured to compare the output data quantity with the input data quantity in the set period of time and determine whether a comparison result accords with a set judgment strategy; and triggering the first determination portion in response to the comparison result conforming to the determination policy; and triggering the second determination portion in response to the comparison result not conforming to the determination policy.
In a third aspect, an embodiment of the present invention provides a performance analysis apparatus of a graphics rendering pipeline, the apparatus comprising: a memory and a processor; wherein the memory stores a performance analysis program of the graphics rendering pipeline; the processor is configured to run a performance analysis program of the graphics rendering pipeline to perform the steps of the performance analysis method of the graphics rendering pipeline of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium storing a performance analysis program of a graphics rendering pipeline, where the performance analysis program of the graphics rendering pipeline, when executed by at least one processor, implements the steps of the performance analysis method of the graphics rendering pipeline of the first aspect.
The embodiment of the invention provides a performance analysis method and device of a graphics rendering pipeline and a computer storage medium; each rendering stage of the graphic rendering pipeline determines the working performance of each rendering stage through the comparison between the input data quantity and the output data quantity, so that the performance problem of each rendering stage is positioned and judged in the process of performing simulation on the graphic rendering pipeline of the GPU, the granularity of performance analysis is reduced from a system stage to a rendering stage, the positioning precision of the functional problem of the graphic rendering pipeline is improved, and a targeted solution is conveniently provided.
Drawings
FIG. 1 is a graphics rendering pipeline block diagram of a GPU capable of implementing one or more aspects of embodiments of the present invention;
FIG. 2 is a flow chart of a method for analyzing performance of a graphics rendering pipeline according to an embodiment of the present invention;
FIG. 3 is a block diagram of various rendering stages to be analyzed that can implement one or more aspects of an embodiment of the present invention;
FIG. 4 is a schematic diagram of a statistical curve provided in an embodiment of the present invention;
FIG. 5 is a schematic diagram of another statistical curve provided in an embodiment of the present invention;
FIG. 6 is a schematic diagram of another statistical curve provided in an embodiment of the present invention;
FIG. 7 is a schematic diagram of another statistical curve according to an embodiment of the present invention
Fig. 8 is a schematic diagram of a performance analysis apparatus of a graphics rendering pipeline according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to FIG. 1, a graphics rendering pipeline 1 of a GPU is shown that is capable of implementing one or more aspects of embodiments of the present invention, it being noted that the graphics processing pipeline 1 is a logical structure formed by cascading using general purpose rendering cores and fixed function rendering cores in the GPU, in embodiments of the present invention, each stage may also be referred to as each rendering stage. In particular, the general purpose rendering cores may each be programmed to be capable of performing processing tasks related to a wide variety of programs, including, but not limited to, linear and nonlinear data transformations, video and/or audio data filtering, modeling operations (e.g., applying laws of physics to determine the position, velocity, and other properties of objects), graphics rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or fragment shader programs), and so forth; the fixed function rendering core includes hardware hardwired to perform certain functions that may be configured via, for example, one or more control signals to perform different functions such as vertex grabbing, primitive assembling, clipping, rasterizing, and blending testing. As shown in fig. 1, in the graphics rendering pipeline 1, a rendering stage implemented by a general-purpose rendering core is illustrated by a rounded frame, a rendering stage implemented by a fixed-function rendering core is illustrated by a block, and each stage (or referred to as each rendering stage) included in the graphics rendering pipeline 1 is sequentially:
the vertex fetch stage 12, shown in the example of FIG. 1 as being implemented by a fixed function rendering core, is generally responsible for fetching graphics data (triangles, lines, and points) from the video memory 2 and supplying to the graphics rendering pipeline 1. For example, vertex fetch stage 12 may collect vertex data for high-order surfaces, primitives, etc. from memory and output vertex data and attributes to vertex shading stage 14.
Vertex shading stage 14, shown in FIG. 1 as being implemented by a general purpose rendering core, is responsible for processing received vertex data and attributes, and processing the vertex data by performing a set of operations for each vertex at a time.
A geometry shading stage 16, shown in FIG. 1 as being implemented by a general purpose rendering core, receives as input the output of the vertex shading stage 14, adds and deletes vertices by efficient geometry operations, and outputs vertex data;
the primitive assembly phase 18, shown in FIG. 1 as being implemented by a fixed function rendering core, is responsible for collecting vertices and assembling the vertices into geometric primitives. For example, primitive assembly stage 86 may be configured to group every three consecutive vertices into a geometric primitive (i.e., triangle). In some embodiments, a particular vertex may be reused for consecutive geometric primitives (e.g., two consecutive triangles in a triangle strip may share two vertices).
The clipping stage 20, shown in fig. 1 as being implemented by a fixed function rendering core, is responsible for retaining primitives that are within the view while culling primitives that are outside the view to reduce the computational burden of subsequent stages.
The rasterization stage 22, shown in FIG. 1 as being implemented by a fixed function rendering core, is responsible for preparing the primitives of the fragment shading stage 24. For example, the rasterization stage 22 may generate fragments for shading by the fragment shading stage 24
Fragment shading stage 24, shown in FIG. 1 as being implemented by a general purpose rendering core, is responsible for receiving fragments by rasterization stage 22 and generating per-pixel data, such as color. In addition, the fragment shading stage 24 may also perform per-pixel processing such as texture blending and illumination model computation.
The blending test stage 26, shown in FIG. 1 as being implemented by a fixed function rendering core, is generally responsible for performing various operations on the pixel data, such as performing a transparency test (alpha test), a stencil test (stepil test), and blending the pixel data with other pixel data corresponding to other segments associated with the pixel. When the blending test phase 26 has completed processing pixel data (i.e., the amount of output data), the processed pixel data, e.g., an image, may be written to a rendering target, e.g., video memory 2, to produce a final result.
Referring to fig. 1, an embodiment of the present invention expects that in a process of simulating a graphics rendering pipeline 1 by software, rendering performance of each stage can be analyzed, and based on this, referring to fig. 2, a method for analyzing performance of a graphics rendering pipeline according to an embodiment of the present invention is shown, where the method may include:
s201: for each rendering stage to be analyzed in a graphics rendering pipeline, counting the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time;
s202: comparing the output data amount with the input data amount in the set period, and determining whether a comparison result accords with a set judgment strategy:
s203: determining that the rendering stage to be analyzed is normal in working performance according to the comparison result conforming to the judging strategy;
s204: and determining that the working performance of the rendering stage to be analyzed is abnormal according to the comparison result which does not accord with the judging strategy.
In the process of performing performance analysis according to the above method for each rendering stage in the graphics rendering pipeline 1, the graphics rendering pipeline 1 shown in fig. 1 is in a cascaded logic structure, and for the embodiment of the present invention, the rendering stages to be analyzed may include 8 rendering stages in the graphics rendering pipeline 1 shown in fig. 1, where the 8 rendering stages are sequentially: vertex acquisition stage 12, vertex shading stage 14, geometry shading stage 16, primitive assembly stage 18, clipping stage 20, rasterization stage 22, fragment shading stage 24, and blending test stage 26.
For the above 8 rendering stages, it should be noted that, when the rendering stage to be analyzed is the vertex shading stage 14, the geometry shading stage 16, the primitive assembling stage 18, the clipping stage 20, the rasterizing stage 22 or the fragment shading stage 24, the input data amount of the rendering stage to be analyzed may be the output data amount of the preceding rendering stage of the rendering stage to be analyzed; accordingly, the output data amount of the rendering stage to be analyzed may be the input data amount of the rendering stage at the subsequent stage of the rendering stage to be analyzed, and it is understood that, in the embodiment of the present invention, the terms "front" or "rear" refer to the rendering flow direction (such as the arrow direction in fig. 1) of the graphics rendering pipeline 1 in fig. 1, and take the vertex shading stage as an example, the rendering stage at the previous stage of the rendering stage is the vertex grabbing stage, and the rendering stage at the subsequent stage of the rendering stage is the geometry shading stage.
When the rendering stage to be analyzed is the vertex acquiring stage 12 or the mixed test stage 26, since the two rendering stages are the first and last rendering stages of the graphics rendering pipeline, respectively, the front rendering stage of the vertex acquiring stage 12 cannot be found in the graphics rendering pipeline 1, and the rear rendering stage of the mixed test stage 26 cannot be found in the graphics rendering pipeline 1; based on this, the input data amount of the vertex acquisition stage 12 is preferably vertex data acquired from the video memory 2, and the output data amount of the hybrid test stage 26 is preferably pixel data output to the video memory 2.
For the technical scheme shown in fig. 2, each rendering stage of the graphics rendering pipeline 1 determines the working performance of each rendering stage through the comparison between the input data amount and the output data amount, so that in the process of performing simulation on the graphics rendering pipeline of the GPU, the performance problem of each rendering stage is positioned and judged, and the granularity of performance analysis is reduced from the system level to the rendering stage level, thereby improving the positioning precision of the functional problem of the graphics rendering pipeline and being convenient for giving a targeted solution.
In some examples, an input first-in first-out (FIFO, first In First OUT) queue may be provided at the front end of each rendering stage to be analyzed, and an output FIFO queue may be provided at the back end of each rendering stage to be analyzed; in detail, in connection with the description of fig. 1 and the technical solution shown in fig. 2, the functions and operations implemented by each rendering stage to be analyzed may be implemented by configuring a general rendering core or a fixed function rendering core in the GPU, based on which, referring to fig. 3, a FIFO queue may be set at a front end of the general rendering core or the fixed function rendering core implementing each rendering stage to be analyzed to buffer input data to be rendered, and a FIFO queue may be set at a rear end of the general rendering core or the fixed function rendering core of each rendering stage to be analyzed to buffer output rendered data. Corresponding to the present example, the technical solution shown in fig. 2 may further include: the pre-rendering stage of the rendering stage to be analyzed inputs data to be rendered to the rendering stage to be analyzed through the input FIFO queue; and the latter rendering stage of the rendering stage to be analyzed receives the rendering-completed data output by the rendering stage to be analyzed through the input FIFO queue.
Based on the above example, a statistics module may be further disposed between the input FIFO queue and the output FIFO queue of each rendering stage to be analyzed, for example, as shown in fig. 3, the statistics module may be used as a separate hardware unit or structure, and is independent of a general rendering core or a fixed function rendering core for implementing each rendering stage to be analyzed; for another example, the statistics module may also be used as a part of a general rendering core or a fixed function rendering core for implementing each rendering stage to be analyzed, so that in the technical solution shown in fig. 2, the statistics of the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time includes: counting the data quantity of the data to be rendered passing through the input FIFO queue in the set period and the data quantity of the rendered data passing through the output FIFO queue in the set period aiming at the rendering stage to be analyzed; the data volume of the data to be rendered is the number of elements represented by the data to be rendered, and the data volume after rendering is the number of elements represented by the data after rendering.
It should be noted that, the elements represented by the data refer to the data structure that the data can embody; for example, of the 8 possible rendering stages to be analyzed, the elements represented by the input data of the vertex acquisition stage 12, the vertex shading stage 14, and the geometry shading stage 16 are vertices, and the elements represented by the output data are vertices; while the elements represented by the input data of the primitive assembly phase 18 are vertices and the elements represented by the output data are primitives; the elements represented by the input data of the shear stage 20 are primitives, as are the elements represented by the output data; the elements represented by the input data of the rasterization stage 22 are primitives and the elements represented by the output data are pixels or segments; the elements characterized by the input data of the fragment shading stage 24 and the blending test stage 26 are pixels or fragments, as are the elements characterized by the output data. In connection with the above example, the statistical input or output data amount may be considered as the statistical vertex number, primitive number, pixel number or segment number according to its corresponding rendering stage to be analyzed. This embodiment will not be described in detail.
It should be further noted that, in addition to the foregoing structural composition, implementation of the rendering process of the graphics rendering pipeline depends on time, and in some examples, statistics of the input data amount and the output data amount may be implemented by accumulating statistics, and based on this example, comparing the output data amount and the input data amount in the set period, and determining whether the comparison result meets the set decision policy includes:
for each statistical time in the set period, acquiring a difference value between the accumulated output data quantity corresponding to each statistical time and the accumulated input data quantity corresponding to each statistical time; the accumulated output data quantity corresponding to each statistic time is the sum of the output data quantity from the image rendering starting time to each statistic time, and the accumulated input data quantity corresponding to each statistic time is the sum of the input data quantity from the image rendering starting time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
For the above example, taking the rendering stage to be analyzed as the vertex coloring stage, in this stage, the elements represented by the input data and the output data are consistent and are vertices, so that in the normal and stable working performance of the vertex coloring stage, the input data amount and the output data amount are relatively close, and the statistical curve is shown in fig. 4 in the manner of the above-mentioned cumulative statistics, where the horizontal axis of fig. 4 represents the statistical time in the set period in the rendering process, and the vertical axis represents the number of vertices of the cumulative statistics of each statistical time. As can be seen from fig. 4, the input data volume and the output data volume of the vertex shading stage are closer and parallel before 30s, but the difference between the input data volume and the output data volume is larger and larger from 30s, which indicates that the rendering core implementing the vertex shading stage may have a performance bottleneck after 30 s.
For the above example, the primitive assembly phase is taken as an example, in which the elements represented by the input data and the output data are not identical, the input data are vertices, the output data are primitives, and the data amount of one primitive is generally equivalent to the data amount of three vertex data. Therefore, in the state that the working performance of the primitive assembly stage is normal and stable, the input data amount and the output data amount are in a stable difference range, and the statistical curve is shown in fig. 5 by adopting the mode of accumulated statistics, the horizontal axis of fig. 5 represents the statistical time in the set period in the rendering process, and the vertical axis represents the number of elements accumulated and counted at each statistical time. As can be seen from fig. 5, the gap between the input data amount and the output data amount of the primitive assembly phase is smooth and relatively parallel before 20s, but the gap between the input data amount and the output data amount is larger and larger from 20s, which indicates that the rendering core implementing the primitive assembly phase may have a performance bottleneck after 20 s. For this example, it can be known that when the input data and the output data of other rendering stages to be analyzed are inconsistent, the element number can still be used for statistics and analysis.
In some examples, statistics of the input data amount and the output data amount may be implemented by time-division statistics, and based on this example, comparing the output data amount and the input data amount in the set period, determining whether a comparison result meets a set decision policy includes:
for each statistical time in the set period, acquiring a difference value between an interval output data amount corresponding to each statistical time and an interval input data amount corresponding to each statistical time; the interval output data quantity corresponding to each statistical moment is the output data quantity from the previous statistical moment of each statistical moment to each statistical moment; the interval input data quantity corresponding to each statistic time is the input data quantity from the previous statistic time of each statistic time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
For the above example, taking the rendering stage to be analyzed as the vertex coloring stage, in this stage, the elements represented by the input data and the output data are consistent and are vertices, so that in the normal and stable working performance of the vertex coloring stage, the input data amount and the output data amount are relatively close, and the statistics curve is shown in fig. 6 by adopting the manner of time-sharing statistics, the horizontal axis of fig. 6 represents the statistics time in the set period in the rendering process, and the vertical axis represents the number of vertices counted in each statistics time interval. As can be seen from fig. 6, the input data volume and the output data volume of the vertex shading stage are closer and parallel before 30s, but the difference between the input data volume and the output data volume is larger and larger from 30s, which indicates that the rendering core implementing the vertex shading stage may have a performance bottleneck after 30 s.
For the above example, the primitive assembly phase is taken as an example, in which the elements represented by the input data and the output data are not identical, the input data are vertices, the output data are primitives, and the data amount of one primitive is generally equivalent to the data amount of three vertex data. Therefore, in the state that the working performance of the primitive assembly stage is normal and stable, the input data amount and the output data amount are in a stable difference range, and the statistical curve is shown in fig. 7 by adopting the above manner of time-division statistics, the horizontal axis of fig. 7 represents the statistical time in the set period in the rendering process, and the vertical axis represents the number of elements counted in each statistical time interval. As can be seen from fig. 7, the gap between the input data amount and the output data amount of the primitive assembly phase is smooth and relatively parallel before 20s, but the gap between the input data amount and the output data amount is larger and larger from 20s, which indicates that the rendering core implementing the primitive assembly phase may have a performance bottleneck after 20 s. For this example, it can be known that when the input data and the output data of other rendering stages to be analyzed are inconsistent, the element number can still be used for statistics and analysis.
In combination with the above two examples, the determining that the performance of the rendering stage to be analyzed is abnormal, corresponding to the comparison result not conforming to the determination policy, includes:
and determining the statistical moment corresponding to the difference value, wherein the statistical moment corresponding to the difference value is not in accordance with a set judging strategy, and the working performance of the rendering stage to be analyzed is abnormal.
For example, as can be seen from the statistical curves shown in fig. 4 and 6, the rendering core in the vertex shading stage has abnormal working performance after 30s, and may have performance bottlenecks; from the statistical curves shown in fig. 5 and fig. 7, it can be seen that the rendering core in the primitive assembly stage has abnormal working performance after 20s, and may have a performance bottleneck.
Referring to fig. 8 in combination with the foregoing drawings, schemes and examples, a performance analysis apparatus 80 for a graphics rendering pipeline according to an embodiment of the present invention is shown, where the apparatus includes: a statistics section 801, a comparison section 802, a first determination section 803, and a second determination section 804; wherein, the liquid crystal display device comprises a liquid crystal display device,
the statistics portion 801 is configured to count, for each rendering stage to be analyzed in the graphics rendering pipeline, an amount of input data of the rendering stage to be analyzed and an amount of output data of the rendering stage to be analyzed within a set period of time;
the comparing section 802 is configured to compare the output data amount and the input data amount in the set period, and determine whether the comparison result meets a set decision policy; and triggering the first determining section 803 in response to the comparison result conforming to the determination policy; and triggering the second determining portion 804 in response to the comparison result not conforming to the decision strategy.
In some examples, an input first-in first-out FIFO queue is set at the front end of each rendering stage to be analyzed, and an output FIFO queue is set at the rear end of each rendering stage to be analyzed; in response to this, the control unit,
the pre-rendering stage of the rendering stage to be analyzed inputs data to be rendered to the rendering stage to be analyzed through the input FIFO queue;
and the latter rendering stage of the rendering stage to be analyzed receives the rendering-completed data output by the rendering stage to be analyzed through the input FIFO queue.
In some examples, the statistics portion 801 is configured to:
counting the data quantity of the data to be rendered passing through the input FIFO queue in the set period and the data quantity of the rendered data passing through the output FIFO queue in the set period aiming at the rendering stage to be analyzed; the data volume of the data to be rendered is the number of elements represented by the data to be rendered, and the data volume after rendering is the number of elements represented by the data after rendering.
In some examples, the comparing portion 802 is configured to:
for each statistical time in the set period, acquiring a difference value between the accumulated output data quantity corresponding to each statistical time and the accumulated input data quantity corresponding to each statistical time; the accumulated output data quantity corresponding to each statistic time is the sum of the output data quantity from the image rendering starting time to each statistic time, and the accumulated input data quantity corresponding to each statistic time is the sum of the input data quantity from the image rendering starting time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
In some examples, the comparing portion 802 is configured to:
for each statistical time in the set period, acquiring a difference value between an interval output data amount corresponding to each statistical time and an interval input data amount corresponding to each statistical time; the interval output data quantity corresponding to each statistical moment is the output data quantity from the previous statistical moment of each statistical moment to each statistical moment; the interval input data quantity corresponding to each statistic time is the input data quantity from the previous statistic time of each statistic time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
In some examples, the second determining portion 804 is configured to:
and determining the statistical moment corresponding to the difference value, wherein the statistical moment corresponding to the difference value is not in accordance with a set judging strategy, and the working performance of the rendering stage to be analyzed is abnormal.
In some examples, the rendering stages to be analyzed include at least one or more of a vertex acquisition stage, a vertex shading stage, a geometry shading stage, a primitive assembly stage, a clipping stage, a rasterization stage, a fragment shading stage, and a blending test stage.
It will be appreciated that in this embodiment, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and of course may be a unit, or a module may be non-modular.
In addition, each component in the present embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional modules.
The integrated units, if implemented in the form of software functional modules, may store the functions as one or more instructions or code on or transmit over a computer-readable medium, rather than being sold or used as separate products. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise U-disk, removable hard disk, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The code may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs) or other equivalent programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. . Thus, the terms "processor" and "processing unit" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of embodiments of the present invention may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (i.e., a chipset). The various components, modules, or units are described in this disclosure in order to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in combination with suitable software and/or firmware, or provided by a collection of interoperable hardware units, including one or more processors as described above.
Various aspects of the invention have been described. These and other embodiments are within the scope of the following claims. It should be noted that: the technical schemes described in the embodiments of the present invention may be arbitrarily combined without any collision.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method of performance analysis of a graphics rendering pipeline, the method comprising:
for each rendering stage to be analyzed in a graphics rendering pipeline, counting the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time;
comparing the output data amount with the input data amount in the set period, and determining whether a comparison result accords with a set judgment strategy:
determining that the rendering stage to be analyzed is normal in working performance according to the comparison result conforming to the judging strategy;
determining that the working performance of the rendering stage to be analyzed is abnormal according to the comparison result which does not accord with the judging strategy;
setting an input first-in first-out FIFO queue at the front end of each rendering stage to be analyzed, and setting an output FIFO queue at the rear end of each rendering stage to be analyzed;
accordingly, the method further comprises:
the pre-rendering stage of the rendering stage to be analyzed inputs data to be rendered to the rendering stage to be analyzed through the input FIFO queue;
the latter rendering stage of the rendering stage to be analyzed receives the rendered data output by the rendering stage to be analyzed through the input FIFO queue;
the counting of the input data amount of the rendering stage to be analyzed and the output data amount of the rendering stage to be analyzed in a set period of time includes:
counting the data quantity of the data to be rendered passing through the input FIFO queue in the set period and the data quantity of the rendered data passing through the output FIFO queue in the set period aiming at the rendering stage to be analyzed; the data volume of the data to be rendered is the number of elements represented by the data to be rendered, and the data volume after rendering is the number of elements represented by the data after rendering.
2. The method of claim 1, wherein comparing the amount of output data with the amount of input data over the set period of time to determine whether the comparison meets a set decision strategy comprises:
for each statistical time in the set period, acquiring a difference value between the accumulated output data quantity corresponding to each statistical time and the accumulated input data quantity corresponding to each statistical time; the accumulated output data quantity corresponding to each statistic time is the sum of the output data quantity from the image rendering starting time to each statistic time, and the accumulated input data quantity corresponding to each statistic time is the sum of the input data quantity from the image rendering starting time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
3. The method of claim 1, wherein comparing the amount of output data with the amount of input data over the set period of time to determine whether the comparison meets a set decision strategy comprises:
for each statistical time in the set period, acquiring a difference value between an interval output data amount corresponding to each statistical time and an interval input data amount corresponding to each statistical time; the interval output data quantity corresponding to each statistical moment is the output data quantity from the previous statistical moment of each statistical moment to each statistical moment; the interval input data quantity corresponding to each statistic time is the input data quantity from the previous statistic time of each statistic time to each statistic time;
comparing the difference value corresponding to each statistical moment with a set threshold value:
if the difference value does not exceed the set threshold value, determining that the comparison result accords with the set judgment strategy at the statistical moment corresponding to the difference value;
and if the difference exceeds the set threshold, determining that the comparison result does not accord with the set judgment strategy at the statistical moment corresponding to the difference.
4. A method according to claim 2 or 3, wherein said determining of an operational performance anomaly of the rendering stage to be analyzed, corresponding to the comparison result not conforming to the decision strategy, comprises:
and determining the statistical moment corresponding to the difference value, wherein the statistical moment corresponding to the difference value is not in accordance with a set judging strategy, and the working performance of the rendering stage to be analyzed is abnormal.
5. The method of claim 1, wherein the rendering stage to be analyzed comprises at least one or more of a vertex acquisition stage, a vertex shading stage, a geometry shading stage, a primitive assembly stage, a clipping stage, a rasterization stage, a fragment shading stage, and a blending test stage.
6. A performance analysis apparatus of a graphics rendering pipeline, the apparatus comprising: a statistics section, a comparison section, a first determination section, and a second determination section; wherein, the liquid crystal display device comprises a liquid crystal display device,
the statistics part is configured to count the input data quantity of the rendering stage to be analyzed and the output data quantity of the rendering stage to be analyzed in a set period of time for each rendering stage to be analyzed in a graphics rendering pipeline;
the comparison part is configured to compare the output data quantity with the input data quantity in the set period of time and determine whether a comparison result accords with a set judgment strategy; and triggering the first determination portion in response to the comparison result conforming to the determination policy; and triggering the second determination portion in response to the comparison result not conforming to the determination policy;
the statistics part is further configured to set an input first-in first-out FIFO queue at the front end of each rendering stage to be analyzed, and set an output FIFO queue at the rear end of each rendering stage to be analyzed; the pre-rendering stage of the rendering stage to be analyzed inputs data to be rendered to the rendering stage to be analyzed through the input FIFO queue; the latter rendering stage of the rendering stage to be analyzed receives the rendered data output by the rendering stage to be analyzed through the input FIFO queue;
the statistics part is further configured to count, for the rendering stage to be analyzed, the data amount of the data to be rendered passing through the input FIFO queue in the set period and the data amount of the rendered data completed passing through the output FIFO queue in the set period; the data volume of the data to be rendered is the number of elements represented by the data to be rendered, and the data volume after rendering is the number of elements represented by the data after rendering.
7. A performance analysis apparatus of a graphics rendering pipeline, the apparatus comprising: a memory and a processor; wherein the memory stores a performance analysis program of the graphics rendering pipeline; the processor configured to run a performance analysis program of the graphics rendering pipeline to perform the steps of the performance analysis method of the graphics rendering pipeline of any one of claims 1 to 5.
8. A computer storage medium storing a performance analysis program of a graphics rendering pipeline, which when executed by at least one processor, implements the steps of the performance analysis method of a graphics rendering pipeline of any one of claims 1 to 5.
CN201911394959.7A 2019-12-30 2019-12-30 Performance analysis method and device for graphics rendering pipeline and computer storage medium Active CN111144057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911394959.7A CN111144057B (en) 2019-12-30 2019-12-30 Performance analysis method and device for graphics rendering pipeline and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911394959.7A CN111144057B (en) 2019-12-30 2019-12-30 Performance analysis method and device for graphics rendering pipeline and computer storage medium

Publications (2)

Publication Number Publication Date
CN111144057A CN111144057A (en) 2020-05-12
CN111144057B true CN111144057B (en) 2023-09-15

Family

ID=70521873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911394959.7A Active CN111144057B (en) 2019-12-30 2019-12-30 Performance analysis method and device for graphics rendering pipeline and computer storage medium

Country Status (1)

Country Link
CN (1) CN111144057B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113144616A (en) * 2021-05-25 2021-07-23 网易(杭州)网络有限公司 Bandwidth determination method and device, electronic equipment and computer readable medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101675453A (en) * 2007-03-02 2010-03-17 索尼计算机娱乐美国公司 Be used to analyze the graphics command management tool and the method for the performance that changes for order before the application modification
CN102449665A (en) * 2009-06-02 2012-05-09 高通股份有限公司 Displaying a visual representation of performance metrics for rendered graphics elements
CN102651142A (en) * 2012-04-16 2012-08-29 深圳超多维光电子有限公司 Image rendering method and image rendering device
CN106504185A (en) * 2016-10-26 2017-03-15 腾讯科技(深圳)有限公司 One kind renders optimization method and device
CN107169916A (en) * 2016-03-07 2017-09-15 想象技术有限公司 Task combination for SIMD processing
CN109242756A (en) * 2018-09-07 2019-01-18 上海兆芯集成电路有限公司 Computer system, graphics processing unit and its graphic processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8587594B2 (en) * 2010-05-21 2013-11-19 International Business Machines Corporation Allocating resources based on a performance statistic
GB2555586B (en) * 2016-10-31 2019-01-02 Imagination Tech Ltd Performance profiling in a graphics unit

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101675453A (en) * 2007-03-02 2010-03-17 索尼计算机娱乐美国公司 Be used to analyze the graphics command management tool and the method for the performance that changes for order before the application modification
CN102449665A (en) * 2009-06-02 2012-05-09 高通股份有限公司 Displaying a visual representation of performance metrics for rendered graphics elements
CN102651142A (en) * 2012-04-16 2012-08-29 深圳超多维光电子有限公司 Image rendering method and image rendering device
CN107169916A (en) * 2016-03-07 2017-09-15 想象技术有限公司 Task combination for SIMD processing
CN106504185A (en) * 2016-10-26 2017-03-15 腾讯科技(深圳)有限公司 One kind renders optimization method and device
CN109242756A (en) * 2018-09-07 2019-01-18 上海兆芯集成电路有限公司 Computer system, graphics processing unit and its graphic processing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王旭 ; 杨新 ; 王志铭 ; .在GPU上实现地形渲染的自适应算法.计算机辅助设计与图形学学报.2010,(10),全文. *
邢立冬 ; 李涛 ; 黄虎才 ; 韩俊刚 ; .3D图形渲染的能耗估计.西安电子科技大学学报.(04),全文. *
韩俊刚 ; 姚静 ; 李涛 ; 黄虎才 ; 乔虹 ; 延酉玫 ; 王鹏博 ; .多态并行机上的3D图形渲染.西安邮电大学学报.2015,(02),全文. *

Also Published As

Publication number Publication date
CN111144057A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
US9779536B2 (en) Graphics processing
US9489763B2 (en) Techniques for setting up and executing draw calls
KR101286318B1 (en) Displaying a visual representation of performance metrics for rendered graphics elements
US8922572B2 (en) Occlusion queries in graphics processing
US10176627B2 (en) Tree-based graphics primitive rendering
US20070139421A1 (en) Methods and systems for performance monitoring in a graphics processing unit
US20080033696A1 (en) Method and system for calculating performance parameters for a processor
US9519982B2 (en) Rasterisation in graphics processing systems
US20170161917A1 (en) Graphics processing systems
KR20230073222A (en) Depth buffer pre-pass
US9679530B2 (en) Compressing graphics data rendered on a primary computer for transmission to a remote computer
CN111080761B (en) Scheduling method and device for rendering tasks and computer storage medium
US9684998B2 (en) Pixel serialization to improve conservative depth estimation
CN111144057B (en) Performance analysis method and device for graphics rendering pipeline and computer storage medium
Park et al. An effective pixel rasterization pipeline architecture for 3D rendering processors
US11790479B2 (en) Primitive assembly and vertex shading of vertex attributes in graphics processing systems
US8269769B1 (en) Occlusion prediction compression system and method
US8390619B1 (en) Occlusion prediction graphics processing system and method
US8723865B1 (en) System and method for rendering a volumetric shadow
US10127131B2 (en) Method for performance monitoring using a redundancy tracking register
US7158132B1 (en) Method and apparatus for processing primitive data for potential display on a display device
CN116348904A (en) Optimizing GPU kernels with SIMO methods for downscaling with GPU caches
KR101239965B1 (en) Rendering apparatus employed in 3D graphic accelerator and method thereof
US20230377086A1 (en) Pipeline delay elimination with parallel two level primitive batch binning
CN116883228B (en) GPU pixel filling rate measuring method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 301, Building D, Yeda Science and Technology Park, No. 300 Changjiang Road, Yantai Area, China (Shandong) Pilot Free Trade Zone, Yantai City, Shandong Province, 265503

Patentee after: Xi'an Xintong Semiconductor Technology Co.,Ltd.

Address before: Room 21101, 11 / F, unit 2, building 1, Wangdu, No. 3, zhangbayi Road, Zhangba Street office, hi tech Zone, Xi'an City, Shaanxi Province

Patentee before: Xi'an Xintong Semiconductor Technology Co.,Ltd.