CN109800141B

CN109800141B - GPU performance bottleneck determining method, device, terminal and storage medium

Info

Publication number: CN109800141B
Application number: CN201910080514.5A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2020-08-18
Anticipated expiration: 2039-01-28
Also published as: CN109800141A; WO2020156132A1

Abstract

The application discloses a method and a device for determining GPU performance bottleneck, a terminal and a storage medium, and belongs to the technical field of terminals. The method comprises the following steps: in the running process of a target application program, obtaining the GPU running frequency of a GPU in a preset time length; if the GPU operating frequency meets a preset condition, acquiring a GPU single-frame rendering time length of a GPU in a preset time length, wherein the GPU single-frame rendering time length is the operating time length of the GPU in the single-frame image rendering process; and determining whether the GPU reaches the performance bottleneck according to the single-frame rendering duration of the GPU. In the embodiment of the application, the terminal performs GPU performance bottleneck judgment based on the GPU operating frequency and the GPU single-frame rendering duration of the single-frame image rendering process, complex GPU load calculation is not required to be performed by means of driving, the GPU performance bottleneck detection accuracy is guaranteed, meanwhile, the complexity of performance bottleneck detection is reduced, and the performance bottleneck detection efficiency is improved.

Description

GPU performance bottleneck determining method, device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to a method and a device for determining GPU performance bottleneck, a terminal and a storage medium.

Background

For video applications, game applications and other applications requiring dynamic image rendering, the operation quality is closely related to the performance of a Graphics Processor (GPU).

In the related art, in order to realize GPU load monitoring, a driver corresponding to the GPU needs to be installed in the terminal, so that the real-time load of the GPU is calculated through the driver, and whether the GPU reaches the performance bottleneck in the running process of the application program is further judged.

Disclosure of Invention

The embodiment of the application provides a method, a device, a terminal and a storage medium for determining a performance bottleneck of a GPU (graphics processing unit), which can be used for solving the problem that the performance bottleneck determination process is complicated because the real-time load of the GPU needs to be calculated by driving in the related technology so as to judge whether the GPU reaches the performance bottleneck according to the real-time load, and the technical scheme is as follows:

in one aspect, a method for determining a GPU performance bottleneck is provided, where the method includes:

in the running process of a target application program, obtaining the GPU running frequency of a GPU in a preset time length;

if the GPU operating frequency meets a preset condition, acquiring a GPU single-frame rendering time length of the GPU in the preset time length, wherein the GPU single-frame rendering time length is the operating time length of the GPU in a single-frame image rendering process;

and determining whether the GPU reaches a performance bottleneck according to the GPU single-frame rendering duration.

In another aspect, an apparatus for determining a GPU performance bottleneck is provided, the apparatus comprising:

the first acquisition module is used for acquiring the GPU operating frequency of the GPU in a preset time length in the operating process of the target application program;

the second obtaining module is used for obtaining the GPU single-frame rendering time length of the GPU in the preset time length when the GPU operating frequency meets a preset condition, wherein the GPU single-frame rendering time length is the operating time length of the GPU in the single-frame image rendering process;

and the determining module is used for determining whether the GPU reaches the performance bottleneck according to the single-frame rendering duration of the GPU.

In another aspect, a terminal is provided, where the terminal includes a processor, a memory connected to the processor, and program instructions stored in the memory, and when the processor executes the program instructions, the method for determining a GPU performance bottleneck is implemented as described in the above aspect.

In another aspect, a computer readable storage medium is provided, on which program instructions are stored, which program instructions, when executed by a processor, implement the method for determining a GPU performance bottleneck as described in the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the method comprises the steps of obtaining the GPU operating frequency of a GPU in a preset time length, further obtaining the GPU single-frame rendering time length of the GPU in the preset time length when the GPU operating frequency meets a preset condition, and determining whether the GPU reaches a performance bottleneck in the operation process of a target application program according to the GPU single-frame rendering time length; in the embodiment of the application, the terminal performs GPU performance bottleneck judgment based on the GPU operating frequency and the GPU single-frame rendering duration of the single-frame image rendering process, complex GPU load calculation is not required to be performed by means of driving, the GPU performance bottleneck detection accuracy is guaranteed, meanwhile, the complexity of performance bottleneck detection is reduced, and the efficiency of performance bottleneck detection is improved.

Drawings

Fig. 1 is a schematic structural diagram of a terminal provided in an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a graphic display process in an Android system;

FIG. 3 is a state transition diagram of four states of a buffer;

FIG. 4 is a flowchart illustrating a method for determining a GPU performance bottleneck according to an exemplary embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for determining a GPU performance bottleneck according to another exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a distribution of start time points and end time points of an enqueueing process;

FIG. 7 is a flowchart illustrating a method for determining a GPU performance bottleneck according to another exemplary embodiment of the present application;

FIG. 8 is a flowchart illustrating a method for determining a GPU performance bottleneck according to another exemplary embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for determining a GPU performance bottleneck according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

In the description of the present application, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as being fixed or detachable or integrally connected; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art. Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Before explaining the embodiments of the present application, an application scenario of the embodiments of the present application will be explained first. Fig. 1 shows a schematic structural diagram of a terminal provided in an exemplary embodiment of the present application.

The terminal 100 is an electronic device in which a target application is installed. The target application may be a system program or a third party application. Wherein the third party application is an application created by a third party other than the user and the operating system. For example, the target application may be a game application or a video playback application.

Optionally, the terminal 100 includes: a processor 120 and a memory 140.

Processor 120 may include one or more processing cores. The processor 120 connects various parts within the overall terminal 100 using various interfaces and lines, and performs various functions of the terminal 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 140 and calling data stored in the memory 140. Optionally, the processor 120 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 120 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 120, but may be implemented by a single chip.

The Memory 140 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 140 includes a non-transitory computer-readable medium. The memory 140 may be used to store instructions, programs, code sets, or instruction sets. The memory 140 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the storage data area may store data and the like referred to in the following respective method embodiments.

The terminal 120 in the embodiment of the present application further includes a display screen 160. Optionally, the display screen 160 is a touch display screen for receiving a touch operation of a user on or near the touch screen using any suitable object such as a finger, a touch pen, or the like, and displaying a user interface of each application. The display screen 160 is generally provided at a front panel of the terminal 100, or at both the front panel and the rear panel of the terminal 100. The display screen 160 may be designed as a full-face screen, a curved screen, or a contoured screen. The display screen 160 may also be designed as a combination of a full-screen and a curved screen, or a combination of a non-flat screen and a curved screen, which is not limited in this embodiment.

In addition, those skilled in the art will appreciate that the configuration of terminal 100 as illustrated in the above-described figures is not intended to be limiting of terminal 100, and that terminals may include more or less components than those illustrated, or some components may be combined, or a different arrangement of components. For example, the terminal 100 further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a Wireless Fidelity (WiFi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

For ease of understanding, the graphic display system in the terminal is first described below, and the following embodiments are schematically described by taking an Android (Android) graphic display system as an example.

As shown in fig. 2, the content displayed on the display screen 21 is read from the hardware frame buffer, and the reading process is as follows: and scanning from the starting address of the hardware frame buffer area in the sequence from top to bottom and from left to right, so that the scanned contents are mapped on the display screen.

Since the content displayed on the display screen 21 needs to be updated continuously, if the reading and writing operations are performed in the same hardware frame buffer, the multi-frame content will be displayed on the display screen 21 at the same time, and therefore, the terminal adopts a double buffer mechanism, where one buffer in the double buffer is used for content reading and displaying, and the other buffer is used for background graphics composition and writing.

Illustratively, as shown in FIG. 2, the front buffer 22 is a frame buffer for the content to be displayed on the display screen, and the back buffer 23 is a frame buffer for composing the next frame of graphics. When the previous frame is displayed and the next frame is written, the display screen 21 reads the content in the back buffer 23, and correspondingly, the front buffer 22 performs the composition of the next frame of graphics (the roles of the front and back buffers are exchanged).

The surface flunger is used as a synthesizer of graphics, and is used for synthesizing a plurality of graphics (surfaces) transmitted by an upper layer, and submitting the graphics to a hardware buffer area of the display screen for the display screen 21 to read and display. As shown in fig. 2, the content in the back buffer 23 is synthesized by the surface flicker for a plurality of surfaces 24. Wherein, each surface corresponds to a window (window) at the upper layer, such as dialog box, status bar, Activity (Activity).

The transmission of the graphics takes a buffer area (buffer) as a carrier, and the surface is used for further packaging the buffer. In order to manage a plurality of buffers in a surface, as shown in fig. 3, a buffer queue (BufferQueue) is provided inside the surface, and forms a producer consumer model with an upper layer and the surface flunger. Wherein, the upper layer is a Producer (Producer), and the surface flag is a Consumer (Consumer).

For each buffer in the BufferQueue, it contains an idle state (Free), an dequeue state (Dequeued), an enqueue state (Queued), and an acquire state (Acquired). Wherein, in the idle state, the buffer can be used by the upper layer; in the out-of-column state, the buffer is being used by the upper layer; in the queuing state, the buffer is used by an upper layer (drawing and rendering are completed), and is waited to be synthesized by the surfaceFlinger; in the acquisition state, the surfaceFlinger is synthesizing according to buffer. And, the different states can be converted through the buffer out (dequeue buffer) and buffer in (queueBuffer) operations, and the conversion process is as shown in fig. 3.

Referring to fig. 4, a flowchart of a method for determining a GPU performance bottleneck according to an exemplary embodiment of the present application is shown. The method may include the following steps.

Step 401, in the running process of the target application program, obtaining the GPU running frequency of the GPU within a predetermined time period.

In one possible implementation, the performance requirement of the target application on the GPU is higher than that of other applications, and the target application is determined by the terminal according to the application type of the installed application, or the target application is manually set by the user.

Optionally, the target application is an application that needs to perform dynamic image rendering, and the application may be a video playing application or a game application. For example, the target application program includes any one of a virtual reality application program, a three-dimensional map program, a military simulation program, a Third-person Shooting Game (TPS), a First-person Shooting Game (FPS), a MOBA Game, and a multi-player gunfight survival Game. The specific type of target application is not limited in this application.

In order to improve the accuracy of subsequent GPU performance bottleneck detection, in a possible implementation, the terminal acquires the GPU operating frequency once at predetermined time intervals within a predetermined time duration, so as to acquire a plurality of GPU operating frequencies within the predetermined time duration. For example, the predetermined time period is 10s, and the time interval is 100 ms.

For the same target application program, the performance requirements for the GPU may be different in different operation scenes, for example, in the game application program, the performance requirements for the GPU in a game start scene, a game loading scene, and a game main interface scene are lower than the performance requirements for the GPU in a game progress scene. Therefore, in a possible implementation manner, the terminal executes the step of acquiring the GPU operation frequency when the target application program runs to the target operation scene. And the scene information of the target running scene is transmitted by the target application program through a data channel between the target application program and the terminal operating system.

Step 402, if the GPU operating frequency meets a preset condition, acquiring a GPU single-frame rendering time length of the GPU in a preset time length, wherein the GPU single-frame rendering time length is the operating time length of the GPU in the single-frame image rendering process.

Further, the terminal detects whether the obtained GPU operating frequency meets a preset condition, if so, the probability of reaching GPU performance bottleneck is determined, and the step of obtaining GPU single-frame rendering time length of the GPU in a preset time length is executed; and if not, continuously acquiring the GPU operating frequency.

Optionally, the preset condition is that the GPU operating frequency is greater than a frequency threshold. For example, the frequency threshold is 80% x the upper frequency limit of the GPU.

The image rendering is completed by a Central Processing Unit (CPU) and a GPU together, the time for the CPU and the GPU to render one frame of image together is called a single frame rendering time, correspondingly, the running duration of the GPU in the single frame image rendering process is the GPU single frame rendering duration, and the running duration of the CPU in the single frame image rendering process is the CPU single frame rendering duration. And the terminal only acquires the GPU single-frame rendering duration because whether the GPU reaches the performance bottleneck needs to be judged subsequently.

Because the time lengths consumed for rendering different images are different, in a possible implementation manner, the terminal obtains the GPU single-frame rendering time length corresponding to each frame image within the predetermined time length, or the terminal obtains the GPU single-frame rendering time length corresponding to a specified image frame within the predetermined time length.

Step 403, determining whether the GPU reaches a performance bottleneck according to the GPU single frame rendering duration.

When the same image frame is rendered, the longer the single-frame rendering duration of the GPU indicates that the slower the rendering speed of the GPU is, the poorer the performance of the corresponding GPU is.

Because the GPU operating frequency and the GPU single-frame rendering duration are easy to obtain, compared with the method for carrying out complex GPU load calculation through additional driving, the difficulty of determining whether the GPU reaches the performance bottleneck or not is lower according to the GPU operating frequency and the GPU single-frame rendering duration, the detection process of the GPU performance bottleneck can be simplified, and data support is provided for subsequent development and optimization.

To sum up, in the embodiment of the application, the GPU operation frequency of the GPU in the predetermined duration is obtained, and when the GPU operation frequency meets the preset condition, the GPU single-frame rendering duration of the GPU in the predetermined duration is further obtained, so that whether the GPU reaches the performance bottleneck in the operation process of the target application program is determined according to the GPU single-frame rendering duration; in the embodiment of the application, the terminal performs GPU performance bottleneck judgment based on the GPU operating frequency and the GPU single-frame rendering duration of the single-frame image rendering process, complex GPU load calculation is not required to be performed by means of driving, the GPU performance bottleneck detection accuracy is guaranteed, meanwhile, the complexity of performance bottleneck detection is reduced, and the efficiency of performance bottleneck detection is improved.

Referring to fig. 5, a flowchart of a method for determining a GPU performance bottleneck according to another exemplary embodiment of the present application is shown. The method may include the following steps.

Step 501, in the running process of the target application program, obtaining the GPU running frequency of the GPU in a preset time length.

The step 401 may be referred to in the implementation manner of this step, and this embodiment is not described herein again.

Step 502, calculating an average operating frequency of the GPU within a predetermined time period according to the GPU operating frequency.

Since the GPU operating frequency may change constantly, determining whether the preset condition is satisfied based on only a single GPU operating frequency will affect the accuracy of the result. In a possible implementation manner, the terminal acquires the GPU operating frequencies corresponding to different time points within the predetermined time duration, so as to calculate the average operating frequency of the GPUs within the predetermined time duration according to the multiple GPU operating frequencies.

In an illustrative example, the terminal acquires the CPU operating frequency every 1s, so as to acquire 10 CPU operating frequencies within 10s, which are 1550MHz, 1570MHz, 1625MHz, 1650MHz, 1655MHz, 1600MHz, 1650MHz, 1600MHz, and 1650MHz, respectively, and further calculates to obtain an average operating frequency of 1620 MHz.

It should be noted that, a terminal (operating system) may obtain the GPU operating frequency in real time through a preset interface (e.g., devfreq), and the method for obtaining the GPU operating frequency is not limited in the embodiment of the present application.

In step 503, if the average operating frequency is greater than the frequency threshold, it is determined that the operating frequency of the GPU meets the preset condition, and the frequency threshold is less than the upper limit of the operating frequency of the GPU.

When the GPU runs at full load (load is large), the GPU running frequency approaches the upper limit of the GPU running frequency from the aspect of frequency, so in the embodiment of the application, the terminal detects whether the average running frequency of the GPU in a predetermined time is greater than the frequency threshold, and if so, determines that the GPU running frequency meets a preset condition (possibly reaching a GPU performance bottleneck); and if the frequency is less than the preset frequency, determining that the GPU operating frequency does not meet the preset condition.

In a possible implementation manner, the terminal is preset with an upper limit of the ratio of the operating frequencies, and when the average operating frequency of the GPU/the maximum operating frequency of the GPU is greater than the upper limit of the ratio, it is determined that the operating frequency of the GPU meets a preset condition. For example, the upper limit of the ratio is 0.8.

With reference to the example in the above step, when the frequency threshold is 0.8 × the upper limit of the operating frequency, and the upper limit of the operating frequency is 2000MHz, since the average operating frequency of the GPU is 1620MHz > 1600MHz, the terminal determines that the GPU operating frequency meets the preset condition.

In other possible embodiments, for a plurality of continuous GPU operating frequencies acquired by the terminal, the terminal detects whether a duration that the GPU operating frequency is greater than the frequency threshold is greater than a duration threshold, if so, determines that the GPU operating frequency satisfies a preset condition, and if not, determines that the GPU operating frequency does not satisfy the preset condition. Namely, the terminal detects whether the operating frequency of the GPU approaches the maximum operating frequency of the GPU for a long time.

Step 504, if the GPU operating frequency meets the preset condition, acquiring a start time point and an end time point of each enlisting process within a preset time duration, where the enlisting process is a process of returning the buffer subjected to image rendering to the BufferQueue.

As shown in fig. 3, rendering of each frame of image needs to go through an dequeue (dequeue) process and an Enqueue (Enqueue) process, where the dequeue process refers to a process in which an upper layer applies for a Free buffer from a buffer queue to perform rendering (i.e., a process in which the buffer changes from a Free state to a Dequeued state in fig. 3), and the Enqueue process refers to a process in which data obtained by rendering is written into the buffer and put back to the buffer queue, and a surfafinger is waited for composition (i.e., a process in which the buffer changes from a Dequeued state to a Queued state in fig. 3).

And, seen from the executing body, the dequeuing process is executed by the CPU, and in the dequeuing process, the CPU measures the width and height of the view (i.e. measure), sets the width and height position of the view (i.e. layout), creates a display list and draws (i.e. draw), generates a polygon and a texture, and sends the generated texture and the polygon to the GPU; the enqueueing process is executed by the GPU, and in the enqueueing process, the GPU performs rasterization and synthesis (i.e., image rendering) on the texture and the polygon generated by the CPU, and writes the rendered data into the buffer.

Therefore, based on the rendering process of the image frame, in one possible embodiment, the terminal records the starting time point and the ending time point of each enlisting process within a predetermined time length, so as to calculate the time length of each enlisting process based on the starting time point and the ending time point.

Optionally, the starting time point of the enqueueing process is a time point when the CPU sends the polygon and the texture to the GPU, and the ending time point of the enqueueing process is a time point when the enqueueing is completed through the buffer of the image rendering.

Schematically, in the image rendering process, the distribution of the starting time point and the ending time point of the enlisting process is shown in fig. 6. In the process of drawing each frame of image, the oblique line filling part is the CPU operation time interval (the first half is that the CPU performs measurement, setting, texture and polygon generation, and the second half is that the CPU performs resource cleaning), and the black line filling part is the GPU operation time interval (the drawing is performed according to the texture and the polygon sent by the CPU).

And 505, calculating the GPU single-frame rendering duration according to the starting time point and the ending time point.

In the image rendering process, the running time of the GPU is mainly concentrated in the enlisting process, so the terminal can calculate the GPU single-frame rendering duration in the image rendering process according to the starting time point and the ending time point of each enlisting process. In one possible embodiment, this step may include the following steps.

First, for each enqueue process, a first time interval between a start time point and an end time point is calculated.

For each enqueue process in a preset time length, the terminal calculates a first time interval between a starting time point and an ending time point, wherein the first time interval is the consumed time of the enqueue time length.

Moreover, because the multi-frame image rendering is included in the preset time length, namely the process of multiple enqueueing is included, the terminal repeats the step to obtain a plurality of first time intervals.

In an illustrative example, the terminal calculates first time intervals corresponding to 10 enqueue processes, which are: 6ms, 5ms, 6ms, 7ms, 6ms, 5ms, 6ms, 7ms, 6 ms.

And secondly, determining the average value of the first time interval corresponding to each enlisting process in the preset time length as the GPU single-frame rendering time length.

In order to avoid inaccurate results caused by determining the GPU single-frame rendering duration according to a single first time interval, in this embodiment, the terminal calculates an average value of the first time intervals corresponding to each enlisting process within the predetermined duration, so as to determine the average value as the GPU single-frame rendering duration of the GPU within the predetermined duration.

With reference to the example in the above step, the terminal calculates an average value of 6ms according to the first time interval corresponding to the 10 enqueue processes, so as to determine the GPU single frame rendering duration as 6 ms.

It should be noted that, in other possible embodiments, the terminal may determine the GPU single-frame rendering duration according to a plurality of first time intervals obtained by sampling in a sampling manner, which is not limited in this embodiment of the present application.

Step 506, acquiring a single-frame rendering duration within a predetermined duration, wherein the single-frame rendering duration is a duration for rendering a single-frame image.

In the image rendering process, another expression of the full-load operation (large load) of the GPU is that the proportion of the GPU operation time is increased, so that the terminal can determine whether the GPU is full-load (reaches a performance bottleneck) by calculating the proportion of the single-frame rendering time of the GPU to the total rendering time of the single-frame images.

On the basis of fig. 5, as shown in fig. 7, the present step may further include the following steps.

Step 506A, acquiring a starting time point and an ending time point of each enlisting process within a preset time length.

The step 504 may be referred to in the process of obtaining the starting time point and the ending time point of the enlisting process, and this embodiment is not described herein again.

In step 506B, a second time interval between end time points of two adjacent enqueue processes is calculated.

Since each image rendering includes the enqueuing process, the terminal may determine the rendering duration of the single-frame image according to a time interval between ending time points of two adjacent enqueuing processes, or according to a time interval between starting time points of two adjacent enqueuing processes.

In one possible embodiment, i ≧ 1 is the i +1 th end time to the i-th end time, or i ≧ 1 is the i +1 th start time to the i-th start time, i ≧ 1

Schematically, as shown in fig. 6, the terminal calculates a second time interval according to an end time point corresponding to the 2 nd frame image and an end time point corresponding to the 1 st frame image; and calculating to obtain a second time interval according to the end time point corresponding to the 3 rd frame image and the end time point corresponding to the 2 nd frame image, and so on.

In an illustrative example, the terminal calculates that the 10 second time intervals are each 16 ms.

In other possible embodiments, each image rendering includes a dequeuing process, so that the terminal may further determine the rendering duration of the single-frame image according to a time interval between ending time points of two consecutive dequeuing processes or according to a time interval between starting time points of two consecutive dequeuing processes, which is not limited in this embodiment.

Step 506C, determining the average value of each second time interval in the predetermined time length as the single-frame rendering time length.

In order to avoid an inaccurate result caused by determining the single-frame rendering duration according to a single second time interval, in this embodiment, the terminal calculates an average value of each second time interval within the predetermined duration, so as to determine the average value as the single-frame rendering duration of the image frame within the predetermined duration.

With reference to the example in the above step, since the calculated second time intervals are all 16ms, the terminal determines that the single frame rendering duration within the predetermined duration is 16 ms.

And step 507, determining whether the GPU reaches the performance bottleneck according to the GPU single-frame rendering time length and the single-frame rendering time length.

Further, the terminal determines whether the GPU reaches the performance bottleneck within the preset time according to the GPU single-frame rendering time length and the single-frame rendering time length obtained through calculation. As shown in fig. 7, this step may include the following steps.

Step 507A, calculating the ratio of the GPU single-frame rendering duration to the single-frame rendering duration.

And the ratio is GPU single-frame rendering duration/single-frame rendering duration.

In connection with the example in the above step, when the GPU single frame rendering duration is 6ms, and the single frame rendering duration is 16ms, the ratio is 6/16-0.375.

Further, the terminal detects whether the ratio is larger than a preset value, and if so, the proportion of the GPU in the operation time is determined to be large in the image rendering process, namely the GPU reaches the performance bottleneck; and if so, determining that the GPU does not reach the performance bottleneck.

And step 507B, if the ratio is larger than a preset value, determining that the GPU reaches the performance bottleneck.

In connection with the example in the above step, when the preset ratio is 0.3, since 0.375 > 0.3, the terminal determines that the CPU reaches the performance bottleneck.

And step 507C, if the ratio is smaller than a preset value, determining that the GPU does not reach the performance bottleneck.

In the embodiment, the terminal calculates the single-frame rendering time length of the single-frame image and the single-frame rendering time length of the GPU according to the starting time point and the ending time point of the listing process in the image rendering, and further determines whether the GPU reaches the performance bottleneck or not according to the ratio of the single-frame rendering time length to the single-frame rendering time length of the GPU, so that the calculation complexity in the detection process is reduced, and the accuracy of the detection result is ensured.

In a possible embodiment, when it is determined that the GPU reaches the performance bottleneck, in order to ensure the picture display quality of the target application, on the basis of fig. 5, as shown in fig. 8, the following steps are further included after step 507.

Step 508, if the GPU reaches the performance bottleneck, the current frame rate is obtained.

In a possible implementation manner for the manner of obtaining the current frame rate, the terminal calculates the current frame rate according to the single-frame rendering duration obtained in the above step, where the current frame rate is 1 s/single-frame rendering duration. For example, when the rendering duration of a single frame is 16ms, the terminal calculates that the current frame rate is 62 fps.

Further, the terminal acquires a target frame rate of the target application program, detects whether the current frame rate reaches the target frame rate, and if so, determines that the picture of the target application program is not blocked; if not, it is determined that the screen of the target application is stuck, and the following

step

509 or 510 is performed.

Optionally, the target frame rate of the target application is obtained by the terminal operating system through a data channel with the target application.

In step 509, if the current frame rate does not reach the target frame rate of the target application program and the average operating frequency is less than the upper limit of the operating frequency of the GPU, the operating parameters of the GPU are adjusted up.

In a possible implementation manner, when the current frame rate does not reach the target frame rate, the terminal obtains the average operating frequency (calculated in step 502), and detects whether the average operating frequency reaches the upper limit of the operating frequency of the GPU. If the performance of the GPU is not reached, the performance of the GPU is indicated to have a promotion space, and the terminal is used for up-regulating the operation parameters of the GPU. For example, the terminal gradually adjusts the upper limit of the operating frequency according to a predetermined upward adjustment amplitude on the basis of the average operating frequency, so as to improve the rendering performance of the GPU.

In an illustrative example, the average operating frequency of the GPU is 1620MHz, the upper limit of the operating frequency of the GPU is 2000MHz, and the terminal gradually adjusts the operating frequency of the GPU on the basis of 1620MHz according to a predetermined adjustment amplitude of 50 MHz.

Step 510, if the current frame rate does not reach the target frame rate of the target application program and the average operating frequency reaches the upper limit of the operating frequency of the GPU, adjusting the image quality of the target application program.

When the current frame rate does not reach the target frame rate and the average operating frame rate reaches the upper limit of the operating frequency of the GPU, the terminal adjusts (for example, down-adjusts) the image quality of the target application program in order to ensure the display quality of the picture, thereby reducing the image rendering difficulty of the GPU.

In this embodiment, after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether a picture is stuck, and adjusts the operating frequency of the GPU or adjusts the picture quality of the target application program based on the average operating frequency and the upper limit of the operating frequency of the GPU when the picture is stuck, thereby improving the picture display quality of the target application program.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 9, a schematic structural diagram of an apparatus for determining a GPU performance bottleneck according to an embodiment of the present application is shown. The apparatus can be implemented as all or a part of the terminal in fig. 1 by a dedicated hardware circuit, or a combination of hardware and software, and includes:

a first obtaining module 910, configured to obtain, during an operation process of a target application, a GPU operation frequency of a GPU within a predetermined time period;

a second obtaining module 920, configured to obtain a GPU single frame rendering duration of the GPU within the predetermined duration when the GPU operating frequency meets a preset condition, where the GPU single frame rendering duration is an operating duration of the GPU in a single frame image rendering process;

a determining module 930, configured to determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering duration.

Optionally, the second obtaining module 920 includes:

a first obtaining unit, configured to obtain a start time point and an end time point of each enqueuing process within the predetermined time duration, where the enqueuing process is a process of returning a buffer area buffer subjected to image rendering to a buffer area queue buffer queue;

and the first calculation unit is used for calculating the GPU single-frame rendering duration according to the starting time point and the ending time point.

Optionally, the first computing unit is configured to:

for each enqueueing process, calculating a first time interval between the start time point and the end time point;

and determining the average value of the first time interval corresponding to each enlisting process in the preset time length as the GPU single-frame rendering time length.

Optionally, the determining module 930 includes:

the second obtaining unit is used for obtaining single-frame rendering duration in the preset duration, and the single-frame rendering duration is duration for rendering a single-frame image;

and the determining unit is used for determining whether the GPU reaches a performance bottleneck according to the GPU single-frame rendering duration and the single-frame rendering duration.

Optionally, the second obtaining unit is configured to:

acquiring a starting time point and an ending time point of each enlisting process within the preset time length, wherein the enlisting process is a process of replacing the buffer subjected to image rendering into a buffer queue;

calculating a second time interval between the ending time points of two adjacent enlisting processes;

and determining the average value of each second time interval in the preset time length as the single-frame rendering time length.

Optionally, the determining unit is configured to:

calculating a ratio of the GPU single-frame rendering duration to the single-frame rendering duration, wherein the ratio is the GPU single-frame rendering duration/the single-frame rendering duration;

if the ratio is larger than a preset value, determining that the GPU reaches a performance bottleneck;

and if the ratio is smaller than the preset value, determining that the GPU does not reach the performance bottleneck.

Optionally, the apparatus further comprises:

the first detection module is used for calculating the average operating frequency of the GPU in the preset time length according to the GPU operating frequency; if the average operating frequency is greater than a frequency threshold, determining that the GPU operating frequency meets the preset condition, wherein the frequency threshold is less than the upper limit of the GPU operating frequency;

alternatively, the first and second electrodes may be,

and the second detection module is used for determining that the GPU operating frequency meets the preset condition if the duration that the GPU operating frequency is greater than the frequency threshold is greater than a duration threshold.

Optionally, the apparatus further comprises:

a third obtaining module, configured to obtain a current frame rate if the GPU reaches a performance bottleneck;

a first adjusting module, configured to adjust an operating parameter of the GPU upward if the current frame rate does not reach a target frame rate of the target application program and the average operating frequency is less than an upper limit of an operating frequency of the GPU;

and the second adjusting module is used for adjusting the image quality of the target application program if the current frame rate does not reach the target frame rate of the target application program and the average operating frequency reaches the upper limit of the operating frequency of the GPU.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

The present application further provides a computer-readable medium, on which program instructions are stored, and when the program instructions are executed by a processor, the method for determining the performance bottleneck of the GPU provided by the above-mentioned method embodiments is implemented.

The present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for determining a GPU performance bottleneck described in the above embodiments.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps in the frame rate control method for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing associated hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like. The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for determining GPU performance bottlenecks in a graphics processor, the method comprising:

if the GPU operating frequency meets a preset condition, acquiring a starting time point and an ending time point of each enqueuing process within the preset time length, wherein the enqueuing process is a process of putting the buffer r subjected to image rendering back to the buffer queue;

calculating a GPU single-frame rendering duration according to the starting time point and the ending time point, wherein the GP U single-frame rendering duration is the operation duration of the GPU in the single-frame image rendering process;

2. The method of claim 1, wherein said calculating the GPU single frame rendering duration based on the start time point and the end time point comprises:

3. The method of claim 1 or 2, wherein determining whether the GPU has reached a performance bottleneck based on the GPU single frame rendering duration comprises:

acquiring single-frame rendering duration within the preset duration, wherein the single-frame rendering duration is duration for rendering a single-frame image;

and determining whether the GPU reaches a performance bottleneck according to the GPU single-frame rendering duration and the single-frame rendering duration.

4. The method of claim 3, wherein obtaining the rendering duration of the single frame within the predetermined duration comprises:

5. The method of claim 3, wherein determining whether the GPU has reached a performance bottleneck based on the GPU single frame rendering duration and the single frame rendering duration comprises:

calculating the ratio of the GPU single-frame rendering duration to the single-frame rendering duration, wherein the ratio is the G PU single-frame rendering duration/the single-frame rendering duration;

6. The method according to claim 1 or 2, wherein after obtaining the GPU operating frequency of the G PU for the predetermined time period, the method further comprises:

calculating the average operating frequency of the GPU in the preset time length according to the GPU operating frequency; if the average operating frequency is greater than a frequency threshold, determining that the GPU operating frequency meets the preset condition, wherein the frequency threshold is less than the upper limit of the GPU operating frequency;

alternatively, the first and second electrodes may be,

and if the time length that the GPU operating frequency is greater than the frequency threshold is greater than a time length threshold, determining that the GPU operating frequency meets the preset condition.

7. The method of claim 6, wherein after determining whether the GPU has reached a performance bottleneck based on the GPU single frame rendering duration, the method further comprises:

if the GPU reaches the performance bottleneck, acquiring the current frame rate;

if the current frame rate does not reach the target frame rate of the target application program and the average operating frequency is less than the upper limit of the operating frequency of the GPU, the operating parameters of the GPU are adjusted up;

and if the current frame rate does not reach the target frame rate of the target application program and the average operating frequency reaches the upper limit of the operating frequency of the GPU, adjusting the image quality of the target application program.

8. An apparatus for determining a graphics processor GPU performance bottleneck, the apparatus comprising:

the second acquisition module is used for acquiring a starting time point and an ending time point of each enqueuing process within the preset time length when the GPU operating frequency meets a preset condition, wherein the enqueuing process is a process of putting a buffer area buffer subjected to image rendering back to a buffer area queue; calculating a GPU single-frame rendering time length according to the starting time point and the ending time point, wherein the GPU single-frame rendering time length is the operation time length of the GPU in the single-frame image rendering process;

9. A terminal, characterized in that the terminal comprises a processor, a memory connected to the processor, and program instructions stored on the memory, which when executed by the processor implement the method for determining a GPU performance bottleneck as claimed in any of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon program instructions which, when executed by a processor, implement the method for determining a GPU performance bottleneck of any of claims 1 to 7.