WO2020156132A1

WO2020156132A1 - Gpu performance bottleneck determining method and device, terminal, and storage medium

Info

Publication number: WO2020156132A1
Application number: PCT/CN2020/071796
Authority: WO
Inventors: 陈岩
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-01-28
Filing date: 2020-01-13
Publication date: 2020-08-06
Also published as: CN109800141B; CN109800141A

Abstract

A GPU performance bottleneck determining method and device, a terminal, and a storage medium, relating to the technical field of terminals. The method comprises: obtaining the GPU running frequency of a GPU within a predetermined duration in the running process of a target application (401); in response to the GPU running frequency satisfying a preset condition, obtaining a GPU single frame rendering duration of the GPU within the predetermined duration, the GPU single frame rendering duration being the GPU running duration in a single image frame rendering process (402); and determining, according to the GPU single frame rendering duration, whether the GPU reaches a performance bottleneck (403). In the method, a terminal performs GPU performance bottleneck determination according to the GPU running frequency and the GPU single frame rendering duration in a single image frame rendering process, without performing complex GPU load calculation using a driver, thereby ensuring the accuracy of GPU performance bottleneck detection, reducing the complexity of performance bottleneck detection, and improving the efficiency of performance bottleneck detection.

Description

Method, device, terminal and storage medium for determining GPU performance bottleneck

This application claims the priority of a Chinese patent application filed on January 28, 2019 with the application number 201910080514.5 and the invention title "Method, Device, Terminal and Storage Medium for Determining GPU Performance Bottleneck", the entire content of which is incorporated by reference In this application.

Technical field

The embodiments of the present application relate to the field of terminal technology, and in particular, to a method, device, terminal, and storage medium for determining GPU performance bottlenecks.

Background technique

For applications such as video applications and game applications that require dynamic image rendering, the running quality is closely related to the performance of a graphics processor unit (GPU).

In related technologies, in order to implement GPU load monitoring, a driver corresponding to the GPU needs to be installed in the terminal, so as to calculate the real-time load of the GPU by driving, and then determine whether the GPU has reached a performance bottleneck during the running of the application.

Summary of the invention

The embodiments of the present application provide a method, device, terminal, and storage medium for determining GPU performance bottlenecks. The technical solutions are as follows:

On the one hand, a method for determining GPU performance bottlenecks is provided, and the method includes:

During the operation of the target application, obtain the GPU operating frequency of the GPU within a predetermined period of time;

In response to the GPU running frequency meeting a preset condition, acquiring the GPU single frame rendering time length of the GPU within the predetermined time period, where the GPU single frame rendering time length is the running time length of the GPU in the single frame image rendering process;

Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time.

In another aspect, a device for determining GPU performance bottlenecks is provided, and the device includes:

The first acquiring module is used to acquire the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;

The second obtaining module is configured to obtain the GPU single-frame rendering time of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single-frame rendering time is determined by the single-frame image rendering process. The running time of the GPU;

The determining module is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.

In another aspect, a terminal is provided. The terminal includes a processor, a memory connected to the processor, and program instructions stored on the memory, and the processor executes the program instructions as described above. The method for determining the GPU performance bottleneck described in the aspect.

In another aspect, a computer-readable storage medium is provided, and program instructions are stored thereon, and when the program instructions are executed by a processor, the method for determining the GPU performance bottleneck as described in the foregoing aspect is implemented.

Description of the drawings

FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application;

Figure 2 is a schematic diagram of the graphic display process in the Android system;

Figure 3 is a state transition diagram of the four states of the buffer;

FIG. 4 shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application;

FIG. 5 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;

Figure 6 is a schematic diagram of the distribution of the start time point and the end time point of the enrollment process;

FIG. 7 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;

FIG. 8 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;

FIG. 9 is a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application.

detailed description

In order to make the objectives, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.

In the description of this application, it should be understood that the terms "first", "second", etc. are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In the description of this application, it should be noted that, unless otherwise clearly specified and limited, the terms "connected" and "connected" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral Ground connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium. For those of ordinary skill in the art, the specific meanings of the above-mentioned terms in this application can be understood under specific circumstances. In addition, in the description of this application, unless otherwise specified, "plurality" means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.

Before explaining the embodiments of the present application, the application scenarios of the embodiments of the present application are first described. Fig. 1 shows a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application.

The terminal 100 is an electronic device installed with a target application. The target application can be a system program or a third-party application. Among them, third-party applications are applications created by third parties other than the user and the operating system. For example, the target application can be a game application or a video playback application.

Optionally, the terminal 100 includes: a processor 120 and a memory 140.

The processor 120 may include one or more processing cores. The processor 120 uses various interfaces and lines to connect various parts of the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets, or instruction sets stored in the memory 140, and calling data stored in the memory 140. 100 various functions and processing data. Optionally, the processor 120 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). A kind of hardware form to realize. The processor 120 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), and a modem. Among them, the CPU mainly processes the operating system, user interface, and application programs; the GPU is used to render and draw the content that the display needs to display; the modem is used to process wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 120, but may be implemented by a chip alone.

The memory 140 may include random access memory (RAM) or read-only memory (Read-Only Memory). Optionally, the memory 140 includes a non-transitory computer-readable storage medium. The memory 140 may be used to store instructions, programs, codes, code sets or instruction sets. The memory 140 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), Instructions used to implement the following various method embodiments, etc.; the storage data area can store the data involved in the following various method embodiments, etc.

The terminal 120 in the embodiment of the present application further includes a display screen 160. Optionally, the display screen 160 is a touch display screen for receiving user touch operations on or near any suitable objects such as a finger, a touch pen, etc., and displaying user interfaces of various application programs. The display screen 160 is usually arranged on the front panel of the terminal 100 or, at the same time, arranged on the front panel and the rear panel of the terminal 100. The display screen 160 can be designed as a full screen, a curved screen or a special-shaped screen. The display screen 160 can also be designed as a combination of a full screen and a curved screen, or a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment.

In addition, those skilled in the art can understand that the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100. The terminal may include more or less components than those shown in the figure, or a combination of certain components. Components, or different component arrangements. For example, the terminal 100 also includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (Wireless Fidelity, WiFi) module, a power supply, a Bluetooth module, etc., which are not repeated here.

For ease of understanding, the graphic display system in the terminal is first described below, and the following embodiments take an Android graphic display system as an example for schematic description.

As shown in Figure 2, the content displayed on the display 21 is read from the hardware frame buffer, and the reading process is as follows: starting from the starting address of the hardware frame buffer, following from top to bottom, from left to left Scan in the order from the right to map the scanned content on the display.

Since the content displayed in the display 21 needs to be continuously updated, if reading and writing operations are performed in the same hardware frame buffer, multiple frames of content will be displayed on the display 21 at the same time. Therefore, the terminal adopts a double buffering mechanism. , One of the double buffers is used for content reading and display, and the other buffer is used for background graphics synthesis and writing.

Schematically, as shown in FIG. 2, the front buffer 22 is a frame buffer for content to be displayed on the display screen, and the back buffer 23 is a frame buffer for synthesizing the next frame of graphics. When the current frame is displayed and the next frame is written, the display screen 21 reads the content in the back buffer 23. Correspondingly, the next frame of graphics is synthesized in the front buffer 22 (the front and rear buffer roles interact with each other). change).

SurfaceFlinger, as a graphics synthesizer, is used to synthesize multiple surfaces transferred from the upper layer and submit them to the hardware frame buffer of the display screen for the display screen 21 to read and display. As shown in Figure 2, the content in the back buffer 23 is synthesized by SurfaceFlinger 24 on multiple surfaces 25. Among them, each surface corresponds to a window (window) of the upper layer, such as a dialog box, a status bar, and an activity (Activity).

The transfer of graphics uses a buffer as a carrier, and the surface is a further encapsulation of the buffer. In order to realize the management of multiple buffers in the surface, as shown in Figure 3, a buffer queue (BufferQueue) is provided inside the surface, which forms a producer consumer model with the upper layer and SurfaceFlinger. Among them, the upper layer is the producer (Producer), and SurfaceFlinger is the consumer (Consumer).

For each buffer in the BufferQueue, it includes the free state (Free), the dequeued state (Dequeued), the queued state (Queued), and the acquisition state (Acquired). Among them, in the idle state, the buffer can be used by the upper layer; in the dequeuing state, the buffer is being used by the upper layer; in the inqueuing state, the buffer is used by the upper layer (drawing and rendering is completed), waiting to be synthesized by SurfaceFlinger; in the acquiring state, SurfaceFlinger is using Buffer is synthesized. In addition, different states can be converted through the dequeueBuffer and queueBuffer operations. The conversion process is shown in Figure 3.

Please refer to FIG. 4, which shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application. The method may include the following steps.

Step 401: During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined period of time.

In a possible implementation, the target application has higher performance requirements for the GPU than other applications, and the target application is determined by the terminal according to the application type of the installed application, or the target application Set manually by the user.

Optionally, the target application is an application that needs to perform dynamic image rendering, and the application may be a video playback application or a game application. For example, the target applications include virtual reality applications, three-dimensional map programs, military simulation programs, third-person shooting games (Third-Personal Shooting Game, TPS), first-person shooting games (First-Person Shooting game, FPS), multiplayer Any one of online tactical competition games (Multiplayer Online Battle Arena, MOBA) games and multiplayer gun battle survival games. This application does not limit the specific types of target applications.

Since the GPU operating frequency will continue to change during the running of the target application, in order to improve the accuracy of the subsequent GPU performance bottleneck detection, in a possible implementation manner, within a predetermined period of time, the terminal obtains it every predetermined time interval GPU operating frequency, thereby obtaining multiple GPU operating frequencies within a predetermined period of time. For example, the predetermined duration is 10s, and the time interval is 100ms.

For the same target application, the performance requirements for the GPU under different operating scenarios may be different. For example, in the game application, the performance requirements for the GPU in the game startup scene, game loading scene, and game main interface scene are lower than those in the game. GPU performance requirements. Therefore, in a possible implementation manner, the terminal executes the step of obtaining the GPU running frequency when the target application program runs to the target running scene. Among them, the scene information of the target running scene is transmitted by the target application through the data channel with the terminal operating system.

Step 402, in response to the GPU operating frequency meeting the preset condition, obtain the GPU single frame rendering time of the GPU within the predetermined time period, where the GPU single frame rendering time is the GPU operating time in the single frame image rendering process.

Further, the terminal detects whether the obtained GPU operating frequency meets the preset condition, if it is satisfied, it determines that there is a probability of reaching the GPU performance bottleneck, and executes the step of obtaining the GPU single frame rendering time of the GPU within the predetermined time period; if not satisfied, Then continue to obtain the GPU operating frequency.

Optionally, the preset condition is that the GPU operating frequency is greater than the frequency threshold. For example, the frequency threshold is 80% × the upper limit of the GPU operating frequency.

Image rendering is completed by the Central Processing Unit (CPU) and GPU. The time for the CPU and GPU to render one frame of image together is called single frame rendering time. Correspondingly, the GPU running time in the single frame image rendering process is It is the GPU single frame rendering time, and the CPU running time during the single frame image rendering process is the CPU single frame rendering time. Since the subsequent need to determine whether the GPU has reached the performance bottleneck, the terminal only obtains the GPU single frame rendering time.

Due to the difference in the time it takes to render different pictures, in a possible implementation, the terminal obtains the GPU single frame rendering time corresponding to each frame of image within a predetermined time period, or the terminal obtains the GPU single frame corresponding to a specified image frame within a predetermined time period. Frame rendering time.

Step 403: Determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.

When rendering the same image frame, the longer the GPU single frame rendering time, the slower the GPU rendering speed, and the worse the performance of the corresponding GPU. In a possible implementation, the terminal detects whether the GPU single frame rendering time is greater than the duration threshold If it is greater than, it is determined that the GPU has reached the performance bottleneck; otherwise, it is determined that the GPU has not reached the performance bottleneck.

Because the GPU operating frequency and GPU single frame rendering time are easy to obtain, it is less difficult to determine whether the GPU has reached the performance bottleneck based on the GPU operating frequency and GPU single frame rendering time compared to the complicated GPU load calculation through additional drivers. And can simplify the GPU performance bottleneck detection process, and provide data support for subsequent development and optimization.

In summary, in the embodiments of the present application, by obtaining the GPU operating frequency of the GPU within a predetermined period of time, and when the GPU operating frequency meets a preset condition, the GPU single frame rendering time of the GPU within the predetermined period of time is further obtained, so as The frame rendering time determines whether the GPU reaches the performance bottleneck during the running of the target application; in this embodiment, the terminal judges the GPU performance bottleneck based on the GPU operating frequency and the GPU single frame rendering time of the single frame image rendering process, without the need for driver complexity GPU load calculation, while ensuring the accuracy of GPU performance bottleneck detection, reduces the complexity of performance bottleneck detection and improves the efficiency of performance bottleneck detection.

Optionally, obtaining the GPU single frame rendering duration of the GPU within the predetermined duration includes:

Obtain the start time point and the end time point of the enqueue process within a predetermined period of time. The enqueue process refers to the process of putting the image-rendered buffer (buffer) back into the buffer queue (BufferQueue);

Calculate the GPU single frame rendering time based on the start time point and the end time point.

Optionally, calculate the GPU single frame rendering time according to the start time point and the end time point, including:

Calculate the first time interval between the start time point and the end time point;

The average value of the first time interval corresponding to the enqueue process within the predetermined time period is determined as the GPU single frame rendering time.

Optionally, determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time, including:

Get the single frame rendering time within a predetermined time period, and the single frame rendering time is the time length of rendering a single frame image;

Determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time and single frame rendering time.

Optionally, obtaining a single frame rendering time within a predetermined time period includes:

Get the start time point and end time point of the enqueue process within a predetermined period of time. The enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;

Calculate the second time interval between the end time points of two adjacent enqueue processes;

The average value of the second time interval within the predetermined time period is determined as the single frame rendering time.

Optionally, determine whether the GPU has reached the performance bottleneck according to the GPU single frame rendering time and single frame rendering time, including:

Calculate the ratio of GPU single frame rendering time to single frame rendering time, the ratio is less than 1;

In response to the ratio being greater than the preset value, it is determined that the GPU has reached the performance bottleneck;

In response to the ratio being less than the preset value, it is determined that the GPU has not reached the performance bottleneck.

Optionally, after obtaining the GPU operating frequency of the GPU within a predetermined period of time, the method further includes:

Calculate the average operating frequency of the GPU within a predetermined period of time according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, it is determined that the GPU operating frequency meets the preset condition, and the frequency threshold is less than the upper limit of the GPU operating frequency;

or,

In response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold, it is determined that the GPU operating frequency satisfies the preset condition.

Optionally, after determining whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time, the method further includes:

In response to the GPU reaching the performance bottleneck, get the current frame rate;

In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increase the operating parameters of the GPU;

In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaches the upper limit of the GPU operating frequency, the image quality of the target application is adjusted.

The foregoing embodiment describes the principle of determining the GPU performance bottleneck. The detailed process of determining the GPU performance bottleneck will be described below in conjunction with the production consumer model shown in FIG. 3.

Please refer to FIG. 5, which shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application. The method may include the following steps.

Step 501: During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined time period.

For the implementation of this step, reference may be made to the above step 401, which will not be repeated in this embodiment.

Step 502: Calculate the average operating frequency of the GPU within a predetermined period of time according to the operating frequency of the GPU.

As the GPU operating frequency will continue to change, it is only based on a single GPU operating frequency to determine whether the preset conditions are met, which will affect the accuracy of the results. In a possible implementation manner, the terminal obtains GPU operating frequencies corresponding to different time points within a predetermined period of time, so as to calculate the average operating frequency of the GPUs within the predetermined period of time according to multiple GPU operating frequencies.

In an illustrative example, the terminal collects the CPU operating frequency every 1s, thereby obtaining 10 CPU operating frequencies within 10s, which are 1550MHz, 1570MHz, 1625MHz, 1650MHz, 1655MHz, 1600MHz, 1650MHz, 1650MHz, 1600MHz, 1650MHz, and further calculate the average operating frequency to be 1620MHz.

It should be noted that the terminal (operating system) can obtain the GPU operating frequency in real time through a preset interface (such as devfreq), and the embodiment of the present application does not limit the manner of obtaining the GPU operating frequency.

Step 503: In response to the average operating frequency being greater than the frequency threshold, it is determined that the GPU operating frequency satisfies a preset condition, and the frequency threshold is less than the GPU operating frequency upper limit.

When the GPU is running at full load (heavy load), from the point of view of frequency, the operating frequency of the GPU approaches the upper limit of the operating frequency of the GPU. Therefore, in the embodiment of the present application, the terminal detects whether the average operating frequency of the GPU within a predetermined period of time is greater than the frequency threshold. If it is greater than, it is determined that the GPU operating frequency meets the preset condition (which may reach the GPU performance bottleneck); if it is less than, it is determined that the GPU operating frequency does not meet the preset condition.

In a possible implementation manner, the terminal is preset with an upper limit of the operating frequency ratio, and when the average operating frequency of the GPU/the upper limit of GPU operating frequency is greater than the upper limit of the ratio, it is determined that the GPU operating frequency meets the preset condition. For example, the upper limit of the ratio is 0.8.

Combining the example in the above steps, when the frequency threshold is 0.8×GPU operating frequency upper limit and the operating frequency is limited to 2000MHz, since the average operating frequency of the GPU is 1620MHz>1600MHz, the terminal determines that the GPU operating frequency meets the preset condition.

In other possible implementation manners, for multiple consecutive GPU operating frequencies acquired by the terminal, the terminal detects whether the GPU operating frequency is greater than the frequency threshold for a duration greater than the duration threshold, if it is greater, it determines that the GPU operating frequency satisfies the preset condition, if less than , It is determined that the GPU operating frequency does not meet the preset condition. That is, the terminal detects whether the operating frequency of the GPU is close to the upper limit of the GPU operating frequency for a long time.

Step 504, in response to the GPU operating frequency meeting the preset condition, obtain the start time point and the end time point of the enqueue process within a predetermined time period. The enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue.

As shown in Figure 3, the rendering of each frame of image needs to go through the dequeque process and the enqueue process. The dequeue process refers to the process in which the upper layer applies for a free buffer from the BufferQueue for rendering (ie The process in which the buffer changes from the Free state to the Dequeued state in Figure 3), and the enqueue process refers to the process of writing the rendered data into the buffer and returning it to the BufferQueue, waiting for SurfaceFlinger to synthesize (that is, the buffer in Figure 3 is Dequeued The status changes to the Queued status).

And, from the perspective of the execution subject, the dequeue process is executed by the CPU, and during the dequeue process, the CPU measures the width and height of the view (ie measure), sets the width and height position of the view (ie layout), and creates the display List and draw (ie draw) and generate polygons and textures, and send the generated textures and polygons to the GPU; the enqueue process is executed by the GPU, and during the enqueue process, the GPU rasterizes the textures and polygons generated by the CPU And composite (that is, image rendering), and write the rendered data into the buffer.

Therefore, based on the image frame rendering process, in a possible implementation manner, the terminal records the start time point and end time point of each enqueue process within a predetermined period of time, so that subsequent calculations based on the start time point and the end time point The duration of the enrollment process.

Optionally, the start time point of the enqueue process is the time point when the CPU sends polygons and textures to the GPU, and the end time point of the enqueue process is the time point when the buffer that has undergone image rendering completes enqueue.

Schematically, during the image rendering process, the distribution of the start time point and the end time point of the enqueue process is shown in FIG. 6. Among them, in the process of drawing each frame of image, the oblique line filling part is the CPU running time (the first half is the CPU for measurement, setting, texture and polygon generation, and the second half is the CPU for resource cleaning), and the black filling part is for the GPU running Time period (drawing according to the texture and polygon sent by the CPU).

Step 505: Calculate the GPU single frame rendering time according to the start time point and the end time point.

In the image rendering process, the running time of the GPU is mainly concentrated in the enqueue process. Therefore, the terminal can calculate the GPU single frame rendering time in the image rendering process according to the start time point and end time point of each enqueue process. In a possible implementation, this step may include the following steps.

1. Calculate the first time interval between the start time point and the end time point.

Optionally, for each enqueue process within a predetermined time period, the terminal calculates a first time interval between the start time point and the end time point, and the first time interval is the time consumed for the enqueue time.

In addition, since the predetermined time period includes multiple frame image rendering, that is, multiple enqueue processes are involved, the terminal repeats this step to obtain multiple first time intervals.

In an illustrative example, the terminal calculates the first time interval corresponding to 10 enqueue processes, which are: 6ms, 5ms, 6ms, 7ms, 6ms, 6ms, 5ms, 6ms, 7ms, 6ms.

2. Determine the average value of the first time interval corresponding to the enqueue process within the predetermined time period as the GPU single frame rendering time.

In order to avoid inaccurate results caused by determining the GPU single frame rendering time based on a single first time interval, in this embodiment, the terminal calculates the average value of each enqueue process corresponding to the first time interval within a predetermined time period, thereby determining the average value It is the GPU single frame rendering time of the GPU within the predetermined time.

Combining the example in the above steps, the terminal calculates an average value of 6ms according to the first time interval corresponding to the 10 enqueue processes, so that the GPU single frame rendering time is determined to be 6ms.

It should be noted that in other possible implementation manners, the terminal may determine the GPU single frame rendering duration according to several first time intervals obtained by sampling in a sampling manner, which is not limited in this embodiment of the application.

Step 506: Obtain a single frame rendering duration within a predetermined duration, where the single frame rendering duration is the duration of rendering a single frame image.

During the image rendering process, another performance when the GPU is running at full load (heavy load) is the increase in the proportion of GPU running time. Therefore, the terminal can determine whether the GPU is by calculating the ratio of the GPU single frame rendering time to the total single frame image rendering time Full load (reaching performance bottleneck).

On the basis of FIG. 5, as shown in FIG. 7, this step may also include the following steps.

Step 506A: Obtain the start time point and the end time point of the enqueue process within a predetermined time period.

For the process of obtaining the start time point and the end time point of the enrollment process, reference may be made to the above step 504, which will not be repeated in this embodiment.

Step 506B: Calculate the second time interval between the end time points of the two adjacent enqueue processes.

Since each image rendering includes the enqueue process, the terminal can be based on the time interval between the end time points of two adjacent enqueue processes, or according to the time interval between the start time points of two adjacent enqueue processes. The time interval determines the rendering time of a single frame image.

In a possible implementation manner, the second time interval=i+1th end time point-ith end time point, i≥1, or the second time interval=i+1th start time point-ith start Time point, i≥1

Schematically, as shown in FIG. 6, the terminal calculates the second time interval according to the end time point corresponding to the second frame image and the end time point corresponding to the first frame image; according to the end time point corresponding to the third frame image and The end time point corresponding to the second frame of image is calculated to obtain the second time interval, and so on.

In an illustrative example, the terminal calculates that 10 second time intervals are all 16 ms.

In other possible implementation manners, since each image rendering includes a dequeue process, the terminal may also be based on the time interval between the end time points of two adjacent dequeue processes, or according to The time interval between the start time points of the column process determines the rendering time of a single frame image, which is not limited in this embodiment.

Step 506C: Determine the average value of the second time interval within the predetermined time length as the single frame rendering time length.

In order to avoid inaccurate results caused by determining the rendering duration of a single frame according to a single second time interval, in this embodiment, the terminal calculates the average value of each second time interval within the predetermined duration, thereby determining the average value as the image frame within the predetermined duration Single frame rendering time.

With reference to the example in the foregoing steps, since the calculated second time intervals are all 16 ms, the terminal determines that the single frame rendering time within the predetermined time length is 16 ms.

Step 507: Determine whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time and the single frame rendering time.

Further, the terminal determines whether the GPU reaches the performance bottleneck within the predetermined time period according to the calculated GPU single frame rendering time and single frame rendering time. As shown in Figure 7, this step may include the following steps.

Step 507A: Calculate the ratio of the GPU single frame rendering time to the single frame rendering time.

Among them, the ratio = GPU single frame rendering time/single frame rendering time, that is, the ratio is less than 1.

Combining the example in the above steps, when the GPU single frame rendering time is 6ms, and the single frame rendering time is 16ms, the ratio is 6/16=0.375.

Further, the terminal detects whether the ratio is greater than a preset value, and if it is greater, it is determined that the GPU running time takes a larger proportion in the image rendering process, that is, the GPU has reached the performance bottleneck; if it is less, it is determined that the GPU has not reached the performance bottleneck.

Step 507B, in response to the ratio being greater than the preset value, it is determined that the GPU has reached the performance bottleneck.

Combining the examples in the above steps, when the preset ratio is 0.3, since 0.375>0.3, the terminal determines that the CPU reaches the performance bottleneck.

Step 507C, in response to the ratio being less than the preset value, it is determined that the GPU has not reached the performance bottleneck.

In this embodiment, the terminal calculates the single frame rendering time and GPU single frame rendering time of a single frame image according to the start time point and end time point of the enqueue process in image rendering, and further calculates the single frame rendering time and GPU single frame rendering time according to the single frame rendering time and GPU single frame rendering time. The ratio of the duration determines whether the GPU reaches the performance bottleneck, reduces the computational complexity in the detection process, and ensures the accuracy of the detection result.

In a possible implementation manner, when it is determined that the GPU reaches the performance bottleneck, in order to ensure the image display quality of the target application, on the basis of FIG. 5, as shown in FIG. 8, the following steps are further included after step 507.

Step 508, in response to the GPU reaching the performance bottleneck, obtain the current frame rate.

Regarding the way of obtaining the current frame rate, in a possible implementation manner, the terminal calculates the current frame rate according to the single frame rendering time obtained in the above steps, where the current frame rate=1s/single frame rendering time. For example, when the single frame rendering time is 16ms, the terminal calculates that the current frame rate is 62fps.

Further, the terminal obtains the target frame rate of the target application, and detects whether the current frame rate reaches the target frame rate, and if it reaches the target frame rate, it is determined that the screen of the target application does not freeze; if it is not reached, the screen of the target application is determined Stutter occurs, and the following

step

509 or 510 is executed.

Optionally, the target frame rate of the target application is obtained by the terminal operating system through a data channel with the target application.

Step 509: In response to the current frame rate does not reach the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increase the operating parameters of the GPU.

In a possible implementation manner, when the current frame rate does not reach the target frame rate, the terminal obtains the average operating frequency (calculated in step 502), and detects whether the average operating frequency reaches the upper limit of the GPU operating frequency. If it is not reached, it indicates that there is room for improvement in the performance of the GPU, and the terminal will increase the operating parameters of the GPU. For example, on the basis of the average operating frequency, the terminal is gradually adjusted to the upper limit of the operating frequency according to the predetermined increase range to improve the rendering performance of the GPU.

In an illustrative example, the average operating frequency of the GPU is 1620 MHz, and the upper limit of the operating frequency of the GPU is 2000 MHz, and the terminal increases the operating frequency of the GPU gradually on the basis of 1620 MHz according to a predetermined increase of 50 MHz.

Step 510: In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency, adjust the image quality of the target application.

When the current frame rate has not reached the target frame rate and the average operating frame rate has reached the upper limit of the GPU operating frequency, in order to ensure the display quality of the picture, the terminal adjusts (for example, lowers) the image quality of the target application, thereby reducing the GPU image rendering Difficulty.

In this embodiment, after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether the screen freezes, and when the freeze occurs, adjust the GPU operating frequency or adjust the target application's operating frequency based on the GPU's average operating frequency and the upper operating frequency limit. Picture quality, thereby improving the picture display quality of the target application.

The following are device embodiments of this application, which can be used to implement the method embodiments of this application. For details not disclosed in the device embodiment of this application, please refer to the method embodiment of this application.

Please refer to FIG. 9, which shows a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application. The device can be implemented as all or part of the terminal in Figure 1 through a dedicated hardware circuit, or a combination of software and hardware. The device includes:

The first obtaining module 910 is configured to obtain the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;

The second obtaining module 920 is configured to obtain the GPU single frame rendering time length of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single frame rendering time length is during the single frame image rendering process The running time of the GPU;

The determining module 930 is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.

Optionally, the second obtaining module 920 includes:

The first acquiring unit is configured to acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;

The first calculation unit is configured to calculate the GPU single frame rendering time according to the start time point and the end time point.

Optionally, the first calculation unit is configured to:

Calculating the first time interval between the start time point and the end time point;

The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.

Optionally, the determining module 930 includes:

The second acquiring unit is configured to acquire a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;

The determining unit is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.

Optionally, the second acquiring unit is used to:

Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;

Calculating the second time interval between the end time points of two adjacent enqueue processes;

The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.

Optionally, the determination unit is used to:

Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;

In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;

In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.

Optionally, the device further includes:

The first detection module is configured to calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, determining that the GPU operating frequency satisfies the preset condition , The frequency threshold is less than the upper limit of the operating frequency of the GPU;

or,

The second detection module is configured to determine that the GPU operating frequency satisfies the preset condition in response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold.

Optionally, the device further includes:

The third obtaining module is configured to obtain the current frame rate in response to the GPU reaching the performance bottleneck;

The first adjustment module is configured to increase the operating parameters of the GPU in response to that the current frame rate does not reach the target frame rate of the target application and the average operating frequency is less than the upper limit of the operating frequency of the GPU;

The second adjustment module is configured to adjust the performance of the target application in response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency Image Quality.

It should be noted that the device provided in the above embodiment, when implementing its functions, only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules as required, namely The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.

The present application also provides a computer-readable medium on which program instructions are stored. When the program instructions are executed by a processor, the method for determining the GPU performance bottleneck provided by the foregoing method embodiments is implemented.

The present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the GPU performance bottleneck determination method described in each of the foregoing embodiments.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

Those of ordinary skill in the art can understand that all or part of the steps in the frame rate control method of the foregoing embodiments can be implemented by hardware, or by a program to instruct relevant hardware to complete, and the program can be stored in a computer readable Among the storage media, the aforementioned storage media may be read-only memory, magnetic disks, or optical disks. The above descriptions are only preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application Within range.

Claims

A method for determining the performance bottleneck of a graphics processor GPU, characterized in that the method includes:

During the operation of the target application, obtain the GPU operating frequency of the GPU within a predetermined period of time;

In response to the GPU running frequency meeting a preset condition, acquiring the GPU single frame rendering time length of the GPU within the predetermined time period, where the GPU single frame rendering time length is the running time length of the GPU in the single frame image rendering process;

Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time.
The method according to claim 1, wherein said acquiring the GPU single frame rendering duration of the GPU within the predetermined duration comprises:

Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;

Calculate the GPU single frame rendering time according to the start time point and the end time point.
The method according to claim 2, wherein the calculating the GPU single frame rendering time according to the start time point and the end time point comprises:

Calculating the first time interval between the start time point and the end time point;

The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
The method according to any one of claims 1 to 3, wherein the determining whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time includes:

Acquiring a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;

Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
The method according to claim 4, wherein the obtaining the single frame rendering time within the predetermined time duration comprises:

Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;

Calculating the second time interval between the end time points of two adjacent enqueue processes;

The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
The method of claim 4, wherein the determining whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length comprises:

Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;

In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;

In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.
The method according to any one of claims 1 to 3, wherein after the obtaining the GPU operating frequency of the GPU within a predetermined period of time, the method further comprises:

Calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than a frequency threshold, it is determined that the GPU operating frequency satisfies the preset condition, and the frequency threshold is less than the GPU Operating frequency upper limit;

or,

In response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold, determining that the GPU operating frequency satisfies the preset condition.
The method according to claim 7, wherein after determining whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time, the method further comprises:

In response to the GPU reaching a performance bottleneck, acquiring the current frame rate;

In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increasing the operating parameters of the GPU;

In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency, the image quality of the target application is adjusted.
A device for determining the performance bottleneck of a graphics processor GPU, characterized in that the device comprises:

The first acquiring module is used to acquire the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;

The second acquiring module is configured to acquire the GPU single-frame rendering duration of the GPU within the predetermined time period when the GPU operating frequency meets a preset condition, and the GPU single-frame rendering duration is determined by the single-frame image rendering process. The running time of the GPU;

The determining module is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
The device according to claim 9, wherein the second acquisition module comprises:

The first acquiring unit is configured to acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;

The first calculation unit is configured to calculate the GPU single frame rendering time according to the start time point and the end time point.
The device according to claim 10, wherein the first calculation unit is configured to:

Calculating the first time interval between the start time point and the end time point;

The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
The device according to any one of claims 9 to 11, wherein the determining module comprises:

The second acquiring unit is configured to acquire a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;

The determining unit is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
The device according to claim 12, wherein the second acquiring unit:

Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;

Calculating the second time interval between the end time points of two adjacent enqueue processes;

The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
The device according to claim 12, wherein the determining unit is configured to:

Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;

In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;

In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.
The device according to any one of claims 9 to 11, wherein the device further comprises:

The first detection module is configured to calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, determining that the GPU operating frequency satisfies the preset condition , The frequency threshold is less than the upper limit of the GPU operating frequency;

or,

The second detection module is configured to determine that the GPU operating frequency satisfies the preset condition in response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold.
The device according to claim 15, wherein the device further comprises:

The third obtaining module is configured to obtain the current frame rate in response to the GPU reaching the performance bottleneck;

The first adjustment module is configured to adjust the operating parameters of the GPU in response to the current frame rate not reaching the target frame rate of the target application and the average operating frequency is less than the upper limit of the GPU operating frequency;

The second adjustment module is configured to adjust the image of the target application in response to the current frame rate not reaching the target frame rate of the target application and the average operating frequency reaching the upper limit of the GPU operating frequency quality.
A terminal, wherein the terminal includes a processor, a memory connected to the processor, and program instructions stored on the memory, and the processor executes the program instructions as claimed in claim 1. To any method for determining GPU performance bottlenecks described in 8.
A computer-readable storage medium, characterized in that program instructions are stored thereon, and when the program instructions are executed by a processor, the method for determining the GPU performance bottleneck according to any one of claims 1 to 8 is realized.