WO2020156132A1 - Gpu performance bottleneck determining method and device, terminal, and storage medium - Google Patents

Gpu performance bottleneck determining method and device, terminal, and storage medium Download PDF

Info

Publication number
WO2020156132A1
WO2020156132A1 PCT/CN2020/071796 CN2020071796W WO2020156132A1 WO 2020156132 A1 WO2020156132 A1 WO 2020156132A1 CN 2020071796 W CN2020071796 W CN 2020071796W WO 2020156132 A1 WO2020156132 A1 WO 2020156132A1
Authority
WO
WIPO (PCT)
Prior art keywords
gpu
single frame
operating frequency
time
frame rendering
Prior art date
Application number
PCT/CN2020/071796
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020156132A1 publication Critical patent/WO2020156132A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment

Definitions

  • the embodiments of the present application relate to the field of terminal technology, and in particular, to a method, device, terminal, and storage medium for determining GPU performance bottlenecks.
  • GPU graphics processor unit
  • a driver corresponding to the GPU needs to be installed in the terminal, so as to calculate the real-time load of the GPU by driving, and then determine whether the GPU has reached a performance bottleneck during the running of the application.
  • the embodiments of the present application provide a method, device, terminal, and storage medium for determining GPU performance bottlenecks.
  • the technical solutions are as follows:
  • a method for determining GPU performance bottlenecks includes:
  • a device for determining GPU performance bottlenecks includes:
  • the first acquiring module is used to acquire the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;
  • the second obtaining module is configured to obtain the GPU single-frame rendering time of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single-frame rendering time is determined by the single-frame image rendering process.
  • the determining module is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
  • a terminal in another aspect, includes a processor, a memory connected to the processor, and program instructions stored on the memory, and the processor executes the program instructions as described above.
  • a computer-readable storage medium is provided, and program instructions are stored thereon, and when the program instructions are executed by a processor, the method for determining the GPU performance bottleneck as described in the foregoing aspect is implemented.
  • FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application.
  • Figure 2 is a schematic diagram of the graphic display process in the Android system
  • Figure 3 is a state transition diagram of the four states of the buffer
  • FIG. 4 shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application
  • FIG. 5 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application
  • Figure 6 is a schematic diagram of the distribution of the start time point and the end time point of the enrollment process
  • FIG. 7 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application.
  • FIG. 8 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application
  • FIG. 9 is a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application.
  • FIG. 1 shows a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application.
  • the terminal 100 is an electronic device installed with a target application.
  • the target application can be a system program or a third-party application.
  • third-party applications are applications created by third parties other than the user and the operating system.
  • the target application can be a game application or a video playback application.
  • the terminal 100 includes: a processor 120 and a memory 140.
  • the processor 120 may include one or more processing cores.
  • the processor 120 uses various interfaces and lines to connect various parts of the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets, or instruction sets stored in the memory 140, and calling data stored in the memory 140. 100 various functions and processing data.
  • the processor 120 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 120 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), and a modem.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • modem modem
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used to render and draw the content that the display needs to display; the modem is used to process wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 120, but may be implemented by a chip alone.
  • the memory 140 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 140 includes a non-transitory computer-readable storage medium.
  • the memory 140 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 140 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), Instructions used to implement the following various method embodiments, etc.; the storage data area can store the data involved in the following various method embodiments, etc.
  • the terminal 120 in the embodiment of the present application further includes a display screen 160.
  • the display screen 160 is a touch display screen for receiving user touch operations on or near any suitable objects such as a finger, a touch pen, etc., and displaying user interfaces of various application programs.
  • the display screen 160 is usually arranged on the front panel of the terminal 100 or, at the same time, arranged on the front panel and the rear panel of the terminal 100.
  • the display screen 160 can be designed as a full screen, a curved screen or a special-shaped screen.
  • the display screen 160 can also be designed as a combination of a full screen and a curved screen, or a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment.
  • the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100.
  • the terminal may include more or less components than those shown in the figure, or a combination of certain components. Components, or different component arrangements.
  • the terminal 100 also includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (Wireless Fidelity, WiFi) module, a power supply, a Bluetooth module, etc., which are not repeated here.
  • the content displayed on the display 21 is read from the hardware frame buffer, and the reading process is as follows: starting from the starting address of the hardware frame buffer, following from top to bottom, from left to left Scan in the order from the right to map the scanned content on the display.
  • the terminal adopts a double buffering mechanism.
  • One of the double buffers is used for content reading and display, and the other buffer is used for background graphics synthesis and writing.
  • the front buffer 22 is a frame buffer for content to be displayed on the display screen
  • the back buffer 23 is a frame buffer for synthesizing the next frame of graphics.
  • the display screen 21 reads the content in the back buffer 23.
  • the next frame of graphics is synthesized in the front buffer 22 (the front and rear buffer roles interact with each other). change).
  • SurfaceFlinger as a graphics synthesizer, is used to synthesize multiple surfaces transferred from the upper layer and submit them to the hardware frame buffer of the display screen for the display screen 21 to read and display.
  • the content in the back buffer 23 is synthesized by SurfaceFlinger 24 on multiple surfaces 25.
  • each surface corresponds to a window (window) of the upper layer, such as a dialog box, a status bar, and an activity (Activity).
  • the transfer of graphics uses a buffer as a carrier, and the surface is a further encapsulation of the buffer.
  • a buffer queue (BufferQueue) is provided inside the surface, which forms a producer consumer model with the upper layer and SurfaceFlinger.
  • the upper layer is the producer (Producer)
  • SurfaceFlinger is the consumer (Consumer).
  • each buffer in the BufferQueue it includes the free state (Free), the dequeued state (Dequeued), the queued state (Queued), and the acquisition state (Acquired).
  • the buffer in the idle state, the buffer can be used by the upper layer; in the dequeuing state, the buffer is being used by the upper layer; in the inqueuing state, the buffer is used by the upper layer (drawing and rendering is completed), waiting to be synthesized by SurfaceFlinger; in the acquiring state, SurfaceFlinger is using Buffer is synthesized.
  • different states can be converted through the dequeueBuffer and queueBuffer operations. The conversion process is shown in Figure 3.
  • FIG. 4 shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application.
  • the method may include the following steps.
  • Step 401 During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined period of time.
  • the target application has higher performance requirements for the GPU than other applications, and the target application is determined by the terminal according to the application type of the installed application, or the target application Set manually by the user.
  • the target application is an application that needs to perform dynamic image rendering, and the application may be a video playback application or a game application.
  • the target applications include virtual reality applications, three-dimensional map programs, military simulation programs, third-person shooting games (Third-Personal Shooting Game, TPS), first-person shooting games (First-Person Shooting game, FPS), multiplayer Any one of online tactical competition games (Multiplayer Online Battle Arena, MOBA) games and multiplayer gun battle survival games. This application does not limit the specific types of target applications.
  • the terminal obtains it every predetermined time interval GPU operating frequency, thereby obtaining multiple GPU operating frequencies within a predetermined period of time.
  • the predetermined duration is 10s
  • the time interval is 100ms.
  • the performance requirements for the GPU under different operating scenarios may be different.
  • the performance requirements for the GPU in the game startup scene, game loading scene, and game main interface scene are lower than those in the game. GPU performance requirements. Therefore, in a possible implementation manner, the terminal executes the step of obtaining the GPU running frequency when the target application program runs to the target running scene. Among them, the scene information of the target running scene is transmitted by the target application through the data channel with the terminal operating system.
  • Step 402 in response to the GPU operating frequency meeting the preset condition, obtain the GPU single frame rendering time of the GPU within the predetermined time period, where the GPU single frame rendering time is the GPU operating time in the single frame image rendering process.
  • the terminal detects whether the obtained GPU operating frequency meets the preset condition, if it is satisfied, it determines that there is a probability of reaching the GPU performance bottleneck, and executes the step of obtaining the GPU single frame rendering time of the GPU within the predetermined time period; if not satisfied, Then continue to obtain the GPU operating frequency.
  • the preset condition is that the GPU operating frequency is greater than the frequency threshold.
  • the frequency threshold is 80% ⁇ the upper limit of the GPU operating frequency.
  • Image rendering is completed by the Central Processing Unit (CPU) and GPU.
  • the time for the CPU and GPU to render one frame of image together is called single frame rendering time.
  • the GPU running time in the single frame image rendering process is It is the GPU single frame rendering time
  • the CPU running time during the single frame image rendering process is the CPU single frame rendering time. Since the subsequent need to determine whether the GPU has reached the performance bottleneck, the terminal only obtains the GPU single frame rendering time.
  • the terminal obtains the GPU single frame rendering time corresponding to each frame of image within a predetermined time period, or the terminal obtains the GPU single frame corresponding to a specified image frame within a predetermined time period. Frame rendering time.
  • Step 403 Determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
  • the terminal detects whether the GPU single frame rendering time is greater than the duration threshold If it is greater than, it is determined that the GPU has reached the performance bottleneck; otherwise, it is determined that the GPU has not reached the performance bottleneck.
  • the GPU operating frequency and GPU single frame rendering time are easy to obtain, it is less difficult to determine whether the GPU has reached the performance bottleneck based on the GPU operating frequency and GPU single frame rendering time compared to the complicated GPU load calculation through additional drivers. And can simplify the GPU performance bottleneck detection process, and provide data support for subsequent development and optimization.
  • the GPU single frame rendering time of the GPU within the predetermined period of time is further obtained, so as The frame rendering time determines whether the GPU reaches the performance bottleneck during the running of the target application; in this embodiment, the terminal judges the GPU performance bottleneck based on the GPU operating frequency and the GPU single frame rendering time of the single frame image rendering process, without the need for driver complexity GPU load calculation, while ensuring the accuracy of GPU performance bottleneck detection, reduces the complexity of performance bottleneck detection and improves the efficiency of performance bottleneck detection.
  • obtaining the GPU single frame rendering duration of the GPU within the predetermined duration includes:
  • the enqueue process refers to the process of putting the image-rendered buffer (buffer) back into the buffer queue (BufferQueue);
  • calculate the GPU single frame rendering time according to the start time point and the end time point including:
  • the average value of the first time interval corresponding to the enqueue process within the predetermined time period is determined as the GPU single frame rendering time.
  • determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time including:
  • the single frame rendering time is the time length of rendering a single frame image
  • obtaining a single frame rendering time within a predetermined time period includes:
  • the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
  • the average value of the second time interval within the predetermined time period is determined as the single frame rendering time.
  • determine whether the GPU has reached the performance bottleneck according to the GPU single frame rendering time and single frame rendering time including:
  • the method further includes:
  • the method further includes:
  • the image quality of the target application is adjusted.
  • the foregoing embodiment describes the principle of determining the GPU performance bottleneck.
  • the detailed process of determining the GPU performance bottleneck will be described below in conjunction with the production consumer model shown in FIG. 3.
  • FIG. 5 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application.
  • the method may include the following steps.
  • Step 501 During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined time period.
  • Step 502 Calculate the average operating frequency of the GPU within a predetermined period of time according to the operating frequency of the GPU.
  • the terminal obtains GPU operating frequencies corresponding to different time points within a predetermined period of time, so as to calculate the average operating frequency of the GPUs within the predetermined period of time according to multiple GPU operating frequencies.
  • the terminal collects the CPU operating frequency every 1s, thereby obtaining 10 CPU operating frequencies within 10s, which are 1550MHz, 1570MHz, 1625MHz, 1650MHz, 1655MHz, 1600MHz, 1650MHz, 1650MHz, 1600MHz, 1650MHz, and further calculate the average operating frequency to be 1620MHz.
  • the terminal can obtain the GPU operating frequency in real time through a preset interface (such as devfreq), and the embodiment of the present application does not limit the manner of obtaining the GPU operating frequency.
  • Step 503 In response to the average operating frequency being greater than the frequency threshold, it is determined that the GPU operating frequency satisfies a preset condition, and the frequency threshold is less than the GPU operating frequency upper limit.
  • the terminal detects whether the average operating frequency of the GPU within a predetermined period of time is greater than the frequency threshold. If it is greater than, it is determined that the GPU operating frequency meets the preset condition (which may reach the GPU performance bottleneck); if it is less than, it is determined that the GPU operating frequency does not meet the preset condition.
  • the terminal is preset with an upper limit of the operating frequency ratio, and when the average operating frequency of the GPU/the upper limit of GPU operating frequency is greater than the upper limit of the ratio, it is determined that the GPU operating frequency meets the preset condition.
  • the upper limit of the ratio is 0.8.
  • the terminal determines that the GPU operating frequency meets the preset condition.
  • the terminal detects whether the GPU operating frequency is greater than the frequency threshold for a duration greater than the duration threshold, if it is greater, it determines that the GPU operating frequency satisfies the preset condition, if less than , It is determined that the GPU operating frequency does not meet the preset condition. That is, the terminal detects whether the operating frequency of the GPU is close to the upper limit of the GPU operating frequency for a long time.
  • Step 504 in response to the GPU operating frequency meeting the preset condition, obtain the start time point and the end time point of the enqueue process within a predetermined time period.
  • the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue.
  • the dequeue process refers to the process in which the upper layer applies for a free buffer from the BufferQueue for rendering (ie The process in which the buffer changes from the Free state to the Dequeued state in Figure 3), and the enqueue process refers to the process of writing the rendered data into the buffer and returning it to the BufferQueue, waiting for SurfaceFlinger to synthesize (that is, the buffer in Figure 3 is Dequeued The status changes to the Queued status).
  • the dequeue process is executed by the CPU, and during the dequeue process, the CPU measures the width and height of the view (ie measure), sets the width and height position of the view (ie layout), and creates the display List and draw (ie draw) and generate polygons and textures, and send the generated textures and polygons to the GPU;
  • the enqueue process is executed by the GPU, and during the enqueue process, the GPU rasterizes the textures and polygons generated by the CPU And composite (that is, image rendering), and write the rendered data into the buffer.
  • the terminal records the start time point and end time point of each enqueue process within a predetermined period of time, so that subsequent calculations based on the start time point and the end time point The duration of the enrollment process.
  • the start time point of the enqueue process is the time point when the CPU sends polygons and textures to the GPU
  • the end time point of the enqueue process is the time point when the buffer that has undergone image rendering completes enqueue.
  • the oblique line filling part is the CPU running time (the first half is the CPU for measurement, setting, texture and polygon generation, and the second half is the CPU for resource cleaning), and the black filling part is for the GPU running Time period (drawing according to the texture and polygon sent by the CPU).
  • Step 505 Calculate the GPU single frame rendering time according to the start time point and the end time point.
  • this step may include the following steps.
  • the terminal calculates a first time interval between the start time point and the end time point, and the first time interval is the time consumed for the enqueue time.
  • the terminal repeats this step to obtain multiple first time intervals.
  • the terminal calculates the first time interval corresponding to 10 enqueue processes, which are: 6ms, 5ms, 6ms, 7ms, 6ms, 6ms, 5ms, 6ms, 7ms, 6ms.
  • the terminal calculates the average value of each enqueue process corresponding to the first time interval within a predetermined time period, thereby determining the average value It is the GPU single frame rendering time of the GPU within the predetermined time.
  • the terminal calculates an average value of 6ms according to the first time interval corresponding to the 10 enqueue processes, so that the GPU single frame rendering time is determined to be 6ms.
  • the terminal may determine the GPU single frame rendering duration according to several first time intervals obtained by sampling in a sampling manner, which is not limited in this embodiment of the application.
  • Step 506 Obtain a single frame rendering duration within a predetermined duration, where the single frame rendering duration is the duration of rendering a single frame image.
  • the terminal can determine whether the GPU is by calculating the ratio of the GPU single frame rendering time to the total single frame image rendering time Full load (reaching performance bottleneck).
  • this step may also include the following steps.
  • Step 506A Obtain the start time point and the end time point of the enqueue process within a predetermined time period.
  • Step 506B Calculate the second time interval between the end time points of the two adjacent enqueue processes.
  • the terminal can be based on the time interval between the end time points of two adjacent enqueue processes, or according to the time interval between the start time points of two adjacent enqueue processes.
  • the time interval determines the rendering time of a single frame image.
  • the terminal calculates the second time interval according to the end time point corresponding to the second frame image and the end time point corresponding to the first frame image; according to the end time point corresponding to the third frame image and The end time point corresponding to the second frame of image is calculated to obtain the second time interval, and so on.
  • the terminal calculates that 10 second time intervals are all 16 ms.
  • the terminal since each image rendering includes a dequeue process, the terminal may also be based on the time interval between the end time points of two adjacent dequeue processes, or according to The time interval between the start time points of the column process determines the rendering time of a single frame image, which is not limited in this embodiment.
  • Step 506C Determine the average value of the second time interval within the predetermined time length as the single frame rendering time length.
  • the terminal calculates the average value of each second time interval within the predetermined duration, thereby determining the average value as the image frame within the predetermined duration Single frame rendering time.
  • the terminal determines that the single frame rendering time within the predetermined time length is 16 ms.
  • Step 507 Determine whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time and the single frame rendering time.
  • this step may include the following steps.
  • Step 507A Calculate the ratio of the GPU single frame rendering time to the single frame rendering time.
  • the ratio GPU single frame rendering time/single frame rendering time, that is, the ratio is less than 1.
  • the terminal detects whether the ratio is greater than a preset value, and if it is greater, it is determined that the GPU running time takes a larger proportion in the image rendering process, that is, the GPU has reached the performance bottleneck; if it is less, it is determined that the GPU has not reached the performance bottleneck.
  • Step 507B in response to the ratio being greater than the preset value, it is determined that the GPU has reached the performance bottleneck.
  • Step 507C in response to the ratio being less than the preset value, it is determined that the GPU has not reached the performance bottleneck.
  • the terminal calculates the single frame rendering time and GPU single frame rendering time of a single frame image according to the start time point and end time point of the enqueue process in image rendering, and further calculates the single frame rendering time and GPU single frame rendering time according to the single frame rendering time and GPU single frame rendering time.
  • the ratio of the duration determines whether the GPU reaches the performance bottleneck, reduces the computational complexity in the detection process, and ensures the accuracy of the detection result.
  • step 507 when it is determined that the GPU reaches the performance bottleneck, in order to ensure the image display quality of the target application, on the basis of FIG. 5, as shown in FIG. 8, the following steps are further included after step 507.
  • Step 508 in response to the GPU reaching the performance bottleneck, obtain the current frame rate.
  • the terminal obtains the target frame rate of the target application, and detects whether the current frame rate reaches the target frame rate, and if it reaches the target frame rate, it is determined that the screen of the target application does not freeze; if it is not reached, the screen of the target application is determined Stutter occurs, and the following step 509 or 510 is executed.
  • the target frame rate of the target application is obtained by the terminal operating system through a data channel with the target application.
  • Step 509 In response to the current frame rate does not reach the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increase the operating parameters of the GPU.
  • the terminal when the current frame rate does not reach the target frame rate, the terminal obtains the average operating frequency (calculated in step 502), and detects whether the average operating frequency reaches the upper limit of the GPU operating frequency. If it is not reached, it indicates that there is room for improvement in the performance of the GPU, and the terminal will increase the operating parameters of the GPU. For example, on the basis of the average operating frequency, the terminal is gradually adjusted to the upper limit of the operating frequency according to the predetermined increase range to improve the rendering performance of the GPU.
  • the average operating frequency of the GPU is 1620 MHz
  • the upper limit of the operating frequency of the GPU is 2000 MHz
  • the terminal increases the operating frequency of the GPU gradually on the basis of 1620 MHz according to a predetermined increase of 50 MHz.
  • Step 510 In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency, adjust the image quality of the target application.
  • the terminal adjusts (for example, lowers) the image quality of the target application, thereby reducing the GPU image rendering Difficulty.
  • the terminal after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether the screen freezes, and when the freeze occurs, adjust the GPU operating frequency or adjust the target application's operating frequency based on the GPU's average operating frequency and the upper operating frequency limit. Picture quality, thereby improving the picture display quality of the target application.
  • FIG. 9 shows a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application.
  • the device can be implemented as all or part of the terminal in Figure 1 through a dedicated hardware circuit, or a combination of software and hardware.
  • the device includes:
  • the first obtaining module 910 is configured to obtain the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;
  • the second obtaining module 920 is configured to obtain the GPU single frame rendering time length of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single frame rendering time length is during the single frame image rendering process The running time of the GPU;
  • the determining module 930 is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
  • the second obtaining module 920 includes:
  • the first acquiring unit is configured to acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;
  • the first calculation unit is configured to calculate the GPU single frame rendering time according to the start time point and the end time point.
  • the first calculation unit is configured to:
  • the average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
  • the determining module 930 includes:
  • the second acquiring unit is configured to acquire a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;
  • the determining unit is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
  • the second acquiring unit is used to:
  • the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
  • the average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
  • the determination unit is used to:
  • the device further includes:
  • the first detection module is configured to calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, determining that the GPU operating frequency satisfies the preset condition , The frequency threshold is less than the upper limit of the operating frequency of the GPU;
  • the second detection module is configured to determine that the GPU operating frequency satisfies the preset condition in response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold.
  • the device further includes:
  • the third obtaining module is configured to obtain the current frame rate in response to the GPU reaching the performance bottleneck;
  • the first adjustment module is configured to increase the operating parameters of the GPU in response to that the current frame rate does not reach the target frame rate of the target application and the average operating frequency is less than the upper limit of the operating frequency of the GPU;
  • the second adjustment module is configured to adjust the performance of the target application in response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency Image Quality.
  • the GPU single frame rendering time of the GPU within the predetermined period of time is further obtained, so as The frame rendering time determines whether the GPU reaches the performance bottleneck during the running of the target application; in this embodiment, the terminal judges the GPU performance bottleneck based on the GPU operating frequency and the GPU single frame rendering time of the single frame image rendering process, without the need for driver complexity GPU load calculation, while ensuring the accuracy of GPU performance bottleneck detection, reduces the complexity of performance bottleneck detection and improves the efficiency of performance bottleneck detection.
  • the terminal calculates the single frame rendering time and GPU single frame rendering time of a single frame image according to the start time point and end time point of the enqueue process in image rendering, and further calculates the single frame rendering time and GPU single frame rendering time according to the single frame rendering time and GPU single frame rendering time.
  • the ratio of the duration determines whether the GPU reaches the performance bottleneck, reduces the computational complexity in the detection process, and ensures the accuracy of the detection result.
  • the terminal after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether the screen freezes, and when the freeze occurs, adjust the GPU operating frequency or adjust the target application's operating frequency based on the GPU's average operating frequency and the upper operating frequency limit. Picture quality, thereby improving the picture display quality of the target application.
  • the device provided in the above embodiment when implementing its functions, only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules as required, namely The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
  • the present application also provides a computer-readable medium on which program instructions are stored.
  • program instructions are executed by a processor, the method for determining the GPU performance bottleneck provided by the foregoing method embodiments is implemented.
  • the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the GPU performance bottleneck determination method described in each of the foregoing embodiments.

Abstract

A GPU performance bottleneck determining method and device, a terminal, and a storage medium, relating to the technical field of terminals. The method comprises: obtaining the GPU running frequency of a GPU within a predetermined duration in the running process of a target application (401); in response to the GPU running frequency satisfying a preset condition, obtaining a GPU single frame rendering duration of the GPU within the predetermined duration, the GPU single frame rendering duration being the GPU running duration in a single image frame rendering process (402); and determining, according to the GPU single frame rendering duration, whether the GPU reaches a performance bottleneck (403). In the method, a terminal performs GPU performance bottleneck determination according to the GPU running frequency and the GPU single frame rendering duration in a single image frame rendering process, without performing complex GPU load calculation using a driver, thereby ensuring the accuracy of GPU performance bottleneck detection, reducing the complexity of performance bottleneck detection, and improving the efficiency of performance bottleneck detection.

Description

GPU性能瓶颈的确定方法、装置、终端及存储介质Method, device, terminal and storage medium for determining GPU performance bottleneck
本申请要求于2019年01月28日提交的申请号为201910080514.5、发明名称为“GPU性能瓶颈的确定方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on January 28, 2019 with the application number 201910080514.5 and the invention title "Method, Device, Terminal and Storage Medium for Determining GPU Performance Bottleneck", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请实施例涉及终端技术领域,特别涉及一种GPU性能瓶颈的确定方法、装置、终端及存储介质。The embodiments of the present application relate to the field of terminal technology, and in particular, to a method, device, terminal, and storage medium for determining GPU performance bottlenecks.
背景技术Background technique
对于视频应用、游戏应用一类需要进行动态画面渲染的应用程序,其运行质量与图形处理器(Graphic Processor Unit,GPU)的性能密切相关。For applications such as video applications and game applications that require dynamic image rendering, the running quality is closely related to the performance of a graphics processor unit (GPU).
相关技术中,为了实现GPU负载监控,终端中需要安装GPU对应的驱动,从而通过驱动计算GPU的实时负载,进而判断应用程序运行过程中GPU是否达到了性能瓶颈。In related technologies, in order to implement GPU load monitoring, a driver corresponding to the GPU needs to be installed in the terminal, so as to calculate the real-time load of the GPU by driving, and then determine whether the GPU has reached a performance bottleneck during the running of the application.
发明内容Summary of the invention
本申请实施例提供了一种GPU性能瓶颈的确定方法、装置、终端及存储介质,技术方案如下:The embodiments of the present application provide a method, device, terminal, and storage medium for determining GPU performance bottlenecks. The technical solutions are as follows:
一方面,提供了一种GPU性能瓶颈的确定方法,所述方法包括:On the one hand, a method for determining GPU performance bottlenecks is provided, and the method includes:
在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率;During the operation of the target application, obtain the GPU operating frequency of the GPU within a predetermined period of time;
响应于所述GPU运行频率满足预设条件,获取所述预定时长内所述GPU的GPU单帧渲染时长,所述GPU单帧渲染时长为单帧图像渲染过程中所述GPU的运行时长;In response to the GPU running frequency meeting a preset condition, acquiring the GPU single frame rendering time length of the GPU within the predetermined time period, where the GPU single frame rendering time length is the running time length of the GPU in the single frame image rendering process;
根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈。Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time.
另一方面,提供了一种GPU性能瓶颈的确定装置,所述装置包括:In another aspect, a device for determining GPU performance bottlenecks is provided, and the device includes:
第一获取模块,用于在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率;The first acquiring module is used to acquire the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;
第二获取模块,用于响应于所述GPU运行频率满足预设条件,获取所述预 定时长内所述GPU的GPU单帧渲染时长,所述GPU单帧渲染时长为单帧图像渲染过程中所述GPU的运行时长;The second obtaining module is configured to obtain the GPU single-frame rendering time of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single-frame rendering time is determined by the single-frame image rendering process. The running time of the GPU;
确定模块,用于根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈。The determining module is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
另一方面,提供了一种终端,所述终端包括处理器、与所述处理器相连的存储器,以及存储在所述存储器上的程序指令,所述处理器执行所述程序指令时实现如上述方面所述的GPU性能瓶颈的确定方法。In another aspect, a terminal is provided. The terminal includes a processor, a memory connected to the processor, and program instructions stored on the memory, and the processor executes the program instructions as described above. The method for determining the GPU performance bottleneck described in the aspect.
另一方面,提供了一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现如上述方面所述的GPU性能瓶颈的确定方法。In another aspect, a computer-readable storage medium is provided, and program instructions are stored thereon, and when the program instructions are executed by a processor, the method for determining the GPU performance bottleneck as described in the foregoing aspect is implemented.
附图说明Description of the drawings
图1是本申请一个示例性实施例所提供的终端的结构示意图;FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application;
图2是Android系统中图形显示过程的原理示意图;Figure 2 is a schematic diagram of the graphic display process in the Android system;
图3是缓冲区四种状态的状态转换图;Figure 3 is a state transition diagram of the four states of the buffer;
图4示出了本申请一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图;FIG. 4 shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application;
图5示出了本申请另一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图;FIG. 5 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;
图6是入列过程开始时间点和结束时间点的分布示意图;Figure 6 is a schematic diagram of the distribution of the start time point and the end time point of the enrollment process;
图7示出了本申请另一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图;FIG. 7 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;
图8示出了本申请另一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图;FIG. 8 shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application;
图9是本申请一个实施例提供的GPU性能瓶颈的确定装置的结构示意图。FIG. 9 is a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.
下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请 的一些方面相一致的装置和方法的例子。When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.
在本申请的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。在本申请的描述中,需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。此外,在本申请的描述中,除非另有说明,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In the description of this application, it should be understood that the terms "first", "second", etc. are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In the description of this application, it should be noted that, unless otherwise clearly specified and limited, the terms "connected" and "connected" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral Ground connection; it can be a mechanical connection or an electrical connection; it can be directly connected or indirectly connected through an intermediate medium. For those of ordinary skill in the art, the specific meanings of the above-mentioned terms in this application can be understood under specific circumstances. In addition, in the description of this application, unless otherwise specified, "plurality" means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are in an "or" relationship.
在对本申请实施例进行解释说明之前,首先对本申请实施例的应用场景进行说明。图1示出了本申请一个示例性实施例所提供的终端的结构示意图。Before explaining the embodiments of the present application, the application scenarios of the embodiments of the present application are first described. Fig. 1 shows a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application.
该终端100是安装有目标应用程序的电子设备。该目标应用程序可以是系统程序或者第三方应用程序。其中,第三方应用程序是除了用户和操作系统之外的第三方制作的应用程序。比如,该目标应用程序可以是游戏应用程序或视频播放应用程序。The terminal 100 is an electronic device installed with a target application. The target application can be a system program or a third-party application. Among them, third-party applications are applications created by third parties other than the user and the operating system. For example, the target application can be a game application or a video playback application.
可选的,该终端100中包括:处理器120和存储器140。Optionally, the terminal 100 includes: a processor 120 and a memory 140.
处理器120可以包括一个或者多个处理核心。处理器120利用各种接口和线路连接整个终端100内的各个部分,通过运行或执行存储在存储器140内的指令、程序、代码集或指令集,以及调用存储在存储器140内的数据,执行终端100的各种功能和处理数据。可选的,处理器120可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器120可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器120中,单独通过一块芯片进行实现。The processor 120 may include one or more processing cores. The processor 120 uses various interfaces and lines to connect various parts of the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets, or instruction sets stored in the memory 140, and calling data stored in the memory 140. 100 various functions and processing data. Optionally, the processor 120 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). A kind of hardware form to realize. The processor 120 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), and a modem. Among them, the CPU mainly processes the operating system, user interface, and application programs; the GPU is used to render and draw the content that the display needs to display; the modem is used to process wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 120, but may be implemented by a chip alone.
存储器140可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选的,该存储器140包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器140可用于存储指令、程序、代码、代码集或指令集。存储器140可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等;存储数据区可存储下面各个方法实施例中涉及到的数据等。The memory 140 may include random access memory (RAM) or read-only memory (Read-Only Memory). Optionally, the memory 140 includes a non-transitory computer-readable storage medium. The memory 140 may be used to store instructions, programs, codes, code sets or instruction sets. The memory 140 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), Instructions used to implement the following various method embodiments, etc.; the storage data area can store the data involved in the following various method embodiments, etc.
本申请实施例中的终端120还包括显示屏160。可选的,显示屏160是触摸显示屏,用于接收用户使用手指、触摸笔等任何适合的物体在其上或附近的触摸操作,以及显示各个应用程序的用户界面。显示屏160通常设置在终端100的前面板,或者,同时设置在终端100的前面板和后面板。显示屏160可被设计成为全面屏、曲面屏或异型屏。显示屏160还可被设计成为全面屏与曲面屏的结合,异型屏与曲面屏的结合,本实施例对此不加以限定。The terminal 120 in the embodiment of the present application further includes a display screen 160. Optionally, the display screen 160 is a touch display screen for receiving user touch operations on or near any suitable objects such as a finger, a touch pen, etc., and displaying user interfaces of various application programs. The display screen 160 is usually arranged on the front panel of the terminal 100 or, at the same time, arranged on the front panel and the rear panel of the terminal 100. The display screen 160 can be designed as a full screen, a curved screen or a special-shaped screen. The display screen 160 can also be designed as a combination of a full screen and a curved screen, or a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment.
除此之外,本领域技术人员可以理解,上述附图所示出的终端100的结构并不构成对终端100的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。比如,终端100中还包括射频电路、输入单元、传感器、音频电路、无线保真(Wireless Fidelity,WiFi)模块、电源、蓝牙模块等部件,在此不再赘述。In addition, those skilled in the art can understand that the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100. The terminal may include more or less components than those shown in the figure, or a combination of certain components. Components, or different component arrangements. For example, the terminal 100 also includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (Wireless Fidelity, WiFi) module, a power supply, a Bluetooth module, etc., which are not repeated here.
为了便于理解,下面首先对终端中的图形显示系统进行说明,且下述实施例以安卓(Android)图形显示系统为例进行示意性说明。For ease of understanding, the graphic display system in the terminal is first described below, and the following embodiments take an Android graphic display system as an example for schematic description.
如图2所示,显示屏21中显示的内容是从硬件帧缓冲区中读取,且读取的过程为:从硬件帧缓冲区的起始地址开始,按照从上往下,从左往右的顺序进行扫描,从而将扫描到的内容映射在显示屏上。As shown in Figure 2, the content displayed on the display 21 is read from the hardware frame buffer, and the reading process is as follows: starting from the starting address of the hardware frame buffer, following from top to bottom, from left to left Scan in the order from the right to map the scanned content on the display.
由于显示屏21中显示的内容需要不断更新,若在同一硬件帧缓冲区内进行读取和写入操作,将会导致显示屏21中同时显示多帧内容,因此,终端采用双缓冲机制,其中,双缓冲区中的一个缓冲区用于内容读取显示,而另一个缓冲区用于后台图形合成和写入。Since the content displayed in the display 21 needs to be continuously updated, if reading and writing operations are performed in the same hardware frame buffer, multiple frames of content will be displayed on the display 21 at the same time. Therefore, the terminal adopts a double buffering mechanism. , One of the double buffers is used for content reading and display, and the other buffer is used for background graphics synthesis and writing.
示意性的,如图2所示,前缓冲区22为显示屏所要显示内容的帧缓冲区,后缓冲区23为用于合成下一帧图形的帧缓冲区。当前一帧显示完毕,后一帧写 入完毕时,显示屏21即读取后缓冲区23中的内容,相应的,前缓冲区22中即进行下一帧图形的合成(前后缓冲区角色互换)。Schematically, as shown in FIG. 2, the front buffer 22 is a frame buffer for content to be displayed on the display screen, and the back buffer 23 is a frame buffer for synthesizing the next frame of graphics. When the current frame is displayed and the next frame is written, the display screen 21 reads the content in the back buffer 23. Correspondingly, the next frame of graphics is synthesized in the front buffer 22 (the front and rear buffer roles interact with each other). change).
SurfaceFlinger作为图形的合成者,用于对上层传递的多个图形(surface)进行合成,并提交到显示屏的硬件帧缓冲区中,供显示屏21读取显示。如图2所示,后缓冲区23中的内容由SurfaceFlinger 24对多个surface 25合成而成。其中,每个surface对应上层的一个窗口(window),比如对话框、状态栏、活动(Activity)。SurfaceFlinger, as a graphics synthesizer, is used to synthesize multiple surfaces transferred from the upper layer and submit them to the hardware frame buffer of the display screen for the display screen 21 to read and display. As shown in Figure 2, the content in the back buffer 23 is synthesized by SurfaceFlinger 24 on multiple surfaces 25. Among them, each surface corresponds to a window (window) of the upper layer, such as a dialog box, a status bar, and an activity (Activity).
图形的传递以缓冲区(buffer)为载体,而surface则是对buffer的进一步封装。为了实现对surface中多个buffer的管理,如图3所示,surface内部提供了缓冲区队列(BufferQueue),与上层和SurfaceFlinger形成生产者消费者模型。其中,上层为生产者(Producer),SurfaceFlinger为消费者(Consumer)。The transfer of graphics uses a buffer as a carrier, and the surface is a further encapsulation of the buffer. In order to realize the management of multiple buffers in the surface, as shown in Figure 3, a buffer queue (BufferQueue) is provided inside the surface, which forms a producer consumer model with the upper layer and SurfaceFlinger. Among them, the upper layer is the producer (Producer), and SurfaceFlinger is the consumer (Consumer).
对于BufferQueue中的各个buffer,其包含空闲状态(Free)、出列状态(Dequeued)、入列状态(Queued)状态以及获取状态(Acquired)。其中,空闲状态下,buffer可以被上层使用;出列状态下,buffer正在被上层使用;入列状态下,buffer经过上层使用(绘制渲染完成),等待被SurfaceFlinger合成;获取状态下,SurfaceFlinger正在根据buffer进行合成。并且,不同状态之间可以通过缓冲区出列(dequeueBuffer)和缓冲区入列(queueBuffer)操作进行转换,其转换过程如图3所示。For each buffer in the BufferQueue, it includes the free state (Free), the dequeued state (Dequeued), the queued state (Queued), and the acquisition state (Acquired). Among them, in the idle state, the buffer can be used by the upper layer; in the dequeuing state, the buffer is being used by the upper layer; in the inqueuing state, the buffer is used by the upper layer (drawing and rendering is completed), waiting to be synthesized by SurfaceFlinger; in the acquiring state, SurfaceFlinger is using Buffer is synthesized. In addition, different states can be converted through the dequeueBuffer and queueBuffer operations. The conversion process is shown in Figure 3.
请参考图4,其示出了本申请一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图。该方法可以包括如下步骤。Please refer to FIG. 4, which shows a method flowchart of a method for determining a GPU performance bottleneck provided by an exemplary embodiment of the present application. The method may include the following steps.
步骤401,在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率。Step 401: During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined period of time.
在一种可能的实施方式中,目标应用程序对GPU的性能需求高于其他应用程序对GPU的性能需求,且目标应用程序是终端根据已安装应用程序的应用程序类型确定,或者,目标应用程序由用户手动设置。In a possible implementation, the target application has higher performance requirements for the GPU than other applications, and the target application is determined by the terminal according to the application type of the installed application, or the target application Set manually by the user.
可选的,目标应用程序是需要进行动态画面渲染的应用程序,该应用程序可以是视频播放类应用程序,也可以是游戏类应用程序。比如,该目标应用程序包括虚拟现实应用程序、三维地图程序、军事仿真程序、第三人称射击游戏(Third-Personal Shooting Game,TPS)、第一人称射击游戏(First-Person Shooting game,FPS)、多人在线战术竞技游戏(Multiplayer Online Battle Arena,MOBA) 游戏、多人枪战类生存游戏中的任意一种。本申请并不对目标应用程序的具体类型进行限定。Optionally, the target application is an application that needs to perform dynamic image rendering, and the application may be a video playback application or a game application. For example, the target applications include virtual reality applications, three-dimensional map programs, military simulation programs, third-person shooting games (Third-Personal Shooting Game, TPS), first-person shooting games (First-Person Shooting game, FPS), multiplayer Any one of online tactical competition games (Multiplayer Online Battle Arena, MOBA) games and multiplayer gun battle survival games. This application does not limit the specific types of target applications.
由于目标应用程序运行过程中,GPU运行频率会不断发生变化,因此为了提高后续GPU性能瓶颈检测的准确性,在一种可能的实施方式中,在预定时长内,终端每隔预定时间间隔获取一次GPU运行频率,从而获取预定时长内的多个GPU运行频率。比如,该预定时长为10s,时间间隔为100ms。Since the GPU operating frequency will continue to change during the running of the target application, in order to improve the accuracy of the subsequent GPU performance bottleneck detection, in a possible implementation manner, within a predetermined period of time, the terminal obtains it every predetermined time interval GPU operating frequency, thereby obtaining multiple GPU operating frequencies within a predetermined period of time. For example, the predetermined duration is 10s, and the time interval is 100ms.
对于同一目标应用程序,不同运行场景下对GPU的性能需求可能不同,比如,游戏应用程序中,游戏启动场景、游戏加载场景、游戏主界面场景下对GPU的性能需求低于游戏进行场景下对GPU的性能需求。因此,在一种可能的实施方式中,终端在目标应用程序运行至目标运行场景时,执行获取GPU运行频率的步骤。其中,目标运行场景的场景信息由目标应用程序通过与终端操作系统之间的数据通道传输。For the same target application, the performance requirements for the GPU under different operating scenarios may be different. For example, in the game application, the performance requirements for the GPU in the game startup scene, game loading scene, and game main interface scene are lower than those in the game. GPU performance requirements. Therefore, in a possible implementation manner, the terminal executes the step of obtaining the GPU running frequency when the target application program runs to the target running scene. Among them, the scene information of the target running scene is transmitted by the target application through the data channel with the terminal operating system.
步骤402,响应于GPU运行频率满足预设条件,获取预定时长内GPU的GPU单帧渲染时长,GPU单帧渲染时长为单帧图像渲染过程中GPU的运行时长。Step 402, in response to the GPU operating frequency meeting the preset condition, obtain the GPU single frame rendering time of the GPU within the predetermined time period, where the GPU single frame rendering time is the GPU operating time in the single frame image rendering process.
进一步的,终端检测获取到的GPU运行频率是否满足预设条件,若满足,则确定存在达到GPU性能瓶颈的概率,并执行获取预定时长内GPU的GPU单帧渲染时长的步骤;若不满足,则继续获取GPU运行频率。Further, the terminal detects whether the obtained GPU operating frequency meets the preset condition, if it is satisfied, it determines that there is a probability of reaching the GPU performance bottleneck, and executes the step of obtaining the GPU single frame rendering time of the GPU within the predetermined time period; if not satisfied, Then continue to obtain the GPU operating frequency.
可选的,该预设条件为GPU运行频率大于频率阈值。比如,该频率阈值为80%×GPU运行频率上限。Optionally, the preset condition is that the GPU operating frequency is greater than the frequency threshold. For example, the frequency threshold is 80% × the upper limit of the GPU operating frequency.
图像渲染由中央处理器(Central Processing Unit,CPU)和GPU共同完成,CPU和GPU共同渲染一帧图像的时间被称为单帧渲染时间,相应的,单帧图像渲染过程中GPU的运行时长即为GPU单帧渲染时长,单帧图像渲染过程中CPU的运行时长即为CPU单帧渲染时长。由于后续需要判断GPU是否到达性能瓶颈,因此终端仅获取GPU单帧渲染时长。Image rendering is completed by the Central Processing Unit (CPU) and GPU. The time for the CPU and GPU to render one frame of image together is called single frame rendering time. Correspondingly, the GPU running time in the single frame image rendering process is It is the GPU single frame rendering time, and the CPU running time during the single frame image rendering process is the CPU single frame rendering time. Since the subsequent need to determine whether the GPU has reached the performance bottleneck, the terminal only obtains the GPU single frame rendering time.
由于渲染不同画面所耗费时长存在差异,因此在一种可能的实施方式中,终端获取预定时长内各帧图像对应的GPU单帧渲染时长,或者,终端获取预定时长内指定图像帧对应的GPU单帧渲染时长。Due to the difference in the time it takes to render different pictures, in a possible implementation, the terminal obtains the GPU single frame rendering time corresponding to each frame of image within a predetermined time period, or the terminal obtains the GPU single frame corresponding to a specified image frame within a predetermined time period. Frame rendering time.
步骤403,根据GPU单帧渲染时长确定GPU是否达到性能瓶颈。Step 403: Determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
渲染相同图像帧时,GPU单帧渲染时长越长,表明GPU的渲染速度越慢,相应的GPU的性能越差,在一种可能的实施方式中,终端检测GPU单帧渲染 时长是否大于时长阈值,若大于,则确定GPU达到性能瓶颈,反之,确定GPU未达到性能瓶颈。When rendering the same image frame, the longer the GPU single frame rendering time, the slower the GPU rendering speed, and the worse the performance of the corresponding GPU. In a possible implementation, the terminal detects whether the GPU single frame rendering time is greater than the duration threshold If it is greater than, it is determined that the GPU has reached the performance bottleneck; otherwise, it is determined that the GPU has not reached the performance bottleneck.
由于GPU运行频率以及GPU单帧渲染时长易于获取,因此相较于通过额外的驱动进行复杂的GPU负载计算,根据GPU的运行频率以及GPU单帧渲染时长确定GPU是否达到性能瓶颈的难度更低,并能够简化GPU性能瓶颈的检测流程,为后续开发和优化提供数据支持。Because the GPU operating frequency and GPU single frame rendering time are easy to obtain, it is less difficult to determine whether the GPU has reached the performance bottleneck based on the GPU operating frequency and GPU single frame rendering time compared to the complicated GPU load calculation through additional drivers. And can simplify the GPU performance bottleneck detection process, and provide data support for subsequent development and optimization.
综上所述,本申请实施例中,通过获取预定时长内GPU的GPU运行频率,并在GPU运行频率满足预设条件时,进一步获取预定时长内GPU的GPU单帧渲染时长,从而根据GPU单帧渲染时长确定目标应用程序运行过程中GPU是否达到性能瓶颈;本申请实施例中,终端基于GPU运行频率以及单帧图像渲染过程的GPU单帧渲染时长进行GPU性能瓶颈判断,无需借助驱动进行复杂的GPU负载计算,在保证GPU性能瓶颈检测准确性的同时,降低了性能瓶颈检测的复杂度,提高了性能瓶颈检测的效率。In summary, in the embodiments of the present application, by obtaining the GPU operating frequency of the GPU within a predetermined period of time, and when the GPU operating frequency meets a preset condition, the GPU single frame rendering time of the GPU within the predetermined period of time is further obtained, so as The frame rendering time determines whether the GPU reaches the performance bottleneck during the running of the target application; in this embodiment, the terminal judges the GPU performance bottleneck based on the GPU operating frequency and the GPU single frame rendering time of the single frame image rendering process, without the need for driver complexity GPU load calculation, while ensuring the accuracy of GPU performance bottleneck detection, reduces the complexity of performance bottleneck detection and improves the efficiency of performance bottleneck detection.
可选的,获取所述预定时长内GPU的GPU单帧渲染时长,包括:Optionally, obtaining the GPU single frame rendering duration of the GPU within the predetermined duration includes:
获取预定时长内入列过程的开始时间点和结束时间点,入列过程是指将经过图像渲染的缓冲区(buffer)放回缓冲区队列(BufferQueue)的过程;Obtain the start time point and the end time point of the enqueue process within a predetermined period of time. The enqueue process refers to the process of putting the image-rendered buffer (buffer) back into the buffer queue (BufferQueue);
根据开始时间点和结束时间点计算GPU单帧渲染时长。Calculate the GPU single frame rendering time based on the start time point and the end time point.
可选的,根据开始时间点和结束时间点计算GPU单帧渲染时长,包括:Optionally, calculate the GPU single frame rendering time according to the start time point and the end time point, including:
计算开始时间点和结束时间点之间的第一时间间隔;Calculate the first time interval between the start time point and the end time point;
将预定时长内入列过程对应第一时间间隔的平均值确定为GPU单帧渲染时长。The average value of the first time interval corresponding to the enqueue process within the predetermined time period is determined as the GPU single frame rendering time.
可选的,根据GPU单帧渲染时长确定GPU是否达到性能瓶颈,包括:Optionally, determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time, including:
获取预定时长内的单帧渲染时长,单帧渲染时长为渲染单帧图像的时长;Get the single frame rendering time within a predetermined time period, and the single frame rendering time is the time length of rendering a single frame image;
根据GPU单帧渲染时长和单帧渲染时长确定GPU是否达到性能瓶颈。Determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time and single frame rendering time.
可选的,获取预定时长内的单帧渲染时长,包括:Optionally, obtaining a single frame rendering time within a predetermined time period includes:
获取预定时长内入列过程的开始时间点和结束时间点,入列过程是指将经过图像渲染的buffer放回BufferQueue的过程;Get the start time point and end time point of the enqueue process within a predetermined period of time. The enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
计算相邻两次入列过程的结束时间点之间的第二时间间隔;Calculate the second time interval between the end time points of two adjacent enqueue processes;
将预定时长内第二时间间隔的平均值确定为单帧渲染时长。The average value of the second time interval within the predetermined time period is determined as the single frame rendering time.
可选的,根据GPU单帧渲染时长和单帧渲染时长确定GPU是否达到性能瓶颈,包括:Optionally, determine whether the GPU has reached the performance bottleneck according to the GPU single frame rendering time and single frame rendering time, including:
计算GPU单帧渲染时长与单帧渲染时长的比值,比值小于1;Calculate the ratio of GPU single frame rendering time to single frame rendering time, the ratio is less than 1;
响应于比值大于预设数值,确定GPU达到性能瓶颈;In response to the ratio being greater than the preset value, it is determined that the GPU has reached the performance bottleneck;
响应于比值小于预设数值,确定GPU未达到性能瓶颈。In response to the ratio being less than the preset value, it is determined that the GPU has not reached the performance bottleneck.
可选的,获取预定时长内GPU的GPU运行频率之后,该方法还包括:Optionally, after obtaining the GPU operating frequency of the GPU within a predetermined period of time, the method further includes:
根据GPU运行频率计算预定时长内GPU的平均运行频率;响应于平均运行频率大于频率阈值,确定GPU运行频率满足预设条件,频率阈值小于GPU运行频率上限;Calculate the average operating frequency of the GPU within a predetermined period of time according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, it is determined that the GPU operating frequency meets the preset condition, and the frequency threshold is less than the upper limit of the GPU operating frequency;
或者,or,
响应于GPU运行频率大于频率阈值的时长大于时长阈值,确定GPU运行频率满足预设条件。In response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold, it is determined that the GPU operating frequency satisfies the preset condition.
可选的,根据GPU单帧渲染时长确定GPU是否达到性能瓶颈之后,该方法还包括:Optionally, after determining whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time, the method further includes:
响应于GPU达到性能瓶颈,获取当前帧率;In response to the GPU reaching the performance bottleneck, get the current frame rate;
响应于当前帧率未达到目标应用程序的目标帧率,且平均运行频率小于GPU运行频率上限,上调GPU的运行参数;In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increase the operating parameters of the GPU;
响应于当前帧率未达到目标应用程序的目标帧率,且平均运行频率达到GPU运行频率上限,调整目标应用程序的图像质量。In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaches the upper limit of the GPU operating frequency, the image quality of the target application is adjusted.
上述实施例对确定GPU性能瓶颈的原理进行了说明,下面结合图3所示的生产消费者模型,对确定GPU性能瓶颈的详细过程进行说明。The foregoing embodiment describes the principle of determining the GPU performance bottleneck. The detailed process of determining the GPU performance bottleneck will be described below in conjunction with the production consumer model shown in FIG. 3.
请参考图5,其示出了本申请另一个示例性实施例提供的GPU性能瓶颈的确定方法的方法流程图。该方法可以包括如下步骤。Please refer to FIG. 5, which shows a method flowchart of a method for determining a GPU performance bottleneck provided by another exemplary embodiment of the present application. The method may include the following steps.
步骤501,在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率。Step 501: During the running of the target application, obtain the GPU running frequency of the GPU within a predetermined time period.
本步骤的实施方式可以参考上述步骤401,本实施例在此不再赘述。For the implementation of this step, reference may be made to the above step 401, which will not be repeated in this embodiment.
步骤502,根据GPU运行频率计算预定时长内GPU的平均运行频率。Step 502: Calculate the average operating frequency of the GPU within a predetermined period of time according to the operating frequency of the GPU.
由于GPU运行频率会不断发生变化,因此仅基于单一的GPU运行频率确定是否满足预设条件,将影响结果的准确性。在一种可能的实施方式中,终端获取到预定时长内不同时间点对应的GPU运行频率,从而根据多个GPU运行频率计算预定时长内GPU的平均运行频率。As the GPU operating frequency will continue to change, it is only based on a single GPU operating frequency to determine whether the preset conditions are met, which will affect the accuracy of the results. In a possible implementation manner, the terminal obtains GPU operating frequencies corresponding to different time points within a predetermined period of time, so as to calculate the average operating frequency of the GPUs within the predetermined period of time according to multiple GPU operating frequencies.
在一个示意性的例子中,终端每隔1s采集一次CPU运行频率,从而获取到 10s内的10个CPU运行频率,分别为1550MHz、1570MHz、1625MHz、1650MHz、1655MHz、1600MHz、1650MHz、1650MHz、1600MHz、1650MHz,并进一步计算得到平均运行频率为1620MHz。In an illustrative example, the terminal collects the CPU operating frequency every 1s, thereby obtaining 10 CPU operating frequencies within 10s, which are 1550MHz, 1570MHz, 1625MHz, 1650MHz, 1655MHz, 1600MHz, 1650MHz, 1650MHz, 1600MHz, 1650MHz, and further calculate the average operating frequency to be 1620MHz.
需要说明的是,终端(操作系统)可以通过预设接口(比如devfreq)实时获取GPU工作频率,本申请实施例对获取GPU运行频率的方式不做限定。It should be noted that the terminal (operating system) can obtain the GPU operating frequency in real time through a preset interface (such as devfreq), and the embodiment of the present application does not limit the manner of obtaining the GPU operating frequency.
步骤503,响应于平均运行频率大于频率阈值,确定GPU运行频率满足预设条件,频率阈值小于GPU运行频率上限。Step 503: In response to the average operating frequency being greater than the frequency threshold, it is determined that the GPU operating frequency satisfies a preset condition, and the frequency threshold is less than the GPU operating frequency upper limit.
当GPU满负荷运行时(负载大),从频率上看,GPU运行频率趋近于GPU运行频率上限,因此,本申请实施例中,终端检测预定时长内GPU的平均运行频率是否大于频率阈值,若大于,则确定GPU运行频率满足预设条件(可能达到GPU性能瓶颈);若小于,则确定GPU运行频率不满足预设条件。When the GPU is running at full load (heavy load), from the point of view of frequency, the operating frequency of the GPU approaches the upper limit of the operating frequency of the GPU. Therefore, in the embodiment of the present application, the terminal detects whether the average operating frequency of the GPU within a predetermined period of time is greater than the frequency threshold. If it is greater than, it is determined that the GPU operating frequency meets the preset condition (which may reach the GPU performance bottleneck); if it is less than, it is determined that the GPU operating frequency does not meet the preset condition.
在一种可能的实施方式中,终端预先设置有运行频率比值上限,当GPU的平均运行频率/GPU运行频率上限大于该比值上限时,确定GPU运行频率满足预设条件。比如,该比值上限为0.8。In a possible implementation manner, the terminal is preset with an upper limit of the operating frequency ratio, and when the average operating frequency of the GPU/the upper limit of GPU operating frequency is greater than the upper limit of the ratio, it is determined that the GPU operating frequency meets the preset condition. For example, the upper limit of the ratio is 0.8.
结合上述步骤中的示例,当频率阈值为0.8×GPU运行频率上限,且运行频率上限制为2000MHz时,由于GPU的平均运行频率为1620MHz>1600MHz,因此终端确定GPU运行频率满足预设条件。Combining the example in the above steps, when the frequency threshold is 0.8×GPU operating frequency upper limit and the operating frequency is limited to 2000MHz, since the average operating frequency of the GPU is 1620MHz>1600MHz, the terminal determines that the GPU operating frequency meets the preset condition.
在其他可能的实施方式中,对于终端获取到的多个连续GPU运行频率,终端检测GPU运行频率大于频率阈值的时长是否大于时长阈值,若大于,则确定GPU运行频率满足预设条件,若小于,则确定GPU运行频率不满足预设条件。即终端检测GPU的运行频率是否长时间趋近于GPU运行频率上限。In other possible implementation manners, for multiple consecutive GPU operating frequencies acquired by the terminal, the terminal detects whether the GPU operating frequency is greater than the frequency threshold for a duration greater than the duration threshold, if it is greater, it determines that the GPU operating frequency satisfies the preset condition, if less than , It is determined that the GPU operating frequency does not meet the preset condition. That is, the terminal detects whether the operating frequency of the GPU is close to the upper limit of the GPU operating frequency for a long time.
步骤504,响应于GPU运行频率满足预设条件,则获取预定时长内入列过程的开始时间点和结束时间点,入列过程是指将经过图像渲染的buffer放回BufferQueue的过程。 Step 504, in response to the GPU operating frequency meeting the preset condition, obtain the start time point and the end time point of the enqueue process within a predetermined time period. The enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue.
如图3所示,每一帧图像的渲染需要经过出列(Dequeque)过程和入列(Enqueue)过程,其中,出列过程是指上层从BufferQueue中申请一个空闲buffer以进行渲染的过程(即图3中buffer由Free状态变为Dequeued状态的过程),而入列过程则是指将将渲染得到的数据写入buffer并放回BufferQueue,等待SurfaceFlinger进行合成的过程(即图3中buffer由Dequeued状态变为Queued状态)。As shown in Figure 3, the rendering of each frame of image needs to go through the dequeque process and the enqueue process. The dequeue process refers to the process in which the upper layer applies for a free buffer from the BufferQueue for rendering (ie The process in which the buffer changes from the Free state to the Dequeued state in Figure 3), and the enqueue process refers to the process of writing the rendered data into the buffer and returning it to the BufferQueue, waiting for SurfaceFlinger to synthesize (that is, the buffer in Figure 3 is Dequeued The status changes to the Queued status).
并且,从执行主体来看,出列过程由CPU执行,且在出列过程中,CPU进 测量视窗(view)的宽高(即measure)、设置view的宽高位置(即layout)、创建显示列表并绘制(即draw)以及生成多边形及纹理,并将生成纹理和多边形发送至GPU;入列过程则由GPU执行,且在入列过程中,GPU对CPU生成的纹理以及多边形进行栅格化和合成(即进行图像渲染),并将渲染得到的数据写入buffer中。And, from the perspective of the execution subject, the dequeue process is executed by the CPU, and during the dequeue process, the CPU measures the width and height of the view (ie measure), sets the width and height position of the view (ie layout), and creates the display List and draw (ie draw) and generate polygons and textures, and send the generated textures and polygons to the GPU; the enqueue process is executed by the GPU, and during the enqueue process, the GPU rasterizes the textures and polygons generated by the CPU And composite (that is, image rendering), and write the rendered data into the buffer.
因此,基于图像帧的渲染过程,在一种可能的实施方式中,终端记录预定时长内各次入列过程的开始时间点以及结束时间点,以便后续基于开始时间点和结束时间点计算每次入列过程的时长。Therefore, based on the image frame rendering process, in a possible implementation manner, the terminal records the start time point and end time point of each enqueue process within a predetermined period of time, so that subsequent calculations based on the start time point and the end time point The duration of the enrollment process.
可选的,入列过程的开始时间点为CPU向GPU发送多边形及纹理的时间点,而入列过程的结束时间点为经过图像渲染的buffer完成入列的时间点。Optionally, the start time point of the enqueue process is the time point when the CPU sends polygons and textures to the GPU, and the end time point of the enqueue process is the time point when the buffer that has undergone image rendering completes enqueue.
示意性的,图像渲染过程中,入列过程的开始时间点和结束时间点分布如图6所示。其中,每一帧图像绘制过程中,斜线填充部分为CPU运行时段(前半部是CPU进行测量、设置、纹理及多边形生成,后半部是CPU进行资源清理),而黑色填充部分为GPU运行时段(根据CPU发送的纹理和多边形进行绘制)。Schematically, during the image rendering process, the distribution of the start time point and the end time point of the enqueue process is shown in FIG. 6. Among them, in the process of drawing each frame of image, the oblique line filling part is the CPU running time (the first half is the CPU for measurement, setting, texture and polygon generation, and the second half is the CPU for resource cleaning), and the black filling part is for the GPU running Time period (drawing according to the texture and polygon sent by the CPU).
步骤505,根据开始时间点和结束时间点计算GPU单帧渲染时长。Step 505: Calculate the GPU single frame rendering time according to the start time point and the end time point.
由于图像渲染过程中,GPU的运行时间主要集中在入列过程,因此,终端可以根据每次入列过程的开始时间点以及结束时间点计算图像渲染过程中的GPU单帧渲染时长。在一种可能的实施方式中,本步骤可以包括如下步骤。In the image rendering process, the running time of the GPU is mainly concentrated in the enqueue process. Therefore, the terminal can calculate the GPU single frame rendering time in the image rendering process according to the start time point and end time point of each enqueue process. In a possible implementation, this step may include the following steps.
一、计算开始时间点和结束时间点之间的第一时间间隔。1. Calculate the first time interval between the start time point and the end time point.
可选的,对于预定时长内的每次入列过程,终端计算开始时间点和结束时间点之间的第一时间间隔,该第一时间间隔即为入列时长的耗时。Optionally, for each enqueue process within a predetermined time period, the terminal calculates a first time interval between the start time point and the end time point, and the first time interval is the time consumed for the enqueue time.
并且,由于预定时长内包含多帧图像渲染,即包含多次入列过程,因此终端重复本步骤,得到多个第一时间间隔。In addition, since the predetermined time period includes multiple frame image rendering, that is, multiple enqueue processes are involved, the terminal repeats this step to obtain multiple first time intervals.
在一个示意性的例子中,终端计算得到10次入列过程对应的第一时间间隔,分别为:6ms、5ms、6ms、7ms、6ms、6ms、5ms、6ms、7ms、6ms。In an illustrative example, the terminal calculates the first time interval corresponding to 10 enqueue processes, which are: 6ms, 5ms, 6ms, 7ms, 6ms, 6ms, 5ms, 6ms, 7ms, 6ms.
二、将预定时长内入列过程对应第一时间间隔的平均值确定为GPU单帧渲染时长。2. Determine the average value of the first time interval corresponding to the enqueue process within the predetermined time period as the GPU single frame rendering time.
为了避免根据单个第一时间间隔确定GPU单帧渲染时长造成的结果不准确,本实施例中,终端计算预定时长内各次入列过程对应第一时间间隔的平均值,从而将该平均值确定为预定时长内GPU的GPU单帧渲染时长。In order to avoid inaccurate results caused by determining the GPU single frame rendering time based on a single first time interval, in this embodiment, the terminal calculates the average value of each enqueue process corresponding to the first time interval within a predetermined time period, thereby determining the average value It is the GPU single frame rendering time of the GPU within the predetermined time.
结合上述步骤中的示例,终端根据10次入列过程对应的第一时间间隔,计 算得到平均值为6ms,从而将GPU单帧渲染时长确定为6ms。Combining the example in the above steps, the terminal calculates an average value of 6ms according to the first time interval corresponding to the 10 enqueue processes, so that the GPU single frame rendering time is determined to be 6ms.
需要说明的是,在其他可能的实施方式中,终端可以采用抽样的方式根据采样得到的若干个第一时间间隔确定GPU单帧渲染时长,本申请实施例对此不做限定。It should be noted that in other possible implementation manners, the terminal may determine the GPU single frame rendering duration according to several first time intervals obtained by sampling in a sampling manner, which is not limited in this embodiment of the application.
步骤506,获取预定时长内的单帧渲染时长,单帧渲染时长为渲染单帧图像的时长。Step 506: Obtain a single frame rendering duration within a predetermined duration, where the single frame rendering duration is the duration of rendering a single frame image.
图像渲染过程中,GPU满负荷运行时(负载大)的另一个表现是GPU运行时间所占比重增加,因此终端可以通过计算GPU单帧渲染时长占单帧图像总渲染时长的比例,确定GPU是否满负荷(达到性能瓶颈)。During the image rendering process, another performance when the GPU is running at full load (heavy load) is the increase in the proportion of GPU running time. Therefore, the terminal can determine whether the GPU is by calculating the ratio of the GPU single frame rendering time to the total single frame image rendering time Full load (reaching performance bottleneck).
在图5的基础上,如图7所示,本步骤还可以包括如下步骤。On the basis of FIG. 5, as shown in FIG. 7, this step may also include the following steps.
步骤506A,获取预定时长内入列过程的开始时间点和结束时间点。 Step 506A: Obtain the start time point and the end time point of the enqueue process within a predetermined time period.
其中,获取入列过程的开始时间点和结束时间点的过程可以参考上述步骤504,本实施例在此不再赘述。For the process of obtaining the start time point and the end time point of the enrollment process, reference may be made to the above step 504, which will not be repeated in this embodiment.
步骤506B,计算相邻两次入列过程的结束时间点之间的第二时间间隔。 Step 506B: Calculate the second time interval between the end time points of the two adjacent enqueue processes.
由于每次图像渲染都包括入列过程,因此,终端可以根据相邻两次入列过程的结束时间点之间的时间间隔,或者,根据相邻两次入列过程的开始时间点之间的时间间隔,确定单帧图像渲染时长。Since each image rendering includes the enqueue process, the terminal can be based on the time interval between the end time points of two adjacent enqueue processes, or according to the time interval between the start time points of two adjacent enqueue processes. The time interval determines the rendering time of a single frame image.
在一种可能的实施方式中,第二时间间隔=第i+1结束时间点-第i结束时间点,i≥1,或者,第二时间间隔=第i+1开始时间点-第i开始时间点,i≥1In a possible implementation manner, the second time interval=i+1th end time point-ith end time point, i≥1, or the second time interval=i+1th start time point-ith start Time point, i≥1
示意性的,如图6所示,终端根据第2帧图像对应的结束时间点以及第1帧图像对应的结束时间点,计算得到第二时间间隔;根据第3帧图像对应的结束时间点以及第2帧图像对应的结束时间点,计算得到第二时间间隔,以此类推。Schematically, as shown in FIG. 6, the terminal calculates the second time interval according to the end time point corresponding to the second frame image and the end time point corresponding to the first frame image; according to the end time point corresponding to the third frame image and The end time point corresponding to the second frame of image is calculated to obtain the second time interval, and so on.
在一个示意性的例子中,终端计算得到10个第二时间间隔均为16ms。In an illustrative example, the terminal calculates that 10 second time intervals are all 16 ms.
在其他可能的实施方式中,由于每次图像渲染都包括出列过程,因此,终端还可以根据相邻两次出列过程的结束时间点之间的时间间隔,或者,根据相邻两次出列过程的开始时间点之间的时间间隔,确定单帧图像渲染时长,本实施例对此不做限定。In other possible implementation manners, since each image rendering includes a dequeue process, the terminal may also be based on the time interval between the end time points of two adjacent dequeue processes, or according to The time interval between the start time points of the column process determines the rendering time of a single frame image, which is not limited in this embodiment.
步骤506C,将预定时长内第二时间间隔的平均值确定为单帧渲染时长。 Step 506C: Determine the average value of the second time interval within the predetermined time length as the single frame rendering time length.
为了避免根据单个第二时间间隔确定单帧渲染时长造成的结果不准确,本实施例中,终端计算预定时长内各个第二时间间隔的平均值,从而将该平均值 确定为预定时长内图像帧的单帧渲染时长。In order to avoid inaccurate results caused by determining the rendering duration of a single frame according to a single second time interval, in this embodiment, the terminal calculates the average value of each second time interval within the predetermined duration, thereby determining the average value as the image frame within the predetermined duration Single frame rendering time.
结合上述步骤中的示例,由于计算得到的第二时间间隔均为16ms,因此,终端确定预定时长内的单帧渲染时长为16ms。With reference to the example in the foregoing steps, since the calculated second time intervals are all 16 ms, the terminal determines that the single frame rendering time within the predetermined time length is 16 ms.
步骤507,根据GPU单帧渲染时长和单帧渲染时长确定GPU是否达到性能瓶颈。Step 507: Determine whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time and the single frame rendering time.
进一步的,终端根据计算得到的GPU单帧渲染时长和单帧渲染时长,确定预定时长内GPU是否达到性能瓶颈。如图7所示,本步骤可以包括如下步骤。Further, the terminal determines whether the GPU reaches the performance bottleneck within the predetermined time period according to the calculated GPU single frame rendering time and single frame rendering time. As shown in Figure 7, this step may include the following steps.
步骤507A,计算GPU单帧渲染时长与单帧渲染时长的比值。 Step 507A: Calculate the ratio of the GPU single frame rendering time to the single frame rendering time.
其中,比值=GPU单帧渲染时长/单帧渲染时长,即比值小于1。Among them, the ratio = GPU single frame rendering time/single frame rendering time, that is, the ratio is less than 1.
结合上述步骤中的示例,当GPU单帧渲染时长为6ms,且单帧渲染时长为16ms时,该比值为6/16=0.375。Combining the example in the above steps, when the GPU single frame rendering time is 6ms, and the single frame rendering time is 16ms, the ratio is 6/16=0.375.
进一步的,终端检测该比值是否大于预设数值,若大于,则确定图像渲染过程中GPU运行时间所占比重较大,即GPU达到性能瓶颈;若小于,则确定GPU未达到性能瓶颈。Further, the terminal detects whether the ratio is greater than a preset value, and if it is greater, it is determined that the GPU running time takes a larger proportion in the image rendering process, that is, the GPU has reached the performance bottleneck; if it is less, it is determined that the GPU has not reached the performance bottleneck.
步骤507B,响应于比值大于预设数值,确定GPU达到性能瓶颈。 Step 507B, in response to the ratio being greater than the preset value, it is determined that the GPU has reached the performance bottleneck.
结合上述步骤中的示例,当预设比值为0.3时,由于0.375>0.3,因此终端确定CPU达到性能瓶颈。Combining the examples in the above steps, when the preset ratio is 0.3, since 0.375>0.3, the terminal determines that the CPU reaches the performance bottleneck.
步骤507C,响应于比值小于预设数值,确定GPU未达到性能瓶颈。 Step 507C, in response to the ratio being less than the preset value, it is determined that the GPU has not reached the performance bottleneck.
本实施例中,终端根据图像渲染中入列过程的开始时间点以及结束时间点,计算单帧图像的单帧渲染时长以及GPU单帧渲染时长,并进一步根据单帧渲染时长和GPU单帧渲染时长的比值确定GPU是否达到性能瓶颈,降低了检测过程中的计算复杂度,并保证检测结果的准确性。In this embodiment, the terminal calculates the single frame rendering time and GPU single frame rendering time of a single frame image according to the start time point and end time point of the enqueue process in image rendering, and further calculates the single frame rendering time and GPU single frame rendering time according to the single frame rendering time and GPU single frame rendering time. The ratio of the duration determines whether the GPU reaches the performance bottleneck, reduces the computational complexity in the detection process, and ensures the accuracy of the detection result.
在一种可能的实施方式中,当确定GPU达到性能瓶颈时,为了保证目标应用程序的画面显示质量,在图5的基础上,如图8所示,步骤507之后还包括如下步骤。In a possible implementation manner, when it is determined that the GPU reaches the performance bottleneck, in order to ensure the image display quality of the target application, on the basis of FIG. 5, as shown in FIG. 8, the following steps are further included after step 507.
步骤508,响应于GPU达到性能瓶颈,获取当前帧率。 Step 508, in response to the GPU reaching the performance bottleneck, obtain the current frame rate.
针对获取当前帧率的方式,在一种可能的实施方式中,终端根据上述步骤中获取到的单帧渲染时长,计算当前帧率,其中当前帧率=1s/单帧渲染时长。比如,当单帧渲染时长为16ms时,终端计算得到当前帧率为62fps。Regarding the way of obtaining the current frame rate, in a possible implementation manner, the terminal calculates the current frame rate according to the single frame rendering time obtained in the above steps, where the current frame rate=1s/single frame rendering time. For example, when the single frame rendering time is 16ms, the terminal calculates that the current frame rate is 62fps.
进一步的,终端获取目标应用程序的目标帧率,并检测当前帧率是否达到 目标帧率,若达到,则确定目标应用程序的画面未发生卡顿;若未达到,则确定目标应用程序的画面发生卡顿,并执行下述步骤509或510。Further, the terminal obtains the target frame rate of the target application, and detects whether the current frame rate reaches the target frame rate, and if it reaches the target frame rate, it is determined that the screen of the target application does not freeze; if it is not reached, the screen of the target application is determined Stutter occurs, and the following step 509 or 510 is executed.
可选的,目标应用程序的目标帧率由终端操作系统通过与目标应用程序之间的数据通道获取。Optionally, the target frame rate of the target application is obtained by the terminal operating system through a data channel with the target application.
步骤509,响应于当前帧率未达到目标应用程序的目标帧率,且平均运行频率小于GPU运行频率上限,上调GPU的运行参数。Step 509: In response to the current frame rate does not reach the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increase the operating parameters of the GPU.
在一种可能的实施方式中,当当前帧率未达到目标帧率时,终端获取平均运行频率(步骤502中计算得到),并检测平均运行频率是否达到GPU运行频率上限。若未达到,表明GPU的性能存在提升空间,终端则上调GPU的运行参数。比如,终端在平均运行频率的基础上,根据预定上调幅度逐步上调至运行频率上限,以提高GPU的渲染性能。In a possible implementation manner, when the current frame rate does not reach the target frame rate, the terminal obtains the average operating frequency (calculated in step 502), and detects whether the average operating frequency reaches the upper limit of the GPU operating frequency. If it is not reached, it indicates that there is room for improvement in the performance of the GPU, and the terminal will increase the operating parameters of the GPU. For example, on the basis of the average operating frequency, the terminal is gradually adjusted to the upper limit of the operating frequency according to the predetermined increase range to improve the rendering performance of the GPU.
在一个示意性的例子中,GPU的平均运行频率为1620MHz,而GPU运行频率上限为2000MHz,终端按照预定上调幅度50Mhz,在1620MHz的基础上逐步上调GPU的运行频率。In an illustrative example, the average operating frequency of the GPU is 1620 MHz, and the upper limit of the operating frequency of the GPU is 2000 MHz, and the terminal increases the operating frequency of the GPU gradually on the basis of 1620 MHz according to a predetermined increase of 50 MHz.
步骤510,响应于当前帧率未达到目标应用程序的目标帧率,且平均运行频率达到GPU运行频率上限,调整目标应用程序的图像质量。Step 510: In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency, adjust the image quality of the target application.
当当前帧率未达到目标帧率时,且平均运行帧率已达到GPU运行频率上限时,为了保证画面的显示质量,终端调整(比如下调)目标应用程序的图像质量,从而降低GPU的图像渲染难度。When the current frame rate has not reached the target frame rate and the average operating frame rate has reached the upper limit of the GPU operating frequency, in order to ensure the display quality of the picture, the terminal adjusts (for example, lowers) the image quality of the target application, thereby reducing the GPU image rendering Difficulty.
本实施例中,终端检测到GPU到达性能瓶颈后,进一步判断画面是否发生卡顿,并在发生卡顿时,基于GPU的平均运行频率与运行频率上限,调整GPU的运行频率或者调整目标应用程序的画面质量,从而提高目标应用程序的画面显示质量。In this embodiment, after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether the screen freezes, and when the freeze occurs, adjust the GPU operating frequency or adjust the target application's operating frequency based on the GPU's average operating frequency and the upper operating frequency limit. Picture quality, thereby improving the picture display quality of the target application.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of this application, which can be used to implement the method embodiments of this application. For details not disclosed in the device embodiment of this application, please refer to the method embodiment of this application.
请参考图9,其示出了本申请一个实施例提供的GPU性能瓶颈的确定装置的结构示意图。该装置可以通过专用硬件电路,或者,软硬件的结合实现成为图1中的终端的全部或一部分,该装置包括:Please refer to FIG. 9, which shows a schematic structural diagram of an apparatus for determining a GPU performance bottleneck provided by an embodiment of the present application. The device can be implemented as all or part of the terminal in Figure 1 through a dedicated hardware circuit, or a combination of software and hardware. The device includes:
第一获取模块910,用于在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率;The first obtaining module 910 is configured to obtain the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;
第二获取模块920,用于响应于所述GPU运行频率满足预设条件,获取所述预定时长内所述GPU的GPU单帧渲染时长,所述GPU单帧渲染时长为单帧图像渲染过程中所述GPU的运行时长;The second obtaining module 920 is configured to obtain the GPU single frame rendering time length of the GPU within the predetermined time period in response to the GPU operating frequency meeting a preset condition, and the GPU single frame rendering time length is during the single frame image rendering process The running time of the GPU;
确定模块930,用于根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈。The determining module 930 is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
可选的,所述第二获取模块920,包括:Optionally, the second obtaining module 920 includes:
第一获取单元,用于获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的缓冲区buffer放回缓冲区队列BufferQueue的过程;The first acquiring unit is configured to acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;
第一计算单元,用于根据所述开始时间点和所述结束时间点计算所述GPU单帧渲染时长。The first calculation unit is configured to calculate the GPU single frame rendering time according to the start time point and the end time point.
可选的,所述第一计算单元,用于:Optionally, the first calculation unit is configured to:
计算所述开始时间点和所述结束时间点之间的第一时间间隔;Calculating the first time interval between the start time point and the end time point;
将所述预定时长内入列过程对应所述第一时间间隔的平均值确定为所述GPU单帧渲染时长。The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
可选的,所述确定模块930,包括:Optionally, the determining module 930 includes:
第二获取单元,用于获取所述预定时长内的单帧渲染时长,所述单帧渲染时长为渲染单帧图像的时长;The second acquiring unit is configured to acquire a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;
确定单元,用于根据所述GPU单帧渲染时长和所述单帧渲染时长确定所述GPU是否达到性能瓶颈。The determining unit is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
可选的,第二获取单元,用于:Optionally, the second acquiring unit is used to:
获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的buffer放回BufferQueue的过程;Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
计算相邻两次入列过程的所述结束时间点之间的第二时间间隔;Calculating the second time interval between the end time points of two adjacent enqueue processes;
将所述预定时长内所述第二时间间隔的平均值确定为所述单帧渲染时长。The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
可选的,确定单元,用于:Optionally, the determination unit is used to:
计算所述GPU单帧渲染时长与所述单帧渲染时长的比值,所述比值小于1;Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;
响应于所述比值大于预设数值,确定所述GPU达到性能瓶颈;In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;
响应于所述比值小于所述预设数值,确定所述GPU未达到性能瓶颈。In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.
可选的,所述装置还包括:Optionally, the device further includes:
第一检测模块,用于根据所述GPU运行频率计算所述预定时长内所述GPU 的平均运行频率;响应于所述平均运行频率大于频率阈值,确定所述GPU运行频率满足所述预设条件,所述频率阈值小于所述GPU的运行频率上限;The first detection module is configured to calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, determining that the GPU operating frequency satisfies the preset condition , The frequency threshold is less than the upper limit of the operating frequency of the GPU;
或者,or,
第二检测模块,用于响应于所述GPU运行频率大于所述频率阈值的时长大于时长阈值,确定所述GPU运行频率满足所述预设条件。The second detection module is configured to determine that the GPU operating frequency satisfies the preset condition in response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold.
可选的,所述装置还包括:Optionally, the device further includes:
第三获取模块,用于响应于所述GPU达到性能瓶颈,获取当前帧率;The third obtaining module is configured to obtain the current frame rate in response to the GPU reaching the performance bottleneck;
第一调整模块,用于响应于所述当前帧率未达到所述目标应用程序的目标帧率,且所述平均运行频率小于所述GPU的运行频率上限,上调所述GPU的运行参数;The first adjustment module is configured to increase the operating parameters of the GPU in response to that the current frame rate does not reach the target frame rate of the target application and the average operating frequency is less than the upper limit of the operating frequency of the GPU;
第二调整模块,用于响应于所述当前帧率未达到所述目标应用程序的所述目标帧率,且所述平均运行频率达到所述GPU的运行频率上限,调整所述目标应用程序的图像质量。The second adjustment module is configured to adjust the performance of the target application in response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency Image Quality.
综上所述,本申请实施例中,通过获取预定时长内GPU的GPU运行频率,并在GPU运行频率满足预设条件时,进一步获取预定时长内GPU的GPU单帧渲染时长,从而根据GPU单帧渲染时长确定目标应用程序运行过程中GPU是否达到性能瓶颈;本申请实施例中,终端基于GPU运行频率以及单帧图像渲染过程的GPU单帧渲染时长进行GPU性能瓶颈判断,无需借助驱动进行复杂的GPU负载计算,在保证GPU性能瓶颈检测准确性的同时,降低了性能瓶颈检测的复杂度,提高了性能瓶颈检测的效率。In summary, in the embodiments of the present application, by obtaining the GPU operating frequency of the GPU within a predetermined period of time, and when the GPU operating frequency meets a preset condition, the GPU single frame rendering time of the GPU within the predetermined period of time is further obtained, so as The frame rendering time determines whether the GPU reaches the performance bottleneck during the running of the target application; in this embodiment, the terminal judges the GPU performance bottleneck based on the GPU operating frequency and the GPU single frame rendering time of the single frame image rendering process, without the need for driver complexity GPU load calculation, while ensuring the accuracy of GPU performance bottleneck detection, reduces the complexity of performance bottleneck detection and improves the efficiency of performance bottleneck detection.
本实施例中,终端根据图像渲染中入列过程的开始时间点以及结束时间点,计算单帧图像的单帧渲染时长以及GPU单帧渲染时长,并进一步根据单帧渲染时长和GPU单帧渲染时长的比值确定GPU是否达到性能瓶颈,降低了检测过程中的计算复杂度,并保证检测结果的准确性。In this embodiment, the terminal calculates the single frame rendering time and GPU single frame rendering time of a single frame image according to the start time point and end time point of the enqueue process in image rendering, and further calculates the single frame rendering time and GPU single frame rendering time according to the single frame rendering time and GPU single frame rendering time. The ratio of the duration determines whether the GPU reaches the performance bottleneck, reduces the computational complexity in the detection process, and ensures the accuracy of the detection result.
本实施例中,终端检测到GPU到达性能瓶颈后,进一步判断画面是否发生卡顿,并在发生卡顿时,基于GPU的平均运行频率与运行频率上限,调整GPU的运行频率或者调整目标应用程序的画面质量,从而提高目标应用程序的画面显示质量。In this embodiment, after detecting that the GPU reaches the performance bottleneck, the terminal further determines whether the screen freezes, and when the freeze occurs, adjust the GPU operating frequency or adjust the target application's operating frequency based on the GPU's average operating frequency and the upper operating frequency limit. Picture quality, thereby improving the picture display quality of the target application.
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由 不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that the device provided in the above embodiment, when implementing its functions, only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules as required, namely The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and the specific implementation process is detailed in the method embodiments, which will not be repeated here.
本申请还提供一种计算机可读介质,其上存储有程序指令,程序指令被处理器执行时实现上述各个方法实施例提供的GPU性能瓶颈的确定方法。The present application also provides a computer-readable medium on which program instructions are stored. When the program instructions are executed by a processor, the method for determining the GPU performance bottleneck provided by the foregoing method embodiments is implemented.
本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各个实施例所述的GPU性能瓶颈的确定方法。The present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the GPU performance bottleneck determination method described in each of the foregoing embodiments.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的帧率控制方法中全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。Those of ordinary skill in the art can understand that all or part of the steps in the frame rate control method of the foregoing embodiments can be implemented by hardware, or by a program to instruct relevant hardware to complete, and the program can be stored in a computer readable Among the storage media, the aforementioned storage media may be read-only memory, magnetic disks, or optical disks. The above descriptions are only preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application Within range.

Claims (18)

  1. 一种图形处理器GPU性能瓶颈的确定方法,其特征在于,所述方法包括:A method for determining the performance bottleneck of a graphics processor GPU, characterized in that the method includes:
    在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率;During the operation of the target application, obtain the GPU operating frequency of the GPU within a predetermined period of time;
    响应于所述GPU运行频率满足预设条件,获取所述预定时长内所述GPU的GPU单帧渲染时长,所述GPU单帧渲染时长为单帧图像渲染过程中所述GPU的运行时长;In response to the GPU running frequency meeting a preset condition, acquiring the GPU single frame rendering time length of the GPU within the predetermined time period, where the GPU single frame rendering time length is the running time length of the GPU in the single frame image rendering process;
    根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈。Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time.
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述预定时长内所述GPU的GPU单帧渲染时长,包括:The method according to claim 1, wherein said acquiring the GPU single frame rendering duration of the GPU within the predetermined duration comprises:
    获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的缓冲区buffer放回缓冲区队列BufferQueue的过程;Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;
    根据所述开始时间点和所述结束时间点计算所述GPU单帧渲染时长。Calculate the GPU single frame rendering time according to the start time point and the end time point.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述开始时间点和所述结束时间点计算所述GPU单帧渲染时长,包括:The method according to claim 2, wherein the calculating the GPU single frame rendering time according to the start time point and the end time point comprises:
    计算所述开始时间点和所述结束时间点之间的第一时间间隔;Calculating the first time interval between the start time point and the end time point;
    将所述预定时长内入列过程对应所述第一时间间隔的平均值确定为所述GPU单帧渲染时长。The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈,包括:The method according to any one of claims 1 to 3, wherein the determining whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time includes:
    获取所述预定时长内的单帧渲染时长,所述单帧渲染时长为渲染单帧图像的时长;Acquiring a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;
    根据所述GPU单帧渲染时长和所述单帧渲染时长确定所述GPU是否达到性能瓶颈。Determine whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述预定时长内的单帧渲染时长,包括:The method according to claim 4, wherein the obtaining the single frame rendering time within the predetermined time duration comprises:
    获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的buffer放回BufferQueue的过程;Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
    计算相邻两次入列过程的所述结束时间点之间的第二时间间隔;Calculating the second time interval between the end time points of two adjacent enqueue processes;
    将所述预定时长内所述第二时间间隔的平均值确定为所述单帧渲染时长。The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
  6. 根据权利要求4所述的方法,其特征在于,所述根据所述GPU单帧渲染时长和所述单帧渲染时长确定所述GPU是否达到性能瓶颈,包括:The method of claim 4, wherein the determining whether the GPU has reached a performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length comprises:
    计算所述GPU单帧渲染时长与所述单帧渲染时长的比值,所述比值小于1;Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;
    响应于所述比值大于预设数值,确定所述GPU达到性能瓶颈;In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;
    响应于所述比值小于所述预设数值,确定所述GPU未达到性能瓶颈。In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.
  7. 根据权利要求1至3任一所述的方法,其特征在于,所述获取预定时长内GPU的GPU运行频率之后,所述方法还包括:The method according to any one of claims 1 to 3, wherein after the obtaining the GPU operating frequency of the GPU within a predetermined period of time, the method further comprises:
    根据所述GPU运行频率计算所述预定时长内所述GPU的平均运行频率;响应于所述平均运行频率大于频率阈值,确定所述GPU运行频率满足所述预设条件,所述频率阈值小于GPU运行频率上限;Calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than a frequency threshold, it is determined that the GPU operating frequency satisfies the preset condition, and the frequency threshold is less than the GPU Operating frequency upper limit;
    或者,or,
    响应于所述GPU运行频率大于所述频率阈值的时长大于时长阈值,确定所述GPU运行频率满足所述预设条件。In response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold, determining that the GPU operating frequency satisfies the preset condition.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈之后,所述方法还包括:The method according to claim 7, wherein after determining whether the GPU reaches a performance bottleneck according to the GPU single frame rendering time, the method further comprises:
    响应于所述GPU达到性能瓶颈,获取当前帧率;In response to the GPU reaching a performance bottleneck, acquiring the current frame rate;
    响应于所述当前帧率未达到所述目标应用程序的目标帧率,且所述平均运行频率小于所述GPU运行频率上限,上调所述GPU的运行参数;In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency is less than the upper limit of the GPU operating frequency, increasing the operating parameters of the GPU;
    响应于所述当前帧率未达到所述目标应用程序的所述目标帧率,且所述平均运行频率达到所述GPU运行频率上限,调整所述目标应用程序的图像质量。In response to the current frame rate not reaching the target frame rate of the target application, and the average operating frequency reaching the upper limit of the GPU operating frequency, the image quality of the target application is adjusted.
  9. 一种图形处理器GPU性能瓶颈的确定装置,其特征在于,所述装置包括:A device for determining the performance bottleneck of a graphics processor GPU, characterized in that the device comprises:
    第一获取模块,用于在目标应用程序运行过程中,获取预定时长内GPU的GPU运行频率;The first acquiring module is used to acquire the GPU operating frequency of the GPU within a predetermined period of time during the running process of the target application;
    第二获取模块,用于当所述GPU运行频率满足预设条件时,获取所述预定时长内所述GPU的GPU单帧渲染时长,所述GPU单帧渲染时长为单帧图像渲染过程中所述GPU的运行时长;The second acquiring module is configured to acquire the GPU single-frame rendering duration of the GPU within the predetermined time period when the GPU operating frequency meets a preset condition, and the GPU single-frame rendering duration is determined by the single-frame image rendering process. The running time of the GPU;
    确定模块,用于根据所述GPU单帧渲染时长确定所述GPU是否达到性能瓶颈。The determining module is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time.
  10. 根据权利要求9所述的装置,其特征在于,所述第二获取模块,包括:The device according to claim 9, wherein the second acquisition module comprises:
    第一获取单元,用于获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的缓冲区buffer放回缓冲区队列BufferQueue的过程;The first acquiring unit is configured to acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the image rendering buffer back into the buffer queue BufferQueue;
    第一计算单元,用于根据所述开始时间点和所述结束时间点计算所述GPU单帧渲染时长。The first calculation unit is configured to calculate the GPU single frame rendering time according to the start time point and the end time point.
  11. 根据权利要求10所述的装置,其特征在于,所述第一计算单元,用于:The device according to claim 10, wherein the first calculation unit is configured to:
    计算所述开始时间点和所述结束时间点之间的第一时间间隔;Calculating the first time interval between the start time point and the end time point;
    将所述预定时长内入列过程对应所述第一时间间隔的平均值确定为所述GPU单帧渲染时长。The average value of the enqueue process corresponding to the first time interval within the predetermined time period is determined as the GPU single frame rendering time.
  12. 根据权利要求9至11任一所述的装置,其特征在于,所述确定模块,包括:The device according to any one of claims 9 to 11, wherein the determining module comprises:
    第二获取单元,用于获取所述预定时长内的单帧渲染时长,所述单帧渲染时长为渲染单帧图像的时长;The second acquiring unit is configured to acquire a single frame rendering time length within the predetermined time period, where the single frame rendering time length is the time length for rendering a single frame image;
    确定单元,用于根据所述GPU单帧渲染时长和所述单帧渲染时长确定所述GPU是否达到性能瓶颈。The determining unit is configured to determine whether the GPU reaches the performance bottleneck according to the GPU single frame rendering time length and the single frame rendering time length.
  13. 根据权利要求12所述的装置,其特征在于,所述第二获取单元:The device according to claim 12, wherein the second acquiring unit:
    获取所述预定时长内入列过程的开始时间点和结束时间点,所述入列过程是指将经过图像渲染的buffer放回BufferQueue的过程;Acquire the start time point and the end time point of the enqueue process within the predetermined time period, and the enqueue process refers to the process of putting the buffer after the image rendering back into the BufferQueue;
    计算相邻两次入列过程的所述结束时间点之间的第二时间间隔;Calculating the second time interval between the end time points of two adjacent enqueue processes;
    将所述预定时长内所述第二时间间隔的平均值确定为所述单帧渲染时长。The average value of the second time interval within the predetermined time period is determined as the single frame rendering time length.
  14. 根据权利要求12所述的装置,其特征在于,所述确定单元,用于:The device according to claim 12, wherein the determining unit is configured to:
    计算所述GPU单帧渲染时长与所述单帧渲染时长的比值,所述比值小于1;Calculate the ratio of the GPU single frame rendering time to the single frame rendering time, and the ratio is less than 1;
    响应于所述比值大于预设数值,确定所述GPU达到性能瓶颈;In response to the ratio being greater than a preset value, determining that the GPU has reached a performance bottleneck;
    响应于所述比值小于所述预设数值,确定所述GPU未达到性能瓶颈。In response to the ratio being less than the preset value, it is determined that the GPU has not reached a performance bottleneck.
  15. 根据权利要求9至11任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 9 to 11, wherein the device further comprises:
    第一检测模块,用于根据所述GPU运行频率计算所述预定时长内所述GPU的平均运行频率;响应于所述平均运行频率大于频率阈值,确定所述GPU运行频率满足所述预设条件,所述频率阈值小于GPU运行频率上限;The first detection module is configured to calculate the average operating frequency of the GPU within the predetermined time period according to the GPU operating frequency; in response to the average operating frequency being greater than the frequency threshold, determining that the GPU operating frequency satisfies the preset condition , The frequency threshold is less than the upper limit of the GPU operating frequency;
    或者,or,
    第二检测模块,用于响应于所述GPU运行频率大于所述频率阈值的时长大于时长阈值,确定所述GPU运行频率满足所述预设条件。The second detection module is configured to determine that the GPU operating frequency satisfies the preset condition in response to the duration of the GPU operating frequency being greater than the frequency threshold being greater than the duration threshold.
  16. 根据权利要求15所述的装置,其特征在于,所述装置还包括:The device according to claim 15, wherein the device further comprises:
    第三获取模块,用于响应于所述GPU达到性能瓶颈,获取当前帧率;The third obtaining module is configured to obtain the current frame rate in response to the GPU reaching the performance bottleneck;
    第一调整模块,用于响应于所述当前帧率未达到所述目标应用程序的目标帧率,且所述平均运行频率小于所述GPU运行频率上限,上调所述GPU的运行参数;The first adjustment module is configured to adjust the operating parameters of the GPU in response to the current frame rate not reaching the target frame rate of the target application and the average operating frequency is less than the upper limit of the GPU operating frequency;
    第二调整模块,用于响应于所述当前帧率未达到所述目标应用程序的所述目标帧率,且所述平均运行频率达到所述GPU运行频率上限,调整所述目标应用程序的图像质量。The second adjustment module is configured to adjust the image of the target application in response to the current frame rate not reaching the target frame rate of the target application and the average operating frequency reaching the upper limit of the GPU operating frequency quality.
  17. 一种终端,其特征在于,所述终端包括处理器、与所述处理器相连的存储器,以及存储在所述存储器上的程序指令,所述处理器执行所述程序指令时实现如权利要求1至8任一所述的GPU性能瓶颈的确定方法。A terminal, wherein the terminal includes a processor, a memory connected to the processor, and program instructions stored on the memory, and the processor executes the program instructions as claimed in claim 1. To any method for determining GPU performance bottlenecks described in 8.
  18. 一种计算机可读存储介质,其特征在于,其上存储有程序指令,所述 程序指令被处理器执行时实现如权利要求1至8任一所述的GPU性能瓶颈的确定方法。A computer-readable storage medium, characterized in that program instructions are stored thereon, and when the program instructions are executed by a processor, the method for determining the GPU performance bottleneck according to any one of claims 1 to 8 is realized.
PCT/CN2020/071796 2019-01-28 2020-01-13 Gpu performance bottleneck determining method and device, terminal, and storage medium WO2020156132A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910080514.5A CN109800141B (en) 2019-01-28 2019-01-28 GPU performance bottleneck determining method, device, terminal and storage medium
CN201910080514.5 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020156132A1 true WO2020156132A1 (en) 2020-08-06

Family

ID=66560585

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071796 WO2020156132A1 (en) 2019-01-28 2020-01-13 Gpu performance bottleneck determining method and device, terminal, and storage medium

Country Status (2)

Country Link
CN (1) CN109800141B (en)
WO (1) WO2020156132A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800141B (en) * 2019-01-28 2020-08-18 Oppo广东移动通信有限公司 GPU performance bottleneck determining method, device, terminal and storage medium
WO2021000226A1 (en) * 2019-07-01 2021-01-07 Qualcomm Incorporated Methods and apparatus for optimizing frame response
CN112516590A (en) * 2019-09-19 2021-03-19 华为技术有限公司 Frame rate identification method and electronic equipment
CN110795056B (en) * 2019-11-08 2023-08-15 Oppo广东移动通信有限公司 Method, device, terminal and storage medium for adjusting display parameters
CN111429333A (en) * 2020-03-25 2020-07-17 京东方科技集团股份有限公司 GPU dynamic frequency modulation method, device and system
CN112363842B (en) * 2020-11-27 2023-01-06 Oppo(重庆)智能科技有限公司 Frequency adjusting method and device for graphic processor, electronic equipment and storage medium
CN113138655B (en) * 2021-04-02 2023-11-28 Oppo广东移动通信有限公司 Processor frequency adjusting method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105573755A (en) * 2015-12-15 2016-05-11 北京奇虎科技有限公司 Method and device for acquiring application Activity rendering time
CN107102936A (en) * 2017-05-27 2017-08-29 腾讯科技(深圳)有限公司 The appraisal procedure and mobile terminal and storage medium of a kind of fluency
CN109104638A (en) * 2018-08-03 2018-12-28 Oppo广东移动通信有限公司 Frame per second optimization method, device, terminal and storage medium
CN109189543A (en) * 2018-10-16 2019-01-11 Oppo广东移动通信有限公司 terminal control method, device, storage medium and intelligent terminal
CN109800141A (en) * 2019-01-28 2019-05-24 Oppo广东移动通信有限公司 Determination method, apparatus, terminal and the storage medium of GPU performance bottleneck

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542914B2 (en) * 2008-12-31 2017-01-10 Apple Inc. Display system with improved graphics abilities while switching graphics processing units
CN108089958B (en) * 2017-12-29 2021-06-08 珠海市君天电子科技有限公司 GPU test method, terminal device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105573755A (en) * 2015-12-15 2016-05-11 北京奇虎科技有限公司 Method and device for acquiring application Activity rendering time
CN107102936A (en) * 2017-05-27 2017-08-29 腾讯科技(深圳)有限公司 The appraisal procedure and mobile terminal and storage medium of a kind of fluency
CN109104638A (en) * 2018-08-03 2018-12-28 Oppo广东移动通信有限公司 Frame per second optimization method, device, terminal and storage medium
CN109189543A (en) * 2018-10-16 2019-01-11 Oppo广东移动通信有限公司 terminal control method, device, storage medium and intelligent terminal
CN109800141A (en) * 2019-01-28 2019-05-24 Oppo广东移动通信有限公司 Determination method, apparatus, terminal and the storage medium of GPU performance bottleneck

Also Published As

Publication number Publication date
CN109800141B (en) 2020-08-18
CN109800141A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
WO2020156132A1 (en) Gpu performance bottleneck determining method and device, terminal, and storage medium
WO2020207250A1 (en) Vertical synchronization method and apparatus, terminal, and storage medium
WO2020207251A1 (en) Image update method and apparatus, and terminal and storage medium
US10325573B2 (en) Managing transitions of adaptive display rates for different video playback scenarios
WO2021155690A1 (en) Image rendering method and related device
JP5089079B2 (en) Program, information storage medium, and image generation system
TWI669592B (en) Method, system and non-transitory computer readable medium that provide backward compatibility through use of spoof clock and fine grain frequency control
CN107770618B (en) Image processing method, device and storage medium
CN110018759B (en) Interface display method, device, terminal and storage medium
CN111491208B (en) Video processing method and device, electronic equipment and computer readable medium
CN108846791B (en) Rendering method and device of physical model and electronic equipment
CN109992347A (en) Interface display method, device, terminal and storage medium
US20210236928A1 (en) Asset aware computing architecture for graphics processing
JP2011090663A (en) Image processing apparatus and program
WO2024067159A1 (en) Video generation method and apparatus, electronic device, and storage medium
CN109925715B (en) Virtual water area generation method and device and terminal
WO2022237289A1 (en) Vibration control method and apparatus, mobile terminal, and storage medium
CN116173496A (en) Image frame rendering method and related device
JP2000245966A (en) Method and device for processing image
CN109069926B (en) Information processing method and computer readable medium
CN109992348A (en) Interface display method, device, terminal and storage medium
CN112215932B (en) Particle animation processing method and device, storage medium and computer equipment
JP3468985B2 (en) Graphic drawing apparatus and graphic drawing method
CN112215932A (en) Particle animation processing method, device, storage medium and computer equipment
CN114904279A (en) Data preprocessing method, device, medium and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20749560

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20749560

Country of ref document: EP

Kind code of ref document: A1