TW201802768A - Graphics processing method and graphics processing apparatus - Google Patents

Graphics processing method and graphics processing apparatus Download PDF

Info

Publication number
TW201802768A
TW201802768A TW106123178A TW106123178A TW201802768A TW 201802768 A TW201802768 A TW 201802768A TW 106123178 A TW106123178 A TW 106123178A TW 106123178 A TW106123178 A TW 106123178A TW 201802768 A TW201802768 A TW 201802768A
Authority
TW
Taiwan
Prior art keywords
image processing
frame
processing unit
event
performance
Prior art date
Application number
TW106123178A
Other languages
Chinese (zh)
Other versions
TWI633517B (en
Inventor
林元淳
鄒雯姍
吳俊源
Original Assignee
聯發科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201662361039P priority Critical
Priority to US62/361,039 priority
Priority to US15/606,132 priority patent/US20170262955A1/en
Priority to US15/606,132 priority
Application filed by 聯發科技股份有限公司 filed Critical 聯發科技股份有限公司
Publication of TW201802768A publication Critical patent/TW201802768A/en
Application granted granted Critical
Publication of TWI633517B publication Critical patent/TWI633517B/en

Links

Abstract

The invention provides an image processing method and an image processing device. The image processing method includes: processing a first frame by providing a control setting to a first group of devices to achieve a first performance metric; and receiving, after the first frame, scene information about the second frame from the second group of devices; The change between the first frame and the second frame is quantized; the control setting is adaptively adjusted according to a comparison between the quantized change and a predetermined threshold; and the first set of devices is set by providing the adjusted control To process the second frame.

Description

Image processing method and image processing device

The present invention relates generally to video playback within an electronic device, and more particularly to power management of an image processing unit.

Unless otherwise stated, the scenarios described in this section are not prior to the preceding claims and are not admitted to the present disclosure as they are included in this section.

A video processing unit (GPU) (also known as a VPU, or visual processing unit) is a specially crafted electronic circuit designed to quickly operate and change storage to speed up image creation in a frame buffer to be output to the display. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and gaming consoles. Modern GPUs are very effective for operating computer images and image processing, and their highly parallel architecture makes them more efficient than the average CPU in parallel processing of large blocks of data.

In order to solve the power consumption problem in image processing, an image processing method and an image processing device are provided.

The present invention provides an image processing method including: processing a first frame by providing a control setting to a first group of devices to achieve a first performance metric; and receiving, after the first frame, a scenario regarding a second frame from the second group of devices Information The change between the first frame and the second frame is quantized; the control setting is adaptively adjusted according to a comparison between the quantized change and a predetermined threshold; and the first set of devices are set by providing control of the adjustment The second frame is processed.

The present invention further provides an image processing method, comprising: processing a frame by providing a control setting to a set of devices to achieve a first performance metric; and when the image processing unit processes the frame, detecting a specific event at the image processing unit; Identifying a second performance metric based on the detected event; and adjusting the control setting of the set of devices to achieve the second performance metric.

The present invention further provides an image processing apparatus comprising: a set of processing units; an image processing unit; a display device; and a computer readable storage medium storing a plurality of sets of instructions, wherein the execution of the plurality of sets of instructions by the set of processing units is configured The set of processing units perform the following actions: when the image processing unit processes a frame to be displayed on the display device, providing control settings to the image processing unit and the display device to achieve a first performance metric; when the image processing unit While processing the frame, detecting a particular event at the image processing unit; identifying a second performance metric based on the detected event; and adjusting the control setting of the image processing unit based on the second performance metric.

The image processing method and image processing apparatus of the present invention can optimize the performance level of the image processing system.

100‧‧‧Image Processing System

110‧‧‧Power Manager

120‧‧‧Frame Analyzer

130‧‧‧Performance Controller

131‧‧‧frequency setting

132‧‧‧Voltage setting

140‧‧‧Event Reporter

150‧‧‧Performance lookup table

211-214‧‧‧early indicator

310-370‧‧‧Steps

410‧‧‧CPU

420‧‧‧GPU

430‧‧‧ main memory

435‧‧‧ memory controller

440‧‧‧ GPU memory

450‧‧‧ display device

455‧‧‧ display controller

490‧‧‧Other equipment

701‧‧‧ Event identity

702‧‧‧ Actual time

710‧‧‧ lookup table

711‧‧‧Expected time

800‧‧‧Electronic system

805‧‧‧Include bus

810‧‧‧Processing unit

815‧‧‧Image Processing Unit

820‧‧‧System Memory

825‧‧‧Network

830‧‧‧Reading memory

835‧‧‧Permanent storage

840‧‧‧ Input device

845‧‧‧Output equipment

FIG. 1 shows an image processing system 100.

FIG. 2 is a diagram showing a setting in which the power manager 110 adjusts the performance of the image processing system 100.

Figure 3 conceptually illustrates a process 300 for managing power within an image processing system.

Figure 4 is a block diagram showing the structure of an image processing system 100 including a CPU, a GPU, a memory, and a display device.

Figures 5a-5b show the flow of data in the image processing system 100 using the performance lookup table 150 to find performance settings for different devices.

Figure 6 shows the fine-tuning of the performance of the processed frames based on the timestamp of the monitored events.

Figure 7 shows the data flow at which image processing system 100 performs fine-tuning performance settings based on event timestamps.

FIG. 8 conceptually shows a schematic diagram of an electronic system 800 implemented by some embodiments of the present application.

Some embodiments disclose methods and apparatus for managing power to an image processing system. Specifically, this method adaptively adjusts the performance of the image processing system based on scene information within each frame. In some embodiments, such adjustments are aimed at optimizing the performance level of the image processing system. For some embodiments, the optimal performance setting for a frame is that the image processing system just completes the frame's workload before the end of the frame, so that the power is wasted to a minimum. (The frame is discarded when the performance level of the workload is completed below the required level on time. The circuit wastes power when operating at a higher than necessary performance level to complete the workload on time.)

In some embodiments, an image processing system includes a video processing unit (GPU) and a scene-aware power manager coupled to the GPU. The scene-aware power manager is used to receive scene information and according to the received Scene information to adaptively control the GPU.

1 shows an image processing system 100 that includes a scene-aware power manager 110 for managing the performance of an image processing system. The power manager 110 obtains scene information from the data collected by each device in the image processing system 100 and adjusts the performance setting of the image processing system accordingly.

Image processing system 100 is an electronic device that includes components that are capable of processing data and producing data for an image display. Such a device may be a general computing device, such as a desktop computer, a notebook computer, a tablet computer, a smart phone, etc., which includes a central processing unit (CPU), storage components, input/output devices, and a network interface. , user interface, and more. Such a device can also be equipped with hardware such as a GPU that is particularly useful for processing image data. The electronic device as the image processing system 100 can also perform an image-independent function.

The image processing system 100 includes a power manager 110, a frame analyzer 120, a performance controller 130, an event reporter 140, and a performance lookup table (LUT) 150. The power manager 110 receives the material from the frame analyzer 120 as scene information and determines the performance setting of the performance controller 130 based on the received scene information. The power manager 110 detects a level of scene change from the received scene information, by, for example, comparing it to the scene information received by the previous frame, and looking up the performance lookup table 150 using the level of the detected scene change. An estimated required performance. The power manager 110 uses the estimated required performance to generate performance settings for the performance controller 130.

In some embodiments, these performance settings include a frequency setting 131 and a voltage setting 132. Voltage setting 132 indicates the voltage required by image processing system 100 to operate at the frequency indicated by frequency setting 131. (higher voltage allows electricity The path operates at a higher frequency, which results in higher performance metrics due to higher data throughput and/or lower latency, but also results in greater power consumption. )

The image processing system 100 adjusts the performance settings of the performance controller 130 based on the events reported by the event reporter 140. In some embodiments, the power manager 110 receives the identity of the reported event from the event reporter 140 and the time stamp associated with the event, and determines if the performance settings provided to the performance controller 130 are sufficient ( Not too fast or too slow). In some embodiments, event reporter 140 is configurable, for example, the type of event that will be monitored and/or reported can be configured by the user to accommodate the type of application running on the CPU or GPU.

The frame analyzer 120 collects status, data or reports from various modules or circuits in the image processing system 100 and provides the collected data to the power manager. The module or circuit in the collected state may include one or more of the following components: CPU, GPU, storage device, display device, bus bar structure, and other types of circuits that together form image processing system 100. The collected status may include signals directly transmitted by various circuits of image processing system 100 and/or data stored in the storage structure that may be read by power manager 110. The data or status bits collected from these devices are provided to the power manager 110 as indicators for obtaining scene information to determine performance settings (in the performance controller 130).

The power manager 110 uses some status data as an "early indicator" because they indicate the upcoming image processing load of the upcoming frame. The early indicator of the frame is the available status data of the image processing load of the predicted frame before or at the beginning of the image processing of the frame (eg before the start of the frame or The initial event, such as vertical synchronization or VSYNC. Such early indicators may include state data associated with the CPU and/or memory access to upcoming frames.

Performance controller 130 controls a set of control data or signals to various modules or circuits in image processing system 100. These modules or circuits may include one or more of the following components: CPU, GPU, storage device, display device, bus bar structure, and other types of circuits that together form image processing system 100. The control data may include signals that the power manager 110 sends directly to the various circuits of the image processing system 100 and/or control data that the power manager 110 stores in the storage structure. Performance controller 130 processes control data or signals that control the performance of circuits and/or devices in the image processing system. Performance controller 130 processes the settings of the control clock frequency (e.g., frequency setting 131) and the operating voltage (e.g., voltage setting 132). The performance controller 130 also processes settings for controlling the display frame rate, display response time for user interaction, or other settings that may affect the performance or power usage of the image processing system 100. In some embodiments, a set of performance settings can achieve a particular performance metric (eg, a particular operating frequency or a particular data rate).

The event reporter 140 reports the operational steps of some particular type of time or image processing system 140 to the power manager 110. The power manager 110 uses the reported events to determine if the performance settings provided to the performance controller 130 are sufficient. In order to report an event, event reporter 140 provides, in some embodiments, an identity of the event and a timestamp of when the tag time occurred. The power manager 110 correspondingly re-evaluates the level of the scene change of the event and determines whether the performance setting is to be fine-tuned based on the re-evaluated scene change level.

The performance lookup table 150 is a lookup table for mapping from frames The scene information of the analyzer 120 is set to the performance of the performance controller 130. The performance lookup table 150 can include entries that directly map scene information to performance settings. The performance lookup table 150 may also include entries for the derived parameters to performance settings. For example, the power manager 110 calculates the level of the scene change from the scene information as a parameter to find the performance setting. (The level of scene change is a measure of the scene information of the previous frame or the scene information of the scene and the scene or the change of the scene.)

The power manager 110 is a module that determines the level of performance settings for the performance controller 130 based on the information provided by the frame analyzer 120. In some embodiments, power manager 110 is a software module of a set of processing units operating in image processing system 100. The set of processing units running in the power manager can be a CPU, GPU, or other processing unit that forms an image processing system.

In some embodiments, the scene information includes a set of early indicators. The power manager 110 compares the early indicator of the current frame with the early indicator of the previous frame to determine the level of scene change and assigns a set of initial performance settings based on the determined level of scene change. The power manager sets the performance of the image processing system 100 prior to the start of the frame using the assigned set of initial performance settings. In some embodiments, if the level of scene change is sufficiently small (ie, the amount of scene change between the current frame and the previous frame is less than a threshold), the power manager 110 will increase or decrease a small amount (or maintain) from the performance setting of the previous frame. The performance setting is unchanged) as the initial performance of the new current frame.

In addition, if the level of the scene change is large enough (ie, the amount of scene change between the current frame and the previous frame is greater than a threshold), the power manager will mention Provides performance settings that significantly improve the performance of the image processing system 100. This is because the greater the level of scene change, the greater the uncertainty in the amount of processing that actually needs to process the frame. By overestimating the performance that needs to be processed, the risk of failure of the power manager while processing the current frame can be minimized.

FIG. 2 is a diagram showing a setting in which the power manager 110 adjusts the performance of the image processing system 100. This figure shows the setting of the performance level in the processing of four consecutive frames (frames 1 to 4). As shown, at the beginning of each frame, the power manager provides the performance of the image processing system based on the initial performance settings (shaded). The power manager provides fine-tuning of performance (non-shaded portions) as the image processing system runs as the frame is processed.

As previously mentioned, the initial performance setting is based on a comparison of the early indicators between the current frame and the previous frame. Figure 2 conceptually shows the early indicators of each frame (early indicators 211-214 for frames 1 through 4, respectively). In this example, the initial performance setting for Frame 2 is based on a comparison between the early indicator of Frame 2 and the early indicator of Frame 1, which is based on the early indicator of Frame 3 and the early indicator of Frame 2. In comparison between the initial performance of frame 4 is based on a comparison between the early indicator of frame 4 and the early indicator of frame 3, and so on.

As shown, the power manager uses the performance setting of frame 1 as the initial performance setting for frame 2. This is because the early indicator 211 of frame 1 is very similar to the early indicator 212 of frame 2 (both CPU loads for tasks A, B and C). The power manager also uses the performance setting of frame 3 as the performance setting of frame 4 with only a slight increase. This is because the early indicator 213 of frame 3 is very similar to the early indicator 214 of frame 4 (both contain CPU loads for tasks Y and Z, Although the early indicator 214 also contains the CPU load of task X2 instead of X1). For frames 2 and 4, the calculated level of scene change is small enough that the power manager can continue to use the performance settings of the previous frame as the initial performance settings for the current frame.

In addition, the power manager assigns improved performance as an initial performance setting for frame 3. This is because the early indicator 213 of frame 3 is significantly different from the early indicator 212 of frame 2 (the early indicator of frame 2 contains the CPU load of tasks A, B and C, while the early indicator of frame 3 contains tasks X1, Y CPU load with Z). For frame 3, the calculated level of scene change is too large for the power manager to continue to use the performance settings of the previous frame as the initial performance setting for the current frame. In fact, the power manager cannot determine what is the appropriate initial performance setting, so setting its initial performance to a level makes it possible for uncertainty to be sufficient.

In some embodiments, the power manager quantifies scene changes. When the quantized change is greater than a particular threshold, the power manager raises the initial performance setting of the frame to achieve a certain amount of improved performance metric that is higher than the performance metric of the previous frame. When the quantized change is less than a certain threshold, the power manager sets the initial performance setting of the current frame by adjusting a certain amount from the performance setting of the previous frame.

In some embodiments, the initial performance of the boost is a set of values that are assigned based on the level of scene change (ie, the difference between the early indicators). In some embodiments, the initial performance of the boost is a predetermined set of values that are completely independent of the level of scene change and the performance settings of the previous frame. In some embodiments, the initial performance of the boost is a certain amount higher than the performance metric of the previous frame.

As shown, after initial performance setting for each frame, the power manager further fine-tunes the performance settings. In some embodiments, the power manager re-evaluates or re-evaluates the level of scene change (eg, reported by event reporter 140) at an event or operational step. When these steps or events occur, the power manager compares the scene information with a set of previously recorded scene information (or early indicators) to re-evaluate the level of the scene change. The previously recorded scene information can be the scene information of the previous frame, or the previous event or operation step. The power manager then uses the level of re-evaluated scene changes to determine new performance settings. In some embodiments, the level of re-evaluated scene changes is used as an index to look up a set of performance settings from the performance lookup table 150.

In some embodiments, each reported event or operational step is associated with a timestamp that marks the actual time of occurrence of the event, and the power manager compares the actual time of occurrence with the expected time of occurrence to determine a fine tuning adjustment of the performance setting.

Figure 3 conceptually illustrates a process 300 for managing power within an image processing system. In some embodiments, power manager 110 performs flow 300 when controlling performance settings of the image processing system. In some embodiments, the power manager is a software module run by a CPU or GPU that executes the process 300.

Flow 300 begins with an early indicator that extracts an upcoming frame/scene (step 310) that contains a scene that may or may not have a significant change from the previous frame. These early indicators are part of the scene information received from the various components within the image processing system. The flow also receives (step 320) an indication of a frame start event, such as a VSYNC signal for an upcoming frame. When receiving VSYNC When the signal is on, the upcoming frame becomes the "current frame".

After receiving the frame start event indication, the flow calculates (step 330) a level of scene change by comparing the extracted early indicator of the current frame with the early indicator of the previous frame. In some embodiments, the process quantifies the scene change as a value or a set of values.

Based on the calculated level of scene change, the process determines (step 340) whether the level of the scene change is a significant increase or a slight increase, such as whether the level of quantization of the scene change is greater than a particular threshold. In some embodiments, significant scene changes may be made by the number of vertexes, different draw calls, different number of layers rendered, and additional events or processes being initiated. If the scene changes significantly, the flow proceeds to 345. If the quantization level of the scene change is not significant, for example, the quantization level of the scene change is less than a certain threshold, the flow proceeds to 350.

At step 345, the process assigns performance settings based on a predetermined set of higher (i.e., elevated) performance settings. The power manager uses this increased performance setting because higher level scene changes mean that it is more difficult to predict the best performance settings based on existing performance settings (the actual performance settings required can be much higher than the existing performance settings). The power manager therefore raises the performance setting above the uncertainty threshold. In some embodiments, the boosted performance setting is a predetermined value that is independent of the current performance setting. In some embodiments, the power manager adds a predetermined boost value to the current performance setting to achieve an elevated performance setting. After assigning the elevated performance settings, the flow proceeds to 360.

At step 350, the process reuses existing performance settings or fine-tunes performance settings. In this operation, the process determines the field between the frame and the previous frame. The scene change is very small, so it is likely that the existing performance settings (the settings used in the previous frame or the settings used before the current frame) are still the best settings. The power manager therefore reuses the current performance settings, or increases/decreases the performance setting by a smaller amount than the fine-tuning (the fine-tuning threshold is less than the uncertainty threshold of operation 345). The flow then proceeds to 360.

In some embodiments, the size of the fine-tuning of the performance settings is based on the level of scene change. In some embodiments, fine tuning is a check based on whether performance is sufficient. The process receives the reported event or operational step (from event reporter 140) and the timestamp of the event. The process then compares the timestamp with the expected time of the event to determine if the performance setting is too high or too low, and therefore decides whether to increase or decrease the performance setting. Fine-tuning the performance settings using the timestamp of the reported event is further explained below with reference to Figures 6-7.

At step 360, the process determines if there is another reported event within the frame when the power manager should determine whether to set or adjust the performance settings. The first event of the frame is that the power manager settings or adjustment performance settings are VSYNC events, which mark the beginning of the frame. However, the power manager can monitor other events or operational steps within the frame and perform re-evaluation and fine-tuning when these events occur. An example of such an event is when the GPU finishes painting N triangles or primitives. If there is another such monitored event within the frame, the flow proceeds to 370. No, the process ends.

In step 370, the process re-evaluates the level of the scene change by comparing the scene information with the previous version of the scene information or by comparing the scene information with the scene information of the previous frame when the monitored event occurs. (This operation is similar to operation 330, which compares the early indicator of the upcoming frame with the early indicator of the previous frame). The flow then proceeds to 340.

As previously mentioned, in some embodiments, an image processing system includes a CPU, a GPU, a set of memory, and a display device. The power manager performs scene-aware power management by using the data generated by these devices as scene information and controlling the performance settings of these devices.

Figure 4 is a block diagram showing the structure of an image processing system 100 including a CPU, a GPU, a memory, and a display device. The image processing system performs scene-aware power management by using data of the CPU, GPU, memory, and display device as scene information.

As shown, the circuitry of image processing system 100 includes CPU 410, GPU 420, main memory 430, GPU memory 440, and display device 450. A set of memory controllers 435 controls main memory 430 and GPU memory 440. The display controller 455 controls the display device 450. The scene-aware power manager 110 is shown as a software or hardware module running on top of the GPU 420. But it can also be a software module run by the CPU 410. These components are interconnected by different circuit components, which are referred to as busbars or busbar structures (not shown).

In some embodiments, the CPU 410, the GPU 420, the main memory 430, the GPU memory 440, the memory controller 435, and the display controller 455 are implemented by a hardware circuit module of an integrated circuit of one or more electronic devices. . For example, in some embodiments, main memory 430 and GPU memory 440 are implemented with physical memory devices, while memory controller 435, display controller 455, CPU 410, GPU 420, and power manager 110 are implemented in an IC. .

As shown, CPU 410 and GPU communicate directly with one another or through memory 430 and 440. The set of memory controller 435 controls memory The access to the main memory 430 and 440 also performs a direct memory access (DMA) operation of the memory structure. These DMA operations may include data transfer between main memory 430 and GPU memory 440, data transfer between GPU 420 and GPU memory 440, data transfer between CPU and main memory 430, and Data transfer between memory 430 and display 450 (with display buffer for storing meta-information to be displayed).

These devices of image processing system 100 perform computations and operations to provide the information to be displayed to display device 450. For example, the image processing system receives data from a large storage device or network via I/O device 405 and stores the data to main memory 430. Based on these stored data, CPU 410 performs various computing tasks and/or loads, and generates processed data to GPU 420 for processing into images to be displayed to display device 450. The power manager 110 uses information about loads and tasks performed by the CPU 410 and the GPU 420 and information of data generated and/or processed by the CPU 410 and the GPU 420 as scene information.

Although not shown, in some embodiments, image processing system 100 is part of a camera system and image data produced by image processing system 100 is provided to an image or video encoding device.

The scene information is used to predict the optimal level of power settings for the image processing system as they indicate the size of the workload that needs to be executed to produce the necessary data for display or camera recording. The scene information collected from different processes is put together and analyzed together (for example, summed) to the level of scene change. Followed by some examples of scene information collected by the power manager from different devices of the image processing system: previous scene load application/program engine/game physics calculation (Game Physics calculation of previous scene loading); GPU context number (Vertex/primitive number); Draw command number; vertex shading execution time and complexity (Vertex shading run-time And complexity); Tessellation run-time and complexity; Vertex distribution and covered tile numbers; Rendering target layer number; rendering analysis of each layer Rendering resolution and tile number of each layer; Pixel Shading run-time and complexity; texture size/type/layer/complexity (Texture size/type/layer/ Complexity; general GPU event counter, ie blocks, primitives, vertices, primitives, textures, instructions, etc.; API (application programming interface) type; Wafer temperature within the image processing system; CPU load (preprocessing, etc., before the next scene, API call); frequency / DRAM delay / cache hit rate; VSYNC event (vertical synchronization of split video fields); and external user events.

As mentioned earlier, the power manager uses some of the collected scene information as an early indicator to determine a set of initial performance settings for each frame. The following is an example of scene information used as an early indicator: CPU load for application/game engine/game physics calculations; API trace for GPU rendering/computation standards (OpenGL, OpenCL, Vulkan, etc.), including each Properties, states and parameters of API function calls; vertex shading execution time and complexity; mosaic execution time and complexity; block list - number of blocks covered; number of render target layers; resolution of each layer and total blocks Quantity; API type; primitive color execution time and complexity; texture type, size, layer, execution time, and complexity; user interface events; and number of displays.

In some embodiments, the power manager provides performance settings to various components of the image processing system to achieve specific performance metrics, such as a particular operating frequency (to achieve a particular data rate or delay). A set of performance settings can include performance settings for different components, modules, or circuits in an image processing system. In other words, a set of performance settings can include the frequency and voltage settings of the CPU 410, GPU 420 The frequency and voltage settings, the frequency and voltage settings of the bus structure, and so on. In some embodiments, the performance settings of a particular module or circuit of image processing system 100 include other settings that affect performance. For example, the performance settings of display 450 (or display controller 455) may include control frame rate, user interaction display response time (also called "display deadline") settings, as they also affect the image processing system. Power consumption; CPU410 performance settings can also include specific content to use several cores.

The following examples of performance settings are controlled by the scene-aware power manager: (as initial performance settings or fine-tuning) to switch power to the GPU or its sub-instances; decelerate/accelerate the frequency of the GPU/CPU and its sub-processes And voltage; device (CPU, GPU, etc.) Early wake-up or early speed-up; memory bandwidth and arbitration strategy adjustments (eg main memory 405 and / or GPU memory) Body 440); and display frame rate and deadline strategy.

In some embodiments, the fine tuning of the performance settings includes budget and step correction. Such budget and step correction can be applied to some or all of the settings in the image processing system: switching external shader/submodule/SRAM power PMIC/LDO/MTCMOS; deceleration/acceleration work shader/submodule/ SRAM frequency and even voltage; reduce performance degradation by predicting early wake-up or early acceleration; CPU load distribution of the process; DRAM bandwidth allocation; and display strategy and deadline policy.

As mentioned earlier, the power manager monitors other events or operational steps within the frame (ie, after the start of the frame) and performs re-evaluation and fine-tuning when those events occur. Subsequent examples of events or operational steps that the power manager monitors to adjust performance settings (eg, via event reporter 140): Events occurring at the CPU : CPU load of the GPU application; events occurring during the vertex shading phase (on the GPU) : Element processing performance; Number of Primitive Phase kicked; Shading instruction counter; Events occurring during primitive coloring (on the GPU): Number of layers to be rendered; Block processing for each layer Performance; shading instruction counters; and multi-sample anti-aliasing (MSAA) types.

As previously mentioned, in some embodiments, the power manager uses a lookup table to look up performance settings based on scene information (including early indicators). Figures 5a-b show the flow of data in the image processing system 100 using the performance lookup table 150 to find performance settings for different devices.

As shown in Figure 5a, the power manager 110 from the CPU 410, remember The memory controller 435, display controller 455, GPU 420, and other devices 490 that include busbar structural elements receive scene information and/or indicators (including early indicators). The power manager 110 uses the received scene information to look up performance settings from the lookup table 150. To generate an initial performance setting, the power manager compares the early indicator of the upcoming frame with the early indicator of the previous frame (stored in storage 510 as shown) and quantizes the difference to "the level of scene change."

As shown in Figure 5b, the power manager uses the level of the scene change as an index to the lookup lookup table 150 and retrieves a set of performance settings including frequency, voltage, and frame rate. In the example shown, the quantization level of the scene change is "3", and the lookup table corresponds to a set of performance settings including a frequency of 400 MHz, a voltage of 2.4 V, and a frame rate of 27 frames per second. As mentioned previously, a set of performance settings can include many parameters, such as different sets of frequencies and voltages for multiple different circuits or modules, and parameters such as display response deadline, memory access arbitration strategy, and the like. .

As previously mentioned, the power manager not only provides an initial performance setting for each frame based on the early indicator of the frame, but also performs fine-tuning of the performance settings for the processing frame after the processing of the frame begins. In some embodiments, these adjustments are implemented in specific events during the processing of the frame by the GPU. The power manager uses these events to evaluate whether the performance settings are sufficient and adjusted accordingly. In some embodiments, the image processing system includes an event reporter, such as event reporter 140, to report these events by, for example, reporting the identity of each event and the timestamp of the event. The power manager 110 accordingly uses the reported event and time stamp to identify the expected time of the event to determine if the performance setting is too high or too low. For example, the power manager 110 in some embodiments monitors the GPU to see that a frame is completed. How long does it take to calculate 10,000 triangles? The power manager 110 uses the timestamp associated with the event to determine how fast the GPU can end the task and whether to increase or decrease performance based on a comparison of the timestamp of the event to the expected execution time of the event.

Figure 6 shows the fine-tuning of the performance of the processed frames based on the timestamp of the monitored events. The figure shows the adjustment of the performance settings for two consecutive frames 601 and 602, especially when the two frames are processed by GPU 420 for display or camera recording. In this embodiment, the period of time displayed by frame 601 and frame 602 on the screen is 16.6 ms.

As shown, when frame 601 is processed, the GPU operates at a frequency of 525 MHz. This frequency is inherited from the previous frame because its early indicator is the same or similar to the previous frame. This frequency can also be a set of improved performance settings because the level of scene change is considered too large.

The power manager monitors multiple GPU events, including events "X" and "Y" (this can correspond, for example, the GPU completes rendering of 10,000 triangles). The GPU event "X" is expected to occur at the 4.1ms flag (after the start of the frame) and the GPU event "Y" is expected to occur at the 7.0ms flag. These expected times are based on the GPU frequency of 525 MHz in frame 601 and the load of the GPU. The actual time of occurrence of GPU event X is 4.1 ms, and the actual time of occurrence of GPU event Y is 7.0 ms, which is the same (or very close) as their expected time. The power manager therefore determines that the load is being processed at near optimal speed and keeps the performance set at 525 MHz. The task/loads 1-A, 1-B, and 1-C end at the end of the frame, confirming that the performance setting of the GPU of frame 601 is almost optimal.

The GPU begins the processing of frame 602 at a frequency of 525 MHz, which is a process inherited from frame 601 because its early indicator is the same or similar to frame 601. The GPU handles multiple loads (tasks 2-A, 2-B, and 2-C). Based on these loads and the frequency of 525 MHz, the performance manager determines that the expected time for event X is 4.1 ms and the expected time for event Y is 7.0 ms.

As the GPU processes frame 601, the actual event result of event X is 2.5ms, meaning that the GPU is running faster than needed ("correct estimate of 525MHz" shows that load 2A-2C is too early) so speed can be reduced Power consumption. The power manager therefore reduces the GPU frequency to 400MHz. The GPU then proceeds to process at 400 MHz until it hits event Y at 9.8 ms, which was originally expected to arrive earlier at 7.0 ms. In other words, the GPU is processed too slowly to perform the load in time with a frequency of 400 MHz ("400 MHz correction estimate" predicts that the loads 2B and 2C cannot be completed on time). The power manager thus boosts the GPU's performance to 700MHz to complete the task on time.

Figure 7 shows the data flow at which image processing system 100 performs fine-tuning performance settings based on event timestamps. As shown, the event reporter 140 reports the detected event (e.g., the GPU completes 10,000 triangles) to the power manager 110 by sending an event identity (ID) 701 and a timestamp 702 indicating the event of the actual time of occurrence. (In some embodiments, the power manager provides the timestamp when it receives a reported event). The timestamp of the event allows the power manager to identify the actual time of the event. The power manager 110 then uses the received event ID 701 to look up the expected time of the event (shown as the expected time 711 received from the lookup table 710). The power manager 110 compares the expected time 711 with the actual time 702 based on the timestamp 702 to determine if the event is Within the acceptable range of expected time. If not, the power manager sends the adjusted performance settings to the various circuits of the image processing system 100, including the CPU 410, the GPU 420, the memory controller 435, the display controller 455, and other devices 490. In some embodiments, the amount of fine tuning is found from the performance lookup table 150 by the difference between the event ID 701 and the actual time 702 of the event and the expected time 711.

In some embodiments, the content of each lookup table (including performance lookup table 150 and expected time lookup table LUT 710) is dynamically tunable based on scene information. The power manager 110 may update the content of the performance lookup table 150 with better performance settings, for example, when fine-tuning of various combinations of scene information or early indicators is performed. The power manager also updates the content of the expected time lookup table 710 after learning the actual time it needs to reach a particular event under a particular performance setting.

Electronic system example

Many of the foregoing features and applications are implemented in a software flow that is embodied in a set of instructions that are recorded on a computer readable storage medium (also referred to as a computer readable medium). When the instructions are by one or more computing or processing units (e.g., one or more processors, processor cores or other processing units), they cause the processing unit to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash memory, random access memory (RAM) chips, hard disks, erasable programmable read only memory (EPROM), electricity Erasable programmable read-only memory (EEPROM) and more. Wireless or wired carrier and electrical signals are not included in the computer readable medium.

In this specification, the term "soft" means to include read-only memory. The firmware in the memory or the application stored in the magnetic memory will be read into the memory for processing by the processor. Moreover, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining as separate software inventions. In some embodiments, multiple software inventions can also be implemented as different programs. Finally, any combination of different programs described herein that are implemented as software inventions is within the scope of the present application. In some embodiments, when a software program is loaded for operation on one or more electronic systems, one or more specific machine implementations that operate and execute the operations of the software program are defined.

FIG. 8 conceptually shows a schematic diagram of an electronic system 800 implemented by some embodiments of the present application. Electronic system 800 can be a computer (eg, a desktop computer, a personal computer, a tablet, etc.), a telephone, a PDA, or other type of electronic device. Such electronic systems include various computer readable media and interfaces to other types of computer readable media. The electronic system 800 includes a bus 805, a processing unit 810, a video processing unit (GPU) 815, a system memory 820, a network 825, a read-only memory 830, a persistent storage device 835, an input device 840, and an output device 845.

Busbar 805 generally represents the system, perimeter, and wafer busbars of all of the internal devices that communicate with electronic system 800. For example, bus 805 is communicatively coupled to processing unit 810 and GPU 815, read only memory 830, system memory 820, and persistent storage 835.

From these respective memory units, the processing unit 810 retrieves the instructions to be executed and the materials to be processed, thereby executing the flow of the present application. The processing unit may be a single processor or a multi-core processor in different embodiments. Some instructions are sent to GPU 815 for execution. GPU 815 can provide various meters provided by processing unit 810 The image processing provided by the processing unit 810 is removed or illustrated.

The read only memory (ROM) 830 stores static data and instructions required by the processing unit 810 and other modules of the electronic system. Permanent storage device 835, on the other hand, is a read and write memory device. This device is a non-volatile memory unit that can store instructions and data even when the electronic system 800 is turned off. Some embodiments of the present application use a large storage area device (e.g., a magnetic or optical disk and a corresponding disk drive) as the permanent storage device 835.

Other embodiments use a removable storage device (e.g., a floppy disk, a flash memory device, etc., and a corresponding disk drive) as a permanent storage device. Like permanent storage 835, system memory 820 is a read and write memory device. However, unlike storage device 835, system memory 820 is a volatile read and write memory, such as random access memory. System memory 820 stores some of the instructions and data needed for the processor to operate. In some embodiments, the processes in this application are stored in system memory 820, persistent storage 835, and/or read-only memory 830. For example, in accordance with some embodiments, different memory units include instructions for processing multimedia video. From these different memory units, processing unit 810 retrieves the instructions to be executed and the material to be processed to perform the flow in some embodiments.

Bus 805 is also coupled to input and output devices 840 and 845. Input device 840 enables the user to exchange information with the electronic system and select commands. Input device 840 includes an alphanumeric keyboard and pointing device (called a "cursor control device"), a camera (such as a webcam), a microphone or similar device that receives voice commands, and the like. Output device 845 displays images or output data produced by the electronic system. The output device 845 includes a printer and a display device. For example, a cathode ray tube display (CRT) or a liquid crystal display (LCD), as well as a speaker or similar audio output device. Some embodiments include a device such as a touch screen that can serve as both an input device and an output device.

Finally, as shown in FIG. 8, the bus 805 is coupled to the electronic system 800 to the network 825 via a network interface card (not shown). As such, the computer can be part of a computer network (eg, a local area network ("LAN"), a wide area network ("WAN"), or an intranet, or a network of networks, such as the Internet. Any or all of the electronics The components of system 800 are used in accordance with the present application.

Some embodiments include electronic components, such as microprocessors, that store computer program instructions in a machine readable or computer readable medium (also referred to as a computer readable storage medium, a machine readable medium, or a machine readable storage medium). ). Some examples of such computer readable media include RAM, ROM, CD-ROM, recordable compressed disk (CD-R), rewritable compact disk (CD-RW), read only Digital universal optical disc (such as DVD-ROM, dual-layer DVD-ROM), various recordable/rewritable DVDs (such as DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (such as SD) Cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic Media, and floppy. The computer readable medium can store a computer program executed by at least one processing unit and include a set of instructions for performing various operations. Examples of computer programs or computer code may include machine code, such as code generated by a compiler, containing higher level code executed by a computer, electronic components, or a microprocessor acting as a translator.

The above discussion is basically about a microprocessor or multi-core processor that executes software. Some embodiments are performed by one or more integrated circuits, such as a dedicated integrated circuit (ASIC) or a field programmable gate array (FPGA). . In some embodiments, the instructions executed by these integrated circuits are stored within the circuit itself. Additionally, some embodiments perform software stored in a programmable logic device (PLD), ROM, or RAM device.

As used in the specification and claims of this application, the terms "computer", "server", "processor", and "memory" refer to either electronic or other technical equipment. These terms exclude people or people. For the purposes of this description, the terms "display" or "display tool" mean to be displayed on an electronic device. As used in the specification and claims of this application, the terms "computer readable medium", "computer readable medium" and "machine readable medium" are generally limited to storing information in a readable form in a computer readable form. Physical object. These terms exclude wireless signals, cable download signals, and any other transient signals.

While the specification has been described with respect to the specific embodiments of the present invention, it will be understood that In addition, multiple illustrations (including Figure 3) conceptually illustrate the process. The specific operations of these processes can be strictly performed as shown and described. The specific operations may not be performed in a series of sequential operations, and different embodiments may perform different specific operations. Moreover, the process can be implemented in several sub-processes or as part of a larger process. Therefore, those skilled in the art will appreciate that the invention is not limited to the details described above, but is defined by the preceding claims.

The invention may be embodied in other specific forms without departing from the invention. Spiritual and basic characteristics. The above-described embodiments are intended to be illustrative only and not to limit the invention, and the scope of the invention is defined by the scope of the appended claims. Equivalent changes and modifications made in accordance with the scope of the present invention should be within the scope of the present invention.

100‧‧‧Image Processing System

110‧‧‧Power Manager

120‧‧‧Frame Analyzer

130‧‧‧Performance Controller

131‧‧‧frequency setting

132‧‧‧Voltage setting

140‧‧‧Event Reporter

150‧‧‧Performance lookup table

Claims (25)

  1. An image processing method includes: processing a first frame by providing a control setting to a first group of devices to achieve a first performance metric; and receiving, after the first frame, scene information about the second frame from the second group of devices; Quantizing the change between the first frame and the second frame; adapting the control setting adaptively according to a comparison between the quantized change and a predetermined threshold; and setting the first set to the control by providing the adjustment The device processes the second frame.
  2. The image processing method of claim 1, wherein the step of quantizing the change between the first frame and the second frame comprises: comparing a set of early indicators of the first frame with the second A set of early indicators of a frame, where a set of early indicators of a frame contains state data that can be used to predict the image processing load of the frame before the start of the frame event.
  3. The image processing method of claim 1, wherein the control setting of the first group of devices is adjusted to achieve a second performance metric when the quantized change is greater than a specific threshold, wherein the second performance The metric is greater than the first performance metric by a particular amount; and when the quantified change is less than the particular threshold, the control setting for the first set of devices is adjusted based on the third performance metric, wherein the third performance metric is The difference in performance metrics is less than the specific amount.
  4. The image processing method of claim 1, wherein the control setting includes a frequency and a voltage to support operation of the first group of devices at the frequency Work.
  5. The image processing method of claim 1, wherein the first group of devices comprises an image processing unit.
  6. The image processing method of claim 1, wherein the second group of devices comprises a central processing unit, a memory controller, and an image processing unit, wherein the scene information is generated by the image processing unit and the central processing unit. a set of information.
  7. The image processing method of claim 1, wherein the scene information comprises at least one of: a central processing unit load of an application/game engine/game physical calculation; an image processing unit rendering/calculation standard application interface track Vertex shading execution time and complexity; mosaic execution time and complexity; block list - number of blocks covered; number of render target layers; resolution of each layer and total number of blocks; application interface type; Color execution time and complexity; texture type, size, layer, execution time, and complexity; user interface events; and number of displays.
  8. The image processing method of claim 1, further comprising: when the image processing unit processes the second frame, detecting a specific event in the image processing unit; Identifying a fourth performance metric based on the detected event; and adjusting the control setting to the first set of devices to achieve the fourth performance metric.
  9. The image processing method of claim 8, wherein the detected event is associated with a timestamp, wherein the step of detecting the first performance metric comprises comparing the timestamp to an expected time of the particular event.
  10. The image processing method of claim 8, wherein the step of detecting the specific event comprises monitoring the specific event comprises: monitoring an event occurring when the CPU is loaded, occurring at the image processing unit at the vertex coloring stage An event and an event occurring at the image processing unit at the coloring stage of the primitive.
  11. The image processing method of claim 1, wherein the step of processing the displayed second frame further comprises: receiving scene information about the second frame from the second group of devices, and based on the received scene information. Adjust the control settings for the first set of devices.
  12. The image processing method of claim 1, wherein the second performance metric is a predetermined value that is independent of the quantized change.
  13. The image processing method of claim 1, wherein the second performance metric is based on the quantized change rather than the predetermined value discriminated by the first performance metric.
  14. An image processing method includes: processing a frame by providing a control setting to a set of devices to achieve a first performance metric; and when the image processing unit processes the frame, detecting at the image processing unit a specific event; identifying a second performance metric based on the detected event; and adjusting the control setting of the set of devices to achieve the second performance metric.
  15. The image processing method of claim 14, wherein the control setting provided to achieve the first performance metric is based on a set of early indicators of the frame, the set of early indicators included for the frame The available status data for the image processing load of the frame is predicted before the event begins.
  16. The image processing method of claim 14, wherein the detected event is related to a timestamp, wherein the step of identifying the second performance metric comprises comparing the timestamp to an expected time of the particular event.
  17. The image processing method of claim 14, wherein the control setting includes a frequency and a voltage to support operation of the group of devices at the frequency.
  18. The image processing method of claim 14, wherein the first performance metric is discriminated based on a set of scene information of the frame, the set of scene information including at least one of: application/game engine/game physical calculation Central processing unit load; image processing unit rendering/calculation of standard application interface trajectory; vertex shading execution time and complexity; mosaic execution time and complexity; block list - number of covered blocks; number of rendered target layers; The resolution of the layer and the total number of blocks; the application interface type; the execution time and complexity of the primitive coloring; Texture type, size, layer, execution time, and complexity; user interface events; and number of displays.
  19. The image processing method of claim 14, wherein the control setting comprises at least one of: switching power of the image processing unit or its sub-process; decelerating/accelerating the frequency of the image processing unit/central processing unit and its sub-processes; And voltage; early wake-up or early acceleration of devices containing central processing units and image processing units; adjustment of memory bandwidth and arbitration strategies; and display frame rate and final strategy.
  20. The image processing method of claim 14, wherein the step of detecting the specific event comprises the step of monitoring the specific event comprising: monitoring an event occurring when the CPU is loaded, occurring at the image processing unit at the vertex coloring stage An event and an event occurring at the image processing unit at the coloring stage of the primitive.
  21. An image processing apparatus comprising: a set of processing units; an image processing unit; a display device; and a computer readable storage medium storing a plurality of sets of instructions, wherein the plurality of sets of instructions are executed by the set of processing units by configuring the set of processing units To perform the following actions: When the image processing unit processes the frame to be displayed on the display device, providing control settings to the image processing unit and the display device to achieve a first performance metric; when the image processing unit is processing the frame, Detecting a specific event at the image processing unit; identifying a second performance metric based on the detected event; and adjusting the control setting of the image processing unit based on the second performance metric.
  22. The image processing device of claim 21, wherein the detected event is related to a timestamp, wherein the step of identifying the second performance metric comprises comparing the timestamp to an expected time of the particular event.
  23. The image processing device of claim 21, wherein the control setting includes a frequency and a voltage to support operation of the image processing unit at the frequency.
  24. The image processing device of claim 21, wherein the control setting includes frame rate control of the display device.
  25. The image processing device of claim 24, further comprising a central processing unit, wherein the set of scene information includes load information of the central processing unit.
TW106123178A 2016-07-12 2017-07-11 Graphics processing method and graphics processing apparatus TWI633517B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US201662361039P true 2016-07-12 2016-07-12
US62/361,039 2016-07-12
US15/606,132 US20170262955A1 (en) 2017-05-26 2017-05-26 Scene-Aware Power Manager For GPU
US15/606,132 2017-05-26

Publications (2)

Publication Number Publication Date
TW201802768A true TW201802768A (en) 2018-01-16
TWI633517B TWI633517B (en) 2018-08-21

Family

ID=61059734

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106123178A TWI633517B (en) 2016-07-12 2017-07-11 Graphics processing method and graphics processing apparatus

Country Status (2)

Country Link
CN (1) CN107610039A (en)
TW (1) TWI633517B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190250690A1 (en) * 2018-02-09 2019-08-15 Futurewei Technologies Inc Video playback energy consumption control
CN109165103A (en) * 2018-10-15 2019-01-08 Oppo广东移动通信有限公司 Frame rate control method, device, terminal and storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6820209B1 (en) * 1999-07-15 2004-11-16 Apple Computer, Inc. Power managed graphics controller
US7634668B2 (en) * 2002-08-22 2009-12-15 Nvidia Corporation Method and apparatus for adaptive power consumption
US7401242B2 (en) * 2005-09-27 2008-07-15 International Business Machines Corporation Dynamic power management in a processor design
TWI342478B (en) * 2005-11-16 2011-05-21 Micro Star Int Co Ltd
CN101281415A (en) * 2007-04-06 2008-10-08 上海摩飞电子科技有限公司 Method for regulating dynamic voltage frequency in power supply management technique
US8458497B2 (en) * 2007-10-11 2013-06-04 Qualcomm Incorporated Demand based power control in a graphics processing unit
US8271812B2 (en) * 2010-04-07 2012-09-18 Apple Inc. Hardware automatic performance state transitions in system on processor sleep and wake events
CN102520754B (en) * 2011-12-28 2013-10-23 东南大学 Dynamic voltage scaling system-oriented on-chip monitoring circuit
CN103019367B (en) * 2012-12-03 2015-07-08 福州瑞芯微电子有限公司 Embedded type GPU (Graphic Processing Unit) dynamic frequency modulating method and device based on Android system
US9030480B2 (en) * 2012-12-18 2015-05-12 Nvidia Corporation Triggering performance event capture via pipelined state bundles
US9424620B2 (en) * 2012-12-29 2016-08-23 Intel Corporation Identification of GPU phase to determine GPU scalability during runtime
US9606605B2 (en) * 2014-03-07 2017-03-28 Apple Inc. Dynamic voltage margin recovery
US9378536B2 (en) * 2014-04-30 2016-06-28 Qualcomm Incorporated CPU/GPU DCVS co-optimization for reducing power consumption in graphics frame processing
US10025367B2 (en) * 2014-08-19 2018-07-17 Intel Corporation Dynamic scaling of graphics processor execution resources
US9905199B2 (en) * 2014-09-17 2018-02-27 Mediatek Inc. Processor for use in dynamic refresh rate switching and related electronic device and method

Also Published As

Publication number Publication date
TWI633517B (en) 2018-08-21
CN107610039A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
US20170132834A1 (en) Adaptive shading in a graphics processing pipeline
US10007560B2 (en) Capacity and load analysis using storage attributes
US10551896B2 (en) Method and apparatus for dynamic clock and voltage scaling in a computer processor based on program phase
KR101615840B1 (en) Switching between direct rendering and binning in graphics processing
US8751699B1 (en) Systems and methods for indication of activity status of a storage device
US9519562B2 (en) Process demand prediction for distributed power and resource management
US9489763B2 (en) Techniques for setting up and executing draw calls
US8334857B1 (en) Method and system for dynamically controlling a display refresh rate
US7355606B2 (en) Methods and apparatuses for the automated display of visual effects
KR101286318B1 (en) Displaying a visual representation of performance metrics for rendered graphics elements
US8754904B2 (en) Virtualization method of vertical-synchronization in graphics systems
US8635475B2 (en) Application-specific power management
US8144149B2 (en) System and method for dynamically load balancing multiple shader stages in a shared pool of processing units
EP2048570B1 (en) Demand based power control in a graphics processing unit
US20150170409A1 (en) Adaptive shading in a graphics processing pipeline
US9317948B2 (en) Method of and apparatus for processing graphics
US8255709B2 (en) Power budgeting for a group of computer systems using utilization feedback for manageable components
US9899007B2 (en) Adaptive lossy framebuffer compression with controllable error rate
KR20160042003A (en) Intelligent multicore control for optimal performance per watt
TWI291831B (en) Method and apparatus for controlling display refresh
TWI452514B (en) Computer system and method for configuring the same
US8738333B1 (en) Capacity and load analysis in a datacenter
US9348594B2 (en) Core switching acceleration in asymmetric multiprocessor system
US9256975B2 (en) Graphics processing systems
CN102033728B (en) Graphic system