WO2023224843A1 - Technique to optimize power and performance of xr workload - Google Patents

Technique to optimize power and performance of xr workload Download PDF

Info

Publication number
WO2023224843A1
WO2023224843A1 PCT/US2023/021587 US2023021587W WO2023224843A1 WO 2023224843 A1 WO2023224843 A1 WO 2023224843A1 US 2023021587 W US2023021587 W US 2023021587W WO 2023224843 A1 WO2023224843 A1 WO 2023224843A1
Authority
WO
WIPO (PCT)
Prior art keywords
ifpc
state
timer
indication
workloads
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2023/021587
Other languages
English (en)
French (fr)
Inventor
Rajesh Kemisetti
Puranam V G TEJASWI
Kamal Agrawal
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to KR1020247035984A priority Critical patent/KR20250010589A/ko
Priority to EP23729889.8A priority patent/EP4526756B1/en
Priority to CN202380038196.0A priority patent/CN119156583A/zh
Priority to JP2024564579A priority patent/JP2025517294A/ja
Publication of WO2023224843A1 publication Critical patent/WO2023224843A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3287Power saving characterised by the action undertaken by switching off individual functional units in the computer system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates generally to processing systems, and more particularly, to one or more techniques for graphics processing.
  • Computing devices often perform graphics and/or display processing (e.g., utilizing a graphics processing unit (GPU), a central processing unit (CPU), a display processor, etc.) to render and display visual content.
  • graphics processing unit GPU
  • CPU central processing unit
  • GPUs are configured to execute a graphics processing pipeline that includes one or more processing stages, which operate together to execute graphics processing commands and output a frame.
  • a central processing unit CPU
  • Modem day CPUs are typically capable of executing multiple applications concurrently, each of which may need to utilize the GPU during execution.
  • a display processor may be configured to convert digital information received from a CPU to analog values and may issue commands to a display panel for displaying the visual content.
  • a device that provides content for visual presentation on a display may utilize a CPU, a GPU, and/or a display processor.
  • a method, a computer-readable medium, and an apparatus may receive, from an application, an indication of a time period for a timer associated with exiting an inter-frame power collapse (IFPC) state.
  • the apparatus may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the apparatus may initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • the apparatus may exit the IFPC state upon detecting an expiration of the timer.
  • IFPC inter-frame power collapse
  • the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims.
  • the following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
  • FIG. 1 is a block diagram that illustrates an example content generation system in accordance with one or more techniques of this disclosure.
  • FIG. 2 illustrates an example GPU in accordance with one or more techniques of this disclosure.
  • FIG. 3 is a block diagram illustrating an example environment in which aspects of the disclosure may be practiced.
  • FIG. 4 is a diagram illustrating an example GPU state timeline associated with IFPC according to one or more aspects.
  • FIG. 5 is a diagram illustrating an example GPU state timeline associated with IFPC according to one or more aspects.
  • FIG. 6 is a call flow diagram illustrating example communications between an application, a first component, and a GPU in accordance with one or more techniques of this disclosure.
  • FIG. 7 is a flowchart of an example method of graphics processing in accordance with one or more techniques of this disclosure.
  • FIG. 8 is a flowchart of an example method of graphics processing in accordance with one or more techniques of this disclosure.
  • processors include microprocessors, microcontrollers, graphics processing units (GPUs), general purpose GPUs (GPGPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems-on-chip (SOCs), baseband processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure.
  • processors include microprocessors, microcontrollers, graphics processing units (GPUs), general purpose GPUs (GPGPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems-on-chip (SOCs), baseband processors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gate
  • One or more processors in the processing system may execute software.
  • Software can be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.
  • the term application may refer to software.
  • one or more techniques may refer to an application (e.g., software) being configured to perform one or more functions.
  • the application may be stored in a memory (e.g., on-chip memory of a processor, system memory, or any other memory).
  • Hardware described herein, such as a processor may be configured to execute the application.
  • the application may be described as including code that, when executed by the hardware, causes the hardware to perform one or more techniques described herein.
  • the hardware may access the code from a memory and execute the code accessed from the memory to perform one or more techniques described herein.
  • components are identified in this disclosure. In such examples, the components may be hardware, software, or a combination thereof. The components may be separate components or subcomponents of a single component.
  • Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer.
  • such computer-readable media can comprise a random access memory (RAM), a read-only memory (ROM), an electrically erasable programmable ROM (EEPROM), optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer-readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.
  • instances of the term “content” may refer to “graphical content,” an “image,” etc., regardless of whether the terms are used as an adjective, noun, or other parts of speech.
  • the term “graphical content,” as used herein may refer to a content produced by one or more processes of a graphics processing pipeline.
  • the term “graphical content,” as used herein may refer to a content produced by a processing unit configured to perform graphics processing.
  • the term “graphical content” may refer to a content produced by a graphics processing unit.
  • IFPC e.g., power collapsing the GPU between command submissions when the GPU is idle
  • the IFPC exit latency may cause an unnecessary performance penalty when the GPU acts as a fixed function block and processes fixed periodical workloads.
  • the hysteresis timeout associated with the IFPC may be superfluous when the GPU processes such fixed periodical workloads.
  • the superfluous hysteresis timeout may be associated with unnecessary power consumption.
  • a hint relating to a timer value may be provided to the graphics management unit (GMU) firmware. As a result, hysteresis timeout that is unnecessary for fixed periodical workloads may be avoided.
  • GMU graphics management unit
  • the timeline associated with the waking up of the GPU may be advanced based on a timer such that the delay between the receipt of an inter-processor communication controller (IPCC) interrupt and the time the GPU becomes awake and ready to process a command may be eliminated.
  • IPCC inter-processor communication controller
  • FIG. 1 is a block diagram that illustrates an example content generation system 100 configured to implement one or more techniques of this disclosure.
  • the content generation system 100 includes a device 104.
  • the device 104 may include one or more components or circuits for performing various functions described herein.
  • one or more components of the device 104 may be components of a SOC.
  • the device 104 may include one or more components configured to perform one or more techniques of this disclosure.
  • the device 104 may include a processing unit 120, a content encoder/decoder 122, and a system memory 124.
  • the device 104 may include a number of components (e.g., a communication interface 126, a transceiver 132, a receiver 128, a transmitter 130, a display processor 127, and one or more displays 131).
  • Display(s) 131 may refer to one or more displays 131.
  • the display 131 may include a single display or multiple displays, which may include a first display and a second display.
  • the first display may be a left-eye display and the second display may be a right-eye display.
  • the first display and the second display may receive different frames for presentment thereon. In other examples, the first and second display may receive the same frames for presentment thereon.
  • the results of the graphics processing may not be displayed on the device, e.g., the first display and the second display may not receive any frames for presentment thereon. Instead, the frames or graphics processing results may be transferred to another device. In some aspects, this may be referred to as split-rendering.
  • the processing unit 120 may include an internal memory 121.
  • the processing unit 120 may be configured to perform graphics processing using a graphics processing pipeline 107.
  • the content encoder/decoder 122 may include an internal memory 123.
  • the device 104 may include a processor, which may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 120 before the frames are displayed by the one or more displays 131. While the processor in the example content generation system 100 is configured as a display processor 127, it should be understood that the display processor 127 is one example of the processor and that other types of processors, controllers, etc., may be used as substitute for the display processor 127.
  • the display processor 127 may be configured to perform display processing.
  • the display processor 127 may be configured to perform one or more display processing techniques on one or more frames generated by the processing unit 120.
  • the one or more displays 131 may be configured to display or otherwise present frames processed by the display processor 127.
  • the one or more displays 131 may include one or more of a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, a projection display device, an augmented reality display device, a virtual reality display device, a head-mounted display, or any other type of display device.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • Memory external to the processing unit 120 and the content encoder/decoder 122 may be accessible to the processing unit 120 and the content encoder/decoder 122.
  • the processing unit 120 and the content encoder/decoder 122 may be configured to read from and/or write to external memory, such as the system memory 124.
  • the processing unit 120 may be communicatively coupled to the system memory 124 over a bus.
  • the processing unit 120 and the content encoder/decoder 122 may be communicatively coupled to the internal memory 121 over the bus or via a different connection.
  • the content encoder/decoder 122 may be configured to receive graphical content from any source, such as the system memory 124 and/or the communication interface 126.
  • the system memory 124 may be configured to store received encoded or decoded graphical content.
  • the content encoder/decoder 122 may be configured to receive encoded or decoded graphical content, e.g., from the system memory 124 and/or the communication interface 126, in the form of encoded pixel data.
  • the content encoder/decoder 122 may be configured to encode or decode any graphical content.
  • the internal memory 121 or the system memory 124 may include one or more volatile or non-volatile memories or storage devices.
  • internal memory 121 or the system memory 124 may include RAM, static random access memory (SRAM), dynamic random access memory (DRAM), erasable programmable ROM (EPROM), EEPROM, flash memory, a magnetic data media or an optical storage media, or any other type of memory.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • EPROM erasable programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash memory a magnetic data media or an optical storage media, or any other type of memory.
  • the internal memory 121 or the system memory 124 may be a non-transitory storage medium according to some examples.
  • the term “non- transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal.
  • non-transitory should not be interpreted to mean that internal memory 121 or the system memory 124 is non-movable or that its contents are static.
  • the system memory 124 may be removed from the device 104 and moved to another device.
  • the system memory 124 may not be removable from the device 104.
  • the processing unit 120 may be a CPU, a GPU, GPGPU, or any other processing unit that may be configured to perform graphics processing.
  • the processing unit 120 maybe integrated into a motherboard of the device 104.
  • the processing unit 120 may be present on a graphics card that is installed in a port of the motherboard of the device 104, or may be otherwise incorporated within a peripheral device configured to interoperate with the device 104.
  • the processing unit 120 may include one or more processors, such as one or more microprocessors, GPUs, ASICs, FPGAs, arithmetic logic units (ALUs), DSPs, discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof.
  • the processing unit 120 may store instructions for the software in a suitable, non-transitory computer-readable storage medium, e.g., internal memory 121, and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
  • the content encoder/decoder 122 may be any processing unit configured to perform content decoding. In some examples, the content encoder/decoder 122 may be integrated into a motherboard of the device 104.
  • the content encoder/decoder 122 may include one or more processors, such as one or more microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), arithmetic logic units (ALUs), digital signal processors (DSPs), video processors, discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof.
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • ALUs arithmetic logic units
  • DSPs digital signal processors
  • video processors discrete logic, software, hardware, firmware, other equivalent integrated or discrete logic circuitry, or any combinations thereof.
  • the content encoder/decoder 122 may store instructions for the software in a suitable, non-transitory computer-readable storage medium, e.g., internal memory 123, and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered to be one or more processors.
  • the content generation system 100 may include a communication interface 126.
  • the communication interface 126 may include a receiver 128 and a transmitter 130.
  • the receiver 128 may be configured to perform any receiving function described herein with respect to the device 104. Additionally, the receiver 128 may be configured to receive information, e.g., eye or head position information, rendering commands, and/or location information, from another device.
  • the transmitter 130 may be configured to perform any transmitting function described herein with respect to the device 104. For example, the transmitter 130 may be configured to transmit information to another device, which may include a request for content.
  • the receiver 128 and the transmitter 130 may be combined into a transceiver 132. In such examples, the transceiver 132 may be configured to perform any receiving function and/or transmitting function described herein with respect to the device 104.
  • the processing unit 120 may include a power collapse scheduler 198 configured to receive, from an application, an indication of a time period for a timer associated with exiting an IFPC state.
  • the power collapse scheduler 198 may be configured to process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the power collapse scheduler 198 may be configured to initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • the power collapse scheduler 198 may be configured to exit the IFPC state upon detecting an expiration of the timer.
  • a device such as the device 104, may refer to any device, apparatus, or system configured to perform one or more techniques described herein.
  • a device may be a server, a base station, a user equipment, a client device, a station, an access point, a computer such as a personal computer, a desktop computer, a laptop computer, a tablet computer, a computer workstation, or a mainframe computer, an end product, an apparatus, a phone, a smart phone, a server, a video game platform or console, a handheld device such as a portable video game device or a personal digital assistant (PDA), a wearable computing device such as a smart watch, an augmented reality device, or a virtual reality device, a non-wearable device, a display or display device, a television, a television set-top box, an intermediate network device, a digital media player, a video streaming device, a content streaming device, an in-vehicle computer, any mobile device, any device configured to generate graphical content
  • GPUs can process multiple types of data or data packets in a GPU pipeline.
  • a GPU can process two types of data or data packets, e.g., context register packets and draw call data.
  • a context register packet can be a set of global state information, e.g., information regarding a global register, shading program, or constant data, which can regulate how a graphics context will be processed.
  • context register packets can include information regarding a color format.
  • Context states can be utilized to determine how an individual processing unit functions, e.g., a vertex fetcher (VFD), a vertex shader (VS), a shader processor, or a geometry processor, and/or in what mode the processing unit functions.
  • GPUs can use context registers and programming data.
  • a GPU can generate a workload, e.g., a vertex or pixel workload, in the pipeline based on the context register definition of a mode or state.
  • Certain processing units, e.g., a VFD can use these states to determine certain functions, e.g., how a vertex is assembled. As these modes or states can change, GPUs may need to change the corresponding context. Additionally, the workload that corresponds to the mode or state may follow the changing mode or state.
  • FIG. 2 illustrates an example GPU 200 in accordance with one or more techniques of this disclosure.
  • GPU 200 includes command processor (CP) 210, draw call packets 212, VFD 220, VS 222, vertex cache (VPC) 224, triangle setup engine (TSE) 226, rasterizer (RAS) 228, Z process engine (ZPE) 230, pixel interpolator (PI) 232, fragment shader (FS) 234, render backend (RB) 236, L2 cache (UCHE) 238, and system memory 240.
  • FIG. 2 displays that GPU 200 includes processing units 220-238, GPU 200 can include a number of additional processing units. Additionally, processing units 220-238 are merely an example and any combination or order of processing units can be used by GPUs according to the present disclosure.
  • GPU 200 also includes command buffer 250, context register packets 260, and context states 261.
  • a GPU can utilize a CP, e.g., CP 210, or hardware accelerator to parse a command buffer into context register packets, e.g., context register packets 260, and/or draw call data packets, e.g., draw call packets 212.
  • the CP 210 can then send the context register packets 260 or draw call packets 212 through separate paths to the processing units or blocks in the GPU.
  • the command buffer 250 can alternate different states of context registers and draw calls.
  • a command buffer can be structured in the following manner: context register of context N, draw call(s) of context N, context register of context N+l, and draw call(s) of context N+l.
  • the full data path may include two SoCs associated with two devices.
  • a companion device may generate the visual content, and may send the visual content to an XR device.
  • the XR device may then perform such operations as a late stage reprojection (LSR) for a final display based on the user’s latest head pose.
  • LSR late stage reprojection
  • the LSR may be a feature that may ensure the responsiveness of an XR headset to user motion.
  • the LSR may help to reduce the perceived input lag and enhance user experience.
  • a previously rendered frame may be reprojected or warped into a prediction of what a normally rendered frame would look like using newer motion information from the headset sensors.
  • a GPU in the XR device may be used to generate a motion vector (MV) grid using one or more of the depth, the render pose, or the latest head pose details.
  • MV motion vector
  • the XR pipeline may be used to process the head motion (e.g., translation and rotation) or to perform optical correction.
  • a reference to XR may also include a reference to augmented reality (AR) or virtual reality (VR).
  • AR augmented reality
  • VR virtual reality
  • FIG. 3 is a block diagram 300 illustrating an example environment in which aspects of the disclosure may be practiced.
  • an XR pipeline is illustrated in FIG. 3.
  • an XR application 302 may use a graphics application programming interface (API) 304 to generate commands associated with the MV grid generation.
  • the graphics driver 310 e.g., a graphics kernel driver or a kernel graphics support layer (KGSL)
  • KGSL kernel graphics support layer
  • EVA enhanced visual analytics
  • the EVA firmware 308 may provide depth buffer details to the GPU 312 (e.g., via the host firmware interface (HFI) queues 316), and may trigger inter-processor communication controller (IPCC) interrupts (the IPCC may be a centralized block for managing inter-processor interrupts at the SoC level) at the GPU 312 via the IPCC 318 at regular intervals when the LSR workload is ready for processing by the GPU 312.
  • IPCC inter-processor communication controller
  • the GPU 312 may be reserved, and may act as a fixed function block. Moreover, in the LSR context, the graphics management unit (GMU) 314 within the GPU 312 may always be active, and may monitor for the IPCC interrupts from the EVA firmware 308 (in other words, the GMU 314 and the EVA firmware 308 may communicate using the IPCC interrupts).
  • GMU graphics management unit
  • the motion-to-render-to-photon (“photon” may refer to a corresponding change on the display such as a head-mounted display (HMD)) latency (i.e., a latency from the companion device to the XR device) may be approximately 50-55 ms. Further, the motion-to-photon latency may be less than 9 ms. Therefore, it may be important to meet the performance goals and at the same time reduce power consumption.
  • HMD head-mounted display
  • the graphics driver 310 may not disable the clock/regulator of the GMU 314 to bring the GMU 314 into a slumber state because the GMU 314 may always monitor for the IPCC interrupts from the EVA firmware 308.
  • the GMU 314 may power collapse the GPU 312 between command submissions (workload submissions) when the GPU 312 is idle. This may be referred to as IFPC.
  • the IFPC may be a power saving feature where the GPU may be switched off between frames.
  • the IFPC may be controlled by the GMU 314 firmware. Based on the IFPC, the GMU 314 firmware may switch off the GPU even if the GPU is idle for short durations.
  • FIG. 4 is a diagram 400 illustrating an example GPU state timeline associated with IFPC according to one or more aspects.
  • the GPU may be in one of five possible states at any given time: an active state (also referred to as the A state), a hysteresis timeout state (also referred to as the B state), an IFPC entry state (also referred to as the C state), an IFPC state (also referred to as the D state) (when there is no workload for the GPU, the GMU 314 may switch off the clocks and the regulators of the GPU; the GPU may be completely off when in the IFPC state), and an IFPC exit state (also referred to as the E state) (when a new workload is submitted while the GPU is in the IFPC state, the GMU 314 may switch on the clocks and the regulators of the GPU; the IFPC exit state may be a transition state corresponding to the transition from the IFPC state to the active state).
  • an active state also referred to as the A state
  • the GPU may process the command submission corresponding to the present sample.
  • the hysteresis timeout (B) state may be a timeout period before starting the IFPC entry (C) state after the GPU becomes idle.
  • the IFPC entry (C) state may correspond to the time it may take for the GMU to switch off the clocks and the regulator of the GPU.
  • the IFPC exit (E) state may correspond to the time it takes for the GMU to turn on the clocks and the regulator of the GPU. In other words, if IFPC is enabled, there may be latencies associated with the entry into and the exit from the IFPC (D) state.
  • the GMU firmware may place the GPU into the IFPC exit (E) state in order to wake the GPU up from the IFPC (D) state. Therefore, the IFPC exit (E) state may represent a delay between the receipt of the IRCC interrupt 402 and the time the GPU becomes awake and ready to process a command.
  • the GPU may process the command associated with the current sample. Once the GPU completes the processing of the command, the GPU may provide a command completion interrupt to the GMU.
  • the GMU may inform the EVA firmware that the MV grid for the current sample is ready by triggering a reverse IPCC interrupt at the EVA firmware.
  • the hysteresis timeout (B) state may start at the same time that the GPU completes the processing of the command. Once the hysteresis timeout (B) state expires, the GMU may power collapse the GPU by first placing the GPU into the IFPC entry (C) state and then the IFPC (D) state.
  • the hysteresis timeout (B) state may help to avoid unnecessary IFPC entry and exit sequences if there is any immediate additional workload after the GPU completes the processing of a command. This may be useful, for example, when the GPU receives unpredictable workloads from the CPU.
  • the total duration between two adjacent IPCC interrupts 402 may be equal to the sum of the durations associated with all five GPU states, as shown in FIG. 4, and it may be known that 1) the duration of the hysteresis timeout (B) state may be approximately 0.3 ms each, 2) the duration of the IFPC entry (C) state may be approximately 0.1 ms each, and 3) the duration of the IFPC exit (E) state may be approximately 0.08 ms each, it may be calculated that the duration of each instance of the IFPC (D) state in this example may be approximately 1.38 ms. Stated differently, the total GPU rail active duration may be approximately 0.7 ms for each interval between two adjacent IPCC interrupts 402.
  • FIG. 5 is a diagram 500 illustrating an example GPU state timeline associated with IFPC according to one or more aspects.
  • the XR workload may be of a persistent type that takes place at fixed intervals throughout the LSR context, additional adaptations as described in further detail below may be adopted to further save power while the XR pipeline performance goals may continue to be met.
  • the XR application 302 may provide a hint corresponding to a timer value (e.g., Tl) to the GMU 314 firmware.
  • the hint may be provided by the EVA firmware 308 to the GMU 314 firmware during the LSR context setup.
  • the graphics driver 310 or the GMU 314 firmware may derive the hint based on a machine learning technique.
  • the timer value Tl may relate to the controlling of the flow between the EVA and the GMU, and may correspond to the interval between two adjacent IPCC interrupts 502 sent by the EVA firmware to the GMU firmware. Therefore, in one or more configurations, based on the latency associated with the IFPC exit (E) state, the GMU firmware may trigger or reset a timer (e.g., Tg) immediately upon receiving an IPCC interrupt 502 from the EVA firmware.
  • the GMU firmware may start to wake up the GPU upon the expiration of the timer Tg instead of at the receipt of the subsequent IP CC interrupt 502’, such that the timeline for waking up the GPU may be advanced and the GPU may be ready in the active (A) state for processing a command approximately at the time the GMU receives the subsequent IP CC interrupt 502’. Therefore, the delay between the receipt of the IPCC interrupt 502’ and the time the GPU becomes awake and ready to process a command may be eliminated or at least greatly reduced, and the GPU may start to retrieve and process the command for the current sample immediately after receiving the corresponding IPCC interrupt 502’.
  • the GMU may also remove the hysteresis timeout (B) state (i.e., set the hysteresis timeout duration to 0) because it may be known that in the LSR context, there may not be any further immediate GPU workload until the timer Tg expires and the next IPCC interrupt is received.
  • B hysteresis timeout
  • the total duration between two adjacent IPCC interrupts 502 may be equal to the sum of the durations associated with all five GPU states, as shown in FIG. 5, and it may be known that 1) the duration of the hysteresis timeout (B) state may be 0 ms each, 2) the duration of the IFPC entry (C) state may be approximately 0.1 ms each, and 3) the duration of the IFPC exit (E) state may be approximately 0.08 ms each, it may be calculated that the duration of the IFPC (D) state in this example may be approximately 1.68 ms. Stated differently, the total GPU rail active duration may be approximately 0.4 ms for each interval between two adjacent IPCC interrupts 502. Therefore, compared to the timeline shown in FIG. 4, the total GPU rail active duration in FIG. 5 may be reduced by approximately 42%, which may be associated with a corresponding power saving.
  • At least one of the XR application, the EVA driver, or the graphics driver may provide a hint relating to the timer value T1 to the GMU firmware.
  • the GPU may enter the IFPC (D) state immediately after completing the processing of a command.
  • the avoidance of the hysteresis timeout may save power.
  • the waking up of the GPU may start before the IPCC interrupt and the corresponding workload are actually received. Accordingly, the delay in processing commands associated with the delay between the receipt of the IPCC interrupt and the time the GPU becomes awake and ready to process a command may be eliminated. The elimination of the delay may bring about performance benefits.
  • the hint relating to the timer value T1 may be implemented as an extension in the graphics API so that applications (e.g., XR/AR/VR applications) may pass in the timer value T1 (e.g., the interval between workload submissions to the GPU) to the graphics driver (e.g., a graphics kernel driver).
  • applications e.g., XR/AR/VR applications
  • the graphics driver e.g., a graphics kernel driver
  • IP intellectual property
  • FIG. 6 is a call flow diagram 600 illustrating example communications between an application 602 (e.g., an XR application 302), a first component 604 (e.g., the EVA firmware 308), and a GPU 606 (including a GMU within the GPU 606) in accordance with one or more techniques of this disclosure.
  • the GPU 606 may receive, from an application 602, an indication of a time period for a timer associated with exiting an IFPC state.
  • the time period for the timer may be further based at least in part on an IFPC exit latency.
  • the GPU 606 may receive a first indication to start processing the one or more predefined workloads.
  • the user space may submit the one or more predefined workloads once to the GPU scheduler (GMU).
  • the GPU scheduler (GMU) may submit the one or more predefined workloads repeatedly to the GPU at regular intervals upon such an event as the IPCC interrupt.
  • the one or more predefined workloads may be one or more LSR workloads (an LSR workload may be a predefined workload to generate an MV grid based on the depth buffer and the head pose).
  • the one or more predefined workloads may be any workload that may be submitted repeatedly to the GPU.
  • the first indication may be an IPCC interrupt.
  • the first indication may be received from at least one of a scheduler, the application, or a service layer.
  • the one or more predefined workloads may be associated with at least one of an XR application, an AR application, or a VR application.
  • the GPU 606 may trigger, upon receiving the first indication, the timer.
  • the GPU 606 may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the GPU 606 may initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • the GPU 606 may detect the expiration of the timer.
  • the GPU 606 may exit the IFPC state upon detecting an expiration of the timer.
  • a hysteresis timeout within a first period associated with the timer is zero.
  • the GPU 606 may receive a second indication to start processing the one or more predefined workloads.
  • FIG. 7 is a flowchart 700 of an example method of graphics processing in accordance with one or more techniques of this disclosure.
  • the method may be performed by an apparatus, such as an apparatus for graphics processing, a GPU, a CPU, a wireless communication device, and the like, as used in connection with the aspects of FIGs. 1-6.
  • the apparatus may receive, from an application, an indication of a time period for a timer associated with exiting an IFPC state.
  • the GPU 606 may receive, from an application 602, an indication of a time period for a timer associated with exiting an IFPC state. Further, 702 may be performed by the processing unit 120.
  • the apparatus may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the GPU 606 may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads. Further, 704 may be performed by the processing unit 120.
  • the apparatus may initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • the GPU 606 may initiate the IFPC state upon the one or more predefined workloads being finished processing. Further, 706 may be performed by the processing unit 120.
  • FIG. 8 is a flowchart 800 of an example method of graphics processing in accordance with one or more techniques of this disclosure. The method may be performed by an apparatus, such as an apparatus for graphics processing, a GPU, a CPU, a wireless communication device, and the like, as used in connection with the aspects of FIGs. 1-6.
  • the apparatus may receive, from an application, an indication of a time period for a timer associated with exiting an IFPC state.
  • the GPU 606 may receive, from an application 602, an indication of a time period for a timer associated with exiting an IFPC state. Further, 802 may be performed by the processing unit 120.
  • the apparatus may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the GPU 606 may process, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • 808 may be performed by the processing unit 120.
  • the apparatus may initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • the GPU 606 may initiate the IFPC state upon the one or more predefined workloads being finished processing.
  • 810 may be performed by the processing unit 120.
  • the apparatus may exit the IFPC state upon detecting an expiration of the timer.
  • the GPU 606 may exit the IFPC state upon detecting an expiration of the timer. Further, 814 may be performed by the processing unit 120.
  • the apparatus may receive a first indication to start processing the one or more predefined workloads.
  • the apparatus may receive a first indication to start processing the one or more predefined workloads.
  • the GPU 606 may receive a first indication to start processing the one or more predefined workloads.
  • 804 may be performed by the processing unit 120.
  • the apparatus may trigger, upon receiving the first indication, the timer.
  • the GPU 606 may trigger, upon receiving the first indication, the timer.
  • 806 may be performed by the processing unit 120.
  • the apparatus may detect the expiration of the timer.
  • the GPU 606 may detect the expiration of the timer. Further, 812 may be performed by the processing unit 120.
  • the one or more predefined workloads may be one or more LSR workloads.
  • the first indication may be an IPCC interrupt.
  • the first indication may be received from at least one of a scheduler, the application, or a service layer.
  • the one or more predefined workloads may be associated with at least one of an XR application, an AR application, or a VR application.
  • the apparatus may receive a second indication to start processing the one or more predefined workloads.
  • the GPU 606 may receive a second indication to start processing the one or more predefined workloads.
  • 816 may be performed by the processing unit 120.
  • exiting the IFPC state upon detecting an expiration of the timer may include exiting the IFPC state at the GPU 606.
  • the time period for the timer may be further based at least in part on an IFPC exit latency.
  • a hysteresis timeout within a first period associated with the timer may be zero.
  • the apparatus may be a GPU, a CPU, or some other processor that may perform graphics processing.
  • the apparatus may be the processing unit 120 within the device 104, or may be some other hardware within the device 104 or another device.
  • the apparatus may include means for receiving, from an application, an indication of a time period for a timer associated with exiting an IFPC state.
  • the apparatus may further include means for processing, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads.
  • the apparatus may further include means for initiating the IFPC state upon the one or more predefined workloads being finished processing.
  • the apparatus may further include means for exiting the IFPC state upon detecting an expiration of the timer.
  • the apparatus may further include means for receiving a first indication to start processing the one or more predefined workloads.
  • the apparatus may further include means for triggering, upon receiving the first indication, the timer.
  • the apparatus may further include means for detecting the expiration of the timer.
  • the one or more predefined workloads may be one or more LSR workloads.
  • the first indication may be an IPCC interrupt.
  • the first indication may be received from at least one of a scheduler, the application, or a service layer.
  • the one or more predefined workloads may be associated with at least one of an XR application, an AR application, or a VR application.
  • the apparatus may further include means for receiving a second indication to start processing the one or more predefined workloads.
  • exiting the IFPC state upon detecting an expiration of the timer may include exiting the IFPC state at the GPU.
  • the time period for the timer may be further based at least in part on an IFPC exit latency.
  • a hysteresis timeout within a first period associated with the timer may be zero.
  • the term “some” refers to one or more and the term “or” may be interpreted as “and/or” where context does not dictate otherwise.
  • Combinations such as “at least one of A, B, or C ,” “one or more of A, B, or C ,” “at least one of A, B, and C ,” “one or more of A, B, and C ,” and “A, B, C, or any combination thereof’ include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C.
  • combinations such as “at least one of A, B, or C,” “one or more of A, B, or C,” “at least one of A, B, and C,” “one or more of A, B, and C,” and “A, B, C, or any combination thereof’ may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C.
  • the functions described herein may be implemented in hardware, software, firmware, or any combination thereof.
  • processing unit has been used throughout this disclosure, such processing units may be implemented in hardware, software, firmware, or any combination thereof. If any function, processing unit, technique described herein, or other module is implemented in software, the function, processing unit, technique described herein, or other module may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
  • Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another.
  • computer-readable media generally may correspond to: (1) tangible computer-readable storage media, which is non-transitory; or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementation of the techniques described in this disclosure.
  • such computer-readable media may comprise RAM, ROM, EEPROM, compact disc-read only memory (CD-ROM), or other optical disk storage, magnetic disk storage, or other magnetic storage devices.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks usually reproduce data magnetically, while discs usually reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • a computer program product may include a computer-readable medium.
  • the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs, e.g., a chip set.
  • IC integrated circuit
  • Various components, modules or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need realization by different hardware units. Rather, as described above, various units may be combined in any hardware unit or provided by a collection of inter-operative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Also, the techniques may be fully implemented in one or more circuits or logic elements.
  • Aspect 1 is a method of graphics processing, comprising: receiving, from an application, an indication of a time period for a timer associated with exiting an IFPC state; processing, upon triggering the timer associated with exiting the IFPC state, one or more predefined workloads; initiating the IFPC state upon the one or more predefined workloads being finished processing; and exiting the IFPC state upon detecting an expiration of the timer.
  • Aspect 2 may be combined with aspect 1 and further includes receiving a first indication to start processing the one or more predefined workloads; triggering, upon receiving the first indication, the timer; and detecting the expiration of the timer.
  • Aspect 3 may be combined with aspect 2 and includes that the one or more predefined workloads are one or more LSR workloads.
  • Aspect 4 may be combined with any of aspects 2 and 3 and includes that the first indication is an IPCC interrupt.
  • Aspect 5 may be combined with any of aspects 2-4 and includes that the first indication is received from at least one of a scheduler, the application, or a service layer.
  • Aspect 6 may be combined with any of aspects 2-5 and includes that the one or more predefined workloads are associated with at least one of an XR application, an AR application, or a VR application.
  • Aspect 7 may be combined with any of aspects 2-6 and further includes receive a second indication to start processing the one or more predefined workloads.
  • Aspect 8 may be combined with any of aspects 1-7 and includes that exiting the IFPC state upon detecting the expiration of the timer includes exiting the IFPC state at a GPU.
  • Aspect 9 may be combined with any of aspects 1-8 and includes that the time period for the timer is further based at least in part on an IFPC exit latency.
  • Aspect 10 may be combined with any of aspects 1-9 and includes that a hysteresis timeout within a first period associated with the timer is zero.
  • Aspect 11 is an apparatus for graphics processing including at least one processor coupled to a memory and configured to implement a method as in any of aspects 1- 10.
  • Aspect 12 may be combined with aspect 11 and includes that the apparatus is a wireless communication device.
  • Aspect 13 is an apparatus for graphics processing including means for implementing a method as in any of aspects 1-10.
  • Aspect 14 is anon-transitory computer-readable medium storing computer executable code, the code when executed by at least one processor causes the at least one processor to implement a method as in any of aspects 1-10.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Power Sources (AREA)
  • Mobile Radio Communication Systems (AREA)
PCT/US2023/021587 2022-05-16 2023-05-09 Technique to optimize power and performance of xr workload Ceased WO2023224843A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020247035984A KR20250010589A (ko) 2022-05-16 2023-05-09 Xr 워크로드의 전력 및 성능을 최적화하는 기술
EP23729889.8A EP4526756B1 (en) 2022-05-16 2023-05-09 Technique to optimize power and performance of xr workload
CN202380038196.0A CN119156583A (zh) 2022-05-16 2023-05-09 优化xr工作负载的功率和性能的技术
JP2024564579A JP2025517294A (ja) 2022-05-16 2023-05-09 Xrワークロードの電力及び性能を最適化する技術

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/663,637 US12045910B2 (en) 2022-05-16 2022-05-16 Technique to optimize power and performance of XR workload
US17/663,637 2022-05-16

Publications (1)

Publication Number Publication Date
WO2023224843A1 true WO2023224843A1 (en) 2023-11-23

Family

ID=86732935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/021587 Ceased WO2023224843A1 (en) 2022-05-16 2023-05-09 Technique to optimize power and performance of xr workload

Country Status (7)

Country Link
US (1) US12045910B2 (https=)
EP (1) EP4526756B1 (https=)
JP (1) JP2025517294A (https=)
KR (1) KR20250010589A (https=)
CN (1) CN119156583A (https=)
TW (1) TW202347246A (https=)
WO (1) WO2023224843A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20250259259A1 (en) * 2024-02-12 2025-08-14 Qualcomm Incorporated Dynamic graphics processor timeouts

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160077575A1 (en) * 2014-09-17 2016-03-17 Advanced Micro Devices, Inc. Interface to expose interrupt times to hardware
EP3401757A1 (en) * 2016-05-31 2018-11-14 Guangdong OPPO Mobile Telecommunications Corp., Ltd. Method for managing central processing unit and related products

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9304813B2 (en) * 2012-07-18 2016-04-05 Intel Corporation CPU independent graphics scheduler for performing scheduling operations for graphics hardware
US20170199558A1 (en) * 2016-01-11 2017-07-13 Qualcomm Incorporated Flexible and scalable energy model for estimating energy consumption
US10255106B2 (en) * 2016-01-27 2019-04-09 Qualcomm Incorporated Prediction-based power management strategy for GPU compute workloads
US10769747B2 (en) * 2017-03-31 2020-09-08 Intel Corporation Intermediate frame generation
US10178619B1 (en) * 2017-09-29 2019-01-08 Intel Corporation Advanced graphics power state management
US11127106B2 (en) * 2019-06-28 2021-09-21 Intel Corporation Runtime flip stability characterization
US20210200255A1 (en) * 2019-12-30 2021-07-01 Qualcomm Incorporated Higher graphics processing unit clocks for low power consuming operations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160077575A1 (en) * 2014-09-17 2016-03-17 Advanced Micro Devices, Inc. Interface to expose interrupt times to hardware
EP3401757A1 (en) * 2016-05-31 2018-11-14 Guangdong OPPO Mobile Telecommunications Corp., Ltd. Method for managing central processing unit and related products

Also Published As

Publication number Publication date
US12045910B2 (en) 2024-07-23
JP2025517294A (ja) 2025-06-05
TW202347246A (zh) 2023-12-01
EP4526756A1 (en) 2025-03-26
US20230368325A1 (en) 2023-11-16
KR20250010589A (ko) 2025-01-21
EP4526756B1 (en) 2026-04-22
CN119156583A (zh) 2024-12-17

Similar Documents

Publication Publication Date Title
US20230196498A1 (en) Scheduling techniques in split rendering
KR20220143667A (ko) 지연된 그래픽 프로세싱 유닛 렌더 시간을 보상하기 위한 감소된 디스플레이 프로세싱 유닛 전달 시간
US20220013087A1 (en) Methods and apparatus for display processor enhancement
US12412546B2 (en) Display processing unit pixel rate based on display region of interest geometry
US20230197037A1 (en) Synchronization techniques in split rendering
WO2021196175A1 (en) Methods and apparatus for clock frequency adjustment based on frame latency
CN116324962B (zh) 用于显示面板fps切换的方法和装置
EP4526756B1 (en) Technique to optimize power and performance of xr workload
CN114174980B (zh) 用于刷新多个显示器的方法和装置
WO2021151228A1 (en) Methods and apparatus for adaptive frame headroom
WO2021000220A1 (en) Methods and apparatus for dynamic jank reduction
US20210358079A1 (en) Methods and apparatus for adaptive rendering
WO2021232328A1 (en) Methods and apparatus for tickless pre-rendering
US12027087B2 (en) Smart compositor module
WO2021000226A1 (en) Methods and apparatus for optimizing frame response
US11705091B2 (en) Parallelization of GPU composition with DPU topology selection
WO2021248370A1 (en) Methods and apparatus for reducing frame drop via adaptive scheduling
WO2021096883A1 (en) Methods and apparatus for adaptive display frame scheduling
WO2023230744A1 (en) Display driver thread run-time scheduling
US20220284536A1 (en) Methods and apparatus for incremental resource allocation for jank free composition convergence
US20250104177A1 (en) Dynamic performance and power adjustment for split xr applications
CN121444044A (zh) 用于相机设备的功率优化
WO2026085266A1 (en) Xr use case driven twt configuration and traffic scheduling for optimized latency and wi-fi power
WO2021042331A1 (en) Methods and apparatus for graphics and display pipeline management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23729889

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 12024552155

Country of ref document: PH

WWE Wipo information: entry into national phase

Ref document number: 202447070932

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2024564579

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202380038196.0

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2401007317

Country of ref document: TH

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112024023277

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 2023729889

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2023729889

Country of ref document: EP

Effective date: 20241216

ENP Entry into the national phase

Ref document number: 112024023277

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20241107

WWP Wipo information: published in national office

Ref document number: 202447070932

Country of ref document: IN

WWG Wipo information: grant in national office

Ref document number: 2023729889

Country of ref document: EP