WO2021158468A1 - System and method for efficient multi-gpu rendering of geometry by geometry analysis while rendering - Google Patents

System and method for efficient multi-gpu rendering of geometry by geometry analysis while rendering Download PDF

Info

Publication number
WO2021158468A1
WO2021158468A1 PCT/US2021/016022 US2021016022W WO2021158468A1 WO 2021158468 A1 WO2021158468 A1 WO 2021158468A1 US 2021016022 W US2021016022 W US 2021016022W WO 2021158468 A1 WO2021158468 A1 WO 2021158468A1
Authority
WO
WIPO (PCT)
Prior art keywords
rendering
geometry
gpus
gpu
pieces
Prior art date
Application number
PCT/US2021/016022
Other languages
English (en)
French (fr)
Inventor
Mark E. Cerny
Tobias BERGHOFF
David Simpson
Original Assignee
Sony Interactive Entertainment Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/780,776 external-priority patent/US11170461B2/en
Priority claimed from US16/780,798 external-priority patent/US11508110B2/en
Priority claimed from US16/780,864 external-priority patent/US11120522B2/en
Application filed by Sony Interactive Entertainment Inc. filed Critical Sony Interactive Entertainment Inc.
Priority to EP21708462.3A priority Critical patent/EP4100923A1/en
Priority to CN202180020414.9A priority patent/CN115335866A/zh
Priority to JP2022546704A priority patent/JP7254252B2/ja
Publication of WO2021158468A1 publication Critical patent/WO2021158468A1/en
Priority to JP2023052155A priority patent/JP7355960B2/ja
Priority to JP2023155680A priority patent/JP7481560B2/ja

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/52Controlling the output signals based on the game progress involving aspects of the displayed game scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/363Graphics controllers

Definitions

  • the present disclosure is related to graphic processing, and more specifically for multi-GPU collaboration when rendering an image for an application.
  • the cloud gaming server may be configured to provide resources to one or more clients and/or applications. That is, the cloud gaming server may be configured with resources capable of high throughput. For example, there are limits to the performance that an individual graphics processing unit (GPU) can attain. To render even more complex scenes or use even more complex algorithms (e.g. materials, lighting, etc.) when generating a scene, it may be desirable to use multiple GPUs to render a single image. However, usage of those graphics processing units equally is difficult to achieve.
  • GPU graphics processing unit
  • Embodiments of the present disclosure relate to using multiple GPUs (graphics processing units) in collaboration to render a single image, such as multi-GPU rendering of geometry for an application by performing geometry analysis while rendering to generate information used for the dynamic assignment of screen regions to GPUs for rendering of the image frame, and/or by performing geometry analysis prior to rendering, and/or by performing a timing analysis during a rendering phase for purposes of redistributing the assignment of GPU responsibilities during the rendering phase.
  • GPUs graphics processing units
  • Embodiments of the present disclosure disclose a method for graphics processing.
  • the method including rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the method including using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry.
  • the method including during a pre-pass phase of rendering, generating information at the GPUs regarding the plurality of pieces of geometry and their relation to a plurality of screen regions.
  • the method including assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry in a subsequent phase of rendering.
  • Other embodiments of the present disclosure disclose a computer system including a processor, and memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method for graphics processing.
  • the method including rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the method including using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry.
  • the method including during a pre-pass phase of rendering, generating information at the GPUs regarding the plurality of pieces of geometry and their relation to a plurality of screen regions.
  • the method including assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry in a subsequent phase of rendering.
  • Still other embodiments of the present disclosure disclose a non-transitory computer- readable medium storing a computer program for graphics processing.
  • the computer readable medium including program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the computer readable medium including program instructions for using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry.
  • the computer readable medium including program instructions for during a pre-pass phase of rendering, generating information at the GPUs regarding the plurality of pieces of geometry and their relation to a plurality of screen regions.
  • the computer readable medium including program instructions for assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry in a subsequent phase of rendering.
  • Embodiments of the present disclosure disclose a method for graphics processing.
  • the method including rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the method including dividing responsibility for processing a plurality of pieces of geometry of an image frame during an analysis pre-pass phase of rendering between the plurality of GPUs, wherein each of the plurality of pieces of geometry is assigned to a corresponding GPU.
  • the method including determining in the analysis pre-pass phase overlap of each the plurality of pieces of geometry with each of a plurality of screen regions.
  • the method including generating information at the plurality of GPUs regarding the plurality of pieces of geometry and their relations to the plurality of screen regions based on the overlap of each the plurality of pieces of geometry with each of the plurality of screen regions.
  • the method including assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry during a subsequent phase of rendering.
  • a computer system including a processor, and memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method for graphics processing.
  • the method including rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the method including dividing responsibility for processing a plurality of pieces of geometry of an image frame during an analysis pre-pass phase of rendering between the plurality of GPUs, wherein each of the plurality of pieces of geometry is assigned to a corresponding GPU.
  • the method including determining in the analysis pre-pass phase overlap of each the plurality of pieces of geometry with each of a plurality of screen regions.
  • the method including generating information at the plurality of GPUs regarding the plurality of pieces of geometry and their relations to the plurality of screen regions based on the overlap of each the plurality of pieces of geometry with each of the plurality of screen regions.
  • the method including assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry during a subsequent phase of rendering.
  • Still other embodiments of the present disclosure disclose a non- transitory computer- readable medium storing a computer program for graphics processing.
  • the computer readable medium including program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the computer readable medium including program instructions for dividing responsibility for processing a plurality of pieces of geometry of an image frame during an analysis pre-pass phase of rendering between the plurality of GPUs, wherein each of the plurality of pieces of geometry is assigned to a corresponding GPU.
  • the computer readable medium including program instructions for determining in the analysis pre pass phase overlap of each the plurality of pieces of geometry with each of a plurality of screen regions.
  • the computer readable medium including program instructions for generating information at the plurality of GPUs regarding the plurality of pieces of geometry and their relations to the plurality of screen regions based on the overlap of each the plurality of pieces of geometry with each of the plurality of screen regions.
  • the computer readable medium including program instructions for assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry during a subsequent phase of rendering.
  • Embodiments of the present disclosure disclose a method for graphics processing.
  • the method including rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the method including using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry.
  • the method including during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU.
  • the method including for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • FIG. 1 A block diagram illustrating an exemplary computing environment in accordance with the present disclosure.
  • Still other embodiments of the present disclosure disclose a non-transitory computer- readable medium storing a computer program for graphics processing.
  • the computer readable medium including program instructions for rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • the computer readable medium including program instructions for using the plurality of GPUs in collaboration to render an image frame including a plurality of pieces of geometry.
  • the computer readable medium including program instructions for during the rendering of the image frame, subdividing one or more of the plurality of pieces of geometry into smaller pieces, and dividing the responsibility for rendering these smaller portions of geometry among the plurality of GPUs, wherein each of the smaller portions of geometry is processed by a corresponding GPU.
  • the computer readable medium including program instructions for for those pieces of geometry that are not subdivided, dividing the responsibility for rendering the pieces of geometry among the plurality of GPUs, wherein each of these pieces of geometry is processed by a corresponding GPU.
  • FIG. 1 is a diagram of a system for providing gaming over a network between one or more cloud gaming servers configured for implementing multiple GPUs in collaboration to render a single image, including multi-GPU (graphics processing unit) rendering of geometry for an application by performing geometry analysis while rendering to generate information used for the dynamic assignment of screen regions to GPUs for further rendering passes of the image frame, and/or by performing geometry analysis prior to a rendering phase, and/or by subdividing pieces of geometry and assigning the resulting smaller portions of geometry to multiple GPUs, in accordance with embodiments of the present disclosure.
  • multi-GPU graphics processing unit
  • FIG. 2 is a diagram of a multi-GPU architecture wherein multiple GPUs collaborate to render a single image, in accordance with one embodiment of the present disclosure.
  • FIG. 3 is a diagram of multiple graphics processing unit resources configured for multi-GPU rendering of geometry for an application by performing geometry analysis while rendering, and/or by performing geometry analysis prior to rendering, and/or by subdividing pieces of geometry and assigning the resulting smaller portions of geometry to multiple GPUs, in accordance with embodiments of the present disclosure.
  • FIG. 4 is a diagram of a rendering architecture implementing a graphics pipeline that is configured for multi-GPU processing, such that multiple GPUs collaborate to render a single image, in accordance with one embodiment of the present disclosure.
  • FIG. 5A is a diagram of a screen that is subdivided into quadrants when performing multi-GPU rendering, in accordance with one embodiment of the present disclosure.
  • FIG. 5B is a diagram of a screen that is subdivided into a plurality of interleaved regions when performing multi-GPU rendering, in accordance with one embodiment of the present disclosure.
  • FIG. 6A illustrates object testing against screen regions when multiple GPUs collaborate to render a single image, in accordance with one embodiment of the present disclosure.
  • FIG. 6B illustrates testing of portions of an object against screen regions when multiple GPUs collaborate to render a single image, in accordance with one embodiment of the present disclosure.
  • FIG. 7 is a flow diagram illustrating a method for graphics processing including multi-GPU rendering of geometry for an application by performing geometry analysis while rendering, in accordance with one embodiment of the present disclosure.
  • FIG. 8 is a diagram of a screen illustrating the dynamic assignment of screen regions to GPUs for geometry rendering based on an analysis of geometry of a current image frame performed while rendering the current image frame, in accordance with one embodiment of the present disclosure.
  • FIGS. 9A-9C are diagrams illustrating the rendering of an image frame including four objects including a Z pre-pass phase and a geometry phase of rendering an image frame, the Z pre-pass phase performed for generating information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 10 illustrates rendering an image frame using the dynamic assignment of screen regions based on whole objects or portions of objects to GPUs for geometry rendering based on an analysis of geometry of a current image frame performed during a Z pre-pass phase of rendering while rendering the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 11 is a diagram illustrating the interleaving of GPU assignments to pieces of geometry of an image frame for purposes of performing a Z pre-pass phase of rendering to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 12A is a flow diagram illustrating a method for graphics processing including multi-GPU rendering of geometry for an application by performing geometry analysis prior to rendering, in accordance with one embodiment of the present disclosure.
  • FIG. 12B is a diagram illustrating an analysis pre-pass performed before a rendering phase for an image frame, the analysis pre-pass generating information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 13A is a diagram illustrating the calculation of an accurate overlapping between a piece of geometry and a screen region when performing an analysis pre-pass to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 13B is a pair of diagrams illustrating the calculations of approximate overlap between a piece of geometry and a screen region when performing an analysis pre-pass to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 14A is a flow diagram illustrating a method for graphics processing including multi-GPU rendering of geometry for an application by performing a timing analysis during a rendering or analysis phase for purposes of redistributing the assignment of GPU responsibilities during the rendering or analysis phase, such as when performing a Z pre-pass phase for pieces of geometry to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 14B is a diagram illustrating various distributions of GPU assignments for performing a Z pre-pass phase of rendering to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 15A is a diagram illustrating the use of multiple GPUs to render pieces of geometry in a screen region, in accordance with one embodiment of the present disclosure.
  • FIG. 15B is a diagram illustrating the rendering of pieces of geometry out of order of their corresponding draw calls, in accordance with one embodiment of the present disclosure.
  • FIG. 16 illustrates components of an example device that can be used to perform aspects of the various embodiments of the present disclosure.
  • various embodiments of the present disclosure provide for the analysis of geometry of an image frame, and dynamically and flexibly assigning responsibilities between the GPUs for rendering the image frame, such that each GPU ends up being responsible for a set of screen regions that are unique for that image frame (i.e., the next image frame may have a different association of GPUs to screen regions).
  • embodiments of the present disclosure support an increase in pixel count (i.e. resolution) and complexity, and/or an increase in geometric complexity, and/or an increase in the amount of processing per vertex and/or primitive.
  • various embodiments of the present disclosure describe methods and systems configured for performing multi-GPU rendering of geometry for an application by performing geometry analysis while rendering to dynamically assign screen regions to GPUs for geometry rendering of the image frame, wherein the geometry analysis is based on information defining relationships between the geometry to be rendered for the image frame and screen regions.
  • the information for the geometry analysis is generated while rendering, such as during a Z pre-pass before the geometry rendering.
  • hardware is configured so that a pre-pass generates the information used to assist in intelligent assignment of screen regions to the GPUs when performing geometry a subsequent phase of rendering.
  • inventions of the present disclosure describe methods and systems configured for performing multi-GPU rendering of geometry for an application by performing geometry analysis prior to a phase of rendering, in order to dynamically assign screen regions to GPUs for that phase of rendering of the image frame, wherein the geometry analysis is based on information defining relationships between the geometry to be rendered for the image frame and screen regions. For example, the information is generated in a pre-pass performed before rendering, such as using shaders (e.g., software). The information is used for the intelligent assignment of screen regions to the GPUs when performing the geometry rendering. Still other embodiments of the present disclosure describe methods and system configured for subdividing pieces of geometry, e.g.
  • an interactive application or “game” or “video game” or “gaming application” is meant to represent any type of interactive application that is directed through execution of input commands.
  • an interactive application includes applications for gaming, word processing, video processing, video game processing, etc. Further, the terms introduced above are interchangeable.
  • GPUs may collaborate when rendering geometry for an application.
  • FIG. 1 is a diagram of a system for performing multi-GPU processing when rendering an image (e.g. image frame) for an application, in accordance with one embodiment of the present disclosure.
  • the system is configured to provide gaming over a network between one or more cloud gaming servers, and more specifically is configured for the collaboration of multiple GPUs to render a single image of an application, such as when performing geometry analysis of pieces of geometry of an image frame while rendering or prior to rendering in order to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or when subdividing pieces of geometry, e.g.
  • Cloud gaming includes the execution of a video game at the server to generate game rendered video frames, which are then sent to a client for display.
  • system 100 is configured for efficient multi-GPU rendering of geometry for an application by pretesting against interleaved screen regions before rendering.
  • FIG. 1 illustrates the implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system
  • FIG. 1 illustrates the implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system
  • FIG. 1 illustrates the implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system
  • FIG. 1 illustrates the implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system
  • FIG. 1 illustrates the implementation of multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system
  • multi-GPU rendering of geometry may be performed using physical GPUs, or virtual GPUs, or a combination of both, in various embodiments (e.g. in a cloud gaming environment or within a stand-alone system).
  • virtual machines e.g., virtual machines
  • a hypervisor of a host hardware (e.g. located at a data center) utilizing one or more components of a hardware layer, such as multiple CPUs, memory modules, GPUs, network interfaces, communication components, etc.
  • These physical resources may be arranged in racks, such as racks of CPUs, racks of GPUs, racks of memory, etc., wherein the physical resources in the racks may be accessed using top of rack switches facilitating a fabric for assembling and accessing of components used for an instance (e.g. when building the virtualized components of the instance).
  • a hypervisor can present multiple guest operating systems of multiple instances that are configured with virtual resources.
  • each of the operating systems may be configured with a corresponding set of virtualized resources supported by one or more hardware resources (e.g. located at a corresponding data center).
  • each operating system may be supported with a virtual CPU, multiple virtual GPUs, virtual memory, virtualized communication components, etc.
  • a configuration of an instance that may be transferred from one data center to another data center to reduce latency.
  • GPU utilization defined for the user or game can be utilized when saving a user’s gaming session.
  • the GPU utilization can include any number of configurations described herein to optimize the fast rendering of video frames for a gaming session.
  • the GPU utilization defined for the game or the user can be transferred between data centers as a configurable setting. The ability to transfer the GPU utilization setting enables for efficient migration of game play from data center to data center in case the user connects to play games from different geo locations.
  • System 100 provides gaming via a cloud game network 190, wherein the game is being executed remote from client device 110 (e.g. thin client) of a corresponding user that is playing the game, in accordance with one embodiment of the present disclosure.
  • System 100 may provide gaming control to one or more users playing one or more games through the cloud game network 190 via network 150 in either single-player or multi-player modes.
  • the cloud game network 190 may include a plurality of virtual machines (VMs) running on a hypervisor of a host machine, with one or more virtual machines configured to execute a game processor module utilizing the hardware resources available to the hypervisor of the host.
  • Network 150 may include one or more communication technologies.
  • network 150 may include 5 th Generation (5G) network technology having advanced wireless communication systems.
  • 5G 5 th Generation
  • communication may be facilitated using wireless technologies.
  • technologies may include, for example, 5G wireless communication technologies.
  • 5G is the fifth generation of cellular network technology.
  • 5G networks are digital cellular networks, in which the service area covered by providers is divided into small geographical areas called cells. Analog signals representing sounds and images are digitized in the telephone, converted by an analog to digital converter and transmitted as a stream of bits. All the 5G wireless devices in a cell communicate by radio waves with a local antenna array and low power automated transceiver (transmitter and receiver) in the cell, over frequency channels assigned by the transceiver from a pool of frequencies that are reused in other cells.
  • the local antennas are connected with the telephone network and the Internet by a high bandwidth optical fiber or wireless backhaul connection.
  • 5G networks are just an example type of communication network, and embodiments of the disclosure may utilize earlier generation wireless or wired communication, as well as later generation wired or wireless technologies that come after 5G.
  • the cloud game network 190 includes a game server 160 that provides access to a plurality of video games.
  • Game server 160 may be any type of server computing device available in the cloud, and may be configured as one or more virtual machines executing on one or more hosts.
  • game server 160 may manage a virtual machine supporting a game processor that instantiates an instance of a game for a user.
  • a plurality of game processors of game server 160 associated with a plurality of virtual machines is configured to execute multiple instances of one or more games associated with gameplays of a plurality of users.
  • back-end server support provides streaming of media (e.g. video, audio, etc.) of gameplays of a plurality of gaming applications to a plurality of corresponding users.
  • game server 160 is configured to stream data (e.g. rendered images and/or frames of a corresponding gameplay) back to a corresponding client device 110 through network 150.
  • data e.g. rendered images and/or frames of a corresponding gameplay
  • a computationally complex gaming application may be executing at the back-end server in response to controller inputs received and forwarded by client device 110.
  • Each server is able to render images and/or frames that are then encoded (e.g. compressed) and streamed to the corresponding client device for display.
  • a plurality of users may access cloud game network 190 via communication network 150 using corresponding client devices 110 configured for receiving streaming media.
  • client device 110 may be configured as a thin client providing interfacing with a back end server (e.g. cloud game network 190) configured for providing computational functionality (e.g. including game title processing engine 111).
  • client device 110 may be configured with a game title processing engine and game logic for at least some local processing of a video game, and may be further utilized for receiving streaming content as generated by the video game executing at a back-end server, or for other content provided by back-end server support.
  • the game title processing engine includes basic processor based functions for executing a video game and services associated with the video game.
  • the game logic may be stored on the local client device 110 and is used for executing the video game.
  • Each of the client devices 110 may be requesting access to different games from the cloud game network.
  • cloud game network 190 may be executing one or more game logics that are built upon a game title processing engine 111, as executed using the CPU resources 163 and GPU resources 365 of the game server 160.
  • game logic 115a in cooperation with game title processing engine 111 may be executing on game server 160 for one client
  • game logic 115b in cooperation with game title processing engine 111 may be executing on game server 160 for a second client
  • game logic 115n in cooperation with game title processing engine 111 may be executing on game server 160 for an Nth client.
  • client device 110 of a corresponding user is configured for requesting access to games over a communication network 150, such as the internet, and for rendering for display images (e.g. image frame) generated by a video game executed by the game server 160, wherein encoded images are delivered to the client device 110 for display in association with the corresponding user.
  • a communication network 150 such as the internet
  • client device 110 For example, the user may be interacting through client device 110 with an instance of a video game executing on game processor of game server 160.
  • an instance of the video game is executed by the game title processing engine
  • Corresponding game logic e.g. executable code
  • a data store not shown
  • Game title processing engine 111 is able to support a plurality of video games using a plurality of game logics (e.g. gaming application), each of which is selectable by the user.
  • game logics e.g. gaming application
  • client device 110 is configured to interact with the game title processing engine 111 in association with the gameplay of a corresponding user, such as through input commands that are used to drive gameplay.
  • client device 110 may receive input from various types of input devices, such as game controllers, tablet computers, keyboards, gestures captured by video cameras, mice, touch pads, etc.
  • Client device 110 can be any type of computing device having at least a memory and a processor module that is capable of connecting to the game server 160 over network 150.
  • the back-end game title processing engine 111 is configured for generating rendered images, which is delivered over network 150 for display at a corresponding display in association with client device 110.
  • the game rendered images may be delivered by an instance of a corresponding game
  • client device 110 is configured for receiving encoded images (e.g. encoded from game rendered images generated through execution of a video game), and for displaying the images that are rendered on display 11.
  • display 11 includes an HMD (e.g. displaying VR content).
  • the rendered images may be streamed to a smartphone or tablet, wirelessly or wired, direct from the cloud based services or via the client device 110 (e.g. PlayStation ® Remote Play).
  • game server 160 and/or the game title processing engine 111 includes basic processor based functions for executing the game and services associated with the gaming application.
  • game server 160 includes central processing unit (CPU) resources 163 and graphics processing unit (GPU) resources 365 that are configured for performing processor based functions include 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, artificial intelligence, etc.
  • CPU central processing unit
  • GPU graphics processing unit
  • the CPU and GPU group may implement services for the gaming application, including, in part, memory management, multi thread management, quality of service (QoS), bandwidth testing, social networking, management of social friends, communication with social networks of friends, communication channels, texting, instant messaging, chat support, etc.
  • one or more applications share a particular GPU resource.
  • multiple GPU devices may be combined to perform graphics processing for a single application that is executing on a corresponding CPU.
  • cloud game network 190 is a distributed game server system and/or architecture.
  • a distributed game engine executing game logic is configured as a corresponding instance of a corresponding game.
  • the distributed game engine takes each of the functions of a game engine and distributes those functions for execution by a multitude of processing entities. Individual functions can be further distributed across one or more processing entities.
  • the processing entities may be configured in different configurations, including physical hardware, and/or as virtual components or virtual machines, and/or as virtual containers, wherein a container is different from a virtual machine as it virtualizes an instance of the gaming application running on a virtualized operating system.
  • the processing entities may utilize and/or rely on servers and their underlying hardware on one or more servers (compute nodes) of the cloud game network 190, wherein the servers may be located on one or more racks.
  • the coordination, assignment, and management of the execution of those functions to the various processing entities are performed by a distribution synchronization layer.
  • the distribution synchronization layer is able to efficiently execute (e.g. through load balancing) those functions across the distributed processing entities, such that critical game engine components/functions are distributed and reassembled for more efficient processing.
  • FIG. 2 is a diagram of an exemplary multi-GPU architecture 200 wherein multiple GPUs collaborate to render a single image of a corresponding application, in accordance with one embodiment of the present disclosure.
  • the multi-GPU architecture 200 is configured to perform geometry analysis of pieces of geometry of an image frame while rendering or prior to rendering in order to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or when subdividing pieces of geometry, e.g. as processed or produced by draw calls, into smaller portions of geometry, and assigning those smaller portions of geometry to multiple GPUs for rendering, wherein each smaller portion of geometry is assigned to a GPU, in accordance with various embodiments of the present disclosure. .
  • multi-GPU rendering of geometry for an application by performing region testing while rendering may be implemented between one or more cloud gaming servers of a cloud gaming system, or may be implemented within a stand-alone system, such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • the multi-GPU architecture 200 includes a CPU 163 and multiple GPUs configured for multi-GPU rendering of a single image (also referred to as “image frame”) for an application, and/or each image in a sequence of images for the application.
  • CPU 163 and GPU resources 365 are configured for performing processor based functions include 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, artificial intelligence, etc., as previously described.
  • processor based functions include 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, artificial intelligence, etc., as previously described.
  • processor based functions include 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, artificial intelligence, etc., as previously described.
  • RAM random access memory
  • GPU-A is connected to memory 210A (e.g., RAM) via bus 220
  • GPU-B is connected to memory 210B (e.g., RAM) via bus 220
  • GPU-C is connected to memory 210C (e.g., RAM) via bus 220
  • GPU-D is connected to memory 210D (e.g., RAM) via bus 220.
  • each GPU is connected to each other via bus 240 that depending on the architecture may be approximately equal in speed or slower than bus 220 used for communication between a corresponding GPU and its corresponding memory.
  • bus 240 is connected to each of GPU-B, GPU-C, and GPU-D via bus 240.
  • GPU-B is connected to each of GPU-A, GPU-C, and GPU-D via bus 240.
  • GPU-C is connected to each of GPU-A, GPU-B, and GPU-D via bus 240.
  • GPU-D is connected to each of GPU-A, GPU-B, and GPU-C via bus 240.
  • CPU 163 connects to each of the GPUs via a lower speed bus 230 (e.g., bus 230 is slower than bus 220 used for communication between a corresponding GPU and its corresponding memory).
  • bus 230 is slower than bus 220 used for communication between a corresponding GPU and its corresponding memory.
  • CPU 163 is connected to each of GPU-A, GPU-B, GPU- C, and GPU-D.
  • the four GPUs are discrete GPUs, each on their own silicon die. In other embodiments, the four GPUs may share a die in order to take advantage of high speed interconnects and other units on the die.
  • there is one physical GPU 250 that can be configured to be used either as a single more powerful GPU or as four less powerful “virtual” GPUs (GPU-A, GPU-B, GPU-C and GPU-D). That is to say, there is sufficient functionality for GPU-A, GPU-B, GPU-C and GPU-D each to operate a graphics pipeline (as shown in FIG. 4), and the chip as a whole can operate a graphics pipeline (as shown in FIG. 4), and the configuration can be flexibly switched (e.g. between rendering passes) between the two configurations.
  • FIG. 3 is a diagram of graphics processing unit resources 365 configured for multi-
  • GPU rendering of geometry for an image frame generated by an application by performing geometry analysis of pieces of geometry of an image frame while rendering or prior to rendering in order to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or when subdividing pieces of geometry, e.g. as processed or produced by draw calls, into smaller portions of geometry, and assigning those smaller portions of geometry to multiple
  • GPUs for rendering wherein each smaller portion of geometry is assigned to a GPU, in accordance with various embodiments of the present disclosure.
  • game server 160 may be configured to include GPU resources 365 in the cloud game network 190 of FIG. 1.
  • GPU resources 365 includes multiple GPUs, such as GPU 365a, GPU 365b ... GPU
  • various architectures may include multiple GPUs collaborating to render a single image by performing multi-GPU rendering of geometry for an application through region testing while rendering, such as implementing multi-GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system, or implementing multi-
  • GPU rendering of geometry within a stand-alone system such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • game server 160 is configured to perform multi-
  • game server 160 may include a CPU and
  • GPU group that is configured to perform multi-GPU rendering of each of one or more images in a sequence of images of the application, wherein one CPU and GPU group could be implementing graphics and/or rendering pipelines for the application, in one embodiment.
  • CPU and GPU group could be configured as one or more processing devices.
  • the GPU and GPU group may include CPU 163 and GPU resources 365, which are configured for performing processor based functions include 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, artificial intelligence, etc.
  • GPU resources 365 are responsible and/or configured for rendering of objects (e.g. writing color or normal vector values for a pixel of the object to multiple render targets - MRTs) and for execution of synchronous compute kernels (e.g. full screen effects on the resulting
  • GPU resources 365 is configured to render objects and perform synchronous compute (e.g. during the execution of synchronous compute kernels) when executing commands from the rendering command buffers 325, wherein commands and/or operations may be dependent on other operations such that they are performed in sequence.
  • GPU resources 365 are configured to perform synchronous compute and/or rendering of objects using one or more rendering command buffers 325 (e.g. rendering command buffer 325a, rendering buffer 325b ... rendering command buffer 325n).
  • Each GPU in the GPU resources 365 may have their own command buffer, in one embodiment.
  • the GPUs in GPU resources 365 may use the same command buffer or the same set of command buffers.
  • each of the GPUs in GPU resources 365 may support the ability for a command to be executed by one GPU, but not by another.
  • rendering command buffer 325a may support flags 330a
  • rendering command buffer 325b support flags 330b
  • rendering command buffer 325n may support flags 330n.
  • Performance of synchronous compute e.g. execution of synchronous compute kernels
  • rendering of objects are part of the overall rendering. For example, if the video game is running at 60Hz (e.g. 60 frames per second), then all object rendering and execution of synchronous compute kernels for an image frame typically must complete within approximately 16.67 ms (e.g. one frame at 60 Hz).
  • operations performed when rendering objects and/or executing synchronous compute kernels are ordered, such that operations may be dependent on other operations (e.g. commands in a rendering command buffer may need to complete execution before other commands in that rendering command buffer can execute).
  • each of the rendering command buffers 325 contains commands of various types, including commands that affect a corresponding GPU configuration (e.g. commands that specify the location and format of a render target), as well as commands to render objects and/or execute synchronous compute kernels.
  • commands that affect a corresponding GPU configuration e.g. commands that specify the location and format of a render target
  • synchronous compute performed when executing synchronize compute kernels may include performing full screen effects when the objects have all been rendered to one or more corresponding multiple render targets (MRTs).
  • GPU resources 365 when GPU resources 365 render objects for an image frame, and/or execute synchronous compute kernels when generating the image frame, the GPU resources 365 are configured via the registers of each GPU 365a, 365b ... 365n.
  • GPU 365a is configured via its registers 340 (e.g. register 340a, register 340b ... register 340n) to perform that rendering or compute kernel execution in a certain way. That is, the values stored in registers 340 define the hardware context (e.g. GPU configuration or GPU state) for GPU 365a 365 when executing commands in rendering command buffers 325 used for rendering objects and/or executing synchronous compute kernels for an image frame.
  • the hardware context e.g. GPU configuration or GPU state
  • Each of the GPUs in GPU resources 365 may be similarly configured, such that GPU 365b is configured via its registers 350 (e.g., register 350a, register 350b ... register 350n) to perform that rendering or compute kernel execution in a certain way; ... and GPU 365n is configured via its registers 370 (e.g., register 370a, register 370b ... register 370n) to perform that rendering or compute kernel execution in a certain way.
  • registers 350 e.g., register 350a, register 350b ... register 350n
  • GPU 365n is configured via its registers 370 (e.g., register 370a, register 370b ... register 370n) to perform that rendering or compute kernel execution in a certain way.
  • GPU configuration includes the location and format of render targets (e.g. MRTs).
  • other examples of GPU configuration include operating procedures. For instance, when rendering an object, the Z- value of each pixel of the object can be compared to the Z-buffer in various ways. For example, the object pixel is written only if the object Z- value matches the value in the Z-buffer. Alternatively, the object pixel could be written only if the object Z- value is the same or less than the value in the Z-buffer. The type of test being performed is defined within the GPU configuration.
  • FIG. 4 is a simplified diagram of a rendering architecture implementing a graphics pipeline 400 that is configured for multi-GPU processing, such that multiple GPUs collaborate to render a single image, in accordance with one embodiment of the present disclosure.
  • the graphics pipeline 400 is illustrative of the general process for rendering images using 3D (three dimensional) polygon rendering processes.
  • the graphics pipeline 400 for a rendered image outputs corresponding color information for each of the pixels in a display, wherein the color information may represent texture and shading (e.g., color, shadowing, etc.).
  • Graphics pipeline 400 may be implementable within the client device 110, game server 160, game title processing engine 111, and/or GPU resources 365 of FIGS. 1 and 3.
  • various architectures may include multiple GPUs collaborating to render a single image by performing multi-GPU rendering of geometry for an application through region testing while rendering, such as implementing multi- GPU rendering of geometry between one or more cloud gaming servers of a cloud gaming system, or implementing multi-GPU rendering of geometry within a stand-alone system, such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • the graphics pipeline receives input geometries 405.
  • the geometry processing stage 410 receives the input geometries 405.
  • the input geometries 405 may include vertices within a 3D gaming world, and information corresponding to each of the vertices.
  • a given object within the gaming world can be represented using polygons (e.g., triangles) defined by vertices, wherein the surface of a corresponding polygon is then processed through the graphics pipeline 400 to achieve a final effect (e.g., color, texture, etc.).
  • Vertex attributes may include normal (e.g., which direction is perpendicular to the geometry at that location), color (e.g., RGB - red, green, and blue triple, etc.), and texture coordinate/mapping information.
  • the geometry processing stage 410 is responsible for (and capable of) both vertex processing (e.g. via a vertex shader) and primitive processing.
  • the geometry processing stage 410 may output sets of vertices that define primitives and deliver them to the next stage of the graphics pipeline 400, as well as positions (to be precise, homogeneous coordinates) and various other parameters for those vertices.
  • the positions are placed in the position cache 450 for access by later shader stages.
  • the other parameters are placed in the parameter cache 460, again for access by later shader stages.
  • Various operations may be performed by the geometry processing stage 410, such as performing lighting and shadowing calculations for the primitives and/or polygons.
  • the geometry stage can perform backface culling, and/or clipping (e.g. testing against the view frustum), thereby reducing the load on downstream stages (e.g., rasterization stage 420, etc.).
  • the geometry stage may generate primitives (e.g. with functionality equivalent to a traditional geometry shader).
  • the primitives output by the geometry processing stage 410 are fed into the rasterization stage 420 that converts the primitives into a raster image composed of pixels.
  • the rasterization stage 420 is configured to project objects in the scene to a two- dimensional (2D) image plane defined by the viewing location in the 3D gaming world (e.g., camera location, user eye location, etc.).
  • 2D two- dimensional
  • the rasterization stage 420 looks at each primitive and determines which pixels are affected by the corresponding primitive.
  • the rasterizer 420 partitions the primitives into pixel sized fragments, wherein each fragment corresponds to a pixel in the display. It is important to note that one or more fragments may contribute to the color of a corresponding pixel when displaying an image.
  • additional operations may also be performed by the rasterization stage 420 such as clipping (identify and disregard fragments that are outside the viewing frustum) and culling (disregard fragments that are occluded by closer objects) to the viewing location.
  • clipping identify and disregard fragments that are outside the viewing frustum
  • culling disregard fragments that are occluded by closer objects
  • the geometry processing stage 410 and/or rasterization stage 420 may be configured to identify and disregard primitives that are outside the viewing frustum as defined by the viewing location in the gaming world.
  • the pixel processing stage 430 uses the parameters created by the geometry processing stage, as well as other data, to generate values such as the resulting color of the pixel.
  • the pixel processing stage 430 at its core performs shading operations on the fragments to determine how the color and brightness of a primitive varies with available lighting.
  • pixel processing stage 430 may determine depth, color, normal and texture coordinates (e.g., texture details) for each fragment, and may further determine appropriate levels of light, darkness, and color for the fragments.
  • pixel processing stage 430 calculates the traits of each fragment, including color and other attributes (e.g., z-depth for distance from the viewing location, and alpha values for transparency).
  • the pixel processing stage 430 applies lighting effects to the fragments based on the available lighting affecting the corresponding fragments. Further, the pixel processing stage 430 may apply shadowing effects for each fragment.
  • the output of the pixel processing stage 430 includes processed fragments (e.g., texture and shading information) and is delivered to the output merger stage 440 in the next stage of the graphics pipeline 400.
  • the output merger stage 440 generates a final color for the pixel, using the output of the pixel processing stage 430, as well as other data, such as a value already in memory.
  • the output merger stage 440 may perform optional blending of values between fragments and/or pixels determined from the pixel processing stage 430, and values already written to an MRT for that pixel.
  • Color values for each pixel in the display may be stored in a frame buffer (not shown). These values are scanned to the corresponding pixels when displaying a corresponding image of the scene.
  • the display reads color values from the frame buffer for each pixel, row-by-row, from left-to-right or right-to-left, top-to-bottom or bottom-to-top, or any other pattern, and illuminates pixels using those pixel values when displaying the image.
  • Embodiments of the present disclosure use multiple GPUs in collaboration to generate and/or render a single image frame.
  • the difficulty in using multiple GPUs is in distributing an equal amount of work to each GPU.
  • Embodiments of the present disclosure are capable of providing an equal amount of work to each GPU (i.e. approximately distribution of work), support an increase in pixel count (i.e. resolution) and complexity, and/or an increase in geometric complexity, and/or an increase in the amount of processing per vertex and/or primitive, through analysis of the spatial distribution of the geometry to be rendered and dynamically (i.e. frame to frame) adjust GPU responsibility for screen regions to optimize for both geometry work and pixel.
  • dynamic distribution of GPU responsibility is performed by screen regions, as further described below in relation to FIGS.
  • FIGS. 5A-5B show renderings, purely for purposes of illustration, of screens that are subdivided into regions, wherein each region is assigned to a GPU in a fixed fashion. That is to say, the assignment of regions to GPUs does not change from image frame to image frame.
  • the screen is subdivided into four quadrants, each of which is assigned to a different GPU.
  • the screen is subdivided into a larger number of interleaved regions, each of which is assigned to a GPU.
  • FIGS. 5A-5B shows more efficient rendering, according to embodiments of the invention.
  • FIG. 5 A is a diagram of a screen 510A that is subdivided into quadrants (e.g. four regions) when performing multi-GPU rendering.
  • screen 510A is subdivided into four quadrants (e.g. A, B, C, and D).
  • Each quadrant is assigned to one of the four GPUs [GPU-A, GPU-B, GPU-C, and GPU-D], in a one-to-one relationship. That is, GPU responsibility is distributed by fixed region assignment, wherein each GPU has a fixed assignment to one or more screen regions.
  • GPU-A is assigned to quadrant A
  • GPU-B is assigned to quadrant B
  • GPU-C is assigned to quadrant C
  • GPU-D is assigned to quadrant D.
  • the geometry can be culled.
  • CPU 163 can check a bounding box against each quadrant’s frustum, and request each GPU to render only the objects that overlap its corresponding frustum.
  • each GPU is responsible for rendering only a portion of the geometry.
  • screen 510 shows pieces of geometry, wherein each piece is a corresponding object, wherein screen 510 shows objects 511 - 517 (e.g. pieces of geometry). It is understood that pieces of geometry may correspond to whole objects or portions of objects (e.g., primitives, etc.).
  • GPU-A will render no objects, as no objects overlap Quadrant A.
  • GPU-B will render objects 515 and 516 (as a portion of object 515 is present in Quadrant B, the CPU’s culling test will correctly conclude that GPU-B must render it).
  • GPU-C will render objects 511 and 512.
  • GPU-D will render objects 512, 513, 514, 515 and 517.
  • the amount of work that each GPU must perform may be very different, as a disproportionate amount of geometry may be in one quadrant in some situations.
  • quadrant A does not have any pieces of geometry
  • quadrant D has five pieces of geometry, or at least portions of at least five pieces of geometry.
  • GPU-A assigned to quadrant A would be idle, while GPU-D assigned to quadrant D would be disproportionately busy when rendering objects in the corresponding image.
  • FIG. 5B illustrates another technique when subdividing a screen into regions, such that screen 510B is subdivided into a plurality of interleaved regions when performing multi-
  • screen 510B is subdivided into a plurality of regions when performing multi-GPU rendering of a single image or each of one or more images in a sequence of images.
  • the screen 510B may be subdivided into regions corresponding to the GPUs.
  • screen 510B is subdivided into a larger number of regions (e.g. greater than the four quadrants), while using the same amount of GPUs for rendering (e.g. four).
  • the objects (511-517) shown in screen 510A are also shown in screen 510B in the same corresponding locations.
  • GPU-A e.g. GPU-A, GPU-B, GPU-C, and GPU-D
  • Each of the GPUs is responsible for rendering geometry overlapping a corresponding region. That is, each GPU is assigned to a corresponding set of regions.
  • GPU-A is responsible for each of the regions labeled A in a corresponding set
  • GPU-B is responsible for each of regions labeled B in a corresponding set
  • GPU-C is responsible for each of regions labeled C in a corresponding set
  • GPU-D is responsible for each of regions labeled D in a corresponding set.
  • the regions are interleaved in a particular pattern. Because of the interleaving (and higher number) of regions, the amount of work that each GPU must perform may be much more balanced.
  • the pattern of interleaving of screen 510B includes alternating rows including regions A - B -A - B and so on, and regions C - D - C - D and so on.
  • Other patterns of interleaving the regions is supported in embodiments of the present disclosure.
  • patterns may include repeated sequences of regions, evenly distributed regions, uneven distribution of regions, repeated rows of sequences of regions, random sequences of regions, random rows of sequences of regions, etc.
  • each GPU must still process most or all of the geometry. For example, it may be difficult to check object bounding boxes against all of the regions that a GPU is responsible for. Also, even if bounding boxes can be checked in a timely manner, due to small region size, the result will be that each GPU likely has to process most of the geometry because every object in an image overlaps at least one region of each of the GPUs (e.g. a GPU processes an entire object even though only a portion of the object overlaps at least one region in a set of regions assigned to that GPU). [0088] As a result, choosing the number of regions is important.
  • Geometry analysis may be performed while rendering or prior to rendering, and the resulting information can then be used to dynamically assign screen region to GPUs for further rendering of a corresponding image frame, as will be further described below. That is, screen regions are not fixed to corresponding GPUs, but may be dynamically assigned to GPUs for rendering a corresponding image frame.
  • FIGS. 6A-6B show the advantage of splitting an object within an image frame into smaller portions for purposes of performing geometry analysis in order to dynamically assign screen regions to GPUs for geometry rendering of whole objects and/or portions of objects of the image frame in various embodiments of the present disclosure.
  • multi-GPU rendering of objects is performed for a single image frame by performing geometry analysis on objects in the screen.
  • Information is generated for “pieces of geometry,” wherein the pieces of geometry can be an entire object or portions of objects.
  • a piece of geometry can be an object 610, or portions of object 610.
  • GPUs are assigned to pieces of geometry
  • the GPUs in collaboration determine information that provides relationships between each of the pieces of geometry and each of the screen regions. Analysis is performed on the information to dynamically assign screen regions to the GPUs for subsequent rendering of a corresponding image frame.
  • the geometry analysis and subsequent rendering e.g. rendering of geometry
  • the other GPUs can skip that object entirely when rendering the image frame, which results in efficient processing of geometry, in accordance with one embodiment of the present disclosure.
  • splitting an object into smaller portions can allow for still higher efficiencies when performing geometry analysis and/or rendering of the geometry in a corresponding image frame.
  • FIG. 6A illustrates geometry analysis of whole objects (i.e. the amount of geometry used by or generated by a corresponding draw call) to determine relationships of objects to screen regions when multiple GPUs collaborate to render a corresponding image frame, in accordance with one embodiment of the present disclosure.
  • object 610 may be determined to overlap region 620A and object 610 may also be determined to overlap region 620B. That is, portion 610A of object 610 overlaps region 620A, and portion 610B of object 610 overlaps region 620B.
  • GPU-A is assigned responsibility for rendering objects in screen region 620A
  • GPU-B is assigned responsibility for rendering objects in screen region 620B. Because objects are rendered wholly, GPU-A is tasked to fully render object 610, i.e. process all primitives within the object, including primitives across both regions 620A and 620B. In this particular example, GPU-B is also tasked to render object 610 in whole. That is, there may be duplication of work by GPU-A and GPU-B when performing rendering of the geometry of the object in the corresponding image frame. Also, the geometry analysis itself may be difficult to balance, if there are a small number of objects (i.e. draw calls) to distribute between the GPUs. [0091] FIG.
  • FIG. 6B illustrates geometry analysis of portions of an object to determine relationships of portions of objects to screen regions when multiple GPUs collaborate to render a corresponding image frame, in accordance with one embodiment of the present disclosure.
  • the geometry used by or generated by a draw call is subdivided to create these portions of objects.
  • object 610 may be split into pieces, such that the geometry used by or generated by a draw call is subdivided into smaller pieces of geometry.
  • information is generated for those smaller pieces of geometry during geometry analysis to determine relationships (e.g. overlap) between the smaller pieces of geometry and each of the screen regions.
  • Geometry analysis is performed using the information to dynamically assign rendering responsibilities by screen regions between the GPUs for rendering the smaller pieces of geometry of a corresponding image frame.
  • Each GPU only renders the smaller pieces of geometry that overlap screen regions to which it is responsible for, when performing rendering for a corresponding image frame.
  • each GPU is assigned to a set of screen regions for rendering pieces of geometry of a corresponding image frame. That is, there is a unique assignment of GPU responsibilities for each image frame. In that manner, there is higher efficiency when rendering a corresponding image frame because there may be less duplication of effort between GPUs when performing geometry analysis and/or rendering of the geometry of the object in the corresponding image frame.
  • the draw calls in the command buffer remain the same, while rendering the GPU splits the geometry into pieces.
  • the pieces of geometry may be roughly the size for which the position cache and/or parameter cache are allocated.
  • Each GPU either renders or skips these pieces, such that a GPU only renders pieces that overlap screen regions to which it is assigned.
  • object 610 is split into portions, such that the pieces of geometry used for region testing corresponds to these smaller portions of object 610.
  • object 610 is split into pieces of geometry “a”, “b”, “c”, “d”, “e”, and “f”.
  • GPU-A may be dynamically assigned to screen region 620A in order to render pieces of geometry “a”, “b”, “c”, “d”, and “e” when rendering a corresponding image frame. That is, GPU-A can skip rendering piece of geometry “f”.
  • GPU-B may be assigned to screen region 620B in order to render pieces of geometry “d,” “e”, and “f”, when rendering the corresponding image frame.
  • GPU-B can skip rendering pieces of geometry “a”, “b”, and “c”. As shown, there is less duplication of effort between GPU-A and GPU-B, as instead of rendering object 610 wholly, only pieces of geometry “d” and “e” are rendered by each of GPU- A and GPU-B.
  • flow diagram 700 of FIG. 7 illustrates a method for graphics processing when implementing multi-GPU rendering of geometry for an image frame generated by an application by performing geometry analysis while rendering, in accordance with one embodiment of the present disclosure.
  • a number of GPUs collaborate to generate an image frame.
  • Responsibility for certain phases of rendering is divided between a plurality of the GPUs based on screen region for each image frame.
  • GPUs While rendering geometry, GPUs generate information regarding the geometry and its relation to screen regions. This information is used to assign GPUs to screen regions, allowing for more efficient rendering. In that manner, multiple GPU resources are used to efficiently perform rendering of objects of an image frame when executing an application.
  • various architectures may include multiple GPUs collaborating to render a single image by performing multi-GPU rendering of geometry for an application through region testing while rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system, such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • the method includes rendering graphics using a plurality of GPUs, wherein in certain phases responsibility for rendering is dynamically divided between the plurality of the GPUs based on screen regions.
  • responsibility for rendering is dynamically divided between the plurality of the GPUs based on screen regions.
  • multi-GPU processing is performed when rendering a single image frame, and/or each of one or more image frames of a sequence of image frames for a real-time application, where each image frame includes a plurality of pieces of geometry.
  • GPU rendering responsibility is dynamically assigned between a plurality of screen regions for each image frame, such that each GPU renders pieces of geometry in its assigned screen regions. That is, each GPU has a corresponding division of the responsibility (e.g. corresponding screen regions).
  • the method includes using the plurality of GPUs in collaboration to render an image frame including a corresponding plurality of pieces of geometry.
  • a pre-pass phase of rendering is performed when rendering.
  • this pre-pass phase of rendering is a Z pre-pass, wherein the plurality of pieces of geometry are rendered.
  • the method includes dividing responsibility for processing the plurality of pieces of geometry of the image frame during the Z pre-pass phase of rendering between the plurality of GPUs. That is, each of the plurality of pieces of geometry is assigned to a corresponding GPU for performing the Z pre-pass, and/or each of the GPUs is assigned a set of screen regions for which it is responsible. As such, the plurality of pieces of geometry are rendered in the Z pre-pass phase at the plurality of GPUs to generate the one or more Z-buffers. In particular, each GPU renders corresponding pieces of geometry in the Z pre-pass phase to generate a corresponding Z-buffer.
  • the Z-buffer may include a corresponding z- value (e.g. depth value) measuring the distance from a pixel on a plane of projection to the piece of geometry.
  • Hidden geometry or objects may be removed from the Z-buffer, as is well known in the art.
  • Each GPU may have a dedicated Z-buffer, in one embodiment.
  • a first GPU renders a first piece of geometry in the Z pre-pass phase to generate a first Z-buffer.
  • Other GPUs render corresponding pieces of geometry in the Z pre-pass phase to generate corresponding Z-buffers.
  • each GPU sends its data in its corresponding Z- buffer to each of the plurality of GPUs so that corresponding Z buffers are updated and are approximately similar for use when rendering geometry of the image frame. That is, each GPU is configured to merge received data from all the Z-buffers, such that each corresponding Z- buffer for the GPUs is similarly updated.
  • the method includes generating information regarding the plurality of pieces of geometry of the image frame and their relations to a plurality of screen regions.
  • the information is generated during the pre-pass phase of rendering. For example, information is generated at a first GPU while rendering a piece of geometry, wherein the information may indicate which screen regions that piece of geometry overlaps.
  • the piece of geometry may be a whole object (i.e. the geometry used by or generated by an individual draw call) or portions of an object (e.g. individual primitives, groups of primitives, etc.).
  • the information may include presence of a piece of geometry in corresponding screen regions. The information may include a conservative approximation as to the presence of the piece of geometry in corresponding screen regions.
  • the information may include the pixel area or approximate pixel area (e.g. coverage) that the piece of geometry covers in a screen region.
  • the information may include the number of pixels written to a screen region.
  • the information may include the number of pixels written to the Z buffer per piece of geometry per screen region during the Z pre-pass phase of rendering.
  • the method includes using this information in subsequent assignment of screen regions to the plurality of GPUs.
  • each GPU is assigned to corresponding screen regions based on the information for purposes of rendering the image frame during a subsequent phase of rendering, which may be a geometry pass.
  • assignment of screen regions to GPUs may vary from image frame to image frame, which is to say that it may be dynamic.
  • FIG. 8 is a diagram of a screen 800 illustrating the dynamic assignment of screen regions to GPUs for geometry rendering (i.e. the rendering of pieces of geometry to MRTs) based on an analysis of geometry of a current image frame performed while rendering the current image frame, in accordance with one embodiment of the present disclosure.
  • screen 800 may be subdivided into regions, each approximately equal in size for purposes of illustration. In other embodiments, each of the regions may vary in size and shape.
  • region 810 is representative of an equal subdivision of screen 800.
  • FIG. 5A shows the partitioning of screen 510A into quadrants which are fixedly assigned to GPUs for geometry rendering.
  • FIG. 5B shows the portioning of screen 510B into regions which are assigned in a fixed fashion to GPUs for geometry rendering.
  • FIG. 8 shows the dynamic assignment of screen regions to GPUs for a current image frame including objects 511-517. The assignment is performed per image frame.
  • objects 511-517 may be in different positions, and as such, the assignment of screen regions for the next image frame may be different than the assignment for the current image frame.
  • GPU-A is assigned to the set of screen regions 832, and renders objects 511 and 512.
  • GPU-B is assigned to the set of screen regions 834, and renders objects 513, 515, and 517.
  • GPU-C is assigned to the set of screen regions 836, and renders objects 512, 513, 514, and 517.
  • GPU-D is assigned to the set of screen regions 838, and renders objects 515 and 516.
  • GPU D is responsible for rendering four regions, wherein GPU-A is responsible for rendering 6 regions, though their corresponding pixel and/or rendering work may be approximately equal.
  • objects may have different rendering costs, such that the cost per pixel or primitive or vertex may be higher or lower for different objects.
  • This cost per pixel or primitive or vertex, etc. may be made available to each GPU and used for the generation of the information, or may be included as information. Alternatively, the cost may be used when assigning screen regions.
  • the cross-hatched region 830 contains no geometry and might be assigned to any one of the GPUs. In another embodiment, the cross-hatched region 830 is not assigned to any of the GPUs. In either case, no geometry rendering is performed for region 830. [00105] In another embodiment, all regions associated with an object are assigned to a single GPU. In that manner, all the other GPUs can then skip the object entirely when performing geometry rendering.
  • FIGS. 9A-9C are diagrams providing a more detailed description for the rendering of an image frame showing four objects, wherein the rendering of the image frame includes a Z pre-pass phase and a geometry phase of rendering. As previously described, the Z pre-pass phase is performed for generating information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with embodiments of the present disclosure.
  • FIGS. 9A-9C illustrate the use of multiple GPUs to render each of a sequence of image frames. The selection of four GPUs for the example shown in FIGS. 9A-9C is made purely for illustrating multi-GPU rendering, and it is understood that any number of GPUs may be used for multi-GPU rendering, in various embodiments.
  • FIG. 9A illustrates a screen 900A showing four objects included within an image frame.
  • the image frame includes object 0, object 1, object 2, and object
  • screen 900A is split into regions.
  • screen 900A may be split into more than four regions, each of which is assigned to a corresponding GPU for rendering a current image frame.
  • a single command buffer is used by the multiple GPUs to render the corresponding image frame.
  • the common rendering command buffer may include draw calls and state settings for each object to perform the Z pre-pass phase of rendering.
  • the command buffer may include draw calls and state set for each object to perform the geometry pass phase of rendering.
  • the common rendering command buffer supports the ability for a command to be executed by one GPU but not by another. That is, the format of the common rendering command buffer allows a command to be executed by one or a subset of the plurality of GPUs. For instance, flags on a draw command or predication in the rendering command buffer allow a single GPU to execute one or more commands in the corresponding command buffer without interference from other GPUs, as previously described.
  • FIG. 9B illustrates a Z pre-pass phase of rendering performed to generate one or more Z-buffers and information relating pieces of geometry of a particular image frame and each of the screen regions and/or sub regions of a drawn screen, in accordance with one embodiment of the present disclosure.
  • FIG. 9B illustrates a Z pre-pass phase of rendering performed to generate one or more Z-buffers and information relating pieces of geometry of a particular image frame and each of the screen regions and/or sub regions of a drawn screen, in accordance with one embodiment of the present disclosure.
  • one strategy is shown by which multiple GPUs can collaborate to generate one or more Z-buffers for a frame of rendering.
  • Other strategies may be implemented to generate the one or more Z-buffers.
  • each GPU in the multi-GPU architecture is allocated a portion of the geometry.
  • GPU-A is assigned to object 0
  • GPU-B is assigned to object 1
  • GPU-C is assigned to object 2
  • GPU-D is assigned to object 3.
  • Each GPU renders corresponding objects in the Z pre-pass phase, and renders the corresponding objects to its own copy of the Z buffer.
  • GPU-A renders object 0 to its Z- buffer.
  • Screen 921 shows pixel coverage of object 0 as determined by GPU-A and stored in its corresponding Z-buffer.
  • GPU-B renders object 1 to its Z-buffer, such that screen 922 shows pixel coverage of object 1 as determined by GPU-B and stored in its corresponding Z- buffer.
  • GPU-C renders object 2 to its Z-buffer, such that screen 923 shows pixel coverage of object 2 as determined by GPU-C and stored in its corresponding Z-buffer.
  • GPU-D renders object 3 to its Z-buffer, such that screen 924 shows pixel coverage of object 3 as determined by GPU-D and stored in its corresponding Z-buffer.
  • each GPU has a corresponding copy of the Z-buffer in its own RAM (random access memory).
  • the strategy of building one or more Z-buffers includes having each GPU send its completed Z-buffer to the other GPUs. In that manner, each of the Z-buffers should be similar in size and format.
  • data in each of the Z-buffers are sent to all of the GPUs for purposes of merging and updating each of the Z-buffers, which is shown by screen 925 showing pixel coverage of each of the four objects 1-4 and stored in each of the updated Z- buffers of the GPUs. The objects are blank in FIG.
  • the merge time is reduced. Instead of waiting for each Z- buffer to be fully completed by a corresponding GPU before the data is sent to other GPUs, as each GPU writes corresponding pieces of geometry to its Z buffer, the corresponding GPU sends the Z buffer data for updated screen regions to other GPUs. That is, as a first GPU renders geometry to a corresponding Z-buffer or other render targets, the first GPU sends data from the Z-buffer or the other render target data including updated screen regions to the other GPUs. By not waiting for each Z-buffer for corresponding GPUs to be completely written before being sent, this removes a portion of the time required to merge the Z buffers, thereby reducing the merge time.
  • another strategy for building a Z-buffer includes sharing a common Z-buffer or common render targets between the multiple GPUs.
  • hardware used for performing the Z-buffering may be configured so that there is a common Z- buffer or common render target that is shared, and updated by each of the GPUs. That is, each of the GPUs updates the common Z-buffer while rendering one or more corresponding pieces of geometry in the Z pre-pass phase of rendering.
  • a first GPU renders geometry to a corresponding Z-buffer or other render targets by updating the common Z-buffer or common render targets, each being shared by the plurality of GPUs.
  • the use of a common Z-buffer or common render targets removes the need for a merge step.
  • screen regions are allocated to the GPUs, simplifying the need for arbitration when accessing the common Z-buffer.
  • the scan converter performing as part of the rasterization stage 420 of FIG. 4 generates the information.
  • the scan converter may calculate the area of overlap for a piece of geometry and each of the screen regions.
  • the overlap may be measured in pixels, such as between each primitive in a piece of geometry and each of the screen regions.
  • the scan converter may sum the areas of overlap to create the total area of overlap (e.g., by pixel) per piece of geometry, as measured for each region.
  • the information Prior to commencement of the geometry pass, the information may be used to assign screen regions to GPUs. That is, one or more of the plurality of GPUs may be assigned to screen regions. In one embodiment, the assignments are made such that rendering responsibilities (e.g. rendering geometry) for each GPU are approximately equal. In that manner, information generated in one phase of rendering (the Z pre-pass phase) is used in another phase of rendering, such as to assign screen regions to GPUs for the geometry pass phase of rendering.
  • rendering responsibilities e.g. rendering geometry
  • objects may have rendering costs that differs from other objects. That is, the cost per pixel, or primitive, or vertex for one object may be higher or lower than other objects.
  • the cost per pixel/primitive/vertex is made available to the GPU and used in the generation of the information, and/or included within the information.
  • the cost per pixel/primitive/vertex is used when assigning screen regions to GPUs, such that the information generated takes into account the approximate rendering cost for a corresponding piece of geometry per pixel, or primitive, or vertex. That is, a plurality of costs is determined for rendering a plurality of pieces of geometry of an image frame during the geometry phase of rendering. The costs are considered when assigning the screen regions to the
  • the subsequent assignment of screen regions to the plurality of GPUs takes into account the approximate rendering cost for the piece of geometry per pixel, primitive or vertex, such that the GPUs may be assigned to screen regions in a manner that the cost of rendering is divided as desired (equally or non-equally) between the GPUs.
  • FIG. 9C illustrates a geometry pass phase of rendering performed to render pieces of geometry of a particular image frame, in accordance with one embodiment of the present disclosure.
  • each GPU renders the objects for a particular image frame to the screen regions for which it is responsible (e.g., based on the previous assignments of GPUs to screen regions).
  • each GPU will render all objects, except those for which it is known (based on the information) that there is no overlap between those objects and the screen regions assigned to the GPU for geometry rendering. As such, if there is no overlap for a piece of geometry to screen regions assigned to a particular GPU, that GPU can skip the render for that piece of geometry.
  • each GPU in the multi-GPU architecture is allocated or assigned to a portion of the screen.
  • GPU-A is assigned to the one region labeled 931 A), and renders object 0 (as introduced in FIG. 9A) (now darkened to represent other values, e.g. color data, being written).
  • Screen 931 shows render target data (e.g. pixels) of object 0 after geometry rendering.
  • GPU-B is assigned to the two regions labeled 932A, and renders object 1, and a portion of object 2 (respective portions of those objects darkened).
  • Screen 932 shows render target data (e.g. pixels) of respective portions of objects 1 and 2 after geometry rendering.
  • GPU-C is assigned to the two regions labeled 933A, and renders portions of object 2 (respective portions darkened).
  • Screen 933 shows render target data (e.g. pixels) of respective portions of object 2 after geometry rendering.
  • GPU-D is assigned to the three regions labeled 934A and renders object 3 (now darkened to represent other values, e.g. color data, being written).
  • Screen 934 shows render target data (e.g. pixels) of object 3 after geometry rendering.
  • the render target data generated by each of the GPUs may need to be merged.
  • merging of the geometry data generated during the geometry pass phase of rendering for each GPU is performed, which is shown by screen 935 including render target data (e.g. pixels) of all four objects 0-3.
  • assignment of screen regions to GPUs changes from frame-to- frame. That is, each GPU may be responsible for different screen regions when comparing assignments for two successive image frames. In another embodiment, assignment of screen regions to GPUs may also vary throughout the various phases used in rendering a single frame.
  • assignments of screen regions may dynamically change during a rendering phase, such as geometry analysis phase (e.g., Z pre-pass) or geometry pass phase.
  • a rendering phase such as geometry analysis phase (e.g., Z pre-pass) or geometry pass phase.
  • the assignment may therefore differ from an existing assignment. That is, GPU- A may now be responsible for a screen region that formerly GPU-B was responsible for. This may necessitate a transfer of Z- buffer or other render target data from the memory of GPU-B to the memory of GPU A.
  • the information may include the first object in the command buffer that will write to a screen region. That information can be used to schedule a DMA (direct memory access) transfer, such as to transfer Z-buffer data or other render target data for a screen region from one GPU to another GPU.
  • DMA direct memory access
  • data from memory of GPU-B may be transferred to the memory of GPU-A.
  • the information may include the last object in the command buffer that will write to a screen region. That information may be used to schedule DMA transfer from the rendering GPU (performing during the Z pre-pass phase of rendering) to other GPUs. That is, the information is used to schedule transfer of the Z-buffer or other render target data for a screen region from one GPU to another GPU - e.g. a rendering GPU.
  • the updated data may be broadcast to the GPUs. In that case, the updated data is available if any of the GPUs need that data. In another embodiment, the data is sent to a specific GPU, such as in anticipation of the receiving GPU being responsible for the screen region in a subsequent phase of rendering.
  • FIG. 10 illustrates the rendering of an image frame using the dynamic assignment of screen regions based on whole objects or portions of objects to GPUs for geometry rendering, wherein the assignment is based on an analysis of geometry of a current image frame performed during a Z pre-pass phase of rendering performed while rendering the image frame, in accordance with one embodiment of the present disclosure.
  • rendering timing diagram 1000A shows the rendering of the image frame based on whole objects (i.e. the geometry used or generated by an individual draw call).
  • rendering timing diagram 1000A shows the rendering of the image frame based on whole objects (i.e. the geometry used or generated by an individual draw call).
  • rendering timing diagram 1000A shows the rendering of the image frame based on whole objects (i.e. the geometry used or generated by an individual draw call).
  • rendering timing diagram 1000A shows the rendering of the image frame based on whole objects (i.e. the geometry used or generated by an individual draw call).
  • rendering timing diagram 1000A shows the rendering of the image frame based on whole objects (i.e. the geometry used or generated by an individual draw
  • 1000B shows the rendering of the image frame based on portions of the objects.
  • the advantages shown when rendering the image frame based on portions of objects includes a more balancing of rendering performance between the GPUs, and therefore a shorter time for rendering the image frame.
  • rendering timing diagram 1000A illustrates the rendering of each of four objects 0-3 by the four GPUs (e.g. GPU-A, GPU-B, GPU-C, and GPU-D), wherein rendering responsibilities are distributed between the GPUs at an object granularity.
  • the objects 0-3 were previously introduced in FIGS. 9A-9C.
  • Various phases of rendering are shown in relation to a timeline 1090.
  • Vertical line 1001A indicates the start of rendering of the Z pre pass.
  • Rendering timing diagram 1000A includes a Z pre-pass phase of rendering 1010A, and also illustrates phase 1020A showing the merging of Z-buffer data between the GPUs.
  • GPU idle time is shown using hashed out areas, wherein the merging phase 1020A may occur during this idle time.
  • a sync point 1030A is provided so that each of the GPUs begin respective geometry pass rendering phases at the same time.
  • rendering timing diagram 1000A includes a geometry pass phase 1040A of rendering for rendering geometry of the image frame, as previously described.
  • a sync point 1050A is provided so that each of the GPUs begin rendering the next image frame at the same time.
  • Sync point 1050A may also indicate the end of the rendering for the corresponding image frame.
  • the total time for rendering the image frame when rendering whole objects is shown by time period 1070. Processing the information to determine screen region responsibilities for each GPU is not shown in the diagram, but may be presumed to conclude prior to the commencement of the geometry pass 1030A.
  • the hashed areas of rendering timing diagram 1000A during the geometry pass phase 1040A show GPU idle time.
  • GPU-A is idle for almost the same time that GPU-A spends on rendering.
  • GPU-B spends very little time being idle
  • GPU-C spends no time being idle.
  • rendering timing diagram 1000B illustrates the rendering of each of four objects 0-3 by the four GPUs (e.g. GPU-A, GPU-B, GPU-C, and GPU-D), wherein rendering responsibilities are distributed between the GPUs at a granularity of portions of objects rather than whole objects, such as the pieces of geometry shown in FIG. 6B.
  • GPUs e.g. GPU-A, GPU-B, GPU-C, and GPU-D
  • rendering responsibilities are distributed between the GPUs at a granularity of portions of objects rather than whole objects, such as the pieces of geometry shown in FIG. 6B.
  • Rendering timing diagram 1000B includes a Z pre-pass phase of rendering 1010B, and also illustrates the hashed out time period 1020B during which merging of Z-buffer data between the GPUs is performed.
  • the GPU idle time 1020B in rendering timing diagram 1000B is less than the idle time 1020A in rendering timing diagram 1000A.
  • each of the GPUs spends approximately the same amount of time processing the Z pre-pass phase, with little or no idle time.
  • a sync point 1030B is provided so that each of the GPUs begin respective geometry pass rendering phases at the same time.
  • rendering timing diagram 1000B includes a geometry pass phase 1040B of rendering for rendering geometry of the image frame, as previously described.
  • a sync point 1050B is provided so that each of the GPUs begin rendering the next image frame at the same time.
  • Sync point 1050B may also indicate the end of the rendering for the corresponding image frame.
  • each of the GPUs spend approximately the same amount of time processing the geometry pass phase, with little or no idle time. That is, the Z pre-pass rendering and geometry rendering is each roughly balanced between the GPUs.
  • the total time for rendering the image frame when rendering by portions of whole objects is shown by time period 1075. Processing the information to determine screen region responsibilities for each GPU is not shown in the diagram, but may be presumed to conclude prior to the commencement of the geometry pass 1030B.
  • rendering timing diagram 1000B shows reduced rendering time when rendering responsibilities are distributed between the GPUs at a granularity of portions of objects rather than whole objects. For instance, a time savings 1077 is shown when rendering the image frame at a granularity of portions of objects.
  • the information allows relaxation of rendering phase requirements and/or dependencies, which results in a GPU proceeding to a subsequent phase of rendering while another GPU is still processing a current phase of rendering, in accordance with one embodiment of the present disclosure.
  • the Z pre-pass phase 1020A or 1020B must complete for all GPUs before any GPUs begin the geometry phase 1040A or 1040B may be relaxed.
  • rendering timing diagram 1000A includes a sync point 1020A of all GPUs prior to beginning the geometry phase 1040A.
  • the information may indicate (for example) that GPU A can begin rendering its assigned region before the other GPUs have completed their corresponding Z pre-pass phase of rendering. This may lead to an overall reduction in rendering time for the image frame.
  • FIG. 11 is a diagram illustrating the interleaving of GPU assignments to pieces of geometry of an image frame for purposes of performing a Z pre-pass phase of rendering to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • FIG. 11 shows the distribution of rendering responsibilities between multiple GPUs for the Z pre-pass.
  • each GPU is assigned to a corresponding portion of the geometry of an image frame, wherein that portion may be further partitioned into objects, portions of objects, geometry, pieces of geometry, etc.
  • object 0, 1 and 2 represent the geometry used by or generated by an individual draw call.
  • the GPU divides each object into smaller pieces of geometry, such as pieces roughly the size at which the position and/or parameter caches are allocated, as previously described.
  • object 0 is split into pieces “a”, “b”, “c”, “d”, “e” and “f”, such as object 610 in FIG. 6B.
  • object 1 is split into pieces “g”, “h”, and “i”.
  • object 2 is split into pieces “j”, “k”, “1”, “m”, “n”, and “o”.
  • the pieces may be ordered (e.g., a-o) for purposes of distributing responsibility for performing the Z pre-pass phase of rendering.
  • Distribution 1110 (e.g. the ABCDABCDABCD... row) shows an even distribution of the responsibility for performing geometry testing between a plurality of GPUs.
  • Distribution 1110 shows an even distribution of the responsibility for performing geometry testing between a plurality of GPUs.
  • one GPU take the first quarter of the geometry (e.g. in a block, such as GPU
  • A takes the first four pieces of the approximately sixteen total pieces including “a”, “b”, “c” and
  • information generated while rendering one frame can be used to assign GPUs to screen regions in a subsequent frame (e.g. current image frame).
  • a subsequent frame e.g. current image frame
  • hardware could be configured to generate information during the geometry pass phase of rendering of the previous image frame, such as GPU usage during the geometry pass phase of rendering for the previous image frame.
  • the information may include actual number of pixels that is shaded per piece of geometry per screen region. This information may be used in the subsequent frame (e.g. rendering the current image frame) when allocating GPUs to screen regions for the geometry pass of rendering.
  • the assignment of screen regions to GPUs for performing the geometry pass phase of rendering for the current image frame considers both the information generated from the previous image frame, and the information generated during the Z pre-pass phase for the current image frame (if any), as previously described.
  • the screen regions are assigned to the GPUs based on the information from the previous image frame (e.g., GPU usage) and the information generated during the Z pre-pass phase of rendering of the current image frame (if any).
  • This information from the prior frame may add more accuracy than just using the area of overlap (e.g. when generating information for the current image frame) previously discussed, or just using the number of pixels written to the Z buffer per piece of geometry per screen region during the Z pre-pass.
  • the number of pixels written to the Z buffer for an object may not correspond to the number of pixels that need to be shaded in the geometry pass due to occlusion of the object by other objects.
  • the use of both the information from the previous image frame (e.g., GPU usage) and the information generated during the Z pre-pass phase of rendering of the current image frame may result in more efficient rendering during the geometry pass phase of rendering for the current image frame.
  • the information may also include a vertex count for each screen region, which gives the number of vertices used by a corresponding portion of geometry (e.g. piece of geometry) that overlaps a corresponding screen region.
  • the rendering GPU may use the vertex count to allocate space in the position cache and parameter cache. For example, vertices that are not needed do not have any allocated space, which may increase the efficiency of rendering, in one embodiment.
  • processing overhead (either software or hardware) associated with generating the information during Z pre-pass phase of rendering. In that case, it may be beneficial to skip generating information for certain pieces of geometry.
  • information may be generated for certain objects but not for others.
  • information may not be generated for a piece of geometry (e.g., an object or portions of the object) that has large primitives and will probably overlap a great number of screen regions.
  • An object having large primitives may be a skybox or a large piece of terrain include triangles that are large, for example.
  • each GPU used for multi-GPU rendering of an image frame will need to render those pieces of geometry, and any information indicating such is unnecessary.
  • the information may be generated or not generated depending on the properties of the corresponding piece of geometry.
  • flow diagram 1200A of FIG. 12A illustrates a method for graphics processing including multi-GPU rendering of geometry for an application by performing geometry analysis prior to rendering, in accordance with one embodiment of the present disclosure. That is, instead of generating information while rendering as described in relation to FIGS. 7, 9 and 10, the information is generated prior to rendering, such as during a pre-pass (i.e. a pass that does not write to a Z-buffer or MRTs). It is understood that one or more of the various features and advantages of various embodiments described in relation to generating information during rendering (e.g.
  • a Z pre-pass phase of rendering are equally applicable to generating information before rendering (e.g., pre-pass performing geometry analysis), and may not be repeated here in an effort minimize duplication in the description.
  • various architectures may include multiple GPUs collaborating to render a single image by performing multi-GPU rendering of geometry for an application through region testing while rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system, such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • GPU rendering responsibility is dynamically assigned between a plurality of screen regions for each image frame, such that each GPU renders objects in its assigned screen regions.
  • Analysis is performed before geometry rendering (e.g. in a primitive shader or compute shader) to determine the spatial distribution of geometry in an image frame, and then to dynamically adjust GPU responsibility for screen regions to render objects in that image frame.
  • the method includes rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • GPUs graphics processing units
  • a number of GPUs collaborate to generate an image frame.
  • multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames of a sequence of image frames for a real-time application.
  • Responsibility for rendering is divided between a plurality of the GPUs based on screen region for each image frame, as will be further described below.
  • the method includes dividing responsibility for processing a plurality of pieces of geometry of an image frame during an analysis pre-pass between the plurality of GPUs, wherein each of the plurality of pieces of geometry is assigned to a corresponding GPU.
  • the analysis pre-pass is performed before a phase of rendering for the image frame.
  • objects are distributed between the multiple GPUs. For example, in a multi-GPU architecture having four GPUs, each GPU processes during the analysis pre-pass approximately a quarter of the objects. As previously described, there may be benefit to subdividing objects into smaller pieces of geometry, in one embodiment. In addition, in other embodiments the objects are dynamically assigned to GPUs per image frame.
  • Processing efficiency may be realized when dynamically assigning pieces of geometry to the GPUs for the analysis pre-pass.
  • the analysis pre-pass is performed before a rendering phase, the processing is typically not performed in hardware. That is, the analysis pre-pass may be performed in software, such as by using a shader in various embodiments. For example, a primitive shader may be used during the analysis pre-pass, such that there is no corresponding pixel shader. In addition, a Z-buffer and/or other render targets are not written to during the analysis pre-pass. In other embodiments, a compute shader is used.
  • the method includes determining in the analysis pre-pass phase overlap of each the plurality of pieces of geometry with each of the plurality of screen regions.
  • the piece of geometry may be an object, or portions of an object (e.g., individual primitives, groups of primitives, etc.).
  • the information generated includes an accurate representation of the overlap of each of the plurality of pieces of geometry with each of the plurality of screen regions.
  • the information includes an approximation of the overlap of each of the plurality of pieces of geometry with each of the plurality of screen regions.
  • the method includes generating information regarding the plurality of pieces of geometry and their relations to a plurality of screen regions based on the overlap of each the plurality of pieces of geometry with each of the plurality of screen regions.
  • the information may simply be that there is an overlap.
  • the information may include the pixel area or approximate pixel area that the piece of geometry overlaps or covers in a screen region.
  • the information may include the number of pixels written to a screen region.
  • the information may include the number of vertices or primitives overlapping the screen region, or an approximation thereof.
  • the method includes dynamically assigning the plurality of screen regions to the plurality of GPUs based on the information for purposes of rendering the plurality of pieces of geometry during a geometry pass phase of rendering. That is, the information may be used in the subsequent assignment of screen regions to the plurality of GPUs. For example, each GPU is assigned to corresponding screen regions based on the information. In that manner, each GPU has a corresponding division of the responsibility (e.g., corresponding screen regions) for rendering the image frame. As such, assignment of screen regions to GPUs may vary from image frame to image frame.
  • the method includes rendering during the geometry pass phase the plurality of pieces of geometry at each of the plurality of GPUs based on GPU to screen region assignments determined from the assigning the plurality of screen regions to the plurality of
  • FIG. 12B is a rendering timing diagram 1200B illustrating an analysis pre-pass performed before rendering an image frame (e.g. during geometry pass phase of rendering), in accordance with one embodiment of the present disclosure.
  • the analysis pre-pass is dedicated to the analysis of the relationship between pieces of geometry and screen regions.
  • the analysis pre-pass generates information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame.
  • rendering timing diagram 1200B illustrates the use of multiple GPUs to collaboratively render an image frame.
  • Responsibility for rendering is divided between a plurality of the GPUs based on screen region.
  • GPUs prior to rendering geometry of the image frame, GPUs generate information regarding the geometry and its relation to screen regions.
  • This information is used to assign GPUs to screen regions, allowing for more efficient rendering. For example, before rendering, a first GPU generates information about a piece of geometry and its relationship to screen regions, wherein this information is used in assigning screen regions to one or more “rendering GPUs” that render that piece of geometry.
  • rendering timing diagram 1200B illustrates the rendering of one or more objects by four GPUs (e.g. GPU-A, GPU-B, GPU-C, and GPU-D) with reference to timeline 1290.
  • the use of four GPUs is merely for purposes of illustration, such that a multi-GPU architecture may include one or more GPUs.
  • Vertical line 1201 indicates the start of a set of rendering phases for the image frame.
  • Vertical line 1201 also indicates the start of the analysis pre-pass 1210. In the analysis pre-pass, objects are distributed between the multiple GPUs. With four GPUs, with each GPU processing approximately a quarter of the objects.
  • a sync point 1230a is provided so that each of the GPUs begin respective geometry pass rendering phase 1220 at the same time.
  • sync operation 1230a ensures simultaneous start of the geometry pass by all GPUs.
  • the sync operation 1230a is not used, as previously described, such that the geometry pass phase of rendering may begin for any GPU that finishes the analysis pre-pass, and without waiting for all the other GPUs to finish their corresponding analysis pre-passes.
  • Sync point 1230b indicates the end of the geometry pass phase of rendering for the current image frame, and is also provided so that each of the GPUs can continue with subsequent phases of rendering for the current frame at the same time, or begin rendering the next image frame at the same time.
  • a single command buffer is used by the multiple GPUs to render the corresponding image frame.
  • the rendering command buffer may include commands to set state and commands to execute primitive shaders or computer shaders, in order to perform an analysis pre-pass.
  • a sync operation may be included within the command buffer to synchronize the start of various operations by the GPUs. For example, a sync operation may be used to synchronize the start of the geometry pass phase of rendering by the GPUs.
  • the command buffer may include draw calls and state settings for each object to perform the geometry pass phase of rendering.
  • the generation of the information is accelerated though use of a dedicated instruction or instructions. That is, the shaders that generate the information use one or more dedicated instructions to accelerate the generation of the information regarding the piece of geometry and its relation to screen regions.
  • the instruction may calculate accurate overlap between a primitive of a piece of geometry and each of the screen regions.
  • FIG. 13A is a diagram 1310 illustrating the calculation of an accurate overlapping between a primitive 1350 and one or more screen regions when performing an analysis pre-pass to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • the primitive 1350 is shown overlapping three different regions, wherein overlap of respective portions of primitive 1350 is accurately determined for each of the regions.
  • this instruction might perform an approximation of the area of overlap, wherein the information includes an approximate area that a primitive overlaps a screen region or regions.
  • the instruction may calculate an approximate overlap between a primitive of a piece of geometry and one or more of the screen regions.
  • FIG. 13B is a pair of diagrams illustrating the calculation of approximate overlapping between a piece of geometry and a number of screen region when performing an analysis pre-pass to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in accordance with one embodiment of the present disclosure.
  • the instruction may use a bounding box of a primitive. As such, the overlap of the bounding box of the primitive 1350 and one or more screen regions is determined.
  • Boundary 1320A indicates the approximate overlap of the piece of geometry 1350 as determined through analysis of bounding boxes.
  • the instruction checks screen regions against the primitive, such that screen regions with no overlapping of pieces of geometry are excluded, and a bounding box is generated for the portion of the primitive that overlaps each screen region.
  • Boundary 1320B indicates the approximate overlap of the primitive 1350 as determined through analysis of bounding boxes and overlap filtering. Note that boundary box 1320B of the right hand diagram of FIG. 13B is smaller than boundary box 1320A of the left hand diagram of FIG. 13B.
  • the instruction might generate presence information, such as whether a piece of geometry is present in screen regions.
  • the presence information may indicate whether a primitive of a piece of geometry overlaps the screen region.
  • the information may include an approximate presence of the piece of geometry in corresponding screen regions.
  • the shader does not allocate space in the position or parameter caches. That is, the shader does not perform allocations of the positions or parameter caches, thereby allowing for a higher degree of parallelism when performing the analysis pre pass. This also leads to a corresponding reduction in time required for the analysis pre-pass.
  • a single shader is used to perform either the analysis performed in the analysis pre-pass, or the rendering in the geometry pass.
  • the shader that generates information may be configurable to output information regarding the piece of geometry and its relation to screen regions, or to output vertex position and parameter information by use by later rendering stages. This may be accomplished in a variety of ways, such as via external hardware state that the shader could check (e.g.
  • the information is used to assign regions to GPUs.
  • Information generated during the rendering of a previous frame e.g. actual pixel count shaded while rendering pieces of geometry
  • the information from the prior frame may include actual number of pixels that are shaded per piece of geometry per screen region, for example. That is, the screen regions are assigned to GPUs based on the information generated from a previous image frame (e.g. GPU usage) and the information generated during the analysis pre-pass.
  • line 1110 of FIG. 14B illustrates a method for graphics processing including multi-GPU rendering of an application by subdividing geometry.
  • Object 0, 1, and 2 represent the geometry used by or generated by an individual draw call.
  • the GPUs divide each object into smaller pieces of geometry, such as pieces roughly the size at which the position and/or parameter caches are allocated.
  • object 0 is split into pieces “a”, “b”, “c”, “d”, “e” and “f”, such as object 610 in FIG. 6B.
  • object 1 is split into pieces “g”, “h”, and “i”.
  • object 2 is split into pieces “j”, “k”, “1”, “m”, “n”, and “o”.
  • Distribution 1110 e.g. the ABCDABCDABCD... row
  • FIG. 14A and line 1410 of FIG. 14B illustrate a method for graphics processing including multi-GPU rendering of geometry for an application by performing a timing analysis during a rendering phase for purposes of redistributing the assignment of GPU responsibilities during the rendering phase. It is understood that one or more of the various features and advantages of various embodiments described in relation to generating information before and during rendering and the geometry pass phases of rendering of FIGS. 7-13 are equally applicable for use when subdividing geometry and/or performing a timing analysis, and may not be repeated here in an effort minimize duplication in the description.
  • various architectures may include multiple GPUs collaborating to render a single image by performing multi-GPU rendering of geometry for an application through region testing while rendering, such as within one or more cloud gaming servers of a cloud gaming system, or within a stand-alone system, such as a personal computer or gaming console that includes a high-end graphics card having multiple GPUs, etc.
  • GPU rendering responsibility is fixedly or dynamically assigned between a plurality of screen regions for each image frame, such that each GPU renders objects in its assigned screen regions, as previously described in relation to FIGS. 7-13.
  • each GPU renders to its own Z-buffers or other render targets. Timing analysis is performed during one or more of the phases of rendering (e.g., geometry pre-pass analysis, Z pre-pass, or geometry rendering) for purposes of redistributing the assignment of GPU responsibilities during those phases.
  • a timing analysis is performed during a rendering phase for purposes of redistributing the assignment of GPU responsibilities during the rendering phase, such as when performing a Z pre-pass phase for pieces of geometry to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame, in one implementation.
  • a screen region initially assigned to one GPU may be reassigned to another GPU during a phase of rendering (e.g., one GPU may be lagging behind other GPUs during that phase).
  • the method includes rendering graphics for an application using a plurality of graphics processing units (GPUs).
  • GPUs graphics processing units
  • multi-GPU processing is performed when rendering a single image frame and/or each of one or more image frames of a sequence of image frames for a real-time application. That is, the plurality of GPUs act in collaboration to render a corresponding image frame including a plurality of pieces of geometry.
  • the method includes dividing responsibility for the rendering of geometry of the graphics between the plurality of GPUs based on a plurality of screen regions. That is, each GPU has a corresponding division of the responsibility (e.g., corresponding set of screen regions).
  • the method includes during a phase of rendering or analysis for an image frame, determining a first GPU is behind at least one other GPU, such as a second GPU.
  • the method includes dynamically assigning geometry in such a way that the first GPU is assigned less than the second GPU.
  • the dynamic assignment of geometry may be performed during the generation of a Z-buffer, for purposes of illustration.
  • Dynamic assignment of geometry may be performed during analysis pre-pass and/or geometry pass phase of rendering.
  • one or more Z-buffers are generated by multiple GPUs and/or merged in collaboration for an image frame during a Z pre-pass phase of rendering.
  • pieces of geometry are divided between the GPUs for processing the Z pre-pass phase of rendering, wherein each of the plurality of pieces of geometry is assigned to a corresponding GPU.
  • the hardware may be configured to perform an analysis pre-pass to generate information that is used to optimize the rendering speed of subsequent geometry pass, for example.
  • objects may be subdivided into smaller pieces, as previously described in FIG. 6B.
  • responsibility for rendering of pieces of geometry in the Z pre-pass phase of rendering is distributed between GPUs in an interleaved fashion as previously described in relation to distribution 1110 of FIG. 14B which shows various distributions of GPU assignments for performing a Z pre-pass phase of rendering to generate information used for the dynamic assignment of screen regions to GPUs for geometry rendering of the image frame.
  • Distribution 1110 shows the distribution of rendering responsibilities between multiple GPUs for the Z pre pass.
  • each GPU is assigned to a corresponding portion of the geometry of an image frame, wherein that portion may be further partitioned into pieces of geometry. Because successive pieces of geometry are assigned to different GPUs, as shown in distribution 1110, the result is that the rendering time during the Z pre-pass is roughly balanced.
  • distribution 1410 [ABCDABCDBCDBBCD row] shows an asymmetric distribution of the responsibility for performing the Z pre-pass phase between a plurality of GPUs.
  • the asymmetric distribution may be advantageous when certain GPUs have been assigned pieces of geometry that are larger than those assigned to other GPUs, and therefore are behind in the Z pre-pass relative to the other GPUs.
  • GPU A is taking more time to render the pieces of geometry during the Z pre-pass phase, so it is skipped when assigning pieces of geometry to GPUs.
  • GPU-B is assigned to render the piece of geometry during the Z pre-pass phase.
  • GPU-B is assigned more pieces of geometry than GPU-A during the Z pre-pass phase of rendering.
  • the piece of geometry is unassigned from the first GPU and then assigned to the second GPU during the Z pre-pass phase of rendering.
  • GPU B is ahead of the other GPUs, so it is able to process more geometry during the Z pre-pass phase. That is, distribution 1410 shows the repeated assignment of GPU-B to successive pieces of geometry for Z pre-pass rendering.
  • GPU-B is assigned to process pieces of geometry “1” and “m” for object 2 during the Z pre-pass phase.
  • GPU A is taking more time to render the pieces of geometry during the Z pre-pass phase, so it is re-assigned.
  • GPU-B is assigned to render the piece of geometry during the Z pre-pass phase, wherein GPU-A may have originally been assigned for rendering that piece of geometry.
  • GPU B is ahead of the other GPUs, so it is able to process more geometry during the Z pre-pass phase.
  • distribution 1410 shows the repeated assignment or re-assignment of GPU-B to successive pieces of geometry for Z pre-pass rendering.
  • GPU-B is assigned to process pieces of geometry “1” and “m” for object 2 during the Z pre-pass phase. That is, GPU-B is assigned to render piece of geometry “1” for object 2, even though that piece of geometry may have been initially assigned to GPU-A.
  • the piece of geometry originally assigned to a first GPU is re-assigned to a second GPU (which may be ahead in rendering) during the Z pre-pass phase of rendering.
  • the dynamic assignment of geometry may be performed during the geometry pass phase of rendering of an image frame.
  • screen regions are assigned to GPUs during the geometry pass phase of rendering based on information generated during a Z pre-pass or analysis pre-pass.
  • a screen region assigned to one GPU may be reassigned to another GPU during the rendering phase. This may increase efficiency, as GPUs that are ahead of others may be allocated additional screen regions, and those GPUs that are behind others may avoid being allocated additional screen regions.
  • a plurality of GPUs in collaboration generates a Z-buffer for an image frame during a Z pre-pass phase of rendering. Information is generated regarding pieces of geometry of the image frame and their relations to a plurality of screen regions during this Z pre-pass.
  • Screen regions are assigned to the GPUs based on the information for purposes of rendering the image frame during a geometry pass phase of rendering.
  • the GPUs render the pieces of geometry during the geometry pass phase of rendering based on GPU to screen region assignments.
  • a timing analysis is performed during the geometry pass phase of rendering, which may result in reassigning a first piece of geometry initially assigned to a first GPU for rendering during the geometry pass phase to the second GPU.
  • the first GPU may be behind in processing the geometry pass phase of rendering, in one embodiment.
  • the second GPU may be ahead in processing the geometry pass phase of rendering.
  • FIGS. 15A-15B show various screen region allocation strategies, which may be applied to rendering of image frames described previously in relation to FIGS. 7-14.
  • FIG. 15A is a diagram illustrating the use of multiple GPUs to render pieces of geometry (e.g., geometry related to objects 0-3) in a particular screen region, in accordance with one embodiment of the present disclosure. That is, screen region 1510 may be assigned to multiple GPUs for rendering. For example, this may increase efficiency, such as when there is very dense geometry late within the rendering phase. Assigning the screen region 1510 to multiple GPUs typically requires subdivision of the screen regions, so that each GPU may be responsible for a portion or portions of the screen region.
  • geometry e.g., geometry related to objects 0-3
  • FIG. 15A is a diagram illustrating the use of multiple GPUs to render pieces of geometry (e.g., geometry related to objects 0-3) in a particular screen region, in accordance with one embodiment of the present disclosure. That is, screen region 1510 may be assigned to multiple GPUs for rendering. For example, this may increase efficiency, such as when there is very dense geometry late within the rendering phase. Assigning the screen region 1510 to multiple GPUs typically requires subdivision of
  • FIG. 15B is a diagram illustrating the rendering of pieces of geometry out of order of their corresponding draw calls, in accordance with one embodiment of the present disclosure.
  • the rendering order of the pieces of geometry may not match the order of their corresponding draw calls in a corresponding command buffer.
  • object 0 precedes object 1 in the rendering command buffer.
  • object 0 and 1 intersect, such as within screen region C. In that case, strict ordering of rendering may need to be observed for region C. That is, object 0 must be rendered before object 1 in region C.
  • objects in region A and region B may be rendered in any order because there is no intersecting. That is object 1 may precede object 0, or vice versa, when rendering region A and/or region B.
  • the rendering command buffer can be traversed multiple times, it is possible to render certain screen regions on a first traversal (e.g. high cost regions) and render remaining regions (e.g. low cost regions) on second or subsequent traversals.
  • the resulting rendering order of pieces of geometry may not match the order of their corresponding draw calls, such as when the first object is rendered on the second traversal).
  • This strategy increases efficiency when rendering a corresponding image frame, as load balancing between GPUs is easier for low cost regions than it is for high cost regions.
  • FIG. 16 illustrates components of an example device 1600 that can be used to perform aspects of the various embodiments of the present disclosure.
  • FIG. 16 illustrates an exemplary hardware system suitable for multi-GPU rendering of geometry for an application by performing geometry analysis while rendering to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or by performing geometry analysis prior to rendering to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or by subdividing pieces of geometry and assigning the resulting smaller portions of geometry to multiple GPUs, in accordance with embodiments of the present disclosure.
  • This block diagram illustrates a device 1600 that can incorporate or can be a personal computer, a server computer, gaming console, mobile device, or other digital device, each of which is suitable for practicing an embodiment of the invention.
  • Device 1600 includes a central processing unit (CPU) 1602 for running software applications and optionally an operating system.
  • CPU 1602 may be comprised of one or more homogeneous or heterogeneous processing cores.
  • CPU 1602 is one or more general-purpose microprocessors having one or more processing cores. Further embodiments can be implemented using one or more CPUs with microprocessor architectures specifically adapted for highly parallel and computationally intensive applications, such as media and interactive entertainment applications, of applications configured for graphics processing during execution of a game.
  • Memory 1604 stores applications and data for use by the CPU 1602 and GPU 1616.
  • Storage 1606 provides non-volatile storage and other computer readable media for applications and data and may include fixed disk drives, removable disk drives, flash memory devices, and CD-ROM, DVD-ROM, Blu-ray, HD-DVD, or other optical storage devices, as well as signal transmission and storage media.
  • User input devices 1608 communicate user inputs from one or more users to device 1600, examples of which may include keyboards, mice, joysticks, touch pads, touch screens, still or video recorders/cameras, and/or microphones.
  • Network interface 1609 allows device 1600 to communicate with other computer systems via an electronic communications network, and may include wired or wireless communication over local area networks and wide area networks such as the internet.
  • An audio processor 1612 is adapted to generate analog or digital audio output from instructions and/or data provided by the CPU 1602, memory 1604, and/or storage 1606.
  • the components of device 1600, including CPU 1602, graphics subsystem including GPU 1616, memory 1604, data storage 1606, user input devices 1608, network interface 1609, and audio processor 1612 are connected via one or more data buses 1622.
  • a graphics subsystem 1614 is further connected with data bus 1622 and the components of the device 1600.
  • the graphics subsystem 1614 includes at least one graphics processing unit (GPU) 1616 and graphics memory 1618.
  • Graphics memory 1618 includes a display memory (e.g. a frame buffer) used for storing pixel data for each pixel of an output image.
  • Graphics memory 1618 can be integrated in the same device as GPU 1616, connected as a separate device with GPU 1616, and/or implemented within memory 1604. Pixel data can be provided to graphics memory 1618 directly from the CPU 1602.
  • CPU 1602 provides the GPU 1616 with data and/or instructions defining the desired output images, from which the GPU 1616 generates the pixel data of one or more output images.
  • the data and/or instructions defining the desired output images can be stored in memory 1604 and/or graphics memory 1618.
  • the GPU 1616 includes 3D rendering capabilities for generating pixel data for output images from instructions and data defining the geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene.
  • the GPU 1616 can further include one or more programmable execution units capable of executing shader programs.
  • the graphics subsystem 1614 periodically outputs pixel data for an image from graphics memory 1618 to be displayed on display device 1610, or to be projected by a projection system (not shown).
  • Display device 1610 can be any device capable of displaying visual information in response to a signal from the device 1600, including CRT, LCD, plasma, and OLED displays.
  • Device 1600 can provide the display device 1610 with an analog or digital signal, for example.
  • FIG. 1614 Other embodiments for optimizing the graphics subsystem 1614 could include multi- GPU rendering of geometry for an application by pretesting the geometry against interleaved screen regions before rendering objects for an image frame.
  • the graphics subsystem 1614 could be configured as one or more processing devices.
  • the graphics subsystem 1614 may be configured to perform multi-GPU rendering of geometry for an application by region testing while rendering, wherein multiple graphics subsystems could be implementing graphics and/or rendering pipelines for a single application, in one embodiment. That is, the graphics subsystem 1614 includes multiple GPUs used for rendering an image or each of one or more images of a sequence of images when executing an application.
  • the graphics subsystem 1614 includes multiple GPU devices, which are combined to perform graphics processing for a single application that is executing on a corresponding CPU.
  • the multiple GPUs can perform multi-GPU rendering of geometry for an application by region testing while rendering of objects for an image.
  • the multiple GPUs can perform alternate forms of frame rendering, wherein GPU 1 renders a first frame, and GPU 2 renders a second frame, in sequential frame periods, and so on until reaching the last GPU whereupon the initial GPU renders the next video frame (e.g. if there are only two GPUs, then GPU 1 renders the third frame). That is the GPUs rotate when rendering frames.
  • the rendering operations can overlap, wherein GPU 2 may begin rendering the second frame before GPU 1 finishes rendering the first frame.
  • the multiple GPU devices can be assigned different shader operations in the rendering and/or graphics pipeline.
  • a master GPU is performing main rendering and compositing.
  • master GPU 1 could perform the main rendering (e.g. a first shader operation) and compositing of outputs from slave GPU 2 and slave GPU 3, wherein slave GPU 2 could perform a second shader (e.g. fluid effects, such as a river) operation, the slave GPU 3 could perform a third shader (e.g. particle smoke) operation, wherein master GPU 1 composites the results from each of GPU 1, GPU 2, and GPU 3.
  • each of the three GPUs could be assigned to different objects and/or parts of a scene corresponding to a video frame. In the above embodiments and implementations, these operations could be performed in the same frame period (simultaneously in parallel), or in different frame periods (sequentially in parallel).
  • the present disclosure describes methods and systems configured for multi-GPU rendering of geometry for an application by performing geometry analysis while rendering to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or by performing geometry analysis prior to rendering to dynamically assign screen regions to GPUs for geometry rendering of the image frame, and/or by subdividing pieces of geometry and assigning the resulting smaller portions of geometry to multiple.
  • Embodiments of the present disclosure may be practiced with various computer system configurations including hand-held devices, microprocessor systems, microprocessor- based or programmable consumer electronics, minicomputers, mainframe computers and the like. Embodiments of the present disclosure can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a wire-based or wireless network.
  • embodiments of the present disclosure can employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Any of the operations described herein that form part of embodiments of the present disclosure are useful machine operations. Embodiments of the disclosure also relate to a device or an apparatus for performing these operations.
  • the apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated or configured by a computer program stored in the computer.
  • various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • the disclosure can also be embodied as computer readable code on a computer readable medium.
  • the computer readable medium is any data storage device that can store data, which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random- access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes and other optical and non-optical data storage devices.
  • the computer readable medium can include computer readable tangible medium distributed over a network-coupled computer system so that the computer readable code is stored and executed in a distributed fashion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)
PCT/US2021/016022 2020-02-03 2021-02-01 System and method for efficient multi-gpu rendering of geometry by geometry analysis while rendering WO2021158468A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP21708462.3A EP4100923A1 (en) 2020-02-03 2021-02-01 System and method for efficient multi-gpu rendering of geometry by geometry analysis while rendering
CN202180020414.9A CN115335866A (zh) 2020-02-03 2021-02-01 在渲染时通过几何图形分析进行几何图形的高效多gpu渲染的系统和方法
JP2022546704A JP7254252B2 (ja) 2020-02-03 2021-02-01 レンダリング中のジオメトリ解析によるジオメトリの効率的なマルチgpuレンダリングのためのシステム及び方法
JP2023052155A JP7355960B2 (ja) 2020-02-03 2023-03-28 レンダリング中のジオメトリ解析によるジオメトリの効率的なマルチgpuレンダリングのためのシステム及び方法
JP2023155680A JP7481560B2 (ja) 2020-02-03 2023-09-21 グラフィック処理のための方法、コンピュータシステム、及びコンピュータ可読媒体

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US16/780,776 US11170461B2 (en) 2020-02-03 2020-02-03 System and method for efficient multi-GPU rendering of geometry by performing geometry analysis while rendering
US16/780,776 2020-02-03
US16/780,798 US11508110B2 (en) 2020-02-03 2020-02-03 System and method for efficient multi-GPU rendering of geometry by performing geometry analysis before rendering
US16/780,864 2020-02-03
US16/780,798 2020-02-03
US16/780,864 US11120522B2 (en) 2020-02-03 2020-02-03 System and method for efficient multi-GPU rendering of geometry by subdividing geometry

Publications (1)

Publication Number Publication Date
WO2021158468A1 true WO2021158468A1 (en) 2021-08-12

Family

ID=74759499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/016022 WO2021158468A1 (en) 2020-02-03 2021-02-01 System and method for efficient multi-gpu rendering of geometry by geometry analysis while rendering

Country Status (4)

Country Link
EP (1) EP4100923A1 (ja)
JP (3) JP7254252B2 (ja)
CN (1) CN115335866A (ja)
WO (1) WO2021158468A1 (ja)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185640A (zh) * 2023-04-20 2023-05-30 上海励驰半导体有限公司 基于多gpu的图像命令处理方法、装置、存储介质及芯片
CN117557444A (zh) * 2023-11-10 2024-02-13 摩尔线程智能科技(上海)有限责任公司 几何处理装置、图形处理器及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050041031A1 (en) * 2003-08-18 2005-02-24 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US20100007662A1 (en) * 2008-06-05 2010-01-14 Arm Limited Graphics processing systems
US20110157193A1 (en) * 2009-12-29 2011-06-30 Nvidia Corporation Load balancing in a system with multi-graphics processors and multi-display systems
US20120001925A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Dynamic Feedback Load Balancing
US20130076761A1 (en) * 2011-09-22 2013-03-28 Arm Limited Graphics processing systems
US20140347357A1 (en) * 2013-05-24 2014-11-27 Hong-Yun Kim Graphic processing unit and tile-based rendering method
EP3185217A2 (en) * 2015-12-21 2017-06-28 Imagination Technologies Limited Allocation of tiles to processing engines in a graphics processing system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008071261A (ja) * 2006-09-15 2008-03-27 Toshiba Corp 画像処理システム及び画像処理方法
GB0723537D0 (en) * 2007-11-30 2008-01-09 Multi-core rasterisation in a tile based rendering system7743180001
EP2596491B1 (en) * 2010-07-19 2017-08-30 ATI Technologies ULC Displaying compressed supertile images
US10147222B2 (en) * 2015-11-25 2018-12-04 Nvidia Corporation Multi-pass rendering in a screen space pipeline
CN107958437A (zh) * 2017-11-24 2018-04-24 中国航空工业集团公司西安航空计算技术研究所 一种多gpu大分辨率多屏图形分块并行渲染方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050041031A1 (en) * 2003-08-18 2005-02-24 Nvidia Corporation Adaptive load balancing in a multi-processor graphics processing system
US20100007662A1 (en) * 2008-06-05 2010-01-14 Arm Limited Graphics processing systems
US20110157193A1 (en) * 2009-12-29 2011-06-30 Nvidia Corporation Load balancing in a system with multi-graphics processors and multi-display systems
US20120001925A1 (en) * 2010-06-30 2012-01-05 Ati Technologies, Ulc Dynamic Feedback Load Balancing
US20130076761A1 (en) * 2011-09-22 2013-03-28 Arm Limited Graphics processing systems
US20140347357A1 (en) * 2013-05-24 2014-11-27 Hong-Yun Kim Graphic processing unit and tile-based rendering method
EP3185217A2 (en) * 2015-12-21 2017-06-28 Imagination Technologies Limited Allocation of tiles to processing engines in a graphics processing system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185640A (zh) * 2023-04-20 2023-05-30 上海励驰半导体有限公司 基于多gpu的图像命令处理方法、装置、存储介质及芯片
CN116185640B (zh) * 2023-04-20 2023-08-08 上海励驰半导体有限公司 基于多gpu的图像命令处理方法、装置、存储介质及芯片
CN117557444A (zh) * 2023-11-10 2024-02-13 摩尔线程智能科技(上海)有限责任公司 几何处理装置、图形处理器及电子设备
CN117557444B (zh) * 2023-11-10 2024-05-14 摩尔线程智能科技(上海)有限责任公司 几何处理装置、图形处理器及电子设备

Also Published As

Publication number Publication date
JP2023503190A (ja) 2023-01-26
EP4100923A1 (en) 2022-12-14
JP2023080128A (ja) 2023-06-08
JP7481560B2 (ja) 2024-05-10
JP7355960B2 (ja) 2023-10-03
JP7254252B2 (ja) 2023-04-07
JP2023171822A (ja) 2023-12-05
CN115335866A (zh) 2022-11-11

Similar Documents

Publication Publication Date Title
US11900500B2 (en) System and method for efficient multi-GPU rendering of geometry by subdividing geometry
JP7481560B2 (ja) グラフィック処理のための方法、コンピュータシステム、及びコンピュータ可読媒体
JP7481557B2 (ja) レンダリング中の領域テストによってジオメトリの効率的なマルチgpuレンダリングを行うためのシステム及び方法
US20220005148A1 (en) System and method for efficient multi-gpu rendering of geometry by performing geometry analysis while rendering
WO2021158483A1 (en) System and method for efficient multi-gpu rendering of geometry by pretesting against interleaved screen regions before rendering
US11847720B2 (en) System and method for performing a Z pre-pass phase on geometry at a GPU for use by the GPU when rendering the geometry
US20210241414A1 (en) System and method for efficient multi-gpu rendering of geometry by pretesting against screen regions using configurable shaders
US11508110B2 (en) System and method for efficient multi-GPU rendering of geometry by performing geometry analysis before rendering
US11869114B2 (en) Efficient multi-GPU rendering by testing geometry against screen regions before rendering using a pretest GPU
US11954760B2 (en) Assigning geometry for pretesting against screen regions for an image frame using prior frame information
US11961159B2 (en) Region testing of geometry while rendering for efficient multi-GPU rendering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21708462

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022546704

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021708462

Country of ref document: EP

Effective date: 20220905