JP7334358B2

JP7334358B2 - System and method for efficient multi-GPU rendering of geometry by pre-testing on interleaved screen regions before rendering

Info

Publication number: JP7334358B2
Application number: JP2022546703A
Authority: JP
Inventors: イー．サーニーマーク; ストラウスフロリアン; バーグオフトビアス
Original assignee: Sony Interactive Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2020-02-03
Filing date: 2021-02-01
Publication date: 2023-08-28
Anticipated expiration: 2041-02-01
Also published as: CN115298686A; CN115298686B; JP7564399B2; JP2024091921A; WO2021158483A1; JP7481556B2; JP2023144060A; WO2021158483A8; JP2023505607A; EP4100922A1

Description

本開示は、グラフィックス処理に関し、より具体的には、アプリケーションに対する画像をレンダリングするときのマルチＧＰＵ連携に関する。 TECHNICAL FIELD This disclosure relates to graphics processing and, more particularly, to multi-GPU cooperation when rendering images for applications.

近年、クラウドゲーミングサーバと、ネットワークを通して接続されたクライアントとの間で、ストリーミングフォーマットでオンラインまたはクラウドゲーミングを行うことを可能にするオンラインサービスに対する継続的な取り組みがある。ストリーミングフォーマットはますます人気が出ている。なぜならば、オンデマンドでゲームタイトルが利用できること、より複雑なゲームが実行できること、マルチプレイヤーゲーミングの場合にプレーヤ間でネットワーク接続できること、プレーヤ間で資産を共有できること、プレーヤ及び／または観戦者の間で瞬時の経験を共有できること、友人がビデオゲームをプレイする様子を友人が観戦できること、友人の進行中のゲームプレイに友人が参加できることなどがあるからである。 In recent years, there has been a continuing effort towards online services that allow online or cloud gaming in a streaming format between a cloud gaming server and clients connected through a network. Streaming formats are becoming more and more popular. because of the availability of game titles on demand, the ability to run more complex games, the ability to network between players in the case of multiplayer gaming, the ability to share assets between players, and the ability to share assets between players and/or spectators. The ability to share instant experiences, the ability for friends to watch as they play video games, and the ability for friends to participate in ongoing game play.

クラウドゲーミングサーバは、１つ以上のクライアント及び／またはアプリケーションにリソースを提供するように構成される場合がある。すなわち、クラウドゲーミングサーバは、高スループットが可能なリソースとともに構成される場合がある。たとえば、個々のグラフィックス処理ユニット（ＧＰＵ）が達成できる性能には限界がある。さらに複雑なシーンをレンダリングするために、またはシーンを生成するときにさらに複雑なアルゴリズム（たとえば、材料、照明など）を用いるために、複数のＧＰＵを用いて単一画像をレンダリングすることが望ましい場合がある。しかし、これらのグラフィックス処理ユニットを均等に用いることは、実現が難しい。さらに、従来の技術を用いてアプリケーションに対する画像を処理するために複数のＧＰＵがある場合でも、スクリーンピクセル数及びジオメトリ密度の両方での対応する増加をサポートすることはできない（４個のＧＰＵにより、画像に対して４倍のピクセルを書き込むこと及び／または４倍の頂点またはプリミティブを処理することは不可能である）。 A cloud gaming server may be configured to provide resources to one or more clients and/or applications. That is, cloud gaming servers may be configured with resources capable of high throughput. For example, there is a limit to the performance that an individual graphics processing unit (GPU) can achieve. When it is desirable to render a single image with multiple GPUs to render a more complex scene, or to use more complex algorithms (e.g., materials, lighting, etc.) when generating the scene There is However, using these graphics processing units evenly is difficult to achieve. Furthermore, even if there are multiple GPUs to process images for an application using conventional techniques, it is not possible to support a corresponding increase in both screen pixel count and geometry density (with 4 GPUs, It is impossible to write four times as many pixels to the image and/or process four times as many vertices or primitives).

本開示の実施形態は、このような背景の下になされたものである。 It is against this background that the embodiments of the present disclosure have been made.

本開示の実施形態は、複数のＧＰＵを連携して用いて単一画像をレンダリングすること、たとえば、レンダリング前にスクリーン領域（インターリーブされ得る）に対する事前テストを行うことによってアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことに関する。 Embodiments of the present disclosure use multiple GPUs in concert to render a single image, e.g., multi-GPU rendering of geometry to an application by pre-testing on screen regions (which may be interleaved) before rendering. about doing

本開示の実施形態では、グラフィックス処理を行うための方法を開示する。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。スクリーン領域はインターリーブされる。本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、ＧＰＵに、アプリケーションによって生成された画像フレームのジオメトリのピースを、ジオメトリテストのために割り当てることを含む。本方法は、ＧＰＵにおいてジオメトリテストを行って、ジオメトリのピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、複数のＧＰＵのそれぞれにおいて情報を用いてジオメトリのピースをレンダリングすることを含み、情報を用いることは、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップすることを含むことができる。 Embodiments of the present disclosure disclose a method for performing graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. Screen regions are interleaved. The method includes assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. The method includes assigning a GPU a piece of geometry of an image frame generated by the application for geometry testing. The method includes performing geometry tests on the GPU to generate information about pieces of geometry and their relationship to each of a plurality of screen regions. The method includes using the information to render a piece of geometry on each of a plurality of GPUs, wherein using the information may, for example, render the piece of geometry any screen area assigned to a given GPU. It can include skipping rendering completely if it is determined that there is no overlap.

別の実施形態では、方法を行うための非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割するためのプログラム命令であって、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有し、複数のスクリーン領域におけるスクリーン領域はインターリーブされている、プログラム命令を含む。コンピュータ可読媒体は、ＧＰＵに、アプリケーションによって生成された画像フレームのジオメトリのピースを、ジオメトリ事前テストのために割り当てるためのプログラム命令を含む。コンピュータ可読媒体は、ＧＰＵにおいてジオメトリ事前テストを行って、ジオメトリのピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成するためのプログラム命令を含む。コンピュータ可読媒体は、画像フレームをレンダリングするときに複数のＧＰＵのそれぞれにおいて情報を用いるためのプログラム命令を含む。 In another embodiment, a non-transitory computer-readable medium for performing the method is disclosed. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium is program instructions for dividing the responsibility for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a responsibility known to the multiple GPUs. and the screen regions in the plurality of screen regions are interleaved. A computer-readable medium includes program instructions for allocating to a GPU a piece of geometry of an image frame generated by an application for geometry pre-testing. A computer-readable medium includes program instructions for performing geometry pre-testing on a GPU to generate information about pieces of geometry and their relationship to each of a plurality of screen regions. A computer-readable medium includes program instructions for using information at each of a plurality of GPUs when rendering image frames.

さらなる他の実施形態では、コンピュータシステムが開示される。コンピュータシステムは、プロセッサと、プロセッサに結合され、命令が記憶されたメモリであって、命令は、コンピュータシステムによって実行されると、コンピュータシステムに、グラフィックス処理を行うための方法を実行させる、メモリと、を含む。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有し、複数のスクリーン領域におけるスクリーン領域はインターリーブされる。本方法は、ＧＰＵに、アプリケーションによって生成された画像フレームのジオメトリのピースを、ジオメトリ事前テストのために割り当てることを含む。本方法は、ＧＰＵにおいてジオメトリ事前テストを行って、ジオメトリのピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、画像フレームをレンダリングするときに複数のＧＰＵのそれぞれにおいて情報を用いることを含む。 In yet another embodiment, a computer system is disclosed. The computer system comprises a processor and a memory coupled to the processor and in which instructions are stored which, when executed by the computer system, cause the computer system to perform a method for performing graphics processing. and including. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs; Screen regions in multiple screen regions are interleaved. The method includes assigning a GPU a piece of geometry of an image frame generated by the application for geometry pre-testing. The method includes performing geometry pre-testing on the GPU to generate information about a piece of geometry and its relationship to each of a plurality of screen regions. The method includes using information on each of a plurality of GPUs when rendering image frames.

本開示の実施形態では、グラフィックス処理を行うための方法を開示する。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する、分割することを含む。本方法は、アプリケーションによって生成された画像フレームのジオメトリの複数のピースに対するＧＰＵ事前テストにおいてジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングすることであって、情報を用いることは、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップすることを含む、レンダリングすることを含む。 Embodiments of the present disclosure disclose a method for performing graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to multiple GPUs. including doing The method includes performing a geometry test in a GPU pretest on multiple pieces of geometry of an image frame generated by the application to generate information about each piece of geometry and its relationship to each of multiple screen regions. include. The method is to render the plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the plurality of pieces of geometry, wherein using the information includes, for example, the geometry , including skipping rendering entirely if it is determined that a piece of .

別の実施形態では、方法を行うための非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割するためのプログラム命令であって、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する、プログラム命令を含む。コンピュータ可読媒体は、アプリケーションによって生成された画像フレームのジオメトリの複数のピースに対するＧＰＵ事前テストにおいてジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成するためのプログラム命令を含む。コンピュータ可読媒体は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングするためのプログラム命令であって、情報を用いることは、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする、プログラミング命令を含む。 In another embodiment, a non-transitory computer-readable medium for performing the method is disclosed. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer readable medium is program instructions for dividing the responsibility for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a responsibility known to the multiple GPUs. contains program instructions, with corresponding divisions of A computer readable medium for performing geometry tests in a GPU pretest on multiple pieces of geometry of an image frame generated by an application to generate information about each piece of geometry and its relationship to each of multiple screen regions. program instructions. The computer-readable medium is program instructions for rendering a plurality of pieces of geometry on each of a plurality of GPUs using information generated for each of the plurality of pieces of geometry, wherein using the information is , for example, programming instructions that skip rendering entirely if it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU.

さらなる他の実施形態では、コンピュータシステムが開示される。コンピュータシステムは、プロセッサと、プロセッサに結合され、命令が記憶されたメモリであって、命令は、コンピュータシステムによって実行されると、コンピュータシステムに、グラフィックス処理を行うための方法を実行させる、メモリと、を含む。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、アプリケーションによって生成された画像フレームのジオメトリの複数のピースに対するＧＰＵ事前テストにおいてジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングし、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 In yet another embodiment, a computer system is disclosed. The computer system comprises a processor and a memory coupled to the processor and in which instructions are stored which, when executed by the computer system, cause the computer system to perform a method for performing graphics processing. and including. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method includes performing a geometry test in a GPU pre-test on multiple pieces of geometry of an image frame generated by the application to generate information about each piece of geometry and its relationship to each of multiple screen regions. include. The method renders the plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the plurality of pieces of geometry, and when using the information, for example, the pieces of geometry are: Skip rendering entirely if it is determined not to overlap any screen regions allocated to a given GPU.

本開示の実施形態では、グラフィックス処理を行うための方法を開示する。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する、分割することを含む。本方法は、アプリケーションによって生成された以前の画像フレームのレンダリングフェーズの間に、複数のＧＰＵにおいてジオメトリの第１の複数のピースをレンダリングすることを含む。本方法は、以前の画像フレームのレンダリングに対する統計値を生成することを含む。本方法は、統計値に基づいて、アプリケーションによって生成された現在の画像フレームのジオメトリの第２の複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、現在の画像フレームにおいてジオメトリの第２の複数のピースに対してジオメトリテストを行って、ジオメトリの第２の複数のピースの各ピースと、複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成し、ジオメトリテストは複数のＧＰＵのそれぞれにおいて割り当てに基づいて行われる。本方法は、ジオメトリの第２の複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの第２の複数のピースをレンダリングし、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする含む。 Embodiments of the present disclosure disclose a method for performing graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to multiple GPUs. including doing The method includes rendering a first plurality of pieces of geometry on the plurality of GPUs during a rendering phase of previous image frames generated by the application. The method includes generating statistics for rendering of previous image frames. The method includes allocating a second plurality of pieces of geometry of the current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistical values. The method performs a geometry test on a second plurality of pieces of geometry in the current image frame to relate each piece of the second plurality of pieces of geometry and its relationship to each of a plurality of screen regions. Information is generated and geometry tests are performed on each of the multiple GPUs based on allocation. The method renders the second plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the second plurality of pieces of geometry, and when using the information, e.g. , skipping rendering entirely if it is determined that the piece of geometry does not overlap any screen regions allocated to the given GPU.

別の実施形態では、方法を行うための非一時的コンピュータ可読媒体を開示する。コンピュータ可読媒体は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割するためのプログラム命令であって、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する、プログラミング命令を含む。コンピュータ可読媒体は、アプリケーションによって生成された以前の画像フレームのレンダリングフェーズの間に、複数のＧＰＵにおいてジオメトリの第１の複数のピースをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、以前の画像フレームのレンダリングに対する統計値を生成するためのプログラム命令を含む。コンピュータ可読媒体は、統計値に基づいて、アプリケーションによって生成された現在の画像フレームのジオメトリの第２の複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てるためのプログラム命令を含む。コンピュータ可読媒体は、現在の画像フレームにおいてジオメトリの第２の複数のピースに対してジオメトリテストを行って、ジオメトリの第２の複数のピースの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成するためのプログラミング命令を有し、ジオメトリテストは複数のＧＰＵのそれぞれにおいて割り当てに基づいて行われる。コンピュータ可読媒体は、ジオメトリの第２の複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの第２の複数のピースをレンダリングするためのプログラミング命令を有し、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップすることができる。 In another embodiment, a non-transitory computer-readable medium for performing the method is disclosed. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer readable medium is program instructions for dividing the responsibility for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a responsibility known to the multiple GPUs. contains programming instructions, with corresponding divisions of The computer-readable medium includes program instructions for rendering a first plurality of pieces of geometry on a plurality of GPUs during a rendering phase of previous image frames generated by an application. A computer-readable medium includes program instructions for generating statistics for rendering of previous image frames. The computer-readable medium includes program instructions for allocating a second plurality of pieces of geometry of the current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistical values. The computer readable medium performs a geometry test on the second plurality of pieces of geometry in the current image frame to relate each piece of the second plurality of pieces of geometry and its relationship to each of the plurality of screen regions. Having programming instructions for generating information, geometry tests are performed on each of a plurality of GPUs based on allocation. The computer-readable medium has programming instructions for rendering a second plurality of pieces of geometry on each of a plurality of GPUs using information generated for each of the second plurality of pieces of geometry. , the rendering can be skipped entirely if, for example, it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU when using the information.

さらなる他の実施形態では、コンピュータシステムが開示される。コンピュータシステムは、プロセッサと、プロセッサに結合され、命令が記憶されたメモリであって、命令は、コンピュータシステムによって実行されると、コンピュータシステムに、グラフィックス処理を行うための方法を実行させる、メモリと、を含む。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、アプリケーションによって生成された以前の画像フレームのレンダリングフェーズの間に、複数のＧＰＵにおいてジオメトリの第１の複数のピースをレンダリングする。本方法は、以前の画像フレームのレンダリングに対する統計値を生成することを含む。本方法は、統計値に基づいて、アプリケーションによって生成された現在の画像フレームのジオメトリの第２の複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、現在の画像フレームにおいてジオメトリの第２の複数のピースに対してジオメトリテストを行って、ジオメトリの第２の複数のピースの各ピースと、複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成し、ジオメトリテストは複数のＧＰＵのそれぞれにおいて割り当てに基づいて行われる。本方法は、ジオメトリの第２の複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの第２の複数のピースをレンダリングし、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップすることを含むことができる。 In yet another embodiment, a computer system is disclosed. The computer system comprises a processor and a memory coupled to the processor and in which instructions are stored which, when executed by the computer system, cause the computer system to perform a method for performing graphics processing. and including. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method renders a first plurality of pieces of geometry on the plurality of GPUs during a rendering phase of previous image frames generated by the application. The method includes generating statistics for rendering of previous image frames. The method includes allocating a second plurality of pieces of geometry of the current image frame generated by the application to the plurality of GPUs for geometry testing based on the statistical values. The method performs a geometry test on a second plurality of pieces of geometry in the current image frame to relate each piece of the second plurality of pieces of geometry and its relationship to each of a plurality of screen regions. Information is generated and geometry tests are performed on each of the multiple GPUs based on allocation. The method renders the second plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the second plurality of pieces of geometry, and when using the information, e.g. , skipping rendering entirely if it is determined that the piece of geometry does not overlap any screen regions allocated to the given GPU.

本開示の実施形態は、グラフィックス処理を行うための方法を開示する。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てる。本方法は、ジオメトリテストを実行するように１つ以上のシェーダーを構成する第１の状態を設定することを含む。本方法は、複数のＧＰＵにおいてジオメトリの複数のピースに対してジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、レンダリングを実行するように１つ以上のシェーダーを構成する第２の状態を設定することを含む。本方法は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングし、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 Embodiments of the present disclosure disclose methods for performing graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method assigns multiple pieces of geometry for an image frame to multiple GPUs for geometry testing. The method includes setting a first state that configures one or more shaders to perform the geometry test. The method includes performing geometry tests on multiple pieces of geometry on multiple GPUs to generate information about each piece of geometry and its relationship to each of multiple screen regions. The method includes setting a second state that configures one or more shaders to perform rendering. The method renders the plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the plurality of pieces of geometry, and when using the information, for example, the pieces of geometry are: Skip rendering entirely if it is determined not to overlap any screen regions allocated to a given GPU.

別の実施形態では、方法を行うための非一時的コンピュータ可読媒体が開示される。コンピュータ可読媒体は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割するためのプログラム命令を有し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。コンピュータ可読媒体は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てるためのプログラム命令を含む。コンピュータ可読媒体は、ジオメトリテストを実行するように１つ以上のシェーダーを構成する第１の状態を設定するためのプログラム命令を含む。コンピュータ可読媒体は、複数のＧＰＵにおいてジオメトリの複数のピースに対してジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成するためのプログラム命令を含む。コンピュータ可読媒体は、レンダリングを実行するように１つ以上のシェーダーを構成する第２の状態を設定するためのプログラム命令を含む。コンピュータ可読媒体は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングするためのプログラム命令を有し、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 In another embodiment, a non-transitory computer-readable medium for performing the method is disclosed. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium has program instructions for dividing the responsibility for rendering graphics geometry among a plurality of GPUs based on a plurality of screen regions, each GPU having a responsibility known to the plurality of GPUs. has a corresponding division of A computer-readable medium includes program instructions for assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. A computer-readable medium includes program instructions for setting a first state that configures one or more shaders to perform a geometry test. The computer-readable medium includes program instructions for performing geometry tests on multiple pieces of geometry on multiple GPUs to generate information regarding each piece of geometry and its relationship to each of multiple screen regions. The computer-readable medium includes program instructions for setting a second state that configures one or more shaders to perform rendering. The computer-readable medium has program instructions for rendering the plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the plurality of pieces of geometry; , for example, skip rendering entirely if it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU.

さらなる他の実施形態では、コンピュータシステムが開示される。コンピュータシステムは、プロセッサと、プロセッサに結合され、命令が記憶されたメモリであって、命令は、コンピュータシステムによって実行されると、コンピュータシステムに、グラフィックス処理を行うための方法を実行させる、メモリと、を含む。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、ジオメトリテストを実行するように１つ以上のシェーダーを構成する第１の状態を設定することを含む。本方法は、複数のＧＰＵにおいてジオメトリの複数のピースに対してジオメトリテストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。本方法は、レンダリングを実行するように１つ以上のシェーダーを構成する第２の状態を設定することを含む。本方法は、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングし、情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 In yet another embodiment, a computer system is disclosed. The computer system comprises a processor and a memory coupled to the processor and storing instructions which, when executed by the computer system, cause the computer system to perform a method for performing graphics processing. and including. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method includes assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. The method includes setting a first state that configures one or more shaders to perform the geometry test. The method includes performing geometry tests on multiple pieces of geometry on multiple GPUs to generate information about each piece of geometry and its relationship to each of multiple screen regions. The method includes setting a second state that configures one or more shaders to perform rendering. The method renders the plurality of pieces of geometry on each of the plurality of GPUs using the information generated for each of the plurality of pieces of geometry, and when using the information, for example, the pieces of geometry are: Skip rendering entirely if it is determined not to overlap any screen regions allocated to a given GPU.

本開示の実施形態は、グラフィックス処理を行うための方法を開示する。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、ジオメトリの第１の組のピースに対してジオメトリテスト及びレンダリングを実行する第１の組のシェーダーと、ジオメトリの第２の組のピースに対してジオメトリテスト及びレンダリングを実行する第２の組のシェーダーとを、インターリーブすることを含む。ジオメトリテストは、第１の組または第２の組内のジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する対応する情報を生成する。対応する情報を複数のＧＰＵが用いて、第１の組または第２の組内のジオメトリの各ピースをレンダリングする。情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 Embodiments of the present disclosure disclose methods for performing graphics processing. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method includes assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. The method comprises a first set of shaders performing geometry testing and rendering on a first set of pieces of geometry and a second set of shaders performing geometry testing and rendering on a second set of pieces of geometry. and interleaving the set of shaders. A geometry test generates corresponding information about each piece of geometry in the first set or the second set and its relationship to each of the plurality of screen regions. Multiple GPUs use the corresponding information to render each piece of geometry in the first set or the second set. When using the information, for example, skip rendering entirely if it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU.

別の実施形態では、方法を行うための非一時的コンピュータ可読媒体が開示される。コンピュータ可読媒体は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングするためのプログラム命令を含む。コンピュータ可読媒体は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割するためのプログラム命令を有し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。コンピュータ可読媒体は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てるためのプログラム命令を含む。コンピュータ可読媒体は、ジオメトリの第１の組のピースに対してジオメトリテスト及びレンダリングを実行する第１の組のシェーダーと、ジオメトリの第２の組のピースに対してジオメトリテスト及びレンダリングを実行する第２の組のシェーダーとをインターリーブするためのプログラミング命令を含む。ジオメトリテストは、第１の組または第２の組内のジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する対応する情報を生成する。対応する情報を複数のＧＰＵが用いて、第１の組または第２の組内のジオメトリの各ピースをレンダリングする。情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 In another embodiment, a non-transitory computer-readable medium for performing the method is disclosed. A computer-readable medium includes program instructions for rendering graphics for an application using multiple graphics processing units (GPUs). The computer-readable medium has program instructions for dividing the responsibility for rendering graphics geometry among a plurality of GPUs based on a plurality of screen regions, each GPU having a responsibility known to the plurality of GPUs. has a corresponding division of A computer-readable medium includes program instructions for assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. The computer-readable medium includes a first set of shaders for performing geometry testing and rendering on a first set of pieces of geometry and a second set of shaders for performing geometry testing and rendering on a second set of pieces of geometry. It contains programming instructions for interleaving two sets of shaders. A geometry test generates corresponding information about each piece of geometry in the first set or the second set and its relationship to each of the plurality of screen regions. Multiple GPUs use the corresponding information to render each piece of geometry in the first set or the second set. When using the information, for example, skip rendering entirely if it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU.

さらなる他の実施形態では、コンピュータシステムが開示される。コンピュータシステムは、プロセッサと、プロセッサに結合され、命令が記憶されたメモリであって、命令は、コンピュータシステムによって実行されると、コンピュータシステムに、グラフィックス処理を行うための方法を実行させる、メモリと、を含む。本方法は、複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。本方法は、グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割し、各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。本方法は、ジオメトリの第１の組のピースに対してジオメトリテスト及びレンダリングを実行する第１の組のシェーダーと、ジオメトリの第２の組のピースに対してジオメトリテスト及びレンダリングを実行する第２の組のシェーダーとをインターリーブする。ジオメトリテストは、第１の組または第２の組内のジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する対応する情報を生成する。対応する情報を複数のＧＰＵが用いて、第１の組または第２の組内のジオメトリの各ピースをレンダリングする。情報を用いるときに、たとえば、ジオメトリのピースが、所与のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップしないと判定された場合に、レンダリングを完全にスキップする。 In yet another embodiment, a computer system is disclosed. The computer system comprises a processor and a memory coupled to the processor and storing instructions which, when executed by the computer system, cause the computer system to perform a method for performing graphics processing. and including. The method includes rendering graphics for an application using multiple graphics processing units (GPUs). The method divides the responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions, each GPU having a corresponding division of responsiveness known to the multiple GPUs. The method includes assigning multiple pieces of geometry of an image frame to multiple GPUs for geometry testing. The method comprises a first set of shaders performing geometry testing and rendering on a first set of pieces of geometry and a second set of shaders performing geometry testing and rendering on a second set of pieces of geometry. pairs of shaders. A geometry test generates corresponding information about each piece of geometry in the first set or the second set and its relationship to each of the plurality of screen regions. Multiple GPUs use the corresponding information to render each piece of geometry in the first set or the second set. When using the information, for example, skip rendering entirely if it is determined that a piece of geometry does not overlap any screen regions assigned to a given GPU.

本開示の他の態様は、以下の詳細な説明と併せて、一例として本開示の原理を例示する添付図面から明らかになる。 Other aspects of the disclosure will become apparent from the accompanying drawings, which, taken together with the following detailed description, illustrate, by way of example, the principles of the disclosure.

本開示は、以下の説明と併せて添付図面を参照することにより最良に理解され得る。 The present disclosure may best be understood by referring to the accompanying drawings in conjunction with the following description.

本開示の実施形態により、複数のＧＰＵ（グラフィックス処理ユニット）を連携で実行して単一画像をレンダリングするように構成された１つ以上のクラウドゲーミングサーバ間でネットワークを介してゲーミングを提供するためのシステムの図であって、スクリーン領域（インターリーブされ得る）に対してジオメトリの事前テストを行うことによって、アプリケーションに対するジオメトリのレンダリングを行うマルチＧＰＵを含む図である。Embodiments of the present disclosure provide gaming over a network among one or more cloud gaming servers configured to cooperatively execute multiple GPUs (graphics processing units) to render a single image. FIG. 2 is a diagram of a system for , including multiple GPUs that render geometry for applications by pre-testing the geometry for screen regions (which may be interleaved). 本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするマルチＧＰＵアーキテクチャの図である。FIG. 2 is a diagram of a multi-GPU architecture in which multiple GPUs work together to render a single image, according to one embodiment of the present disclosure; 本開示の一実施形態により、スクリーン領域（インターリーブされ得る）に対してジオメトリの事前テストを行うことによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うように構成された複数のグラフィックス処理ユニットリソースの図である。FIG. 4 is an illustration of multiple graphics processing unit resources configured for multi-GPU rendering of geometry to an application by pretesting the geometry against screen regions (which may be interleaved) according to one embodiment of the present disclosure; is. 本開示の一実施形態により、マルチＧＰＵ処理を行って複数のＧＰＵが連携して単一画像をレンダリングするように構成されたグラフィックスパイプラインを実装するレンダリングアーキテクチャの図である。1 is a diagram of a rendering architecture implementing a graphics pipeline configured for multi-GPU processing so that multiple GPUs work together to render a single image, in accordance with an embodiment of the present disclosure; FIG. 本開示の一実施形態により、レンダリング前にインターリーブスクリーン領域に対する事前テストを行うことによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことを含むグラフィックス処理を行うための方法を例示するフロー図である。[0014] Figure 4 is a flow diagram illustrating a method for performing graphics processing including multi-GPU rendering of geometry for an application by pre-testing on interleaved screen regions before rendering, according to one embodiment of the present disclosure; 本開示の一実施形態により、マルチＧＰＵレンダリングを行うときに四分円に細分割されるスクリーンの図である。FIG. 4 is a diagram of a screen subdivided into quadrants when doing multi-GPU rendering, according to one embodiment of the present disclosure; 本開示の一実施形態により、マルチＧＰＵレンダリングを行うときに複数のインターリーブ領域に細分割されるスクリーンの図である。FIG. 4 is a diagram of a screen subdivided into multiple interleaved regions when performing multi-GPU rendering, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、連携して単一画像をレンダリングする複数のＧＰＵによって共有されるレンダリングコマンドバッファの図であり、ジオメトリの事前テスト部分及びレンダリング部分を含む図である。FIG. 4 is a diagram of a rendering command buffer shared by multiple GPUs working together to render a single image, including a geometry pretest portion and a rendering portion, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、複数のＧＰＵによってレンダリングされる４つのオブジェクトを含む画像を例示する図であり、画像のオブジェクトをレンダリングするときの各ＧＰＵに対するスクリーン領域レスポンシビリティを示す図である。[0014] Fig. 4 is a diagram illustrating an image containing four objects rendered by multiple GPUs, showing the screen area responsiveness for each GPU when rendering the objects of the image, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、図７Ｂ－１の４つのオブジェクトをレンダリングするときに各ＧＰＵが行うレンダリングを例示する表である。7B-1 is a table illustrating the rendering each GPU does when rendering the four objects of FIG. 7B-1, according to one embodiment of the present disclosure; 本開示の一実施形態により、複数のＧＰＵの連携を通して画像フレームをレンダリングするときに（たとえば、図７Ｂ－１の画像）１つ以上のＧＰＵによって行われるジオメトリの事前テスト及びジオメトリのレンダリングを行うことを例示する図である。According to one embodiment of the present disclosure, performing geometry pre-testing and geometry rendering performed by one or more GPUs when rendering an image frame through coordination of multiple GPUs (e.g., the image of FIG. 7B-1). It is a figure which illustrates. 本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするときのスクリーン領域に対するオブジェクトテストを例示する図である。FIG. 4 illustrates an object test for screen regions when multiple GPUs work together to render a single image, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするときのスクリーン領域に対するオブジェクトの一部のテストを例示する図である。[0014] Figure 4 illustrates testing a portion of an object against screen regions when multiple GPUs work together to render a single image, in accordance with an embodiment of the present disclosure; Ａ～Ｃは、本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするときにスクリーン領域を対応するＧＰＵに割り当てるための種々の方策を例示する図である。4A-C are diagrams illustrating various strategies for allocating screen area to corresponding GPUs when multiple GPUs work together to render a single image, according to one embodiment of the present disclosure. 本開示の実施形態により、ジオメトリの複数のピースに対してジオメトリ事前テストを行うためのＧＰＵ割り当ての種々の分配を例示する図である。[0014] Figure 4 illustrates various distributions of GPU allocation for geometry pre-testing on multiple pieces of geometry, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、複数のＧＰＵによる以前の画像フレームのジオメトリの事前テスト及びレンダリングと、レンダリング中に収集した統計値を用いて、現在の画像フレームのジオメトリの事前テストを現在の画像フレームにおける複数のＧＰＵに割り当てることに影響を与えることと、を例示する図である。According to one embodiment of the present disclosure, multiple GPUs pre-test and render the geometry of previous image frames and use statistics collected during rendering to pre-test the geometry of the current image frame to the current image frame. FIG. 2 illustrates influencing allocation to multiple GPUs in . 本開示の一実施形態により、複数のＧＰＵによる以前の画像フレームのジオメトリの事前テスト及びレンダリングと、レンダリング中に収集した統計値を用いて、現在の画像フレームのジオメトリの事前テストを現在の画像フレームにおける複数のＧＰＵに割り当てることに影響を与えることと、を含むグラフィックス処理を行うための方法を例示するフロー図である。According to one embodiment of the present disclosure, multiple GPUs pre-test and render the geometry of previous image frames and use statistics collected during rendering to pre-test the geometry of the current image frame to the current image frame. 1 is a flow diagram illustrating a method for performing graphics processing including influencing allocation to multiple GPUs in . 本開示の一実施形態により、コマンドバッファの一部を通る２回のパスにおいて画像フレームのジオメトリの事前テスト及びレンダリングの両方を実行するように構成されたシェーダーを用いることを例示する図である。[0015] Figure 4 illustrates using a shader configured to both pre-test and render the geometry of an image frame in two passes through a portion of the command buffer, in accordance with an embodiment of the present disclosure; 本開示の一実施形態により、コマンドバッファの一部を通る２回のパスにおいて同じ組のシェーダーを用いて画像フレームのジオメトリの事前テスト及びレンダリングの両方を行うことを含むグラフィックス処理を行うための方法を例示するフロー図である。According to one embodiment of the present disclosure, for performing graphics processing that includes both pre-testing and rendering the geometry of an image frame with the same set of shaders in two passes through a portion of the command buffer. Figure 2 is a flow diagram illustrating the method; 本開示の一実施形態により、ジオメトリテスト及びレンダリングの両方を実行するように構成されたシェーダーを用いることを例示する図であり、ジオメトリの異なる組のピースに対して行われるジオメトリテスト及びレンダリングが、対応するコマンドバッファの別個の部分を用いてインターリーブされる図である。[0014] Figure 4 illustrates using a shader configured to perform both geometry testing and rendering, according to one embodiment of the present disclosure, wherein geometry testing and rendering performed on different sets of pieces of geometry are performed by: FIG. 4B is interleaved with separate portions of the corresponding command buffers; 本開示の一実施形態により、ジオメトリの異なる組のピースに対する画像フレームのジオメトリの事前テスト及びレンダリングを、対応するコマンドバッファの別個の部分を用いてインターリーブすることを含むグラフィックス処理を行うための方法を例示するフロー図である。According to one embodiment of the present disclosure, a method for performing graphics processing that includes interleaving pre-testing and rendering of image frame geometry for different sets of pieces of geometry with separate portions of corresponding command buffers. is a flow diagram illustrating the . 本開示の種々の実施形態の態様を実行するために用いることができるデバイス例のコンポーネントを例示する図である。1 illustrates components of an example device that can be used to carry out aspects of various embodiments of the present disclosure; FIG.

以下の詳細な説明には、説明の目的上、多くの特定の詳細が含まれているが、当業者であれば分かるように、以下の詳細に対する多くの変形及び修正も本開示の範囲内である。したがって、以下に説明する本開示の態様は、この説明に続く特許請求の範囲に対する一般性を何ら失うことなく、また特許請求の範囲に限定を課すことなく、述べられている。 Although the following detailed description includes many specific details for purposes of illustration, many variations and modifications to the following details are within the scope of the disclosure as will be appreciated by those skilled in the art. be. Accordingly, the aspects of the disclosure set forth below are set forth without any loss of generality to, or imposing limitations on, the claims following this description.

一般的に言って、個々のＧＰＵが達成できる性能には限界があり、これは、たとえばＧＰＵをどのくらい大きくできるかに対する限界から導かれる。本開示の実施形態では、さらに複雑なシーンをレンダリングするために、またはさらに複雑なアルゴリズム（たとえば、材料、照明など）を用いるために、複数のＧＰＵを用いて単一画像をレンダリングすることが望ましい。詳細には、本開示の種々の実施形態では、スクリーン領域（インターリーブされ得る）に対してジオメトリの事前テストを行うことによってアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うように構成された方法及びシステムについて説明する。複数のＧＰＵは連携して画像を生成する。レンダリングに対するレスポンシビリティ（responsibility）あるいは義務あるいは応答能力を、スクリーン領域に基づいて複数のＧＰＵ間で分割する。ジオメトリをレンダリングする前に、ＧＰＵは、ジオメトリとスクリーン領域に対するその関係とに関する情報を生成する。これにより、ＧＰＵは、ジオメトリをより効率的にレンダリングすることまたはレンダリングを完全に回避することができる。たとえば、これによって、複数のＧＰＵが、より複雑なシーン及び／または画像を同じ時間でレンダリングすることができるという利点がある。 Generally speaking, there is a limit to the performance that an individual GPU can achieve, which is derived, for example, from the limits on how large the GPU can be. In embodiments of the present disclosure, it is desirable to render a single image using multiple GPUs to render more complex scenes or to use more complex algorithms (e.g., materials, lighting, etc.) . In particular, various embodiments of the present disclosure describe methods and systems configured to perform multi-GPU rendering of geometry for applications by pre-testing the geometry against screen regions (which may be interleaved). do. Multiple GPUs work together to generate an image. Responsibility or responsibility for rendering is divided among GPUs based on screen area. Before rendering geometry, the GPU generates information about the geometry and its relationship to the screen area. This allows the GPU to render the geometry more efficiently or avoid rendering altogether. For example, this has the advantage that multiple GPUs can render more complex scenes and/or images in the same amount of time.

種々の実施形態の前述した全般的な理解に基づき、次に実施形態の詳細例について、種々の図面を参照して説明する。 Based on the foregoing general understanding of various embodiments, detailed example embodiments will now be described with reference to various drawings.

明細書の全体にわたって、「アプリケーション」または「ゲーム」または「ビデオゲーム」または「ゲーミングアプリケーション」に言及した場合、入力コマンドの実行を通して指示される任意のタイプの対話型アプリケーションを表すことが意図されている。例示のみを目的として、対話型アプリケーションには、ゲーミング、文書処理、ビデオ処理、ビデオゲーム処理などに対するアプリケーションが含まれる。さらに、前述で導入した用語は交換可能である。 Throughout the specification, references to "application" or "game" or "video game" or "gaming application" are intended to represent any type of interactive application directed through the execution of input commands. there is By way of example only, interactive applications include applications for gaming, word processing, video processing, video game processing, and the like. Moreover, the terms introduced above are interchangeable.

明細書の全体にわたって、本開示の種々の実施形態は、４つのＧＰＵを有する典型的なアーキテクチャを用いてアプリケーションに対するジオメトリのマルチＧＰＵ処理またはレンダリングを行うことについて説明される。しかし、当然のことながら、アプリケーションに対するジオメトリをレンダリングするときに任意の数のＧＰＵ（たとえば、２つ以上のＧＰＵ）が連携してもよい。 Throughout the specification, various embodiments of the present disclosure are described for multi-GPU processing or rendering of geometry for applications using an exemplary architecture having four GPUs. However, it should be appreciated that any number of GPUs (eg, two or more GPUs) may work together when rendering geometry for an application.

図１は、本開示の一実施形態により、アプリケーションに対する画像（たとえば画像フレーム）をレンダリングするときにマルチＧＰＵ処理を行うためのシステムの図である。本開示の実施形態により、システムは、１つ以上のクラウドゲーミングサーバ間でネットワークを介してゲーミングを提供するように構成されており、より具体的には、複数のＧＰＵを連携してアプリケーションの単一画像をレンダリングするように構成されている。クラウドゲーミングには、サーバにおいてビデオゲームを実行して、ゲームレンダリングされたビデオフレームを生成することが含まれる。これは次に、クライアントに送られて表示される。詳細には、システム１００は、レンダリング前にスクリーン領域（インターリーブあるいは交互配置され得る）に対して事前テストを行うことによって、アプリケーションに対するジオメトリの効率的なマルチＧＰＵレンダリングを行うように構成されている。 FIG. 1 is a diagram of a system for multi-GPU processing when rendering images (eg, image frames) for an application, according to one embodiment of the present disclosure. According to an embodiment of the present disclosure, a system is configured to provide gaming over a network between one or more cloud gaming servers, and more specifically, multiple GPUs are coordinated to run single applications. configured to render one image. Cloud gaming involves running a video game on a server to generate game-rendered video frames. This is then sent to the client for display. Specifically, system 100 is configured for efficient multi-GPU rendering of geometry for applications by pre-testing screen regions (which may be interleaved or interleaved) before rendering.

図１では、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ間でのジオメトリのマルチＧＰＵレンダリングの実施態様を例示しているが、本開示の他の実施形態では、スタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むもの）内でレンダリング中に領域テストを行うことによってアプリケーションに対するジオメトリの効率的なマルチＧＰＵレンダリングを行うことが提供される。 While FIG. 1 illustrates an implementation of multi-GPU rendering of geometry across one or more cloud gaming servers of a cloud gaming system, other embodiments of the present disclosure may include standalone systems (e.g., personal computers or Efficient multi-GPU rendering of geometry for applications is provided by performing region testing during rendering in gaming consoles, including high-end graphics cards with multiple GPUs.

また当然のことながら、ジオメトリのマルチＧＰＵレンダリングを、物理ＧＰＵ、または仮想ＧＰＵ、または両方の組み合わせを種々の実施形態で（たとえば、クラウドゲーミング環境においてまたはスタンドアロンシステム内で）用いて、行ってもよい。たとえば、仮想マシン（たとえば、インスタンス）を、ハードウェア層の１つ以上のコンポーネント（たとえば、複数のＣＰＵ、メモリモジュール、ＧＰＵ、ネットワークインターフェース、通信コンポーネントなど）を用いるホストハードウェア（たとえば、データセンタに配置される）のハイパーバイザを用いて、形成してもよい。これらの物理リソースを、ラック（たとえば、ＣＰＵのラック、ＧＰＵのラック、メモリのラックなど）内に配列してもよい。ラック内の物理リソースにはトップオブラックスイッチを用いてアクセスしてもよく、これにより、インスタンスに対して用いるコンポーネントの組み立て及びアクセスを行うための構造を容易にする（たとえば、インスタンスの仮想化コンポーネントを構築するときに）。一般的に、ハイパーバイザは、仮想リソースを用いて構成される複数のインスタンスの複数のゲストオペレーティングシステムを示すことができる。すなわち、オペレーティングシステムはそれぞれ、１つ以上のハードウェアリソース（たとえば、対応するデータセンタに配置される）によってサポートされる仮想化リソースの対応する組を用いて構成してもよい。たとえば、各オペレーティングシステムを、仮想ＣＰＵ、複数の仮想ＧＰＵ、仮想メモリ、仮想化通信コンポーネントなどによってサポートしてもよい。さらに、あるデータセンタから別のデータセンタへインスタンスの設定を転送し、レイテンシを短縮し得る。ユーザのゲーミングセッションを保存するときに、ユーザまたはゲームに対して規定されるＧＰＵ稼働率を用いることができる。ＧＰＵ稼働率は、ゲーミングセッションに対するビデオフレームの高速レンダリングを最適化するための本明細書で説明する任意の数の構成を含むことができる。一実施形態では、ゲームまたはユーザに対して規定されるＧＰＵ稼働率を、構成可能な設定としてデータセンタ間で移すことができる。異なるジオロケーションからゲームをプレイするためにユーザが接続する場合には、ＧＰＵ稼働率設定を移せることで、データセンタからデータセンタへゲームプレイを効率的に移行することができる。 It should also be appreciated that multi-GPU rendering of geometry may be performed using physical GPUs, or virtual GPUs, or a combination of both in various embodiments (e.g., in a cloud gaming environment or within a standalone system). . For example, a virtual machine (e.g., instance) can be placed on host hardware (e.g., in a data center) using one or more components of the hardware layer (e.g., multiple CPUs, memory modules, GPUs, network interfaces, communication components, etc.). may be formed using the hypervisor of the These physical resources may be arranged in racks (eg, racks of CPUs, racks of GPUs, racks of memory, etc.). Physical resources within a rack may be accessed using a top-of-rack switch, which facilitates constructs for assembling and accessing components for use with instances (e.g., the virtualization component of an instance ). In general, a hypervisor can represent multiple instances of multiple guest operating systems configured with virtual resources. That is, each operating system may be configured with a corresponding set of virtualized resources supported by one or more hardware resources (eg, located in corresponding data centers). For example, each operating system may be supported by a virtual CPU, multiple virtual GPUs, virtual memory, virtualized communication components, and the like. Additionally, instance configurations may be transferred from one data center to another to reduce latency. The GPU utilization defined for the user or game can be used when saving the user's gaming session. GPU utilization can include any number of configurations described herein for optimizing fast rendering of video frames for gaming sessions. In one embodiment, GPU utilization defined for a game or user can be transferred between data centers as a configurable setting. When a user connects to play a game from a different geolocation, the ability to transfer GPU utilization settings allows game play to efficiently migrate from data center to data center.

本開示の一実施形態により、システム１００は、クラウドゲームネットワーク１９０を介してゲーミングを提供する。ゲームは、ゲームをプレイしている対応するユーザのクライアントデバイス１１０（たとえば、シンクライアント）から遠隔で実行されている。システム１００は、シングルプレイヤーモードまたはマルチプレイヤーモードのいずれかでネットワーク１５０を介してクラウドゲームネットワーク１９０を通して１つ以上のゲームをプレイしている１人以上のユーザに対するゲーミングコントロールを提供してもよい。いくつかの実施形態では、クラウドゲームネットワーク１９０は、ホストマシンのハイパーバイザ上で実行される複数の仮想マシン（ＶＭ）を含んでいてもよい。１つ以上の仮想マシンが、ホストのハイパーバイザにとって利用可能なハードウェアリソースを用いるゲームプロセッサモジュールを実行するように構成されている。ネットワーク１５０は１つ以上の通信技術を含んでいてもよい。いくつかの実施形態では、ネットワーク１５０は、高度な無線通信システムを有する第５世代（５Ｇ）ネットワーク技術を含んでいてもよい。 According to one embodiment of the present disclosure, system 100 provides gaming via cloud gaming network 190 . The game is running remotely from the client device 110 (eg, thin client) of the corresponding user playing the game. System 100 may provide gaming control for one or more users playing one or more games over cloud gaming network 190 over network 150 in either single-player mode or multiplayer mode. In some embodiments, the cloud gaming network 190 may include multiple virtual machines (VMs) running on the host machine's hypervisor. One or more virtual machines are configured to run game processor modules using hardware resources available to the host's hypervisor. Network 150 may include one or more communication technologies. In some embodiments, network 150 may include fifth generation (5G) network technology with advanced wireless communication systems.

いくつかの実施形態では、無線技術を用いて通信を容易にしてもよい。このような技術には、たとえば、５Ｇ無線通信技術が含まれていてもよい。５Ｇは第５世代のセルラーネットワーク技術である。５Ｇネットワークはデジタルセルラーネットワークであり、ここでは、プロバイダがカバーするサービスエリアが、セルと言われる小さい地理的領域に分割される。音及び画像を表すアナログ信号は、電話機内でデジタル化され、アナログデジタル変換器によって変換されて、ビットのストリームとして送信される。セル内のすべての５Ｇ無線デバイスは、セル内のローカルアンテナアレイ及び低パワー自動化送受信装置（送信部及び受信部）を用いて電波によって通信し、この通信は、他のセル内で再使用される周波数のプールから送受信装置によって割り当てられた周波数チャネル上で行われる。ローカルアンテナは、高帯域幅光ファイバまたは無線バックホール接続によって電話ネットワーク及びインターネットと接続される。他のセルネットワークの場合と同様に、モバイルデバイスがあるセルから別のセルへ横断すると、新しいセルに自動的に移される。当然のことながら、５Ｇネットワークは単に通信ネットワークのタイプ例であり、本開示の実施形態では、前の世代の無線または有線通信、ならびに５Ｇの後に来る後の世代の有線または無線技術を用いてもよい。 In some embodiments, wireless technology may be used to facilitate communication. Such technologies may include, for example, 5G wireless communication technologies. 5G is the fifth generation cellular network technology. A 5G network is a digital cellular network, where the service area covered by a provider is divided into small geographical areas called cells. Analog signals representing sounds and images are digitized within the phone, converted by an analog-to-digital converter, and transmitted as a stream of bits. All 5G wireless devices within a cell communicate over the air using local antenna arrays and low power automated transceivers (transmitters and receivers) within the cell, and this communication is reused in other cells. It takes place on a frequency channel assigned by the transceiver from a pool of frequencies. The local antenna is connected to the telephone network and the Internet by high-bandwidth fiber optic or wireless backhaul connections. As with other cell networks, when a mobile device traverses from one cell to another, it is automatically transferred to the new cell. Of course, a 5G network is merely an example of a type of communication network, and embodiments of the present disclosure may use previous generations of wireless or wireline communications, as well as later generations of wireline or wireless technologies that will come after 5G. good.

図示したように、クラウドゲームネットワーク１９０には、複数のビデオゲームにアクセスを提供するゲームサーバ１６０が含まれる。ゲームサーバ１６０は、クラウド内で利用できる任意のタイプのサーバコンピューティングデバイスであってもよく、１つ以上のホスト上で実行される１つ以上の仮想マシンとして構成してもよい。たとえば、ゲームサーバ１６０は、ユーザに対するゲームのインスタンスをインスタンス化するゲームプロセッサをサポートする仮想マシンを管理してもよい。したがって、複数の仮想マシンに対応付けられるゲームサーバ１６０の複数のゲームプロセッサは、複数のユーザのゲームプレイに対応付けられる１つ以上のゲームの複数のインスタンスを実行するように構成されている。このように、バックエンドサーバサポートは、複数のゲーミングアプリケーションのゲームプレイの媒体（たとえば、ビデオ、オーディオなど）のストリーミングを、複数の対応するユーザに提供する。すなわち、ゲームサーバ１６０は、データ（たとえば、対応するゲームプレイのレンダリング画像及び／またはフレーム）を、対応するクライアントデバイス１１０にネットワーク１５０を通してストリーミングによって戻すように構成されている。このように、コンピュータ的に複雑なゲーミングアプリケーションを、クライアントデバイス１１０が受け取って転送するコントローラ入力に応じて、バックエンドサーバで実行してもよい。各サーバは画像及び／またはフレームをレンダリングすることができ、これらは次に、エンコード（たとえば圧縮）され、対応するクライアントデバイスにストリーミングされて表示される。 As shown, cloud gaming network 190 includes game servers 160 that provide access to multiple video games. Game server 160 may be any type of server computing device available in the cloud and may be configured as one or more virtual machines running on one or more hosts. For example, game server 160 may manage a virtual machine that supports a game processor that instantiates instances of games for users. Accordingly, multiple game processors of game server 160 associated with multiple virtual machines are configured to execute multiple instances of one or more games associated with multiple users' gameplay. Thus, the backend server support provides streaming of gameplay media (eg, video, audio, etc.) of multiple gaming applications to multiple corresponding users. That is, game servers 160 are configured to stream data (eg, rendered images and/or frames of corresponding gameplay) back to corresponding client devices 110 over network 150 . In this manner, computationally complex gaming applications may be executed at the backend server in response to controller inputs received and forwarded by the client device 110 . Each server can render images and/or frames, which are then encoded (eg, compressed) and streamed to corresponding client devices for display.

たとえば、複数のユーザは、ストリーミングメディアを受け取るように構成された対応するクライアントデバイス１１０を用いて、通信ネットワーク１５０を介してクラウドゲームネットワーク１９０にアクセスしてもよい。一実施形態では、クライアントデバイス１１０をシンクライアントとして構成して、計算機能（たとえば、ゲームタイトル処理エンジン１１１を含む）を提供するように構成されたバックエンドサーバ（たとえば、クラウドゲームネットワーク１９０）との相互連絡を提供してもよい。別の実施形態では、クライアントデバイス１１０を、ビデオゲームの少なくとも何らかのローカル処理を行うためのゲームタイトル処理エンジン及びゲームロジックを用いて構成してもよく、さらに、バックエンドサーバで実行されるビデオゲームが生成するストリーミングコンテンツを受け取るために、またはバックエンドサーバサポートが提供する他のコンテンツに対して用いてもよい。ローカル処理に対しては、ゲームタイトル処理エンジンには、ビデオゲームに対応付けられるビデオゲーム及びサービスを実行するための基本プロセッサベースの機能が含まれる。その場合、ゲームロジックを、ローカルクライアントデバイス１１０上に記憶して、ビデオゲームを実行するために用いてもよい。 For example, multiple users may access cloud gaming network 190 via communications network 150 with corresponding client devices 110 configured to receive streaming media. In one embodiment, client device 110 is configured as a thin client to communicate with a backend server (eg, cloud gaming network 190) configured to provide computing functionality (eg, including game title processing engine 111). Intercommunication may be provided. In another embodiment, the client device 110 may be configured with a game title processing engine and game logic for at least some local processing of the video game, and the video game running on the back end server may be It may be used to receive streaming content that it produces, or for other content provided by backend server support. For local processing, the game title processing engine includes basic processor-based functionality for executing video games and services associated with the video game. The game logic may then be stored on the local client device 110 and used to execute the video game.

クライアントデバイス１１０はそれぞれ、クラウドゲームネットワークから異なるゲームへのアクセスをリクエストしていてもよい。たとえば、クラウドゲームネットワーク１９０は、ゲームサーバ１６０のＣＰＵリソース１６３及びＧＰＵリソース３６５を用いて実行されるように、ゲームタイトル処理エンジン１１１上に構築される１つ以上のゲームロジックを実行していてもよい。たとえば、ゲームロジック１１５ａはゲームタイトル処理エンジン１１１と連携して、１つのクライアントに対してゲームサーバ１６０上で実行していてもよく、ゲームロジック１１５ｂはゲームタイトル処理エンジン１１１と連携して、２番目のクライアントに対してゲームサーバ１６０上で実行していてもよく、・・・またゲームロジック１１５ｎはゲームタイトル処理エンジン１１１と連携して、ｎ番目のクライアントに対してゲームサーバ１６０上で実行していてもよい。 Each client device 110 may request access to a different game from the cloud gaming network. For example, cloud gaming network 190 may be executing one or more game logic built on game title processing engine 111 to execute using CPU resources 163 and GPU resources 365 of game server 160. good. For example, game logic 115a may work with game title processing engine 111 and be running on game server 160 for one client, and game logic 115b may work with game title processing engine 111 and run on a second client. , and the game logic 115n cooperates with the game title processing engine 111 to execute on the game server 160 for the nth client. may

詳細には、対応するユーザ（図示せず）のクライアントデバイス１１０は、通信ネットワーク１５０（たとえば、インターネット）を介してゲームへのアクセスをリクエストするように、またゲームサーバ１６０が実行するビデオゲームによって生成される表示画像（たとえば、画像フレーム）をレンダリングするように、構成されている。エンコード画像は、クライアントデバイス１１０に送出されて、対応するユーザに関連して表示される。たとえば、ユーザは、ゲームサーバ１６０のゲームプロセッサ上で実行されているビデオゲームのインスタンスと、クライアントデバイス１１０を通してやり取りしていてもよい。より詳細には、ビデオゲームのインスタンスはゲームタイトル処理エンジン１１１によって実行される。ビデオゲームを実施する対応するゲームロジック（たとえば、実行可能コード）１１５は、データストア（図示せず）を通して記憶されてアクセス可能であり、ビデオゲームを実行するために用いられる。ゲームタイトル処理エンジン１１１は、複数のゲームロジック（たとえば、ゲーミングアプリケーション）を用いて、複数のビデオゲームをサポートすることができ、各ゲームロジックは、ユーザによって選択可能である。 Specifically, a corresponding user's (not shown) client device 110 requests access to a game over a communications network 150 (eg, the Internet) and is generated by a video game executed by a game server 160 . is configured to render a displayed image (eg, an image frame) to be displayed. The encoded images are sent to the client device 110 and displayed in relation to the corresponding user. For example, a user may interact through client device 110 with an instance of a video game running on a game processor of game server 160 . More specifically, instances of video games are executed by game title processing engine 111 . Corresponding game logic (eg, executable code) 115 that implements the video game is stored and accessible through a data store (not shown) and used to execute the video game. Game title processing engine 111 can support multiple video games with multiple game logics (eg, gaming applications), each game logic selectable by a user.

たとえば、クライアントデバイス１１０は、対応するユーザのゲームプレイに関連するゲームタイトル処理エンジン１１１と、たとえば、ゲームプレイを駆動するために用いる入力コマンドを通して、相互に作用するように構成されている。詳細には、クライアントデバイス１１０は、種々のタイプの入力デバイス、たとえば、ゲームコントローラ、タブレットコンピュータ、キーボード、ビデオカメラによって取り込まれたジェスチャ、マウス、タッチパッドなどから入力を受け取ってもよい。クライアントデバイス１１０は、ネットワーク１５０を介してゲームサーバ１６０に接続することができるメモリ及びプロセッサモジュールを少なくとも有する任意のタイプのコンピューティングデバイスとすることができる。バックエンドのゲームタイトル処理エンジン１１１は、レンダリング画像を生成するように構成されている。レンダリング画像は、ネットワーク１５０を介して送出されて、クライアントデバイス１１０に関連する対応するディスプレイにおいて表示される。たとえば、クラウドベースのサービスを通して、ゲームレンダリング画像を、ゲームサーバ１６０のゲーム実行エンジン１１１上で実行されている対応するゲーム（たとえば、ゲームロジック）のインスタンスが送出してもよい。すなわち、クライアントデバイス１１０は、エンコード画像（たとえば、ビデオゲームの実行を通して生成されるゲームレンダリング画像からエンコードされる）を受け取るように、またレンダリングされる画像をディスプレイ１１上に表示するように構成されている。一実施形態では、ディスプレイ１１は、ＨＭＤ（たとえば、ＶＲコンテンツを表示する）を含む。いくつかの実施形態では、レンダリング画像を、スマートフォンまたはタブレットに、無線または有線で、クラウドベースのサービスから直接にまたはクライアントデバイス１１０（たとえば、プレイステーション（登録商標）リモートプレイ）を介して、ストリーミングしてもよい。 For example, the client device 110 is configured to interact with the game title processing engine 111 associated with the corresponding user's gameplay, eg, through input commands used to drive the gameplay. In particular, client device 110 may receive input from various types of input devices, such as game controllers, tablet computers, keyboards, gestures captured by video cameras, mice, touch pads, and the like. Client device 110 may be any type of computing device having at least a memory and processor module that can be connected to game server 160 via network 150 . The backend game title processing engine 111 is configured to generate rendered images. The rendered image is sent over network 150 and displayed on a corresponding display associated with client device 110 . For example, through a cloud-based service, game-rendered images may be submitted by an instance of a corresponding game (eg, game logic) running on game execution engine 111 of game server 160 . That is, client device 110 is configured to receive encoded images (eg, encoded from game-rendered images generated through execution of a video game) and to display the rendered images on display 11 . there is In one embodiment, display 11 includes an HMD (eg, for displaying VR content). In some embodiments, the rendered image is streamed to a smartphone or tablet, wirelessly or wired, directly from a cloud-based service or via a client device 110 (e.g., PlayStation Remote Play). good too.

一実施形態では、ゲームサーバ１６０及び／またはゲームタイトル処理エンジン１１１には、ゲーミングアプリケーションに対応付けられるゲーム及びサービスを実行するための基本プロセッサベースの機能が含まれる。たとえば、ゲームサーバ１６０には、プロセッサベースの機能（たとえば、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプティング、オーディオ、アニメーション、グラフィックス処理、照明、シェーディング、ラスタライゼーション、レイトレーシング、シャドーイング、カリング、変換、人工知能など）を行うように構成された中央処理ユニット（ＣＰＵ）リソース１６３及びグラフィックス処理ユニット（ＧＰＵ）リソース３６５が含まれる。加えて、ＣＰＵ及びＧＰＵグループは、ゲーミングアプリケーションに対するサービス（メモリ管理、マルチスレッド管理、サービスの質（ＱｏＳ）、バンド幅テスト、ソーシャルネットワーキング、ソーシャルフレンズの管理、フレンズのソーシャルネットワークとの通信、通信チャネル、テキスティング、インスタントメッセージ、チャットサポートなどを部分的に含む）を実行してもよい。一実施形態では、１つ以上のアプリケーションは特定のＧＰＵリソースを共有する。一実施形態では、複数のＧＰＵデバイスを結合して、対応するＣＰＵ上で実行されている単一アプリケーションに対するグラフィックス処理を実行してもよい。 In one embodiment, game server 160 and/or game title processing engine 111 include basic processor-based functionality for executing games and services associated with gaming applications. For example, game server 160 may include processor-based functions (e.g., 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformation, , including central processing unit (CPU) resources 163 and graphics processing unit (GPU) resources 365 configured to perform processing such as artificial intelligence. In addition, the CPU and GPU groups provide services for gaming applications (memory management, multi-thread management, quality of service (QoS), bandwidth testing, social networking, management of social friends, communication with friends' social networks, communication channels). , texting, instant messaging, chat support, etc.). In one embodiment, one or more applications share certain GPU resources. In one embodiment, multiple GPU devices may be combined to perform graphics processing for a single application running on the corresponding CPUs.

一実施形態では、クラウドゲームネットワーク１９０は分散ゲームサーバシステム及び／またはアーキテクチャである。詳細には、ゲームロジックを実行する分散ゲームエンジンは、対応するゲームの対応するインスタンスとして構成される。一般的に、分散ゲームエンジンは、ゲームエンジンの機能のそれぞれを取って、それらの機能を多数の処理エンティティが実行するように分配する。個々の機能をさらに、１つ以上の処理エンティティにわたって分配することができる。処理エンティティを異なる構成（たとえば、物理ハードウェア）で、及び／または仮想コンポーネントまたは仮想マシンとして、及び／または仮想コンテナとして構成してもよい。コンテナは、仮想化オペレーティングシステム上で実行されるゲーミングアプリケーションのインスタンスを仮想化するため、仮想マシンとは異なる。処理エンティティは、クラウドゲームネットワーク１９０のサーバ及びその基礎をなすハードウェア（１つ以上のサーバ（計算ノード）上にある）を使用し及び／またはそれらに依拠してもよい。サーバは１つ以上のラックに配置してもよい。種々の処理エンティティに対するこれらの機能の実行の調整、割り当て、及び管理は、分散同期層が行う。このように、これらの機能の実行を分散同期層が制御して、プレーヤによるコントローラ入力に応じたゲーミングアプリケーションに対する媒体（たとえばビデオフレーム、オーディオなど）の生成を可能にする。分散同期層は、これらの機能を、分散させた処理エンティティにわたって効率的に実行して（たとえば、ロードバランシングを通して）、重要なゲームエンジンコンポーネント／機能を分散させて再組み立てして、より効率的な処理が行われるようにすることができる。 In one embodiment, cloud gaming network 190 is a distributed gaming server system and/or architecture. Specifically, distributed game engines that execute game logic are configured as corresponding instances of corresponding games. Generally, a distributed game engine takes each of the functions of the game engine and distributes those functions to multiple processing entities for execution. Individual functions can also be distributed across one or more processing entities. Processing entities may be configured in different configurations (eg, physical hardware) and/or as virtual components or virtual machines and/or as virtual containers. A container differs from a virtual machine because it virtualizes an instance of a gaming application running on a virtualized operating system. The processing entity may use and/or rely on the servers and underlying hardware of cloud gaming network 190 (reside on one or more servers (computation nodes)). Servers may be placed in one or more racks. A distributed synchronization layer coordinates, allocates, and manages the execution of these functions for the various processing entities. As such, the distributed synchronization layer controls the execution of these functions to enable the generation of media (eg, video frames, audio, etc.) to the gaming application in response to controller inputs by the player. The distributed synchronization layer efficiently performs these functions across distributed processing entities (e.g., through load balancing) to distribute and reassemble critical game engine components/functions to make them more efficient. processing can take place.

図２は、本開示の一実施形態により、複数のＧＰＵが連携して対応するアプリケーションの単一画像をレンダリングする典型的なマルチＧＰＵアーキテクチャ２００の図である。当然のことながら、本開示の種々の実施形態において、複数のＧＰＵが連携して単一画像をレンダリングする多くのアーキテクチャが可能であるが、明示的に説明することも図示することもしない。たとえば、レンダリング中に領域テストを行うことによるアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ間で実行してもよいし、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むものなど）内で実行してもよい。 FIG. 2 is a diagram of a typical multi-GPU architecture 200 in which multiple GPUs work together to render a single image of a corresponding application, according to one embodiment of the present disclosure. Of course, many architectures for multiple GPUs working together to render a single image are possible in various embodiments of the present disclosure, but are not explicitly described or illustrated. For example, multi-GPU rendering of geometry to an application by performing area tests during rendering may be performed between one or more cloud gaming servers of a cloud gaming system, or a standalone system (e.g., personal computer or gaming consoles, including high-end graphics cards with multiple GPUs).

マルチＧＰＵアーキテクチャ２００には、アプリケーションに対する単一画像及び／またはアプリケーションに対する画像列内の各画像のマルチＧＰＵレンダリングを行うように構成されたＣＰＵ１６３及び複数のＧＰＵが含まれている。詳細には、ＣＰＵ１６３及びＧＰＵリソース３６５は、プロセッサベースの機能（たとえば、前述したように、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプティング、オーディオ、アニメーション、グラフィックス処理、照明、シェーディング、ラスタライゼーション、レイトレーシング、シャドーイング、カリング、変換、人工知能など）を行うように構成されている。 Multi-GPU architecture 200 includes CPU 163 and multiple GPUs configured to perform multi-GPU rendering of a single image for an application and/or each image in an image sequence for an application. In particular, CPU 163 and GPU resources 365 are used for processor-based functions (e.g., 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, raytracing, as discussed above). , shadowing, culling, transformations, artificial intelligence, etc.).

たとえば、マルチＧＰＵアーキテクチャ２００のＧＰＵリソース３６５には４つのＧＰＵが示されているが、アプリケーションに対する画像をレンダリングするときには任意の数のＧＰＵを用いてもよい。各ＧＰＵは、対応する専用メモリ（たとえば、ランダムアクセスメモリ（ＲＡＭ））に高速バス２２０を介して接続されている。詳細には、ＧＰＵ－Ａはメモリ２１０Ａ（たとえば、ＲＡＭ）にバス２２０を介して接続され、ＧＰＵ－Ｂはメモリ２１０Ｂ（たとえば、ＲＡＭ）にバス２２０を介して接続され、ＧＰＵ－Ｃはメモリ２１０Ｃ（たとえば、ＲＡＭ）にバス２２０を介して接続され、ＧＰＵ－Ｄはメモリ２１０Ｄ（たとえば、ＲＡＭ）にバス２２０を介して接続されている。 For example, although four GPUs are shown in GPU resource 365 of multi-GPU architecture 200, any number of GPUs may be used when rendering images for an application. Each GPU is connected to a corresponding dedicated memory (eg, random access memory (RAM)) via high speed bus 220 . Specifically, GPU-A is connected to memory 210A (eg, RAM) via bus 220, GPU-B is connected to memory 210B (eg, RAM) via bus 220, and GPU-C is connected to memory 210C. (eg, RAM) via bus 220, and GPU-D is connected via bus 220 to memory 210D (eg, RAM).

さらに、各ＧＰＵは、バス２４０を介して互いに接続されている。バス２４０は、アーキテクチャに応じて、対応するＧＰＵとその対応するメモリとの間の通信に対して用いるバス２２０と速度がほぼ等しいかまたはそれよりも遅い場合がある。たとえば、ＧＰＵ－Ａは、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれと、バス２４０を介して接続されている。また、ＧＰＵ－Ｂは、ＧＰＵ－Ａ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれと、バス２４０を介して接続されている。加えて、ＧＰＵ－Ｃは、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｄのそれぞれと、バス２４０を介して接続されている。さらに、ＧＰＵ－Ｄは、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｃのそれぞれと、バス２４０を介して接続されている。 Furthermore, each GPU is connected to each other via bus 240 . Bus 240 may be approximately equal in speed or slower than bus 220 used for communication between the corresponding GPU and its corresponding memory, depending on the architecture. For example, GPU-A is connected via bus 240 to each of GPU-B, GPU-C, and GPU-D. GPU-B is also connected to each of GPU-A, GPU-C, and GPU-D via bus 240 . Additionally, GPU-C is connected to each of GPU-A, GPU-B, and GPU-D via bus 240 . Furthermore, GPU-D is connected to each of GPU-A, GPU-B, and GPU-C via bus 240 .

ＣＰＵ１６３は、ＧＰＵのそれぞれと、より低速度のバス２３０を介して接続されている（たとえば、バス２３０は、対応するＧＰＵとその対応するメモリとの間の通信に対して用いるバス２２０よりも遅い）。詳細には、ＣＰＵ１６３は、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄのそれぞれと接続されている。 CPU 163 is connected to each of the GPUs via a slower bus 230 (eg bus 230 is slower than bus 220 used for communication between the corresponding GPU and its corresponding memory). ). Specifically, the CPU 163 is connected to each of GPU-A, GPU-B, GPU-C, and GPU-D.

図３は、本開示の一実施形態により、レンダリング前にスクリーン領域（インターリーブされ得る）に対して事前テストを行うことによって、アプリケーションよって生成された画像フレームに対するジオメトリのマルチＧＰＵレンダリングを行うように構成されたグラフィックス処理ユニットリソース３６５の図である。たとえば、ゲームサーバ１６０を、図１のクラウドゲームネットワーク１９０内のＧＰＵリソース３６５を含むように構成してもよい。図示するように、ＧＰＵリソース３６５は、複数のＧＰＵ（たとえば、ＧＰＵ３６５ａ、ＧＰＵ３６５ｂ・・・ＧＰＵ３６５ｎ）を含んでいる。前述したように、種々のアーキテクチャに、レンダリング中の領域テストを通してアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって、複数のＧＰＵが連携して単一画像をレンダリングすることが含まれていてもよい。たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ間でジオメトリのマルチＧＰＵレンダリングを実施すること、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むものなど）内でジオメトリのマルチＧＰＵレンダリングを実施することである。 FIG. 3 is configured for multi-GPU rendering of geometry for image frames generated by an application by pre-testing on screen regions (which may be interleaved) before rendering, according to one embodiment of the present disclosure. 4 is a diagram of the graphics processing unit resources 365 allocated. FIG. For example, game server 160 may be configured to include GPU resources 365 in cloud gaming network 190 of FIG. As shown, GPU resource 365 includes multiple GPUs (eg, GPU 365a, GPU 365b . . . GPU 365n). As noted above, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application through region testing during rendering. For example, performing multi-GPU rendering of geometry across one or more cloud gaming servers in a cloud gaming system, or a standalone system (e.g., a personal computer or gaming console that includes a high-end graphics card with multiple GPUs) is to perform multi-GPU rendering of the geometry within a

詳細には、一実施形態では、ゲームサーバ１６０は、アプリケーションの単一画像をレンダリングするときにマルチＧＰＵ処理を実行するように構成されていて、複数のＧＰＵが連携して単一画像をレンダリングし、及び／またはアプリケーションを実行するときに画像列の１つ以上の各画像をレンダリングする。たとえば、一実施形態では、ゲームサーバ１６０は、アプリケーションの画像列内の１つ以上の各画像のマルチＧＰＵレンダリングを実行するように構成されたＣＰＵ及びＧＰＵグループを含んでいてもよい。ここで、１つのＣＰＵ及びＧＰＵグループが、グラフィックスを実行していることができ、及び／またはアプリケーションに対するパイプラインをレンダリングしていることができる。ＣＰＵ及びＧＰＵグループを、１つ以上の処理デバイスとして構成することができる。前述したように、ＧＰＵ及びＧＰＵグループは、ＣＰＵ１６３及びＧＰＵリソース３６５を含んでいてもよい。ＣＰＵ１６３及びＧＰＵリソース３６５は、プロセッサベースの機能（たとえば、２Ｄまたは３Ｄレンダリング、物理シミュレーション、スクリプティング、オーディオ、アニメーション、グラフィックス処理、照明、シェーディング、ラスタライゼーション、レイトレーシング、シャドーイング、カリング、変換、人工知能など）を行うように構成されている。 Specifically, in one embodiment, game server 160 is configured to perform multi-GPU processing when rendering a single image of an application, with multiple GPUs working together to render the single image. , and/or render each of the one or more images in the sequence of images when executing the application. For example, in one embodiment, game server 160 may include a group of CPUs and GPUs configured to perform multi-GPU rendering of one or more of each image in an application's sequence of images. Here, one CPU and GPU group may be running graphics and/or rendering pipelines for applications. A group of CPUs and GPUs can be configured as one or more processing devices. As previously mentioned, GPUs and GPU groups may include CPUs 163 and GPU resources 365 . CPU 163 and GPU resources 365 are used for processor-based functions (e.g., 2D or 3D rendering, physics simulation, scripting, audio, animation, graphics processing, lighting, shading, rasterization, ray tracing, shadowing, culling, transformations, artificial intelligence, etc.).

ＧＰＵリソース３６５は、オブジェクトのレンダリング（たとえば、オブジェクトのピクセルに対する色または法線ベクトル値を複数のレンダリングターゲット－ＭＲＴに書き込むこと）及び同期型計算カーネルの実行（たとえば、結果として生じるＭＲＴに対するフルスクリーン効果）にレスポンシビリティを有し及び／またはこれらを行うように構成されている。実行すべき同期型計算及びレンダリングすべきオブジェクトは、ＧＰＵが実行するレンダリングコマンドバッファ３２５に含まれるコマンドによって指定される。詳細には、ＧＰＵリソース３６５は、レンダリングコマンドバッファ３２５からのコマンドを実行するときに、オブジェクトをレンダリングして、（たとえば、同期型計算カーネルを実行する間に）同期型計算を行うように構成され、コマンド及び／または動作は、それらが順次行われるように、他の動作に依存し得る。 GPU resources 365 are responsible for rendering objects (e.g., writing color or normal vector values for pixels of objects to multiple render targets—MRTs) and executing synchronous computation kernels (e.g., performing full-screen effects on the resulting MRTs). ) and/or configured to do so. The synchronous computations to be performed and the objects to be rendered are specified by commands contained in the rendering command buffer 325 executed by the GPU. In particular, GPU resource 365 is configured to render objects and perform synchronous computation (eg, while executing synchronous computation kernels) when executing commands from rendering command buffer 325 . , commands and/or actions may depend on other actions such that they are performed sequentially.

たとえば、ＧＰＵリソース３６５は、同期型計算を行うように、及び／または１つ以上のレンダリングコマンドバッファ３２５（たとえば、レンダリングコマンドバッファ３２５ａ、レンダリングバッファ３２５ｂ・・・レンダリングコマンドバッファ３２５ｎ）を用いてオブジェクトのレンダリングを行うように、構成されている。一実施形態では、ＧＰＵリソース３６５における各ＧＰＵはその独自のコマンドバッファを有していてもよい。代替的に、実質的に同じ組のオブジェクトが各ＧＰＵによってレンダリングされているとき（たとえば、領域のサイズが小さいために）、ＧＰＵリソース３６５におけるＧＰＵは、同じコマンドバッファまたは同じ組のコマンドバッファを用いてもよい。さらに、ＧＰＵリソース３６５におけるＧＰＵのそれぞれが、コマンドをあるＧＰＵによって実行できるが、別のコマンドでは実行できないことをサポートしてもよい。たとえば、レンダリングコマンドバッファ内の描画コマンドまたはプレディケーション（predication）上にフラッグがあると、単一のＧＰＵが、対応するコマンドバッファ内の１つ以上のコマンドを実行できるが、他のＧＰＵはコマンドを無視する。たとえば、レンダリングコマンドバッファ３２５ａはフラッグ３３０ａをサポートしてもよく、レンダリングコマンドバッファ３２５ｂはフラッグ３３０ｂをサポートしてもよく・・・レンダリングコマンドバッファ３２５ｎはフラッグ３３０ｎをサポートしてもよい。 For example, GPU resource 365 may use one or more rendering command buffers 325 (eg, rendering command buffer 325a, rendering buffer 325b, . configured to render. In one embodiment, each GPU in GPU resources 365 may have its own command buffer. Alternatively, when substantially the same set of objects are being rendered by each GPU (e.g., due to the small size of the regions), the GPUs in GPU resources 365 may use the same command buffers or the same set of command buffers. may Additionally, each of the GPUs in GPU resource 365 may support that commands can be executed by one GPU but not by another. For example, a flag on a drawing command or predication in the rendering command buffer may cause a single GPU to execute one or more commands in the corresponding command buffer, while other GPUs may execute the command. ignore. For example, rendering command buffer 325a may support flag 330a, rendering command buffer 325b may support flag 330b...rendering command buffer 325n may support flag 330n.

同期型計算を行うこと（たとえば、同期型計算カーネルの実行）及びオブジェクトのレンダリングは、レンダリング全体の一部分である。たとえば、ビデオゲームが６０Ｈｚ（たとえば、６０フレーム／秒）で実行されている場合、画像フレームに対するすべてのオブジェクトレンダリング及び同期型計算カーネルの実行は通常、ほぼ１６．６７ｍｓ（たとえば、６０Ｈｚで１フレーム）内で完了しなければならない。前述したように、オブジェクトをレンダリングし及び／または同期型計算カーネルを実行するときに行う動作は順序付けされており、動作は他の動作に依存し得る（たとえば、レンダリングコマンドバッファ内のコマンドは、そのレンダリングコマンドバッファ内の他のコマンドが実行できる前に、実行を完了する必要があり得る）。 Performing synchronous computation (eg, executing synchronous computation kernels) and rendering objects are part of the overall rendering. For example, if a video game is running at 60 Hz (e.g., 60 frames/sec), execution of all object rendering and synchronous computation kernels for an image frame is typically approximately 16.67 ms (e.g., 1 frame at 60 Hz). must be completed within As mentioned above, the actions taken when rendering an object and/or executing a synchronous computation kernel are ordered, and actions may depend on other actions (e.g., a command in the rendering command buffer may execution may need to complete before other commands in the rendering command buffer can execute).

詳細には、レンダリングコマンドバッファ３２５はそれぞれ、種々のタイプのコマンドを含んでいる（たとえば、対応するＧＰＵ構成に影響するコマンド（たとえば、レンダリングターゲットの場所及びフォーマットを指定するコマンド）、ならびにオブジェクトをレンダリングし及び／または同期型計算カーネルを実行するコマンド）。説明の目的上、同期型計算カーネルを実行するときに行う同期型計算には、オブジェクトがすべて１つ以上の対応する複数のレンダリングターゲット（ＭＲＴ：Multiple Render Targets）にレンダリングされたときにフルスクリーン効果を行うことが含まれていてもよい。 Specifically, each of the rendering command buffers 325 contains various types of commands (eg, commands that affect the corresponding GPU configuration (eg, commands that specify the location and format of the render target), as well as commands to render objects. and/or commands to execute synchronous computational kernels). For illustrative purposes, synchronous computation when running a synchronous computation kernel includes full-screen effects when objects are all rendered to one or more corresponding Multiple Render Targets (MRTs). may include performing

加えて、ＧＰＵリソース３６５が画像フレームに対するオブジェクトをレンダリングするとき、及び／または画像フレームを生成するときに同期型計算カーネルを実行するときに、ＧＰＵリソース３６５は各ＧＰＵ３６５ａ、３６５ｂ・・・３６５ｎのレジスタを介して構成される。たとえば、ＧＰＵ３６５ａは、そのレジスタ３４０（たとえばレジスタ３４０ａ、レジスタ３４０ｂ・・・レジスタ３４０ｎ）を介して、そのレンダリングを行うかまたは特定の方法でカーネル実行を計算するように構成される。すなわち、レジスタ３４０に記憶される値は、画像フレームに対するオブジェクトをレンダリングし及び／または同期型計算カーネルを実行するために用いるレンダリングコマンドバッファ３２５内のコマンドを実行するときに、ＧＰＵ３６５ａに対するハードウェアコンテキスト（たとえば、ＧＰＵ構成またはＧＰＵ状態）を規定する。ＧＰＵリソース３６５におけるＧＰＵのそれぞれを同様に構成して、ＧＰＵ３６５ｂが、そのレジスタ３５０（たとえば、レジスタ３５０ａ、レジスタ３５０ｂ・・・レジスタ３５０ｎ）を介して、そのレンダリングを実行するかまたは特定の方法でカーネル実行を計算するように構成され、・・・ＧＰＵ３６５ｎが、そのレジスタ３７０（たとえば、レジスタ３７０ａ、レジスタ３７０ｂ・・・レジスタ３７０ｎ）を介して、そのレンダリングを実行するかまたは特定の方法でカーネル実行を計算するように構成されるようにしてもよい。 Additionally, when GPU resources 365 render an object for an image frame and/or execute a synchronous computation kernel when generating an image frame, GPU resources 365 register each GPU 365a, 365b . Configured via For example, GPU 365a is configured through its registers 340 (eg, registers 340a, 340b . . . 340n) to perform its rendering or compute kernel execution in a particular manner. That is, the values stored in registers 340 are used in the hardware context ( for example, GPU configuration or GPU state). Each of the GPUs in GPU resources 365 is similarly configured such that GPU 365b performs its rendering or kernel processing in a particular manner through its registers 350 (eg, registers 350a, 350b . . . 350n). GPU 365n, through its registers 370 (e.g., registers 370a, 370b, . It may be configured to calculate

ＧＰＵ構成のいくつかの例としては、レンダリングターゲット（たとえば、ＭＲＴ）の場所及びフォーマットが挙げられる。また、ＧＰＵ構成の他の例としては、操作手順が挙げられる。たとえば、オブジェクトをレンダリングするとき、オブジェクトの各ピクセルのＺ値を、Ｚバッファと種々の方法で比較することができる。たとえば、オブジェクトピクセルを書き込むのは、オブジェクトＺ値がＺバッファ内の値とマッチする場合のみである。あるいは、オブジェクトピクセルを書き込むことができるのは、オブジェクトＺ値がＺバッファ内の値と同じかまたはそれを下回る場合のみである。行うテストのタイプはＧＰＵ構成内で規定される。 Some examples of GPU configuration include render target (eg, MRT) location and format. Another example of the GPU configuration is an operating procedure. For example, when rendering an object, the Z value of each pixel of the object can be compared to the Z-buffer in various ways. For example, write an object pixel only if the object Z value matches the value in the Z buffer. Alternatively, an object pixel can be written only if the object Z value is equal to or less than the value in the Z buffer. The type of test to perform is defined within the GPU configuration.

図４は、本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするようにマルチＧＰＵ処理用に構成されたグラフィックスパイプライン４００を実施するレンダリングアーキテクチャの略図である。グラフィックスパイプライン４００は、３Ｄ（３次元）ポリゴンレンダリングプロセスを用いて画像をレンダリングするための一般的なプロセスを例示する。レンダリング画像に対するグラフィックスパイプライン４００は、ディスプレイ内の各ピクセルに対する対応する色情報を出力する。色情報は、テクスチャ及びシェーディング（たとえば、色、シャドーイングなど）を表し得る。グラフィックスパイプライン４００は、図１及び３のクライアントデバイス１１０、ゲームサーバ１６０、ゲームタイトル処理エンジン１１１、及び／またはＧＰＵリソース３６５内で実施可能であり得る。すなわち、種々のアーキテクチャは、レンダリング中の領域テストを通してアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって複数のＧＰＵが連携して単一画像をレンダリングすることを含んでいてもよい。たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ間でジオメトリのマルチＧＰＵレンダリングを実施すること、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むものなど）内でジオメトリのマルチＧＰＵレンダリングを実施することである。 FIG. 4 is a schematic illustration of a rendering architecture implementing a graphics pipeline 400 configured for multi-GPU processing such that multiple GPUs work together to render a single image, according to one embodiment of the present disclosure. Graphics pipeline 400 illustrates a general process for rendering an image using a 3D (three-dimensional) polygon rendering process. The graphics pipeline 400 for rendered images outputs corresponding color information for each pixel in the display. Color information may represent texture and shading (eg, color, shadowing, etc.). Graphics pipeline 400 may be implementable within client device 110, game server 160, game title processing engine 111, and/or GPU resource 365 of FIGS. That is, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application through region testing during rendering. For example, performing multi-GPU rendering of geometry across one or more cloud gaming servers in a cloud gaming system, or a standalone system (e.g., a personal computer or gaming console that includes a high-end graphics card with multiple GPUs) ) is to perform multi-GPU rendering of the geometry.

図示したように、グラフィックスパイプラインは入力ジオメトリ４０５を受け取る。たとえば、ジオメトリ処理ステージ４１０が入力ジオメトリ４０５を受け取る。たとえば、入力ジオメトリ４０５としては、３Ｄゲーミング世界内の頂点及び各頂点に対応する情報を挙げてもよい。ゲーミング世界内の所与のオブジェクトを、頂点によって規定されるポリゴン（たとえば、三角形）を用いて表すことができる。次に、対応するポリゴンの表面をグラフィックスパイプライン４００を通して処理して、最終的な効果（たとえば、色、テクスチャなど）を実現する。頂点属性としては、法線（たとえば、どの方向がジオメトリのその場所に垂直であるか）、色（たとえば、ＲＧＢ－赤色、緑色、及び青色の三色など）、及びテクスチャ座標／マッピング情報を挙げてもよい。 As shown, the graphics pipeline receives input geometry 405 . For example, geometry processing stage 410 receives input geometry 405 . For example, input geometry 405 may include vertices and information corresponding to each vertex in a 3D gaming world. A given object in the gaming world can be represented using polygons (eg, triangles) defined by vertices. The surfaces of the corresponding polygons are then processed through the graphics pipeline 400 to achieve final effects (eg, colors, textures, etc.). Vertex attributes include normal (e.g. which direction is perpendicular to that location in the geometry), color (e.g. RGB - red, green, blue tricolor, etc.), and texture coordinates/mapping information. may

ジオメトリ処理ステージ４１０は、頂点処理（たとえば、頂点シェーダーを介して）及びプリミティブ処理の両方にレスポンシビリティを有する（またこれらを行うことができる）。詳細には、ジオメトリ処理ステージ４１０は、プリミティブを規定してそれをグラフィックスパイプライン４００の次のステージに送出する頂点の組、ならびにそれらの頂点に対する位置（正確には、同次座標）及び他の種々のパラメータを出力してもよい。位置は、後のシェーダーステージによるアクセスに備えて、位置キャッシュ４５０に配置される。他のパラメータは、やはり後のシェーダーステージによるアクセスに備えて、パラメータキャッシュ４６０に配置される。 Geometry processing stage 410 is responsible (and capable of) both vertex processing (eg, via a vertex shader) and primitive processing. Specifically, the geometry processing stage 410 defines a set of vertices that define a primitive and send it to the next stage of the graphics pipeline 400, as well as the positions (more precisely, homogeneous coordinates) for those vertices and other Various parameters may be output. The location is placed in the location cache 450 for access by later shader stages. Other parameters are placed in parameter cache 460, also for access by later shader stages.

種々の動作をジオメトリ処理ステージ４１０によって行ってもよい。たとえば、プリミティブ及び／またはポリゴンに対する照明及びシャドーイング計算を行うことである。一実施形態では、ジオメトリステージは、プリミティブを処理することができるため、バックフェースカリング及び／またはクリッピング（たとえば、視錐台に対するテスト）を実行することができ、その結果、下流ステージ（たとえば、ラスタライゼーションステージ４２０など）に対する負荷が減る。別の実施形態では、ジオメトリステージは、プリミティブを生成してもよい（たとえば、従来のジオメトリシェーダーと同等の機能により）。 Various operations may be performed by geometry processing stage 410 . For example, performing lighting and shadowing calculations on primitives and/or polygons. In one embodiment, since the geometry stage can process primitives, it can perform backface culling and/or clipping (e.g., testing against the view frustum) so that downstream stages (e.g., raster less load on the Rization stage 420, etc.). In another embodiment, the geometry stage may generate primitives (eg, with functionality equivalent to a traditional geometry shader).

ジオメトリ処理ステージ４１０によって出力されたプリミティブを、ラスタライゼーションステージ４２０内に供給し、そこでプリミティブを、ピクセルからなるラスター画像に変換する。詳細には、ラスタライゼーションステージ４２０は、シーン内のオブジェクトを、３Ｄゲーミング世界内の視認場所（たとえば、カメラ場所、ユーザ眼場所など）によって規定される２次元（２Ｄ）像平面に投影するように構成されている。単純化したレベルでは、ラスタライゼーションステージ４２０は、各プリミティブを見て、どのピクセルが対応するプリミティブの影響を受けるかを判定する。詳細には、ラスタライザ４２０はプリミティブをピクセルサイズのフラグメントに分割する。各フラグメントはディスプレイ内のピクセルに対応する。画像を表示するときに、１つ以上のフラグメントが、対応するピクセルの色に寄与し得ることに留意することは重要である。 The primitives output by geometry processing stage 410 are fed into rasterization stage 420, which converts the primitives into a raster image of pixels. Specifically, the rasterization stage 420 projects the objects in the scene onto a two-dimensional (2D) image plane defined by the viewing location (eg, camera location, user eye location, etc.) within the 3D gaming world. It is configured. At a simplistic level, rasterization stage 420 looks at each primitive to determine which pixels are affected by the corresponding primitive. Specifically, the rasterizer 420 divides primitives into pixel-sized fragments. Each fragment corresponds to a pixel in the display. It is important to note that when displaying an image, more than one fragment can contribute to the color of the corresponding pixel.

前述したように、ラスタライゼーションステージ４２０によってさらなる動作を行ってもよい。たとえば、視認場所に対するクリッピング（視錐台の外側のフラグメントを特定して無視する）及びカリング（より近いオブジェクトによって隠されるフラグメントを無視する）である。クリッピングに関連して、ジオメトリ処理ステージ４１０及び／またはラスタライゼーションステージ４２０を、ゲーミング世界内の視認場所によって規定される視錐台の外側にあるプリミティブを特定して無視するように構成してもよい。 Further operations may be performed by the rasterization stage 420 as previously described. For example, viewing location clipping (identifying and ignoring fragments outside the view frustum) and culling (ignoring fragments obscured by closer objects). In connection with clipping, geometry processing stage 410 and/or rasterization stage 420 may be configured to identify and ignore primitives outside the view frustum defined by the viewing location within the gaming world. .

ピクセル処理ステージ４３０は、ジオメトリ処理ステージによって形成されるパラメータ（ならびに他のデータ）を用いて、ピクセルの結果として生じる色などの値を生成してもよい。詳細には、ピクセル処理ステージ４３０は根本的に、フラグメントに対してシェーディング動作を実行して、プリミティブの色及び輝度が、利用可能な照明によってどのように異なるかを判定する。たとえば、ピクセル処理ステージ４３０は、各フラグメントに対する深さ、色、法線、及びテクスチャ座標（たとえば、テクスチャ詳細）を決定してもよく、さらに、フラグメントに対する光、暗さ、及び色の適切なレベルを決定してもよい。詳細には、ピクセル処理ステージ４３０は各フラグメントの特徴を計算する。たとえば、色及び他の属性（たとえば、視認場所からの距離に対するｚ深度、及び透明性に対するアルファ値）である。加えて、ピクセル処理ステージ４３０は、対応するフラグメントに影響する利用可能な照明に基づいてフラグメントに照明効果を適用する。さらに、ピクセル処理ステージ４３０は、各フラグメントに対してシャドーイング効果を適用してもよい。 The pixel processing stage 430 may use the parameters (as well as other data) formed by the geometry processing stage to generate values such as the resulting color of the pixel. Specifically, the pixel processing stage 430 essentially performs shading operations on the fragments to determine how the color and brightness of the primitives vary with available lighting. For example, pixel processing stage 430 may determine depth, color, normal, and texture coordinates (eg, texture detail) for each fragment, and may also determine appropriate levels of light, darkness, and color for the fragment. may be determined. Specifically, the pixel processing stage 430 computes features for each fragment. For example, color and other attributes (eg, z-depth for distance from viewing location, and alpha value for transparency). In addition, pixel processing stage 430 applies lighting effects to fragments based on the available lighting affecting the corresponding fragments. In addition, pixel processing stage 430 may apply shadowing effects to each fragment.

ピクセル処理ステージ４３０の出力は、処理されたフラグメント（たとえば、テクスチャ及びシェーディング情報）を含み、グラフィックスパイプライン４００の次のステージにある４４０出力マージャーステージに送出される。出力マージャーステージ４４０は、ピクセル処理ステージ４３０の出力、ならびに他のデータ（たとえば、すでにメモリ内にある値）を用いて、ピクセルに対する最終色を生成する。たとえば、出力マージャーステージ４４０は、ピクセル処理ステージ４３０から決定されたフラグメント及び／またはピクセルと、そのピクセルに対してＭＲＴにすでに書き込まれた値との間の値の任意的なブレンディングを実行してもよい。 The output of pixel processing stage 430 contains processed fragments (eg, texture and shading information) and is sent to the next stage of graphics pipeline 400, 440 output merger stage. Output merger stage 440 uses the output of pixel processing stage 430 as well as other data (eg, values already in memory) to generate the final color for the pixel. For example, output merger stage 440 may perform arbitrary blending of values between fragments and/or pixels determined from pixel processing stage 430 and values already written to the MRT for that pixel. good.

ディスプレイ内の各ピクセルに対する色値を、フレームバッファ（図示せず）に記憶してもよい。これらの値を、対応するピクセルにスキャンすることを、シーンの対応する画像を表示するときに行う。詳細には、ディスプレイは、各ピクセルに対するフレームバッファから色値を、行ごとに、左から右へまたは右から左へ、上から下へまたは下から上へ、または任意の他のパターンで読み出し、画像を表示するときにこれらのピクセル値を用いるピクセルを照明する。 Color values for each pixel in the display may be stored in a frame buffer (not shown). Scanning these values into the corresponding pixels is done when displaying the corresponding image of the scene. Specifically, the display reads the color values from the frame buffer for each pixel row by row, left to right or right to left, top to bottom or bottom to top, or in any other pattern; Illuminate the pixels that use these pixel values when displaying the image.

図１～３のクラウドゲームネットワーク１９０（たとえば、ゲームサーバ１６０内の）及びＧＰＵリソース３６５の詳細な説明により、図５のフロー図５００は、本開示の一実施形態により、レンダリング前にインターリーブスクリーン領域に対してジオメトリの事前テストを行うことによって、アプリケーションよって生成された画像フレームに対するジオメトリのマルチＧＰＵレンダリングを実施するときに、グラフィックス処理を行うための方法を例示する。このように、複数のＧＰＵリソースを用いて、アプリケーションを実行するときにオブジェクトのレンダリングを効率的に実行する。前述したように、種々のアーキテクチャには、レンダリング中の領域テストを通してアプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって複数のＧＰＵが連携して単一画像をレンダリングすることが含まれていてもよい。レンダリングは、たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ内において、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むもの）内において等、行われる。 With the detailed description of cloud gaming network 190 (eg, in game server 160) and GPU resources 365 of FIGS. 1-3, flow diagram 500 of FIG. 1 illustrates a method for performing graphics processing when performing multi-GPU rendering of geometry on image frames generated by an application by pre-testing the geometry on . In this manner, multiple GPU resources are used to efficiently render objects when executing an application. As noted above, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application through region testing during rendering. Rendering may be performed, for example, within one or more cloud gaming servers of a cloud gaming system, or within a standalone system (e.g., a personal computer or gaming console that includes a high-end graphics card with multiple GPUs), etc. done.

５１０において、本方法は、画像を生成するために連携する複数のグラフィックス処理ユニット（ＧＰＵ）を用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。詳細には、マルチＧＰＵ処理は、単一画像フレーム及び／またはリアルタイムアプリケーションに対する画像フレーム列の１つ以上の各画像フレームをレンダリングするときに行う。 At 510, the method includes rendering graphics for the application using multiple graphics processing units (GPUs) working together to generate an image. In particular, multi-GPU processing is performed when rendering one or more individual image frames of a single image frame and/or sequence of image frames for real-time applications.

５２０において、本方法は、グラフィックスのジオメトリをレンダリングするレスポンシビリティを複数のスクリーン領域に基づいて複数のＧＰＵ間で分割することを含む。すなわち、各ＧＰＵは、すべてのＧＰＵに知られたレスポンシビリティの対応するディビジョンあるいは分割部（たとえば、対応するスクリーン領域）を有する。より具体的には、ＧＰＵはそれぞれ、複数のスクリーン領域のうちの対応する組のスクリーン領域内のジオメトリをレンダリングすることにレスポンシビリティを有している。対応する組のスクリーン領域は１つ以上のスクリーン領域を含んでいる。たとえば、第１のＧＰＵは、第１の組のスクリーン領域内のオブジェクトをレンダリングするためのレスポンシビリティの第１のディビジョンを有する。また、第２のＧＰＵは、第２の組のスクリーン領域内のオブジェクトをレンダリングするためのレスポンシビリティの第２のディビジョンを有する。このことは、残りのＧＰＵに対して繰り返し適用される。 At 520, the method includes dividing responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions. That is, each GPU has a corresponding division or division (eg, corresponding screen region) of responsiveness known to all GPUs. More specifically, each GPU has a responsibility to render geometry within a corresponding set of screen regions of a plurality of screen regions. A corresponding set of screen areas includes one or more screen areas. For example, a first GPU has a first division of responsiveness for rendering objects within a first set of screen regions. The second GPU also has a second division of responsiveness for rendering objects within a second set of screen regions. This is repeated for the remaining GPUs.

５３０において、本方法は、ジオメトリテストのために、第１のＧＰＵに、アプリケーションの実行中に生成された画像フレームのジオメトリの第１のピース（piece）あるいは断片を割り当てることを含む。たとえば、画像フレームは１つ以上のオブジェクトを含んでいてもよい。各オブジェクトはジオメトリの１つ以上のピースによって規定され得る。すなわち、一実施形態では、ジオメトリ事前テスト及びレンダリングを、オブジェクト全体であるジオメトリのピースに対して行う。他の実施形態では、ジオメトリ事前テスト及びレンダリングを、オブジェクト全体の一部であるジオメトリのピースに対して行う。 At 530, the method includes assigning the first GPU a first piece or fragment of the geometry of the image frame generated during execution of the application for geometry testing. For example, an image frame may contain one or more objects. Each object may be defined by one or more pieces of geometry. That is, in one embodiment, geometry pre-testing and rendering is performed on a piece of geometry that is the entire object. In other embodiments, geometry pre-testing and rendering is performed on pieces of geometry that are part of the entire object.

たとえば、複数のＧＰＵをそれぞれ、画像フレームに対応付けられるジオメトリの対応部分に割り当てる。詳細には、ジオメトリ事前テストを目的として、ジオメトリのすべての部分を対応するＧＰＵに割り当てる。一実施形態では、ジオメトリを複数のＧＰＵ間で均一に割り当ててもよい。たとえば、複数に４つのＧＰＵがある場合、ＧＰＵはそれぞれ、画像フレーム内のジオメトリの４分の１を処理してもよい。他の実施形態では、ジオメトリを複数のＧＰＵ間で不均一に割り当ててもよい。たとえば、画像フレームのマルチＧＰＵレンダリングのために４つのＧＰＵを用いる例では、あるＧＰＵが処理する画像フレームのジオメトリが、別のＧＰＵより多くてもよい。 For example, multiple GPUs are each assigned a corresponding portion of the geometry associated with the image frame. Specifically, for geometry pre-testing purposes, all parts of the geometry are assigned to corresponding GPUs. In one embodiment, the geometry may be evenly distributed among multiple GPUs. For example, if there are four GPUs in the plurality, each GPU may process a quarter of the geometry in the image frame. In other embodiments, geometry may be allocated unevenly among multiple GPUs. For example, in an example using four GPUs for multi-GPU rendering of image frames, one GPU may process more image frame geometry than another GPU.

５４０において、本方法は、第１のＧＰＵにおいてジオメトリ事前テストを行って、ジオメトリのピースが複数のスクリーン領域にどのように関係するかに対する情報を生成することを含む。詳細には、第１のＧＰＵは、ジオメトリのピース及びそれが複数のスクリーン領域のそれぞれにどのように関係するかに対する情報を生成する。たとえば、第１のＧＰＵによるジオメトリ事前テストは、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた特定のスクリーン領域とオーバーラップするか否かを判定してもよい。ジオメトリの第１のピースは、他のＧＰＵがオブジェクトレンダリングを行うレスポンシビリティを有するスクリーン領域とオーバーラップしてもよく、及び／または第１のＧＰＵがオブジェクトレンダリングを行うレスポンシビリティを有するスクリーン領域とオーバーラップしてもよい。一実施形態では、複数のＧＰＵのうちのいずれかがジオメトリのレンダリングを行う前に、第１のＧＰＵが行う対応するコマンドバッファ内のシェーダーがジオメトリテストを行う。他の実施形態では、ジオメトリテストを、たとえばグラフィックスパイプライン４００のラスタライゼーションステージ４２０において、ハードウェアによって行う。 At 540, the method includes performing a geometry pre-test on the first GPU to generate information on how pieces of geometry relate to multiple screen regions. Specifically, the first GPU generates information for a piece of geometry and how it relates to each of the multiple screen regions. For example, a geometry pre-test by a first GPU may determine whether a piece of geometry overlaps a particular screen area allocated to the corresponding GPU for object rendering. The first piece of geometry may overlap a screen region with responsiveness for object rendering by another GPU and/or overlap a screen region with responsiveness for object rendering by the first GPU. You can wrap. In one embodiment, before any of the GPUs render geometry, the shader in the corresponding command buffer performed by the first GPU performs geometry tests. In other embodiments, geometry testing is performed by hardware, such as in rasterization stage 420 of graphics pipeline 400 .

ジオメトリ事前テストは通常、実施形態において、複数のＧＰＵにより、対応する画像フレームのすべてのジオメトリに対して同時に行われる。すなわち、各ＧＰＵは、対応する画像フレームのジオメトリのその部分に対してジオメトリ事前テストを実行する。このように、ＧＰＵがジオメトリ事前テストを行うことで、各ＧＰＵは、ジオメトリのどのピースをレンダリングするか、ジオメトリのどのピースをスキップするかを知ることができる。詳細には、対応するＧＰＵがジオメトリ事前テストを行うときに、対応するＧＰＵは、ジオメトリのその部分を、画像フレームをレンダリングするために用いる複数の各ＧＰＵのスクリーン領域に対してテストする。たとえば、４つのＧＰＵがある場合、特にジオメトリテストを目的として、ジオメトリがＧＰＵに均一に割り当てられる場合、各ＧＰＵは、画像フレームのジオメトリの４分の１上でジオメトリテストを実行してもよい。したがって、各ＧＰＵが、対応する画像フレームのジオメトリのその部分に対してのみジオメトリ事前テストを行っていても、ジオメトリ事前テストは通常、実施形態において、複数のＧＰＵにわたって画像フレームのすべてのジオメトリに対して同時に行われるため、生成された情報は、画像フレーム内のすべてのジオメトリ（たとえば、ジオメトリのピース）がすべてのＧＰＵのスクリーン領域にどのように関係するかを示す。スクリーン領域はそれぞれ、オブジェクトレンダリングのために対応するＧＰＵに割り当てられ、及び／またはレンダリングは、ジオメトリのピース（たとえば、オブジェクト全体またはオブジェクトの一部）に対して行ってもよい。 Geometry pre-testing is typically performed on all geometries of corresponding image frames simultaneously by multiple GPUs in an embodiment. That is, each GPU performs a geometry pre-test on its portion of the geometry of the corresponding image frame. Thus, the GPU's geometry pre-testing allows each GPU to know which pieces of geometry to render and which pieces of geometry to skip. Specifically, when the corresponding GPU performs the geometry pre-test, the corresponding GPU tests that portion of the geometry against screen regions of each of the multiple GPUs used to render the image frame. For example, if there are four GPUs, each GPU may perform a geometry test on a quarter of the geometry of the image frame, especially for the purpose of geometry testing, if the geometry is evenly allocated to the GPUs. Thus, even though each GPU performs geometry pre-tests only on its portion of the geometry of the corresponding image frame, geometry pre-tests are typically performed on all geometry of an image frame across multiple GPUs in embodiments. , the generated information indicates how all geometry (eg, pieces of geometry) in the image frame relate to all GPU screen areas. Each screen region is allocated to a corresponding GPU for object rendering, and/or rendering may be performed on pieces of geometry (eg, whole objects or portions of objects).

５５０において、本方法は、ジオメトリのピースをレンダリングするときに、複数のＧＰＵのそれぞれにおいて情報を用いることを含む（たとえば、ジオメトリのピースを完全にレンダリングすることまたはそのジオメトリのピースのレンダリングをスキップすることを含むために）。すなわち、複数のＧＰＵのそれぞれにおいて情報を用いてジオメトリのピースをレンダリングする。ジオメトリのテスト結果（たとえば、情報）を他のＧＰＵに送って、情報が各ＧＰＵに知られるようにする。たとえば、画像フレーム内のジオメトリ（たとえば、ジオメトリのピース）は通常、実施形態において、複数のＧＰＵによって同時にレンダリングされる。詳細には、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップするとき、ＧＰＵは、情報に基づいて、そのジオメトリのピースをレンダリングする。他方では、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられたどのスクリーン領域ともオーバーラップしないとき、ＧＰＵは、情報に基づいて、ジオメトリのそのピースのレンダリングをスキップすることができる。したがって、情報によって、すべてのＧＰＵは、画像フレーム内のジオメトリをより効率的にレンダリングすること、及び／またはそのジオメトリのレンダリングを完全に回避することができる。たとえば、レンダリングを、複数のＧＰＵによって実行されるように、対応するコマンドバッファ内のシェーダーによって行ってもよい。図７Ａ、１２Ａ、及び１３Ａにおいて以下でより十分に説明するように、シェーダーを、対応するＧＰＵ構成に基づいて、ジオメトリテスト及び／またはレンダリングの一方または両方を実行するように構成してもよい。 At 550, the method includes using information on each of the plurality of GPUs when rendering a piece of geometry (e.g., rendering the piece of geometry entirely or skipping rendering of the piece of geometry). to include that). That is, the information is used on each of multiple GPUs to render a piece of geometry. Send geometry test results (eg, information) to other GPUs so that information is known to each GPU. For example, geometry (eg, pieces of geometry) within an image frame are typically rendered concurrently by multiple GPUs in an embodiment. Specifically, when a piece of geometry overlaps any screen region allocated to the corresponding GPU for object rendering, the GPU renders that piece of geometry based on the information. On the other hand, when a piece of geometry does not overlap any screen regions allocated to the corresponding GPU for object rendering, the GPU may informedly skip rendering that piece of geometry. Therefore, the information allows all GPUs to render the geometry in the image frame more efficiently and/or avoid rendering that geometry altogether. For example, rendering may be done by shaders in corresponding command buffers to be executed by multiple GPUs. Shaders may be configured to perform one or both of geometry testing and/or rendering based on the corresponding GPU configuration, as described more fully below in FIGS. 7A, 12A, and 13A.

本開示の一実施形態により、いくつかのアーキテクチャでは、対応するレンダリングＧＰＵが、対応する情報をそれを用いるのに間に合って受け取った場合、ＧＰＵはその情報を、対応する画像内でどのジオメトリをレンダリングすべきかを決定するときに用いる。すなわち、情報はヒントとして取られ得る。そうでない場合には、レンダリングＧＰＵはジオメトリのピースを、通常行うように処理する。ジオメトリが、レンダリングＧＰＵ（たとえば、第２のＧＰＵ）に割り当てられた任意のスクリーン領域とオーバーラップするか否かを情報が示し得る例を用いて、ジオメトリのオーバーラップはないと情報が示す場合、レンダリングＧＰＵはジオメトリのレンダリングを完全にスキップしてもよい。また、ジオメトリのピースのみがオーバーラップしない場合、第２のＧＰＵは、少なくとも、オブジェクトレンダリングのために第２のＧＰＵに割り当てられたスクリーン領域のいずれともオーバーラップしないジオメトリのピースのレンダリングをスキップしてもよい。他方で、ジオメトリに対するオーバーラップがあると情報が示すことがあり、この場合、第２のまたはレンダリングＧＰＵはジオメトリをレンダリングする。また、ジオメトリの特定のピースが、オブジェクトレンダリングのために第２のまたはレンダリングＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップすると、情報は示す場合がある。その場合、第２のまたはレンダリングＧＰＵは、オーバーラップするジオメトリのピースのみをレンダリングする。さらなる他の実施形態では、情報がない場合、または情報の生成もしくは受け取りが間に合わない場合、第２のＧＰＵはレンダリングを通常どおりに実行する（たとえば、ジオメトリをレンダリングする）。したがって、ヒントとして提供された情報は、間に合って受け取られた場合には、グラフィックス処理システムの全体的効率を増加させ得る。情報が間に合って受け取られなかった場合、グラフィックス処理システムは、このような情報がない場合でもやはり適切に動作する。 According to one embodiment of the present disclosure, in some architectures, if the corresponding rendering GPU receives the corresponding information in time to use it, the GPU uses that information to determine which geometry to render in the corresponding image. Used when deciding whether to That is, the information can be taken as a hint. Otherwise, the rendering GPU processes the piece of geometry as it normally does. Using the example where the information may indicate whether the geometry overlaps any screen regions assigned to the rendering GPU (e.g., the second GPU), if the information indicates that the geometry does not overlap: The rendering GPU may skip rendering geometry entirely. Also, if only pieces of geometry do not overlap, then the second GPU skips rendering at least those pieces of geometry that do not overlap any of the screen regions allocated to the second GPU for object rendering. good too. On the other hand, the information may indicate that there is overlap for the geometry, in which case the second or rendering GPU renders the geometry. The information may also indicate when a particular piece of geometry overlaps any screen area allocated to a second or rendering GPU for object rendering. In that case, the second or rendering GPU renders only the overlapping pieces of geometry. In yet other embodiments, the second GPU performs rendering normally (eg, renders geometry) if the information is not available, or if the information is not generated or received in time. Thus, information provided as hints, if received in time, can increase the overall efficiency of the graphics processing system. If the information is not received in time, the graphics processing system will still operate properly in the absence of such information.

一実施形態では、あるＧＰＵ（たとえば、事前テストＧＰＵ）は、ジオメトリ事前テストを行って情報を生成する専用である。すなわち、専用のＧＰＵは、対応する画像フレーム内のオブジェクト（たとえば、ジオメトリのピース）のレンダリングには用いない。具体的には、前述したように、アプリケーションに対するグラフィックスを複数のＧＰＵを用いてレンダリングする。グラフィックスのジオメトリのレンダリングに対するレスポンシビリティを、複数のスクリーン領域（インターリーブされ得る）に基づいて複数のＧＰＵ間で分割する。各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。ジオメトリテストを、事前テストＧＰＵにおいて、アプリケーションによって生成された画像フレームのジオメトリの複数のピースに対して行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成する。ジオメトリの複数のピースを、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてレンダリングする。すなわち、画像フレームをレンダリングするために用いるＧＰＵからの対応するレンダリングＧＰＵによってジオメトリの各ピースをレンダリングするときに、情報を用いる。 In one embodiment, one GPU (eg, a pre-test GPU) is dedicated to performing geometry pre-tests and generating information. That is, the dedicated GPU is not used to render objects (eg, pieces of geometry) within the corresponding image frames. Specifically, as described above, graphics for an application are rendered using multiple GPUs. Responsibility for rendering graphics geometry is divided among multiple GPUs based on multiple screen regions (which may be interleaved). Each GPU has a corresponding division of responsiveness known to multiple GPUs. A geometry test is performed on the pieces of geometry of the image frame generated by the application on the pre-test GPU to generate information about each piece of geometry and its relationship to each of the multiple screen regions. Multiple pieces of geometry are rendered on each of multiple GPUs using information generated for each of the multiple pieces of geometry. That is, the information is used when rendering each piece of geometry by the corresponding rendering GPU from the GPU used to render the image frame.

図６Ａ～６Ｂに、純粋に説明を目的として、領域及びサブ領域に細分割されたスクリーンに対するレンダリングを示す。当然のことながら、細分割する領域及びサブ領域の数は、画像及び／または画像列の１つ以上の各画像の効率的なマルチＧＰＵ処理に対して選択可能である。すなわち、スクリーンを２つ以上の領域に細分割してもよく、各領域をさらにサブ領域に分割してもよい。本開示の一実施形態では、図６Ａに示すように、スクリーンを４つの四分円に細分割する。本開示の別の実施形態では、図６Ｂに示すように、スクリーンをより大きい数のインターリーブ領域に細分割する。以下の図６Ａ～６Ｂの説明は、複数のＧＰＵに割り当てられた複数のスクリーン領域にマルチＧＰＵレンダリングを行うときに生じる非効率を例示することを意図している。図７Ａ～７Ｃ及び図８Ａ～８Ｂは、本発明のいくつかの実施形態による、より効率的なレンダリングを示している。 6A-6B show renderings for a screen subdivided into regions and sub-regions purely for illustrative purposes. Of course, the number of subdivision regions and sub-regions can be selected for efficient multi-GPU processing of one or more of each image and/or sequence of images. That is, the screen may be subdivided into two or more regions, and each region may be further divided into subregions. In one embodiment of the present disclosure, the screen is subdivided into four quadrants, as shown in FIG. 6A. Another embodiment of the present disclosure subdivides the screen into a greater number of interleaved regions, as shown in FIG. 6B. The following description of FIGS. 6A-6B is intended to illustrate the inefficiencies that arise when doing multi-GPU rendering to multiple screen regions assigned to multiple GPUs. Figures 7A-7C and Figures 8A-8B illustrate more efficient rendering according to some embodiments of the invention.

詳細には、図６Ａは、マルチＧＰＵレンダリングを行うときに四分円（たとえば、４つの領域）に細分割されるスクリーン６１０Ａの図である。図示したように、スクリーン６１０Ａは４つの四分円（たとえば、Ａ、Ｂ、Ｃ、及びＤ）に細分割される。各四分円は、４つのＧＰＵ［ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ］のうちの１つに、１対１の関係で割り当てられる。たとえば、ＧＰＵ－Ａは四分円Ａに割り当てられ、ＧＰＵ－Ｂは四分円Ｂに割り当てられ、ＧＰＵ－Ｃは四分円Ｃに割り当てられ、ＧＰＵ－Ｄは四分円Ｄに割り当てられる。 Specifically, FIG. 6A is a diagram of a screen 610A subdivided into quadrants (eg, four regions) when performing multi-GPU rendering. As shown, screen 610A is subdivided into four quadrants (eg, A, B, C, and D). Each quadrant is assigned to one of the four GPUs [GPU-A, GPU-B, GPU-C, and GPU-D] in a one-to-one relationship. For example, GPU-A is assigned to quadrant A, GPU-B is assigned to quadrant B, GPU-C is assigned to quadrant C, and GPU-D is assigned to quadrant D.

ジオメトリをカリングすることができる。たとえば、ＣＰＵ１６３は、各四分円の錐台に対して境界ボックスをチェックすることができ、各ＧＰＵに、その対応する錐台とオーバーラップするオブジェクトのみをレンダリングするようにリクエストすることができる。その結果、各ＧＰＵは、ジオメトリの一部のみをレンダリングすることにレスポンシビリティを有する。説明の目的上、スクリーン６１０はジオメトリのピースを示し、各ピースは対応するオブジェクトであり、スクリーン６１０はオブジェクト６１１～６１７（たとえば、ジオメトリのピース）を示している。四分円Ａとオーバーラップするオブジェクトはないため、ＧＰＵ－Ａはオブジェクトをレンダリングしない。ＧＰＵ－Ｂは、オブジェクト６１５及び６１６をレンダリングする（オブジェクト６１５の一部は四分円Ｂ内に存在するため、ＣＰＵのカリングテストは、ＧＰＵ－Ｂはそれをレンダリングしなければならないと正しく結論する）。ＧＰＵ－Ｃは、オブジェクト６１１及び６１２をレンダリングする。ＧＰＵ－Ｄは、オブジェクト６１２、６１３、６１４、６１５、及び６１７をレンダリングする。 Geometry can be culled. For example, the CPU 163 can check the bounding box against each quadrant frustum and request each GPU to render only objects that overlap its corresponding frustum. As a result, each GPU is responsible for rendering only part of the geometry. For illustrative purposes, screen 610 shows pieces of geometry, each piece being a corresponding object, and screen 610 shows objects 611-617 (eg, pieces of geometry). Since no objects overlap quadrant A, GPU-A renders no objects. GPU-B renders objects 615 and 616 (because part of object 615 lies within quadrant B, the CPU culling test correctly concludes that GPU-B must render it ). GPU-C renders objects 611 and 612 . GPU-D renders objects 612 , 613 , 614 , 615 and 617 .

図６Ａにおいて、スクリーン６１０Ａが四分円Ａ～Ｄに分割されると、各ＧＰＵが実行しなければならない作業量は非常に異なり得る。なぜならば、場合によっては、不釣り合いな数量のジオメトリが１つの四分円の中にあり得るからである。たとえば、四分円Ａには何らジオメトリのピースはないが、四分円Ｄにはジオメトリの５つのピースまたはジオメトリの少なくとも５つのピースの少なくとも一部がある。したがって、四分円Ａに割り当てられたＧＰＵ－Ａは使われないが、四分円Ｄに割り当てられたＧＰＵ－Ｄは、対応する画像内でオブジェクトをレンダリングするときに不釣り合いにビジーである。 In FIG. 6A, when the screen 610A is divided into quadrants AD, the amount of work that each GPU has to do can be very different. This is because in some cases there can be disproportionate amounts of geometries in one quadrant. For example, quadrant A has no pieces of geometry, but quadrant D has five pieces of geometry or at least a portion of at least five pieces of geometry. Thus, GPU-A assigned to quadrant A is unused, while GPU-D assigned to quadrant D is disproportionately busy rendering objects in the corresponding image.

図６Ｂに、スクリーンを領域に細分割するときの別の手法を例示する。詳細には、単一画像または画像列内の１つ以上の各画像のマルチＧＰＵレンダリングを行うときに、四分円に細分割するのではなくて、スクリーン６１０Ｂを複数のインターリーブ領域に細分割する。その場合、スクリーン６１０Ｂを、より大きい数のインターリーブ領域に細分割し（たとえば、４つの四分円よりも多い）、一方で、同じ数量のＧＰＵをレンダリングのために用いる（たとえば、４つ）。スクリーン６１０Ａに示したオブジェクト（６１１～６１７）を、スクリーン６１０Ｂの同じ対応する場所にも示す。 FIG. 6B illustrates another approach when subdividing the screen into regions. Specifically, rather than subdividing into quadrants, screen 610B is subdivided into multiple interleaved regions when performing multi-GPU rendering of a single image or each of one or more images in an image sequence. . In that case, screen 610B is subdivided into a larger number of interleaved regions (eg, more than four quadrants) while using the same number of GPUs for rendering (eg, four). The objects (611-617) shown on screen 610A are also shown at the same corresponding locations on screen 610B.

詳細には、４つのＧＰＵ（たとえば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）を用いて、対応するアプリケーションに対する画像をレンダリングする。ＧＰＵはそれぞれ、対応する領域とオーバーラップするジオメトリをレンダリングすることにレスポンシビリティを有する。すなわち、各ＧＰＵは対応する組の領域に割り当てられる。たとえば、ＧＰＵ－Ａは、対応する組においてＡとラベル付けされた領域のそれぞれにレスポンシビリティを有し、ＧＰＵ－Ｂは、対応する組においてＢとラベル付けされた領域のそれぞれにレスポンシビリティを有し、ＧＰＵ－Ｃは、対応する組においてＣとラベル付けされた領域のそれぞれにレスポンシビリティを有し、ＧＰＵ－Ｄは、対応する組においてＤとラベル付けされた領域のそれぞれにレスポンシビリティを有している。 Specifically, four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) are used to render images for the corresponding applications. Each GPU has a responsibility to render geometry that overlaps the corresponding region. That is, each GPU is assigned to a corresponding set of regions. For example, GPU-A has a responsibility to each of the regions labeled A in the corresponding set, and GPU-B has a responsibility to each of the regions labeled B in the corresponding set. GPU-C has a responsibility to each of the regions labeled C in the corresponding set, and GPU-D has a responsibility to each of the regions labeled D in the corresponding set. are doing.

さらに、領域は特定のパターンでインターリーブされている。領域をインターリーブする（及び領域の数がより多い）ために、各ＧＰＵが実行しなければならない作業量は、はるかにバランスされ得る。たとえば、スクリーン６１０Ｂをインターリーブするパターンには、交互に変わる行（たとえば、領域Ａ－Ｂ－Ａ－Ｂなど、及び領域Ｃ－Ｄ－Ｃ－Ｄなど）が含まれる。本開示の実施形態では、領域をインターリーブする他のパターンもサポートされる。たとえば、パターンには、反復配列の領域、均一に分布する領域、不均一に分布する領域、反復可能な行の配列の領域、ランダム配列の領域、ランダム行の配列の領域などが含まれていてもよい。 Additionally, the regions are interleaved in a specific pattern. In order to interleave regions (and have a higher number of regions), the amount of work that each GPU has to do can be much more balanced. For example, the pattern interleaving the screen 610B includes alternating rows (eg, regions ABAB, etc. and regions CDCD, etc.). Other patterns of interleaving regions are also supported in embodiments of the present disclosure. For example, patterns include regions of repeated sequences, uniformly distributed regions, non-uniformly distributed regions, regions of repeatable row sequences, regions of random sequences, regions of random row sequences, and so on. good too.

領域の数を選ぶことは重要である。たとえば、領域の分配が細かすぎる（たとえば、領域の数が多すぎて最適ではない）場合、各ＧＰＵはやはりジオメトリの大部分または全部を処理しなければならない。たとえば、ＧＰＵがレスポンシビリティを有するすべての領域に対してオブジェクトの境界ボックスをチェックすることは難しい場合がある。また、境界ボックスを適時にチェックできるとしても、領域サイズが小さいために、結果として、各ＧＰＵはほとんどのジオメトリを処理しなければならない可能性がある。なぜならば、画像内のすべてのオブジェクトが、各ＧＰＵの少なくとも１つの領域とオーバーラップするからである（たとえば、ＧＰＵは、オブジェクトの一部のみが、そのＧＰＵに割り当てられた領域の組内の少なくとも１つの領域とオーバーラップしたとしても、オブジェクト全体を処理する）。 Choosing the number of regions is important. For example, if the distribution of regions is too fine (eg, too many regions to be optimal), each GPU must still process most or all of the geometry. For example, it may be difficult for the GPU to check the object's bounding box against all regions for which it has responsiveness. Also, even if bounding boxes can be checked in a timely manner, small region sizes may result in each GPU having to process most of the geometry. This is because every object in the image overlaps at least one region of each GPU (e.g., a GPU can only partially overlap objects in at least one region within the set of regions assigned to that GPU). process the entire object, even if it overlaps one region).

その結果、領域の数を選ぶこと、インターリーブのパターンなどが重要である。少なすぎるかもしくは多すぎる領域を選ぶ、またはインターリーブに対して少なすぎる領域もしくは多すぎる領域を選ぶ、またはインターリーブに対して非効率なパターン選ぶと、ＧＰＵ処理を行うときの非効率につながり得る（たとえば、各ＧＰＵがジオメトリの大部分または全部を処理する）。このような場合、画像のレンダリングのために複数のＧＰＵがある場合でも、ＧＰＵの非効率のために、スクリーンピクセル数及びジオメトリ密度の両方における対応する増加をサポートすることはできない（すなわち、４つのＧＰＵが、４倍のピクセルを書き込むこと及び４倍の頂点またはプリミティブを処理することはできない）。以下の実施形態では、とりわけ、カリング方策（図７Ａ～７Ｃ）及びカリングの粒度（図８Ａ～８Ｂ）における改善を対象にする。 As a result, the choice of the number of regions, the pattern of interleaving, etc. is important. Choosing too few or too many regions, or choosing too few or too many regions for interleaving, or choosing inefficient patterns for interleaving can lead to inefficiencies when performing GPU processing (e.g. , each GPU processes most or all of the geometry). In such a case, even if there are multiple GPUs to render the image, due to GPU inefficiency, it is not possible to support a corresponding increase in both screen pixel count and geometry density (i.e., four A GPU cannot write four times as many pixels and process four times as many vertices or primitives). The following embodiments are directed, inter alia, to improvements in culling strategy (FIGS. 7A-7C) and culling granularity (FIGS. 8A-8B).

図７Ａ～７Ｃは、本開示の実施形態において、複数のＧＰＵを用いて、単一画像及び／または画像列内の少なくとも１つ以上の各画像をレンダリングすることを例示する図である。４つのＧＰＵの選択は単に、アプリケーションを実行しながら画像をレンダリングするときのマルチＧＰＵレンダリングを簡単に例示するために行っており、当然のことながら、種々の実施形態におけるマルチＧＰＵレンダリングのために任意の数のＧＰＵを用いてもよい。 7A-7C are diagrams illustrating rendering a single image and/or each of at least one or more images in an image sequence using multiple GPUs, in embodiments of the present disclosure. The choice of four GPUs is made merely to briefly illustrate multi-GPU rendering when rendering images while running an application, and of course any number of GPUs may be used for multi-GPU rendering in various embodiments. number of GPUs may be used.

詳細には、図７Ａは、本開示の一実施形態により、連携して単一画像フレームをレンダリングする複数のＧＰＵによって共有されるレンダリングコマンドバッファ７００Ａの図である。すなわち、本実施例では、複数のＧＰＵはそれぞれ、同じレンダリングコマンドバッファ（たとえば、バッファ７００Ａ）を使用し、ＧＰＵはそれぞれ、レンダリングコマンドバッファ内ですべてのコマンドを実行する。複数のコマンド（完全セット）が、レンダリングコマンドバッファ７００Ａ内にロードされて、対応する画像フレームをレンダリングするために用いられる。当然のことながら、対応する画像フレームを生成するために１つ以上のレンダリングコマンドバッファを用いてもよい。一例では、ＣＰＵは、画像フレームに対して１つ以上のドローコールを生成する。ドローコールには、対応する画像のマルチＧＰＵレンダリングを行うときに図３のＧＰＵリソース３６５のうちの１つ以上のＧＰＵによって実行されるように１つ以上のレンダリングコマンドバッファ内に配置されるコマンドが含まれる。いくつかの実施態様では、ＣＰＵ１６３は１つ以上のＧＰＵに、対応する画像をレンダリングするために用いるドローコールの全部または一部を生成するように要求してもよい。さらに、図７Ａに、レンダリングコマンドバッファ７００Ａ内に含まれるコマンドの全セットを示している場合があるか、または図７Ａに、レンダリングコマンドバッファ７００Ａ内に含まれるコマンドの全セットの一部を示している場合がある。 Specifically, FIG. 7A is a diagram of a rendering command buffer 700A shared by multiple GPUs that jointly render a single image frame, according to one embodiment of the present disclosure. That is, in this example, multiple GPUs each use the same rendering command buffer (eg, buffer 700A), and each GPU executes all commands within the rendering command buffer. A plurality of commands (a complete set) are loaded into render command buffer 700A and used to render the corresponding image frames. Of course, one or more rendering command buffers may be used to generate the corresponding image frames. In one example, the CPU generates one or more draw calls for the image frame. A draw call includes commands that are placed in one or more rendering command buffers to be executed by one or more GPUs in GPU resources 365 of FIG. 3 when performing multi-GPU rendering of the corresponding image. included. In some implementations, CPU 163 may request one or more GPUs to generate all or part of the draw calls used to render the corresponding images. Additionally, FIG. 7A may show the full set of commands contained within the rendering command buffer 700A, or FIG. 7A may show a portion of the full set of commands contained within the rendering command buffer 700A. There are cases.

ＧＰＵは通常、実施形態において、画像または画像列内の１つ以上の各画像のマルチＧＰＵレンダリングを行うときに同時にレンダリングする。画像のレンダリングは複数のフェーズに分解することができる。各フェーズにおいて、ＧＰＵは同期させる必要があり、より速いＧＰＵが、より遅いＧＰＵの完了まで待機しなければならない。レンダリングコマンドバッファ７００Ａに対して図７Ａに示すコマンドは１つのフェーズを示している。図７Ａでは、たった１つのフェーズに対するコマンドを示しているが、レンダリングコマンドバッファ７００Ａは、画像をレンダリングするときに１つ以上のフェーズに対するコマンドを含んでいてもよい。図７Ａでは、すべてのコマンドの一部のみを示して、他のフェーズに対するコマンドは示していない。１つのフェーズを例示する図７Ａに示すレンダリングコマンドバッファ７００Ａのピースにおいて、レンダリングすべき４つのオブジェクトがある（たとえば、オブジェクト０、オブジェクト１、オブジェクト２、及びオブジェクト３）。これを図７Ｂ－１に示す。 The GPUs typically render simultaneously when performing multi-GPU rendering of one or more of each image or images in an image sequence, in embodiments. Rendering an image can be decomposed into multiple phases. In each phase the GPUs must be synchronized and the faster GPU must wait until the slower GPU completes. The commands shown in FIG. 7A for rendering command buffer 700A illustrate one phase. Although FIG. 7A shows commands for only one phase, rendering command buffer 700A may contain commands for more than one phase when rendering an image. FIG. 7A shows only a portion of all commands and does not show commands for other phases. In the piece of rendering command buffer 700A shown in FIG. 7A illustrating one phase, there are four objects to render (eg, object 0, object 1, object 2, and object 3). This is shown in FIG. 7B-1.

図示するように、図７Ａに示すレンダリングコマンドバッファ７００Ａのピースは、ジオメトリテスト、オブジェクト（たとえば、ジオメトリのピース）のレンダリングに対するコマンド、及びレンダリングコマンドバッファ７００Ａからのコマンドを実行している１つ以上のレンダリングＧＰＵの状態を構成するためのコマンドを含む。単に説明の目的上、図７Ａに示すレンダリングコマンドバッファ７００Ａのピースは、ジオメトリ事前テスト、オブジェクトのレンダリング、及び／または対応するアプリケーションに対する対応する画像をレンダリングするときの同期型計算カーネルの実行のために用いるコマンド（７１０～７２８）を含む。いくつかの実施態様では、ジオメトリ事前テスト、及びその画像に対するオブジェクトのレンダリング、及び／または同期型計算カーネルの実行は、フレーム周期内に行わなければならない。レンダリングコマンドバッファ７００Ａ内に２つの処理セクションを示す。詳細には、処理セクション１は事前テストまたはジオメトリテスト７０１を含み、セクション２はレンダリング７０２を含む。 As shown, the piece of rendering command buffer 700A shown in FIG. 7A is one or more executing commands for geometry tests, rendering of objects (eg, pieces of geometry), and commands from rendering command buffer 700A. Contains commands for configuring the state of the rendering GPU. For purposes of illustration only, the pieces of rendering command buffer 700A shown in FIG. Contains commands (710-728) to use. In some implementations, geometry pre-testing and rendering of objects to that image and/or execution of synchronous computational kernels must occur within a frame period. Two processing sections are shown within rendering command buffer 700A. Specifically, processing section 1 includes pre-testing or geometry testing 701 and section 2 includes rendering 702 .

セクション１は、画像フレーム内のオブジェクトのジオメトリテスト７０１を行うことを含む。各オブジェクトは、ジオメトリの１つ以上のピースによって規定され得る。事前テストまたはジオメトリテスト７０１は、１つ以上のシェーダーによって行われ得る。たとえば、対応する画像フレームのマルチＧＰＵレンダリングにおいて用いる各ＧＰＵに、画像フレームのジオメトリの一部を割り当てて、ジオメトリテストを実行する。一実施形態では、事前テストのためにすべての部分を割り当ててもよい。割り当てた部分には、ジオメトリの１つ以上のピースが含まれていてもよい。各ピースは、オブジェクト全体を含んでいてもよいし、またはオブジェクトの一部（たとえば、頂点、プリミティブなど）を含んでいてもよい。詳細には、ジオメトリテストをジオメトリのピースに対して行って、ジオメトリのそのピースが複数のスクリーン領域のそれぞれにどのように関係するかについての情報を生成する。たとえば、ジオメトリテストは、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた特定のスクリーン領域とオーバーラップするか否かを判定してもよい。 Section 1 involves performing a geometry test 701 of objects in the image frame. Each object may be defined by one or more pieces of geometry. Pretests or geometry tests 701 may be performed by one or more shaders. For example, each GPU used in multi-GPU rendering of the corresponding image frame is assigned a portion of the image frame's geometry and a geometry test is performed. In one embodiment, all portions may be allocated for pre-testing. The assigned portion may include one or more pieces of geometry. Each piece may contain the entire object, or it may contain a portion of the object (eg, vertices, primitives, etc.). Specifically, a geometry test is performed on a piece of geometry to generate information about how that piece of geometry relates to each of the multiple screen regions. For example, a geometry test may determine whether a piece of geometry overlaps a particular screen area allocated to the corresponding GPU for object rendering.

図７Ａに示すように、セクション１のジオメトリテスト７０１（たとえば、ジオメトリの事前テスト）は、レンダリングコマンドバッファ７００Ａからのコマンドを実行する１つ以上のＧＰＵの状態を構成するためのコマンド、及びジオメトリテストを行うためのコマンドを含む。詳細には、各ＧＰＵのＧＰＵ状態は、ＧＰＵが、対応するオブジェクトに対してジオメトリテストを実行する前に構成する。たとえば、コマンド７１０、７１３、及び７１５はそれぞれ、ジオメトリテストに対するコマンドを実行する目的で１つ以上のＧＰＵのＧＰＵ状態を構成するために用いられる。図示したように、コマンド７１０は、ジオメトリテストコマンド７１１～７１２が適切に行えるようにＧＰＵ状態を構成する。コマンド７１１はオブジェクト０に対してジオメトリテストを実行し、コマンド７１２はオブジェクト１に対してジオメトリテストを実行する。同様に、コマンド７１３は、ジオメトリテストコマンド７１４がオブジェクト２に対してジオメトリテストを実行できるようにＧＰＵ状態を構成する。また、コマンド７１５は、ジオメトリテストコマンド７１６がオブジェクト３に対してジオメトリテストを実行できるようにＧＰＵ状態を構成する。当然のことながら、ＧＰＵ状態を１つ以上のジオメトリテストコマンド（たとえば、テストコマンド７１１及び７１２）に対して構成してもよい。 As shown in FIG. 7A, the Geometry Test 701 of Section 1 (eg, Geometry Pretest) includes commands to configure the state of one or more GPUs that execute commands from the rendering command buffer 700A, and Geometry Tests. Contains commands for doing Specifically, the GPU state of each GPU is configured before the GPU performs geometry tests on the corresponding object. For example, commands 710, 713, and 715 are each used to configure the GPU state of one or more GPUs for the purpose of executing commands for geometry tests. As shown, command 710 configures the GPU state so that geometry test commands 711-712 can be performed properly. Command 711 performs a geometry test on object 0 and command 712 performs a geometry test on object 1 . Similarly, command 713 configures the GPU state so that geometry test command 714 can perform geometry tests on object2. Command 715 also configures the GPU state so that geometry test command 716 can perform geometry tests on object 3 . Of course, GPU states may be configured for one or more geometry test commands (eg, test commands 711 and 712).

前述したように、ジオメトリテスト及び／またはオブジェクトのレンダリング及び／または対応する画像に対する同期型計算カーネルの実行に対して用いるレンダリングコマンドバッファ７００Ａ内のコマンドを実行するときに、レジスタに記憶された値は、対応するＧＰＵに対するハードウェアコンテキスト（たとえば、ＧＰＵ構成）を規定する。図示したように、ＧＰＵ状態は、レンダリングコマンドバッファ７００Ａ内のコマンドの処理の全体にわたって変更してもよい。コマンドの以後の各セクションを、ＧＰＵ状態を構成するために用いてもよい。図７Ａに適用されるように、ならびに明細書の全体にわたって、ＧＰＵ状態の設定に言及するときに、ＧＰＵ状態は種々の方法で設定してもよい。たとえば、ＣＰＵまたはＧＰＵはランダムアクセスメモリ（ＲＡＭ）内の値を設定することができる。ＧＰＵはＲＡＭ内の値をチェックする。別の例では、状態はＧＰＵの内部である可能性があり、これは、たとえば、コマンドバッファをサブルーチンとして２回呼び出して、内部のＧＰＵ状態が２つのサブルーチン呼び出しの間で異なるときである。 As previously described, when executing commands in rendering command buffer 700A used for geometry testing and/or rendering objects and/or executing synchronous computation kernels on corresponding images, the values stored in the registers are , defines the hardware context (eg, GPU configuration) for the corresponding GPU. As shown, GPU state may change throughout the processing of commands in rendering command buffer 700A. Each subsequent section of the command may be used to configure the GPU state. As applied to FIG. 7A, as well as throughout the specification, when referring to setting GPU state, GPU state may be set in a variety of ways. For example, the CPU or GPU can set values in random access memory (RAM). The GPU checks the values in RAM. In another example, the state may be internal to the GPU, for example when calling the command buffer twice as a subroutine and the internal GPU state is different between the two subroutine calls.

セクション２には、画像フレーム内のオブジェクトのレンダリング７０２を行うことが含まれる。ジオメトリのピースがレンダリングされる）。レンダリング７０２は、コマンドバッファ７００Ａ内の１つ以上のシェーダーによって行われ得る。図７Ａに示すように、セクション２のレンダリング７０２は、レンダリングコマンドバッファ７００Ａからのコマンドを実行する１つ以上のＧＰＵの状態を構成するためのコマンドと、レンダリングを行うためのコマンドとを含む。詳細には、ＧＰＵが、対応するオブジェクト（たとえば、ジオメトリのピース）をレンダリングする前に、各ＧＰＵのＧＰＵ状態が構成される。たとえば、コマンド７２１、７２３、７２５、及び７２７がそれぞれ、レンダリングに対するコマンドを実行する目的で１つ以上のＧＰＵのＧＰＵ状態を構成するために用いられる。図示したように、コマンド７２１は、レンダリングコマンド７２２がオブジェクト０をレンダリングできるように、ＧＰＵ状態を構成する。コマンド７２３は、レンダリングコマンド７２４がオブジェクト１をレンダリングできるように、ＧＰＵ状態を構成する。コマンド７２５は、レンダリングコマンド７２６がオブジェクト２をレンダリングできるように、ＧＰＵ状態を構成する。またコマンド７２７は、レンダリングコマンド７２８がオブジェクト３をレンダリングできるように、ＧＰＵ状態を構成する。図７Ａでは、ＧＰＵ状態が各レンダリングコマンド（たとえば、オブジェクト０をレンダリングする等）に対して構成されていると示しているが、当然のことながら、ＧＰＵ状態を１つ以上のレンダリングコマンドに対して構成してもよい。 Section 2 involves rendering 702 the objects in the image frame. pieces of geometry are rendered). Rendering 702 may be performed by one or more shaders in command buffer 700A. As shown in FIG. 7A, Section 2 Rendering 702 includes commands to configure the state of one or more GPUs that execute commands from rendering command buffer 700A and commands to render. Specifically, the GPU state for each GPU is configured before the GPU renders the corresponding object (eg, piece of geometry). For example, commands 721, 723, 725, and 727 are each used to configure the GPU state of one or more GPUs for the purpose of executing commands for rendering. As shown, command 721 configures the GPU state so that render command 722 can render object 0 . Command 723 configures the GPU state so that render command 724 can render object 1 . Command 725 configures the GPU state so that render command 726 can render object 2 . Command 727 also configures the GPU state so that render command 728 can render object 3 . Although FIG. 7A shows that GPU state is configured for each rendering command (eg, render object 0, etc.), it should be appreciated that GPU state may be configured for one or more rendering commands. may be configured.

前述したように、対応する画像フレームのマルチＧＰＵレンダリングにおいて用いる各ＧＰＵは、ジオメトリ事前テストの間に生成された情報に基づいてジオメトリの対応するピースをレンダリングする。具体的には、各ＧＰＵに知られた情報は、オブジェクトとスクリーン領域との間の関係を提供する。ジオメトリの対応するピースをレンダリングするときに、ＧＰＵは、それらのジオメトリのピースを効率的にレンダリングする目的で、その情報を適時に受け取った場合に用い得る。具体的には、情報が示すように、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域または領域（複数）とオーバーラップするときに、ＧＰＵはジオメトリのそのピースに対するレンダリングを実行する。他方では、第１のＧＰＵはジオメトリのピースのレンダリングを完全にスキップしなければならないと、情報は示すことがある（たとえば、ジオメトリのピースが、オブジェクトレンダリングを行うレスポンシビリティが第１のＧＰＵに割り当てられたどのスクリーン領域ともオーバーラップしない）。このように、各ＧＰＵは、それがオブジェクトレンダリングを行うレスポンシビリティを有するスクリーン領域または領域（複数）とオーバーラップするジオメトリのピースのみをレンダリングする。したがって、情報は各ＧＰＵに対するヒントとして提供され、情報は、レンダリングが始まる前に受け取られた場合に、ジオメトリのピースのレンダリングを行っている各ＧＰＵによって考慮される。一実施形態では、情報が間に合って受け取られなかった場合にはレンダリングは正常に進む。たとえば、ジオメトリの対応するピースが、オブジェクトレンダリングのためにＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップするか否かとは関わりなく、ジオメトリのそのピースは、対応するＧＰＵによって完全にレンダリングされる。 As described above, each GPU used in multi-GPU rendering of the corresponding image frame renders a corresponding piece of geometry based on information generated during geometry pre-testing. Specifically, the information known to each GPU provides the relationship between objects and screen regions. When rendering corresponding pieces of geometry, the GPU may use that information, if received in a timely manner, for the purpose of efficiently rendering those pieces of geometry. Specifically, when a piece of geometry overlaps any screen region or regions allocated to the corresponding GPU for object rendering, as the information indicates, the GPU renders that piece of geometry perform rendering for On the other hand, the information may indicate that the first GPU should skip rendering the piece of geometry entirely (e.g., if the piece of geometry is assigned the responsibility of doing object rendering to the first GPU). does not overlap any of the screen regions specified). In this way, each GPU renders only the pieces of geometry that overlap the screen region or regions that it has the responsibility of object rendering. Therefore, the information is provided as a hint to each GPU, and the information is taken into account by each GPU rendering a piece of geometry if received before rendering begins. In one embodiment, rendering proceeds normally if the information was not received in time. For example, regardless of whether the corresponding piece of geometry overlaps any screen area allocated to the GPU for object rendering, that piece of geometry is fully rendered by the corresponding GPU.

単に説明の目的上、４つのＧＰＵが、対応するスクリーンをそれらの間の領域に分割している。前述したように、各ＧＰＵは、領域の対応する組におけるオブジェクトをレンダリングするレスポンシビリティを有する。対応する組には１つ以上の領域が含まれる。一実施形態では、レンダリングコマンドバッファ７００Ａは、連携して単一画像をレンダリングする複数のＧＰＵによって共有される。すなわち、単一画像または画像列内の１つ以上の各画像のマルチＧＰＵレンダリングのために用いられるＧＰＵは、共通のコマンドバッファを共有する。別の実施形態では、各ＧＰＵはそれ自身のコマンドバッファを有していてもよい。 For illustrative purposes only, four GPUs divide their corresponding screens into regions between them. As mentioned above, each GPU has the responsibility to render objects in a corresponding set of regions. A corresponding set contains one or more regions. In one embodiment, the rendering command buffer 700A is shared by multiple GPUs that cooperate to render a single image. That is, the GPUs used for multi-GPU rendering of one or more images in a single image or sequence of images share a common command buffer. In another embodiment, each GPU may have its own command buffer.

代替的に、さらなる他の実施形態では、ＧＰＵはそれぞれ、ある程度異なるオブジェクト組をレンダリングしていてもよい。特定のＧＰＵが特定のオブジェクトをレンダリングする必要がなく、なぜならば、それが、たとえば対応する組においてその対応するスクリーン領域とオーバーラップしないからであると判定できるときには、これは成り立ち得る。前述したように、コマンドをあるＧＰＵによって実行できるが、別のコマンドでは実行できないことをコマンドバッファがサポートする限り、複数のＧＰＵはやはり同じコマンドバッファを用いる（たとえば、１つのコマンドバッファを共有する）ことができる。たとえば、共有のレンダリングコマンドバッファ７００Ａ内のコマンドの実行を、レンダリングＧＰＵの１つに限定してもよい。これは種々の方法で達成することができる。別の例では、フラッグを対応するコマンド上で用いて、どのＧＰＵがそれを実行するべきかを示してもよい。また、どのＧＰＵがどの条件下で何をするかを示すビットを用いて、レンダリングコマンドバッファ内でプレディケーションを実行してもよい。プレディケーションの例としては、「これがＧＰＵ－Ａならば、次のＸコマンドをスキップする」が挙げられる。 Alternatively, in still other embodiments, each GPU may be rendering a somewhat different set of objects. This may be true when it can be determined that a particular GPU does not need to render a particular object because it does not overlap its corresponding screen region in the corresponding set, for example. As previously mentioned, multiple GPUs still use the same command buffer (e.g., share one command buffer) as long as the command buffer supports that commands can be executed by one GPU but not by another. be able to. For example, execution of commands in the shared rendering command buffer 700A may be restricted to one of the rendering GPUs. This can be achieved in various ways. In another example, a flag may be used on the corresponding command to indicate which GPU should execute it. Predication may also be performed in the rendering command buffer with bits indicating which GPU does what under what conditions. An example of a predication is "if this is GPU-A, skip the next X commands".

さらなる他の実施形態では、実質的に同じ組のオブジェクトが各ＧＰＵによってレンダリングされているので、複数のＧＰＵはやはり同じコマンドバッファを用い得る。たとえば、前述したように、領域が比較的小さいときには、各ＧＰＵはオブジェクトをすべてレンダリングしてもよい。 In still other embodiments, multiple GPUs may still use the same command buffer, since substantially the same set of objects are being rendered by each GPU. For example, as mentioned above, each GPU may render all objects when the region is relatively small.

図７Ｂ－１にスクリーン７００Ｂを例示する。スクリーン７００Ｂは、本開示の一実施形態により、図７Ａのレンダリングコマンドバッファ７００Ａを用いて複数のＧＰＵによってレンダリングされる４つのオブジェクトを含む画像を示す。本開示の一実施形態により、ジオメトリのマルチＧＰＵレンダリングは、画像フレーム内のオブジェクトに対応するジオメトリのピースをレンダリングする前に、スクリーン領域（インターリーブされ得る）に対してジオメトリを事前テストすることによってアプリケーションに対して行われる。 An example screen 700B is shown in FIG. 7B-1. Screen 700B shows an image including four objects rendered by multiple GPUs using rendering command buffer 700A of FIG. 7A, according to one embodiment of the present disclosure. According to one embodiment of the present disclosure, multi-GPU rendering of geometry enables applications to pre-test geometry against screen regions (which may be interleaved) before rendering pieces of geometry corresponding to objects in image frames. performed for

詳細には、ジオメトリのレンダリングに対するレスポンシビリティは、複数のＧＰＵ間のスクリーン領域によって分割される。複数のスクリーン領域は、複数のＧＰＵ間のレンダリング時間のアンバランスを減らすように構成されている。たとえば、スクリーン７００Ｂに、画像のオブジェクトをレンダリングするときの各ＧＰＵに対するスクリーン領域レスポンシビリティを示す。４つのＧＰＵ（ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）を、スクリーン７００Ｂに示す画像内のオブジェクトをレンダリングするために用いる。ＧＰＵ間でのピクセル及び頂点ロードのバランスを取るために、スクリーン７００Ｂを図６Ａに示す四分円よりも細かく分割する。加えて、スクリーン７００Ｂを、インターリーブされ得る領域に分割する。たとえば、インターリーブには複数行の領域が含まれる。行７３１及び７３３はそれぞれ、領域Ａが領域Ｂと交互に現れている。行７３２及び７３４はそれぞれ、領域Ｃが領域Ｄと交互に現れている。より詳細には、パターン内で、領域ＡとＢを含む行は、領域ＣとＤを含む行と交互に現れている。 Specifically, the responsiveness to geometry rendering is divided by the screen area between multiple GPUs. Multiple screen regions are configured to reduce rendering time imbalances among multiple GPUs. For example, screen 700B shows the screen area responsiveness for each GPU when rendering the object of the image. Four GPUs (GPU-A, GPU-B, GPU-C, and GPU-D) are used to render the objects in the image shown on screen 700B. In order to balance pixel and vertex loads across GPUs, screen 700B is divided into smaller quadrants than shown in FIG. 6A. Additionally, screen 700B is divided into regions that can be interleaved. For example, interleaving includes multiline regions. Rows 731 and 733 each have regions A alternating with regions B. FIG. Rows 732 and 734 each have regions C alternating with regions D. FIG. More specifically, rows containing regions A and B alternate with rows containing regions C and D in the pattern.

前述したように、ＧＰＵ処理効率を達成するために、スクリーンを領域に分割するときに種々の技術を用いてもよい。たとえば、領域の数を増加または減少させる（たとえば、正確な領域量を選ぶために）、領域をインターリーブする、領域及び／またはサブ領域をインターリーブするときに特定のパターンをインターリーブして選択するために領域の数を増加または減少させる等である。一実施形態では、複数のスクリーン領域はそれぞれ、均一サイズである。一実施形態では、複数のスクリーン領域はそれぞれ、サイズが均一でない。さらなる他の実施形態では、複数のスクリーン領域の数及びサイジングは動的に変化する。 As mentioned above, various techniques may be used when dividing the screen into regions to achieve GPU processing efficiency. For example, to increase or decrease the number of regions (e.g., to choose the correct amount of regions), to interleave regions, to interleave and select particular patterns when interleaving regions and/or sub-regions. Increase or decrease the number of regions, and so on. In one embodiment, each of the plurality of screen areas is of uniform size. In one embodiment, each of the plurality of screen regions are non-uniform in size. In still other embodiments, the number and sizing of multiple screen areas is dynamically varied.

ＧＰＵはそれぞれ、対応する組の領域内のオブジェクトのレンダリングにレスポンシビリティを有する。各組には１つ以上の領域が含まれていてもよい。したがって、ＧＰＵ－Ａは、対応する組における各Ａ領域内のオブジェクトのレンダリングにレスポンシビリティを有し、ＧＰＵ－Ｂは、対応する組における各Ｂ領域内のオブジェクトのレンダリングにレスポンシビリティを有し、ＧＰＵ－Ｃは、対応する組における各Ｃ領域内のオブジェクトのレンダリングにレスポンシビリティを有し、ＧＰＵ－Ｄは、対応する組における各Ｄ領域内のオブジェクトのレンダリングにレスポンシビリティを有する。他のレスポンシビリティを有するＧＰＵがあってもよく、それらはレンダリングを行わないでもよい（たとえば、複数のフレーム周期にわたって実行される非同期型計算カーネルを行う、レンダリングＧＰＵに対するカリングを行う等）。 Each GPU has a responsibility in rendering objects within a corresponding set of regions. Each set may contain one or more regions. GPU-A is therefore responsible for rendering objects in each A region in the corresponding set, GPU-B is responsible for rendering objects in each B region in the corresponding set, and GPU-C is responsible for rendering objects in each C region in the corresponding set, and GPU-D is responsible for rendering objects in each D region in the corresponding set. There may be GPUs with other responsiveness, and they may not render (eg, do asynchronous computation kernels that run over multiple frame periods, do culling for rendering GPUs, etc.).

行うべきレンダリングの量はＧＰＵごとに異なる。図７Ｂ－２に、本開示の一実施形態により、図７Ｂ－１の４つのオブジェクトをレンダリングするときに各ＧＰＵが行うレンダリングを示す表を例示する。表に示したように、ジオメトリ事前テストの後に、オブジェクト０がＧＰＵ－Ｂによってレンダリングされていること、オブジェクト１がＧＰＵ－Ｃ及びＧＰＵ－Ｄによってレンダリングされていること、オブジェクト２がＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｄによってレンダリングされていること、ならびにオブジェクト３がＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄによってレンダリングされていることを判定してもよい。ＧＰＵＡは、オブジェクト２のみをレンダリングする必要があり、ＧＰＵＤは、オブジェクト１、２、及び３をレンダリングする必要があるため、さらにいくつかのアンバランスなレンダリングが存在し得る。しかし、全体として、スクリーン領域のインターリーブにより、画像内のオブジェクトのレンダリングは、画像のマルチＧＰＵレンダリングまたは画像列内の１つ以上の各画像のレンダリングのために用いる複数のＧＰＵ間で合理的にバランスされる。 The amount of rendering to do varies from GPU to GPU. FIG. 7B-2 illustrates a table showing the rendering each GPU makes when rendering the four objects of FIG. 7B-1, according to one embodiment of the present disclosure. As shown in the table, after the geometry pre-test, object 0 is rendered by GPU-B, object 1 is rendered by GPU-C and GPU-D, object 2 is rendered by GPU-A, It may be determined that it is being rendered by GPU-B and GPU-D, and that object 3 is being rendered by GPU-B, GPU-C and GPU-D. Since GPU A only needs to render object 2 and GPU D needs to render objects 1, 2 and 3, there may still be some unbalanced rendering. Overall, however, the interleaving of screen regions allows the rendering of objects within an image to be reasonably balanced between multiple GPUs used for multi-GPU rendering of images or rendering of one or more of each image in an image sequence. be done.

図７Ｃは、本開示の一実施形態により、複数のＧＰＵが連携して単一画像フレーム（たとえば、図７Ｂ－１に示す画像フレーム７００Ｂ）をレンダリングするときに、各ＧＰＵが行う各オブジェクトのレンダリングを例示する図である。詳細には、図７Ｃに、図７Ａの共有のレンダリングコマンドバッファ７００Ａを用いて４つのＧＰＵそれぞれ（たとえば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）が行うオブジェクト０～３のレンダリングプロセスを示す。 FIG. 7C illustrates the rendering of each object by each GPU when multiple GPUs work together to render a single image frame (eg, image frame 700B shown in FIG. 7B-1), according to one embodiment of the present disclosure. It is a figure which illustrates. Specifically, FIG. 7C illustrates object 0-3 rendering performed by each of the four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) using the shared rendering command buffer 700A of FIG. 7A. rendering process.

詳細には、２つのレンダリングタイミング図を時間軸７４０に対して示す。レンダリングタイミング図７００Ｃ－１は、レンダリングの１つのフェーズにおける対応する画像のオブジェクト０～３のマルチＧＰＵレンダリングを示す。ＧＰＵはそれぞれ、オブジェクト０～３とスクリーン領域との間のオーバーラップに関する何らの情報もない場合には、レンダリングを行う。レンダリングタイミング図７００Ｃ－２は、レンダリングの同じフェーズにおける対応する画像のオブジェクト０～３のマルチＧＰＵレンダリングを示す。スクリーン領域のジオメトリテスト（たとえば、レンダリングの前に行われる）の間に生成された情報は、各ＧＰＵによって共有されて、対応するＧＰＵパイプラインを通してオブジェクト０～３をレンダリングするために用いられる。レンダリングタイミング図７００Ｃ－１及び７００Ｃ－２はそれぞれ、ジオメトリの各ピースを処理する（たとえば、ジオメトリテスト及びレンダリングを実行する）ために各ＧＰＵが要する時間を示す。一実施形態では、ジオメトリのピースはオブジェクト全体である。別の実施形態では、ジオメトリのピースはオブジェクトの一部であってもよい。説明の目的上、図７Ｃの例はジオメトリのピースのレンダリングを示す。ジオメトリの各ピースはオブジェクトに対応する（たとえば、その全体において）。レンダリングタイミング図７００Ｃ－１及び７００Ｃ－２のそれぞれにおいて、対応するＧＰＵの少なくとも１つのスクリーン領域（たとえば、対応する組の領域内で）とオーバーラップするジオメトリ（たとえば、オブジェクトのプリミティブ）がないオブジェクト（たとえば、ジオメトリのピース）を、破線で描いたボックスで表している。他方で、対応するＧＰＵの少なくとも１つのスクリーン領域（たとえば、対応する組の領域内で）とオーバーラップするジオメトリを有するオブジェクトを、実線で描いたボックスで表している。 Specifically, two rendering timing diagrams are shown against time axis 740 . Rendering timing diagram 700C-1 shows multi-GPU rendering of objects 0-3 of the corresponding image in one phase of rendering. Each GPU renders in the absence of any information about the overlap between objects 0-3 and the screen area. Rendering timing diagram 700C-2 shows multi-GPU rendering of objects 0-3 of the corresponding image in the same phase of rendering. Information generated during a screen region geometry test (eg, performed prior to rendering) is shared by each GPU and used to render objects 0-3 through the corresponding GPU pipeline. Rendering timing diagrams 700C-1 and 700C-2 respectively show the time each GPU takes to process each piece of geometry (eg, perform geometry testing and rendering). In one embodiment, the piece of geometry is the entire object. In another embodiment, the piece of geometry may be part of the object. For illustration purposes, the example of FIG. 7C shows a rendering of a piece of geometry. Each piece of geometry corresponds to an object (eg, in its entirety). In each of rendering timing diagrams 700C-1 and 700C-2, objects (eg, object primitives) that have no overlapping geometry (eg, object primitives) with at least one screen region (eg, within a corresponding set of regions) of the corresponding GPU. For example, a piece of geometry) is represented by a dashed box. On the other hand, objects that have geometry that overlaps at least one screen region (eg, within the corresponding set of regions) of the corresponding GPU are represented by boxes drawn with solid lines.

レンダリングタイミング図７００Ｃ－１は、４つのＧＰＵ（たとえばＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）を用いたオブジェクト０～３のレンダリングを示す。レンダリングタイミング図７００Ｃ－１において、垂直線７５５ａはオブジェクトに対するレンダリングのフェーズの開始を示し、垂直線７５５ｂはオブジェクトに対するレンダリングのフェーズの終了を示している。図示したレンダリングのフェーズに対する時間軸７４０に沿った開始点と終了点は、同期点を表している。４つのＧＰＵはそれぞれ、対応するＧＰＵパイプラインを実行するときに同期されている。たとえば、レンダリングのフェーズの終了を示す垂直線７５５ｂにおいて、すべてのＧＰＵは、レンダリングの次のフェーズに移動する前に、最も遅いＧＰＵ（たとえば、ＧＰＵ－Ｂ）が、対応するグラフィックスパイプラインを通してオブジェクト０～３のレンダリングを終えるのを待たなければならない。 Rendering timing diagram 700C-1 shows the rendering of objects 0-3 using four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D). In rendering timing diagram 700C-1, vertical line 755a indicates the beginning of the rendering phase for the object, and vertical line 755b indicates the ending of the rendering phase for the object. The start and end points along the timeline 740 for the illustrated rendering phases represent synchronization points. Each of the four GPUs is synchronized when executing the corresponding GPU pipeline. For example, at vertical line 755b, which marks the end of a phase of rendering, all GPUs are forced to process object 0 through the corresponding graphics pipeline, with the slowest GPU (eg, GPU-B) running through the corresponding graphics pipeline before moving on to the next phase of rendering. I have to wait to finish rendering ~3.

レンダリングタイミング図７００Ｃ－１において、ジオメトリ事前テストは行われていない。したがって、各ＧＰＵは、対応するグラフィックスパイプラインを通して各オブジェクトを処理しなければならない。オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意の領域（たとえば、対応する組内で）においてオブジェクトに対して描くべきピクセルがない場合、ＧＰＵはグラフィックスパイプラインを通してオブジェクトを完全にはレンダリングしない場合がある。たとえば、オブジェクトがオーバーラップしないとき、グラフィックスパイプラインのジオメトリ処理ステージのみが実行される。しかし、これはやはり処理に多少の時間がかかる。 In render timing diagram 700C-1, no geometry pre-testing is done. Therefore, each GPU must process each object through a corresponding graphics pipeline. A GPU does not render an object completely through the graphics pipeline if there are no pixels to draw for the object in any region (e.g., within the corresponding set) allocated to the corresponding GPU for object rendering. There is For example, only the geometry processing stage of the graphics pipeline is executed when objects do not overlap. However, this again takes some time to process.

詳細には、ＧＰＵ－Ａはオブジェクト０、１、及び３を完全にはレンダリングしない。なぜならば、それらが、オブジェクトレンダリングのためにＧＰＵ－Ａに割り当てられた任意のスクリーン領域（たとえば、対応する組における）のいずれともオーバーラップしないからである。これら３つのオブジェクトのレンダリングを、破線を伴うボックス内に示す。これは、少なくともジオメトリ処理ステージは行われているが、グラフィックスパイプラインは完全には行われていないことを示している。ＧＰＵ－Ａはオブジェクト２を完全にレンダリングする。なぜならば、そのオブジェクトは、レンダリングのためにＧＰＵ－Ａに割り当てられた少なくとも１つのスクリーン領域とオーバーラップするからである。オブジェクト２のレンダリングを、実線を伴うボックス内に示す。これは、対応するグラフィックスパイプラインのすべてのステージが行われることを示している。同様に、ＧＰＵ－Ｂはオブジェクト１（破線を伴うボックスで示す）を完全にはレンダリングしない（すなわち、少なくともジオメトリ処理ステージを行う）が、オブジェクト０、２、及び３（実線を伴うボックスで示す）を完全にレンダリングする。なぜならば、これらのオブジェクトは、レンダリングのためにＧＰＵ－Ｂに割り当てられた少なくとも１つのスクリーン領域（たとえば、対応する組における）とオーバーラップするからである。また、ＧＰＵ－Ｃはオブジェクト０及び２（破線を伴うボックスで示す）を完全にはレンダリングしない（すなわち、少なくともジオメトリ処理ステージを行う）が、オブジェクト（実線を伴うボックスで示す）を完全にレンダリングする。なぜならば、これらのオブジェクトは、レンダリングのためにＧＰＵ－Ｃに割り当てられた少なくとも１つのスクリーン領域（たとえば、対応する組における）とオーバーラップするからである。さらに、ＧＰＵ－Ｄはオブジェクト０（破線を伴うボックスで示す）を完全にはレンダリングしない（すなわち、少なくともジオメトリ処理ステージを行う）が、オブジェクト１、２、及び３（実線を伴うボックスで示す）を完全にレンダリングする。なぜならば、これらのオブジェクトは、レンダリングのためにＧＰＵ－Ｄに割り当てられた少なくとも１つのスクリーン領域（たとえば、対応する組における）とオーバーラップするからである。 Specifically, GPU-A does not render objects 0, 1, and 3 completely. This is because they do not overlap any of the screen regions (eg, in corresponding sets) allocated to GPU-A for object rendering. Renderings of these three objects are shown in boxes with dashed lines. This indicates that at least the geometry processing stage is done, but the graphics pipeline is not fully done. GPU-A renders object 2 completely. This is because the object overlaps at least one screen area allocated to GPU-A for rendering. A rendering of object 2 is shown in a box with a solid line. This indicates that all stages of the corresponding graphics pipeline are performed. Similarly, GPU-B does not fully render object 1 (indicated by the box with dashed lines) (i.e., does at least the geometry processing stage), but objects 0, 2, and 3 (indicated by boxes with solid lines). fully rendered. This is because these objects overlap at least one screen region (eg, in the corresponding set) allocated to GPU-B for rendering. Also, GPU-C does not fully render objects 0 and 2 (indicated by boxes with dashed lines) (i.e., does at least the geometry processing stage), but fully renders objects (indicated by boxes with solid lines). . This is because these objects overlap at least one screen region (eg, in the corresponding set) allocated to GPU-C for rendering. Furthermore, the GPU-D does not fully render object 0 (indicated by the box with dashed lines) (i.e., does at least the geometry processing stage), but renders objects 1, 2, and 3 (indicated by boxes with solid lines). Render completely. This is because these objects overlap at least one screen region (eg, in the corresponding set) allocated to GPU-D for rendering.

レンダリングタイミング図７００Ｃ－２は、複数のＧＰＵを用いるオブジェクト０～３のジオメトリ事前テスト７０１’とレンダリング７０２’とを示す。レンダリングタイミング図７００Ｃ－２において、垂直線７５０ａは、オブジェクトに対するレンダリングのフェーズ（たとえば、ジオメトリ事前テスト及びレンダリングを含む）の開始を示し、垂直線７５０ｂは、オブジェクトに対するレンダリングのフェーズの終了を示す。タイミング図７００Ｃ－２に示すレンダリングのフェーズに対する時間軸７４０に沿った開始点及び終了点は、同期点を表す。前述したように、４つのＧＰＵはそれぞれ、対応するＧＰＵパイプラインを実行するときに同期される。たとえば、レンダリングのフェーズの終了を示す垂直線７５０ｂにおいて、すべてのＧＰＵは、次のレンダリングフェーズに移動する前に、対応するグラフィックスパイプラインを通してオブジェクト０～３のレンダリングを終了するために、最も遅いＧＰＵ（たとえば、ＧＰＵ－Ｂ）を待たなければならない。 Rendering timing diagram 700C-2 shows geometry pre-testing 701' and rendering 702' of objects 0-3 using multiple GPUs. In rendering timing diagram 700C-2, vertical line 750a indicates the start of the phase of rendering for the object (eg, including geometry pretest and rendering), and vertical line 750b indicates the end of the phase of rendering for the object. The start and end points along the timeline 740 for the phases of rendering shown in timing diagram 700C-2 represent synchronization points. As mentioned above, each of the four GPUs is synchronized when executing the corresponding GPU pipeline. For example, at the vertical line 750b, which marks the end of the rendering phase, all GPUs have the slowest GPU to finish rendering objects 0-3 through their corresponding graphics pipelines before moving on to the next rendering phase. (eg GPU-B).

最初に、ジオメトリ事前テスト７０１’をＧＰＵが行う。各ＧＰＵは、画像フレームのジオメトリのサブ組に対するジオメトリ事前テストを、すべてのスクリーン領域に対して実行する。各スクリーン領域は、オブジェクトレンダリングのために対応するＧＰＵに割り当てられている。前述したように、ＧＰＵはそれぞれ、画像フレームに対応付けられるジオメトリの対応部分に割り当てられている。ジオメトリ事前テストは、ジオメトリの特定のピースがスクリーン領域のそれぞれにどのように関係するか（たとえば、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするか否か）についての情報を生成する。この情報は、画像フレームをレンダリングするために用いる各ＧＰＵによって共有される。たとえば、図７Ｃに示す７０１’ジオメトリ事前テストには、ＧＰＵ－Ａにオブジェクト０に対するジオメトリ事前テストを実行させること、ＧＰＵ－Ｂにオブジェクト１に対するジオメトリ事前テストを実行させること、ＧＰＵ－Ｃにオブジェクト２に対するジオメトリ事前テストを実行させること、及びＧＰＵ－Ｄにオブジェクト３に対するジオメトリ事前テストを実行させることが含まれる。テストしているオブジェクトに応じて、ジオメトリ事前テストを行うための時間は変わり得る。たとえば、オブジェクト０のジオメトリ事前テストにかかる時間は、オブジェクト１に対してジオメトリ事前テストを実行する場合よりも短い。これは、オブジェクトサイジング、オーバーラップするスクリーン領域の数などに起因する場合がある。 First, the GPU performs a geometry pre-test 701'. Each GPU performs a geometry pre-test on a subset of the image frame's geometry for all screen regions. Each screen area is assigned to a corresponding GPU for object rendering. As previously mentioned, each GPU is assigned a corresponding portion of the geometry associated with the image frame. A geometry pre-test determines how a particular piece of geometry relates to each of the screen regions (e.g., any screen region allocated to the corresponding GPU for object rendering (e.g., the corresponding ) in the set that overlaps or not). This information is shared by each GPU used to render the image frame. For example, the 701′ geometry pretest shown in FIG. and having GPU-D perform a geometry pre-test on object 3 . Depending on the object being tested, the time to do the geometry pretest can vary. For example, the geometry pretest for object 0 takes less time than running the geometry pretest for object 1 . This may be due to object sizing, number of overlapping screen regions, etc.

ジオメトリ事前テストの後で、各ＧＰＵは、そのスクリーン領域と交差するジオメトリのすべてのオブジェクトまたはピースに対してレンダリングを実行する。一実施形態では、各ＧＰＵは、ジオメトリテストが終了したらすぐに、ジオメトリのそのピースのレンダリングを始める。すなわち、ジオメトリテストとレンダリングとの間に同期点はない。これは可能である。なぜならば、生成されているジオメトリテスト情報は、ハード依存ではなくてヒントとして処置されるからである。たとえば、ＧＰＵ－Ａがオブジェクト２のレンダリングを始めるのは、ＧＰＵ－Ｂがオブジェクト１のジオメトリ事前テストを終了する前、したがってＧＰＵ－Ｂがオブジェクト０、２、及び３のレンダリングを始める前である。 After the geometry pretest, each GPU performs rendering for all objects or pieces of geometry that intersect its screen area. In one embodiment, each GPU begins rendering that piece of geometry as soon as the geometry test is finished. That is, there is no synchronization point between geometry testing and rendering. This is possible. This is because the geometry test information being generated is treated as a hint rather than a hard dependency. For example, GPU-A begins rendering object 2 before GPU-B finishes the geometry pre-test for object 1, and thus before GPU-B begins rendering objects 0, 2, and 3.

垂直線７５０ａは垂直線７５５ａと位置合わせされていて、レンダリングタイミング図７００Ｃ－１及び７００Ｃ－２がそれぞれ同時に始まって、オブジェクト０～１をレンダリングするようになっている。しかし、レンダリングタイミング図７００Ｃ－２に示すオブジェクト０～３のレンダリングは、レンダリングタイミング図７００Ｃ－１に示すレンダリングよりも短い時間で行われる。すなわち、下部のタイミング図７００Ｃ－２に対するレンダリングのフェーズの終了を示す垂直線７５０ｂは、垂直線７５５ｂによって示される上部のタイミング図７００Ｃ－１に対するレンダリングのフェーズの終了よりも早く現れる。具体的には、アプリケーションに対する画像のジオメトリのマルチＧＰＵレンダリング（レンダリング前のスクリーン領域に対するジオメトリの事前テストを含む）を行って、ジオメトリ事前テストの結果を情報（たとえば、ヒント）として提供するときに、オブジェクト０～３をレンダリングするときの速度増加７４５が実現される。図示したように、速度増加７４５は、タイミング図７００Ｃ－２の垂直線７５０ｂとタイミング図７００Ｃ－１の垂直線７５５ｂとの間の時間差である。 Vertical line 750a is aligned with vertical line 755a such that rendering timing diagrams 700C-1 and 700C-2, respectively, begin simultaneously to render objects 0-1. However, the rendering of objects 0-3 shown in rendering timing diagram 700C-2 takes less time than the rendering shown in rendering timing diagram 700C-1. That is, the vertical line 750b indicating the end of the rendering phase for the lower timing diagram 700C-2 appears earlier than the end of the rendering phase for the upper timing diagram 700C-1 indicated by the vertical line 755b. Specifically, when doing multi-GPU rendering of the geometry of an image to an application (including pre-testing the geometry against screen regions before rendering) and providing the result of the geometry pre-test as information (e.g., a hint): A speed increase 745 when rendering objects 0-3 is realized. As illustrated, velocity increase 745 is the time difference between vertical line 750b of timing diagram 700C-2 and vertical line 755b of timing diagram 700C-1.

速度増加は、ジオメトリ事前テストの間に生成された情報の生成及び共有を通して実現される。たとえば、ジオメトリ事前テストの間に、ＧＰＵ－Ａは、オブジェクト０をＧＰＵ－Ｂでレンダリングするだけでよいことを示す情報を生成する。したがって、ＧＰＵ－Ｂにはオブジェクト０をレンダリングすべきであることが通知され、他のＧＰＵ（たとえばＧＰＵ－Ａ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）は、オブジェクト０のレンダリングを完全にスキップし得る。なぜならば、オブジェクト０は、オブジェクトレンダリングのためにこれらのＧＰＵに割り当てられたどの領域（たとえば、対応する組における）ともオーバーラップしないからである。たとえば、これらのＧＰＵはジオメトリ処理ステージを実行する必要はないが、一方で、タイミング図７００Ｃ－１に示すように、これらのＧＰＵがオブジェクト０を完全にはレンダリングしない場合でも、ジオメトリ事前テストなしでこの段階は処理された。また、ジオメトリ事前テストの間に、ＧＰＵ－Ｂは、オブジェクト１はＧＰＵ－Ｃ及びＧＰＵ－Ｄがレンダリングすべきであり、ＧＰＵ－Ａ及びＧＰＵ－Ｂはオブジェクト１のレンダリングを完全にスキップしてもよいことを示す情報を生成する。これは、なぜならば、オブジェクト１は、オブジェクトレンダリングのためにＧＰＵ－ＡまたはＧＰＵ－Ｂに割り当てられたどの領域（たとえば、個々の対応する組における）ともオーバーラップしないからである。また、ジオメトリ事前テストの間に、ＧＰＵ－Ｃは、オブジェクト２が、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、及びＧＰＵ－Ｄによってレンダリングされるべきであり、ＧＰＵ－Ｃはオブジェクト２のレンダリングを完全にスキップしてもよいことを示す情報を生成する。これは、なぜならば、オブジェクト２は、オブジェクトレンダリングのためにＧＰＵ－Ｃに割り当てられたどの領域（たとえば、対応する組における）ともオーバーラップしないからである。さらに、ジオメトリ事前テストの間に、ＧＰＵ－Ｄは、オブジェクト３が、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄによってレンダリングされるべきであり、ＧＰＵ－Ａはオブジェクト３のレンダリングを完全にスキップしてもよいことを示す情報を生成する。これは、なぜならば、オブジェクト３は、オブジェクトレンダリングのためにＧＰＵ－Ａに割り当てられたどの領域（たとえば、対応する組における）ともオーバーラップしないからである。 Speed increases are realized through the generation and sharing of information generated during geometry pre-testing. For example, during geometry pretest, GPU-A generates information indicating that object 0 only needs to be rendered on GPU-B. Thus, GPU-B is informed that it should render object 0, and the other GPUs (eg GPU-A, GPU-C, and GPU-D) may skip rendering object 0 entirely. . This is because object 0 does not overlap any regions (eg, in the corresponding set) allocated to these GPUs for object rendering. For example, these GPUs are not required to run the geometry processing stage, but on the other hand, as shown in timing diagram 700C-1, even if these GPUs do not fully render object 0, without geometry pre-test This stage has been processed. Also during the geometry pretest, GPU-B states that object 1 should be rendered by GPU-C and GPU-D, even if GPU-A and GPU-B skip rendering of object 1 entirely. Generate good information. This is because Object 1 does not overlap any regions (eg, in respective corresponding sets) allocated to GPU-A or GPU-B for object rendering. Also during the geometry pretest, GPU-C states that object 2 should be rendered by GPU-A, GPU-B, and GPU-D, and GPU-C skips rendering of object 2 entirely. Generates information indicating that This is because object 2 does not overlap any regions (eg, in the corresponding set) allocated to GPU-C for object rendering. Additionally, during the geometry pretest, GPU-D states that object 3 should be rendered by GPU-B, GPU-C, and GPU-D, and GPU-A skips rendering of object 3 entirely. Generates information indicating that This is because object 3 does not overlap any region (eg, in the corresponding set) allocated to GPU-A for object rendering.

ジオメトリ事前テストから生成された情報はＧＰＵ間で共有されるため、各ＧＰＵは、どのオブジェクトをレンダリングするかを決定することができる。したがって、ジオメトリ事前テストを行って、テストからの結果がすべてのＧＰＵによって共有された後で、各ＧＰＵは、ジオメトリのどのオブジェクトまたはピースを、対応するＧＰＵがレンダリングする必要があるかに対する情報を有する。たとえば、ＧＰＵ－Ａはオブジェクト２をレンダリングし、ＧＰＵ－Ｂはオブジェクト０、２、及び３をレンダリングし、ＧＰＵ－Ｃはオブジェクト１及び３をレンダリングし、ならびにＧＰＵ－Ｄはオブジェクト１、２、及び３をレンダリングする。 Information generated from the geometry pre-test is shared between GPUs so each GPU can decide which objects to render. Therefore, after the geometry pre-test is done and the results from the test are shared by all GPUs, each GPU has information on which objects or pieces of geometry the corresponding GPU needs to render. . For example, GPU-A renders object 2, GPU-B renders objects 0, 2, and 3, GPU-C renders objects 1 and 3, and GPU-D renders objects 1, 2, and Render 3.

詳細には、ＧＰＵＡはオブジェクト１に対するジオメトリ処理を実行して、オブジェクト１はＧＰＵ－Ｂがスキップできると判定する。なぜならば、オブジェクト１は、オブジェクトレンダリングのためにＧＰＵ－Ｂに割り当てられたどの領域（たとえば、対応する組における）ともオーバーラップしないからである。加えて、オブジェクト１は、ＧＰＵ－Ａによって完全にはレンダリングされない。なぜならば、オブジェクト１は、オブジェクトレンダリングのためにＧＰＵ－Ａに割り当てられたどの領域（たとえば、対応する組における）ともオーバーラップしないからである。オブジェクト１はＧＰＵ－Ｂに割り当てられたどの領域ともオーバーラップしないという判定は、ＧＰＵ－Ｂがオブジェクト１に対するジオメトリ処理を始める前になされるため、ＧＰＵ－Ｂはオブジェクト１のレンダリングをスキップする。 Specifically, GPU A performs geometry processing on object 1 and determines that object 1 can be skipped by GPU-B. This is because Object 1 does not overlap any regions (eg, in the corresponding set) allocated to GPU-B for object rendering. Additionally, Object 1 is not fully rendered by GPU-A. This is because object 1 does not overlap any regions (eg, in the corresponding set) allocated to GPU-A for object rendering. Since the determination that object 1 does not overlap any regions allocated to GPU-B is made before GPU-B begins geometry processing for object 1, GPU-B skips rendering object 1. FIG.

図８Ａ～８Ｂに、スクリーン領域８２０Ａ及び８２０Ｂに対するオブジェクトテストを示す。スクリーン領域はインターリーブされていてもよい（たとえば、スクリーン領域８２０Ａ及び８２０Ｂはディスプレイの一部を示す）。詳細には、オブジェクトのマルチＧＰＵレンダリングを単一画像フレームまたは画像フレーム列内の１つ以上の各画像フレームに対して、スクリーン内のオブジェクトをレンダリングする前にジオメトリテストを実行することによって行う。図示したように、ＧＰＵ－Ａには、スクリーン領域８２０Ａ内のオブジェクトをレンダリングするレスポンシビリティが割り当てられる。ＧＰＵ－Ｂには、スクリーン領域８２０Ｂ内のオブジェクトをレンダリングするレスポンシビリティが割り当てられる。「ジオメトリのピース」に対する情報が生成される。ジオメトリのピースは、オブジェクト全体またはオブジェクトの一部とすることができる。たとえば、ジオメトリのピースは、オブジェクト８１０またはオブジェクト８１０の一部とすることができる。 Figures 8A-8B show object tests for screen areas 820A and 820B. Screen regions may be interleaved (eg, screen regions 820A and 820B represent portions of the display). Specifically, multi-GPU rendering of objects is performed by performing a geometry test on each of one or more image frames in a single image frame or sequence of image frames before rendering the object in the screen. As shown, GPU-A is assigned the responsibility of rendering objects within screen region 820A. GPU-B is assigned the responsibility of rendering objects within screen region 820B. Information for a "piece of geometry" is generated. A piece of geometry can be an entire object or a portion of an object. For example, a piece of geometry can be object 810 or part of object 810 .

図８Ａは、本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするときのスクリーン領域に対するオブジェクトテストを例示する図である。前述したように、ジオメトリのピースはオブジェクトとすることができ、ピースは、対応するドローコールが使用または生成するジオメトリに対応する。ジオメトリ事前テストの間に、オブジェクト８１０は領域８２０Ａとオーバーラップすると判定され得る。すなわち、オブジェクト８１０の部分８１０Ａは領域８２０Ａとオーバーラップする。その場合、ＧＰＵ－Ａにはオブジェクト８１０をレンダリングすることが課される。また、ジオメトリ事前テストの間に、オブジェクト８１０は領域８２０Ｂとオーバーラップすると判定され得る。すなわち、オブジェクト８１０の部分８１０Ｂは領域８２０Ｂとオーバーラップする。その場合、ＧＰＵ－Ｂにもオブジェクト８１０をレンダリングすることが課される。 FIG. 8A is a diagram illustrating object testing for screen regions when multiple GPUs work together to render a single image, according to one embodiment of the present disclosure. As previously mentioned, a piece of geometry can be an object, where the piece corresponds to the geometry that the corresponding draw call uses or produces. During the geometry pretest, it may be determined that object 810 overlaps region 820A. That is, portion 810A of object 810 overlaps area 820A. GPU-A is then tasked with rendering object 810 . Also, during the geometry pre-test, it may be determined that object 810 overlaps region 820B. That is, portion 810B of object 810 overlaps region 820B. In that case, GPU-B is also tasked with rendering object 810 .

図８Ｂは、本開示の一実施形態により、複数のＧＰＵが連携して単一画像フレームをレンダリングするときのスクリーン領域及び／またはスクリーンサブ領域に対するオブジェクトの一部のテストを例示する図である。すなわち、ジオメトリのピースをオブジェクトの一部とすることができる。たとえば、オブジェクト８１０をピースに分割してもよく、ドローコールが使用または生成するジオメトリはジオメトリのより小さいピースに細分割される。一実施形態では、ジオメトリのピースはそれぞれ大まかに、位置キャッシュ及び／またはパラメータキャッシュが割り当てられるサイズである。その場合、ジオメトリテストに間にジオメトリのそれらのより小さいピースに対して、情報（たとえば、ヒントまたはヒント（複数））が生成される。前述したように、情報はレンダリングＧＰＵによって用いられる。 FIG. 8B is a diagram illustrating testing of a portion of an object for screen regions and/or screen sub-regions when multiple GPUs work together to render a single image frame, according to one embodiment of the present disclosure. That is, a piece of geometry can be part of an object. For example, object 810 may be divided into pieces, and the geometry that draw calls use or generate is subdivided into smaller pieces of geometry. In one embodiment, each piece of geometry is roughly the size that a position cache and/or parameter cache are allocated. In that case, information (eg, a hint or hints) is generated for those smaller pieces of geometry during geometry testing. As mentioned above, the information is used by the rendering GPU.

たとえば、オブジェクト８１０はより小さいオブジェクトに分割される。領域テストのために用いるジオメトリのピースは、これらのより小さいオブジェクトに対応する。図示したように、オブジェクト８１０は、ジオメトリのピース「ａ」、「ｂ」、「ｃ」、「ｄ」、「ｅ」、及び「ｆ」に分割される。ジオメトリ事前テストの後に、ＧＰＵ－Ａは、ジオメトリのピース「ａ」、「ｂ」、「ｃ」、「ｄ」、及び「ｅ」のみをレンダリングする。すなわち、ＧＰＵ－Ａは、ジオメトリのピース「ｆ」のレンダリングをスキップすることができる。また、ジオメトリ事前テストの後に、ＧＰＵ－Ｂは、ジオメトリのピース「ｄ」、「ｅ」、及び「ｆ」のみをレンダリングする。すなわち、ＧＰＵ－Ｂは、ジオメトリのピース「ａ」、「ｂ」、及び「ｃ」のレンダリングをスキップすることができる。 For example, object 810 is split into smaller objects. The pieces of geometry used for region testing correspond to these smaller objects. As shown, object 810 is divided into pieces of geometry "a", "b", "c", "d", "e", and "f". After the geometry pre-test, GPU-A renders only pieces of geometry 'a', 'b', 'c', 'd' and 'e'. That is, GPU-A can skip rendering piece "f" of geometry. Also, after the geometry pre-test, GPU-B only renders pieces 'd', 'e', and 'f' of the geometry. That is, GPU-B can skip rendering pieces "a", "b", and "c" of geometry.

一実施形態では、ジオメトリ処理ステージは頂点処理及びプリミティブ処理の両方を行うように構成されているため、ジオメトリ処理ステージにおいてシェーダーを用いてジオメトリのピースに対してジオメトリ事前テストを行うことができる。たとえば、ジオメトリ処理ステージは、ＧＰＵスクリーン領域に対してジオメトリに対する境界錐台をテストすること（ソフトウェアシェーダー動作によって行われ得る）などによって、情報（たとえば、ヒント）を生成する。一実施形態では、このテストは、ハードウェアを通して実施される専用命令または命令（複数）を用いることを通して加速され、その結果、ソフトウェア／ハードウェア解決策が実施される。すなわち、専用命令または命令（複数）を用いて、ジオメトリのピースとスクリーン領域に対するその関係とに関する情報の生成を加速する。たとえば、ジオメトリのピースのプリミティブの頂点の同次座標が、ジオメトリ処理ステージにおけるジオメトリ事前テストに対する命令への入力として提供される。テストによって、各ＧＰＵに対して、プリミティブが、オブジェクトレンダリングのためにそのＧＰＵに対して割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするか否かを示すブール戻り値を、生成してもよい。したがって、対応するジオメトリピースとスクリーン領域に対するその関係とに関してジオメトリ事前テストの間に生成される情報（たとえば、ヒント）が、ジオメトリ処理ステージにおけるシェーダーによって生成される。 In one embodiment, because the geometry processing stage is configured to perform both vertex processing and primitive processing, a shader can be used in the geometry processing stage to perform geometry pre-tests on pieces of geometry. For example, the geometry processing stage generates information (eg, hints), such as by testing the bounding frustum for geometry against the GPU screen region (which may be done by software shader operations). In one embodiment, this testing is accelerated through the use of a dedicated instruction or instructions implemented through hardware, resulting in a software/hardware solution. That is, a dedicated command or commands are used to accelerate the generation of information about pieces of geometry and their relationship to the screen area. For example, the homogeneous coordinates of the vertices of the primitives of the piece of geometry are provided as input to the instructions for the geometry pre-test in the geometry processing stage. By testing, for each GPU, a Boolean return value indicating whether the primitive overlaps any screen regions (e.g., in the corresponding set) allocated to that GPU for object rendering. , may be generated. Thus, the information (eg, hints) generated during geometry pre-testing about the corresponding geometry piece and its relationship to the screen area is generated by the shader in the geometry processing stage.

別の実施形態では、ジオメトリのピースに対するジオメトリ事前テストを、ハードウェアラスタライゼーションステージにおいて行うことができる。たとえば、ハードウェアスキャンコンバータを、ジオメトリ事前テストを実行するように構成して、スキャンコンバータが、対応する画像フレームのオブジェクトレンダリングのために複数のＧＰＵに割り当てられたすべてのスクリーン領域に関する情報を生成するようにしてもよい。 In another embodiment, geometry pre-testing on a piece of geometry can be done at the hardware rasterization stage. For example, the hardware scan converter is configured to perform a geometry pre-test such that the scan converter generates information about all screen regions allocated to multiple GPUs for object rendering of the corresponding image frames. You may do so.

さらなる他の実施形態では、ジオメトリのピースはプリミティブとすることができる。すなわち、ジオメトリ事前テストのために用いるオブジェクトの一部はプリミティブであってもよい。したがって、あるＧＰＵがジオメトリ事前テストの間に生成した情報（たとえばヒント）は、個々の三角形（たとえば、プリミティブを表す）を別のレンダリングＧＰＵがレンダリングする必要があるか否かを示す。 In still other embodiments, a piece of geometry can be a primitive. That is, some of the objects used for geometry pre-testing may be primitives. Thus, information (eg, hints) generated by one GPU during geometry pretest indicates whether individual triangles (eg, representing primitives) should be rendered by another rendering GPU.

一実施形態では、ジオメトリ事前テストの間に生成され、レンダリングのために用いるＧＰＵによって共有される情報には、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするプリミティブの数（たとえば、残存しているプリミティブ数）が含まれる。また情報には、これらのプリミティブを構築または規定するために用いられる頂点の数が含まれていてもよい。すなわち、情報には残存している頂点数が含まれる。したがって、レンダリングするとき、対応するレンダリングＧＰＵは、供給された頂点数を用いて位置キャッシュ及びパラメータキャッシュにスペースを割り当ててもよい。たとえば、一実施形態では、必要でない頂点に割り当てられるスペースはないため、レンダリングの効率が増加し得る。 In one embodiment, the information generated during the geometry pretest and shared by the GPUs used for rendering includes any screen regions (e.g., corresponding set ) and the number of overlapping primitives (eg, number of primitives remaining). Information may also include the number of vertices used to construct or define these primitives. That is, the information includes the number of remaining vertices. Therefore, when rendering, the corresponding rendering GPU may use the supplied vertex count to allocate space for the position and parameter caches. For example, in one embodiment, rendering efficiency may be increased because no space is allocated for vertices that are not needed.

他の実施形態では、ジオメトリ事前テストの間に生成される情報（たとえばヒント）には、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップする特定のプリミティブ（たとえば、完全一致として残存しているプリミティブ）が含まれる。すなわち、レンダリングＧＰＵに対して生成される情報には、レンダリングのために特定の組のプリミティブが含まれる。また情報には、これらのプリミティブを構築または規定するために用いられる特定の頂点が含まれていてもよい。すなわち、レンダリングＧＰＵに対して生成される情報には、レンダリングのために特定の組の頂点が含まれる。この情報によって、たとえば、ジオメトリのピースをレンダリングするときのそのジオメトリ処理ステージ中に他のレンダリングＧＰＵ時間が節約され得る。 In other embodiments, the information (e.g., hints) generated during the geometry pretest may overlap any screen regions (e.g., in the corresponding set) allocated to the corresponding GPU for object rendering. specific primitives (eg, primitives that have survived as exact matches). That is, the information generated for the rendering GPU includes a specific set of primitives for rendering. The information may also include specific vertices used to construct or define these primitives. That is, the information generated for the rendering GPU includes a specific set of vertices for rendering. This information may save other rendering GPU time, for example, during the geometry processing stage when rendering a piece of geometry.

さらに他の実施形態では、ジオメトリテスト中の情報の生成に対応付けられる処理オーバーヘッド（ソフトウェアまたはハードウェアのいずれか）があり得る。その場合、ジオメトリの特定のピースに対しては情報の生成をスキップすることが有用であり得る。すなわち、ヒントとして提供される情報は、特定のオブジェクトに対しては生成されるが、他に対しては生成されない。たとえば、スカイボックスまたは大きな地形ピースを表すジオメトリのピース（たとえば、オブジェクトまたはオブジェクトのピース）には、大きな三角形が含まれていてもよい。その場合、画像フレームまたは画像フレーム列内の１つ以上の各画像フレームのマルチＧＰＵレンダリングのために用いる各ＧＰＵが、ジオメトリのこれらのピースをレンダリングする必要があるという可能性がある。すなわち、対応するジオメトリのピースの特性に応じて、情報を生成してもよいし生成しなくてもよい。 In still other embodiments, there may be processing overhead (either software or hardware) associated with generating information during geometry testing. In that case, it may be useful to skip generating information for certain pieces of geometry. That is, the information provided as hints is generated for certain objects, but not for others. For example, a piece of geometry (eg, an object or a piece of an object) representing a skybox or a large piece of terrain may contain large triangles. In that case, it is possible that each GPU used for multi-GPU rendering of one or more image frames in an image frame or sequence of image frames will need to render these pieces of geometry. That is, information may or may not be generated depending on the characteristics of the corresponding piece of geometry.

図９Ａ～９Ｃに、本開示の一実施形態により、複数のＧＰＵが連携して単一画像をレンダリングするときにスクリーン領域を対応するＧＰＵに割り当てるための種々の方策を例示する。ＧＰＵ処理効率を達成するために、スクリーンを領域に分割するときに種々の技術を用いてもよい。たとえば、領域の数を増加または減少させる（たとえば、正確な領域量を選択するために）、領域をインターリーブする、領域をインターリーブするときに特定のパターンをインターリーブして選択するために領域の数を増加または減少させる等である。たとえば、複数のＧＰＵは、アプリケーションよって生成された画像フレームに対するジオメトリのマルチＧＰＵレンダリングを、対応する画像内のオブジェクトをレンダリングする前にインターリーブされたスクリーン領域に対してジオメトリの事前テストを行うことによって実行するように構成されている。図９Ａ～９Ｃのスクリーン領域の構成は、複数のＧＰＵ間のレンダリング時間のわずかなアンバランスも減らすようにデザインされている。テストの複雑さ（たとえば、対応するスクリーン領域にオーバーラップする）は、スクリーン領域がＧＰＵにどのように割り当てられたかに応じて変化する。図９Ａ～９Ｃに示す図に示すように、太字のボックス９１０は、画像をレンダリングするときに用いる対応するスクリーンまたはディスプレイのアウトラインである。 9A-9C illustrate various strategies for allocating screen area to corresponding GPUs when multiple GPUs work together to render a single image, according to one embodiment of the present disclosure. Various techniques may be used when dividing the screen into regions to achieve GPU processing efficiency. For example, increasing or decreasing the number of regions (e.g., to select the correct amount of region), interleaving regions, or increasing the number of regions to interleave and select a particular pattern when interleaving regions. increase or decrease, and so on. For example, multiple GPUs perform multi-GPU rendering of geometry for image frames generated by an application by pre-testing the geometry against interleaved screen regions before rendering objects in the corresponding images. is configured to The configuration of the screen regions of Figures 9A-9C is designed to reduce even the slight imbalance in rendering times between multiple GPUs. The complexity of the test (eg, overlapping corresponding screen regions) varies depending on how the screen regions were allocated to the GPU. As shown in the diagrams shown in FIGS. 9A-9C, the bold box 910 outlines the corresponding screen or display used when rendering the image.

一実施形態では、複数のスクリーン領域または複数の領域はそれぞれ、均一サイズである。一実施形態では、複数のスクリーン領域はそれぞれ、サイズが均一でない。さらなる他の実施形態では、複数のスクリーン領域におけるスクリーン領域の数及びサイジングは動的に変化する。 In one embodiment, each of the plurality of screen regions or regions is of uniform size. In one embodiment, each of the plurality of screen regions are non-uniform in size. In still other embodiments, the number and sizing of screen areas in the plurality of screen areas dynamically change.

詳細には、図９Ａに、スクリーン９１０に対する簡単なパターン９００Ａを例示する。スクリーン領域はそれぞれ、均一サイズである。たとえば、各領域のサイズは、２ピクセルの累乗である寸法の矩形であってもよい。たとえば、各領域はサイズが２５６×２５６ピクセルであってもよい。図示するように、領域割り当てはチェックボードパターンであり、Ａ及びＢ領域のある行がＢ及びＣ領域の別の行と交互になっている。パターン９００Ａは、ジオメトリ事前テストの間に容易にテストされ得る。しかし、いくつかのレンダリング非効率さが存在し得る。たとえば、各ＧＰＵに割り当てられたスクリーン面積は実質的に異なっている（すなわち、スクリーン９１０内でスクリーン領域Ｃ及び領域Ｄに対するカバレージの方が小さい）。そのため、各ＧＰＵに対するレンダリング時間のアンバランスにつながり得る。 Specifically, FIG. 9A illustrates a simple pattern 900A for screen 910 . Each screen area is of uniform size. For example, the size of each region may be a rectangle with dimensions that are powers of two pixels. For example, each region may be 256×256 pixels in size. As shown, the region allocation is a checkboard pattern, with one row of A and B regions alternating with another row of B and C regions. Pattern 900A can be easily tested during geometry pre-test. However, some rendering inefficiencies may exist. For example, the screen area allocated to each GPU is substantially different (ie, less coverage for screen regions C and D within screen 910). This can lead to an imbalance in rendering times for each GPU.

図９Ｂに、スクリーン９１０に対するスクリーン領域のパターン９００Ｂを例示する。スクリーンまたはサブ領域はそれぞれ、均一サイズである。スクリーン領域は、ＧＰＵ間のレンダリング時間のアンバランスを減らすように割り当てられて分配されている。たとえば、パターン９００ＢでＧＰＵをスクリーン領域に割り当てると、スクリーン９１０にわたって各ＧＰＵに割り当てられるスクリーンピクセルの数量がほぼ等しくなる。すなわち、スクリーン９１０内のスクリーン面積またはカバレージが等しくなるように、スクリーン領域はＧＰＵに割り当てられている。たとえば、各領域が２５６×２５６ピクセルのサイズであり得る場合、スクリーン９１０内での各領域のカバレージはほぼ同じである。詳細には、スクリーン領域Ａの組は、６×２５６×２５６ピクセルのサイズの面積をカバーし、スクリーン領域Ｂの組は、５．７５×２５６×２５６ピクセルのサイズの面積をカバーし、スクリーン領域Ｃの組は、５．５×２５６×２５６ピクセルのサイズの面積をカバーし、スクリーン領域Ｄの組は、５．５×２５６×２５６ピクセルのサイズの面積をカバーする。 FIG. 9B illustrates a pattern 900B of screen areas for screen 910 . Each screen or subregion is of uniform size. Screen area is allocated and distributed to reduce the rendering time imbalance between GPUs. For example, pattern 900B assigns GPUs to screen regions so that the number of screen pixels assigned to each GPU across screen 910 is approximately equal. That is, the screen areas are allocated to the GPUs such that the screen areas or coverage within the screen 910 are equal. For example, if each region may be 256×256 pixels in size, the coverage of each region within screen 910 is approximately the same. Specifically, the set of screen regions A covers an area with a size of 6×256×256 pixels, the set of screen regions B covers an area with a size of 5.75×256×256 pixels, and the screen regions The set of C covers an area of size 5.5×256×256 pixels and the set of screen regions D covers an area of size of 5.5×256×256 pixels.

図９Ｃに、スクリーン９１０に対するスクリーン領域のパターン９００Ｃを例示する。スクリーン領域はそれぞれ、サイズが均一でない。すなわち、ＧＰＵにオブジェクトをレンダリングするレスポンシビリティが割り当てられたスクリーン領域は、サイズが均一でない場合がある。詳細には、各ＧＰＵが同一数のピクセルに割り当てられるようにスクリーン９１０は分割されている。たとえば、４Ｋディスプレイ（３８４０×２１６０）を垂直方向に４つの領域に等しく分割した場合、各領域は高さが５２０ピクセルになる。しかし、通常、ＧＰＵは３２×３２ブロックのピクセルにおいて多くの動作を行い、５２０ピクセルは３２ピクセルの倍数ではない。したがって、一実施形態では、パターン９００Ｃには、高さが５１２ピクセル（３２の倍数）のブロックと、高さが５４４ピクセル（やはり３２の倍数）の他のブロックとが含まれていてもよい。他の実施形態では、異なるサイズのブロックを用いてもよい。パターン９００Ｃは、不均一のスクリーン領域を用いることによって等しい数量のスクリーンピクセルが各ＧＰＵに割り当てられている様子を示している。 A pattern 900C of screen areas for screen 910 is illustrated in FIG. 9C. Each screen area is non-uniform in size. That is, the screen regions assigned responsiveness to rendering objects on the GPU may not be uniform in size. Specifically, the screen 910 is divided so that each GPU is assigned the same number of pixels. For example, if a 4K display (3840×2160) is divided vertically into 4 equal regions, each region will be 520 pixels high. However, GPUs typically do most of the work in 32x32 blocks of pixels, and 520 pixels is not a multiple of 32 pixels. Thus, in one embodiment, pattern 900C may include blocks that are 512 pixels high (a multiple of 32) and other blocks that are 544 pixels high (also a multiple of 32). Other embodiments may use blocks of different sizes. Pattern 900C shows that equal numbers of screen pixels are assigned to each GPU by using non-uniform screen areas.

さらなる他の実施形態では、画像のレンダリングを行うときのアプリケーションのニーズが時間とともに変化し、スクリーン領域が動的に選択される。たとえば、レンダリング時間のほとんどがスクリーンの下半分上で費やされることが分かっている場合、ディスプレイの下半分におけるほぼ等しい数量のスクリーンピクセルが、対応する画像をレンダリングするために用いる各ＧＰＵに割り当てられるように、領域を割り当てることが好都合である。すなわち、対応する画像をレンダリングするために用いる各ＧＰＵに割り当てる領域を動的に変えてもよい。たとえば、ゲームモード、異なるゲーム、スクリーンのサイズ、領域に対して選択されるパターンなどに基づいて、変更を適用してもよい。 In yet another embodiment, the screen area is dynamically selected as the application's needs when rendering images change over time. For example, if it is known that most of the rendering time is spent on the bottom half of the screen, then an approximately equal number of screen pixels in the bottom half of the display should be allocated to each GPU used to render the corresponding image. It is convenient to allocate regions to That is, the area allocated to each GPU used to render the corresponding image may be dynamically changed. For example, changes may be applied based on game modes, different games, screen sizes, patterns selected for regions, and the like.

図１０は、本開示の一実施形態により、ジオメトリ事前テストを行う目的で、ジオメトリのピースに対するＧＰＵの割り当ての種々の分配を例示する図である。すなわち、図１０は、複数のＧＰＵ間でジオメトリ事前テストを行う間に情報を生成するためのレスポンシビリティの分配を示す。前述したように、各ＧＰＵは、画像フレームのジオメトリの対応部分に割り当てられる。その部分はさらに、オブジェクト、オブジェクトの一部、ジオメトリ、ジオメトリのピースなどに分割され得る。ジオメトリ事前テストには、ジオメトリの特定のピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域またはスクリーン領域（複数）とオーバーラップするか否かを判定することが含まれる。ジオメトリ事前テストは通常、実施形態において、対応する画像フレームのすべてのジオメトリ（たとえば、ジオメトリのすべてのピース）に対して、ＧＰＵによって同時に行われる。このように、ジオメトリテストは、ＧＰＵによって連携的に行われ、これによって、前述したように、各ＧＰＵは、ジオメトリのどのピースをレンダリングするか、ジオメトリのどのピースのレンダリングをスキップするかを知ることができる。 FIG. 10 is a diagram illustrating various distributions of GPU allocations to pieces of geometry for the purpose of geometry pre-testing, according to one embodiment of the present disclosure. That is, FIG. 10 illustrates the distribution of responsiveness for generating information during geometry pre-testing among multiple GPUs. As previously mentioned, each GPU is assigned a corresponding portion of the geometry of the image frame. The portion may be further divided into objects, parts of objects, geometries, pieces of geometry, and the like. Geometry pre-testing involves determining whether a particular piece of geometry overlaps any screen region or screen regions allocated to the corresponding GPU for object rendering. Geometry pre-testing is typically performed by the GPU on all geometries (eg, all pieces of geometry) of the corresponding image frame at the same time in embodiments. In this way, the geometry tests are done cooperatively by the GPUs so that, as mentioned above, each GPU knows which pieces of geometry to render and which pieces of geometry to skip rendering. can be done.

図１０に示すように、ジオメトリの各ピースはオブジェクト、オブジェクトの一部などであり得る。たとえば、前述したように、ジオメトリのピースは、オブジェクトの一部（たとえば、大よそ、位置及び／またはパラメータキャッシュが割り当てられるサイズであるピース）であり得る。純粋に説明用に、オブジェクト０（たとえば、レンダリングコマンドバッファ７００Ａ内のコマンド７２２によってレンダリングされると指定されている）を、ピース「ａ」、「ｂ」、「ｃ」、「ｄ」、「ｅ」、及び「ｆ」（たとえば、図８Ｂにおけるオブジェクト８１０）に分割する。また、オブジェクト１（たとえばレンダリングコマンドバッファ７００Ａ内のコマンド７２４によってレンダリングされると指定されている）を、ピース「ｇ」、「ｈ」、及び「ｉ」に分割する。さらに、オブジェクト２（たとえばレンダリングコマンドバッファ７００Ａ内のコマンド７２４によってレンダリングされると指定されている）を、ピース「ｊ」、「ｋ」、「ｌ」、「ｍ」、「ｎ」、及び「ｏ」に分割する。ジオメトリテストに対するレスポンシビリティをＧＰＵに分配する目的で、ピースを順序付けしてもよい（たとえば、ａ～ｏ）。 As shown in FIG. 10, each piece of geometry can be an object, part of an object, or the like. For example, as noted above, a piece of geometry can be a portion of an object (eg, a piece that is approximately the size that a position and/or parameter cache is allocated). Purely for illustrative purposes, let object 0 (eg, designated to be rendered by command 722 in render command buffer 700A) be represented by pieces "a", "b", "c", "d", "e , and “f” (eg, object 810 in FIG. 8B). It also splits object 1 (eg, designated to be rendered by command 724 in rendering command buffer 700A) into pieces "g", "h", and "i". In addition, object 2 (eg, designated to be rendered by command 724 in render command buffer 700A) is rendered by pieces "j", "k", "l", "m", "n", and "o". ”. Pieces may be ordered (eg, a to o) for the purpose of distributing responsiveness to geometry tests to GPUs.

分配１０１０（たとえば、ＡＢＣＤＡＢＣＤＡＢＣＤ．．．行）は、ジオメトリテストに対するレスポンシビリティの、複数のＧＰＵ間での均一分配を示す。詳細には、あるＧＰＵに、ジオメトリの第１の４分の１を取らせて（たとえば、ブロックにおいて、たとえばＧＰＵＡは、ジオメトリテストのために「ａ」、「ｂ」、「ｃ」、及び「ｄ」を含む全体でほぼ１６のピースのうちの第１の４つのピースを取る）、第２のＧＰＵに第２の４分の１を取らせる等ではなく、ＧＰＵに対する割り当てをインターリーブする。すなわち、ジオメトリの連続的なピースが、異なるＧＰＵに割り当てられる。たとえば、ピース「ａ」はＧＰＵ－Ａに割り当てられ、ピース「ｂ」はＧＰＵ－Ｂに割り当てられ、ピース「ｃ」はＧＰＵ－Ｃに割り当てられ、ピース「ｄ」はＧＰＵ－Ｄに割り当てられ、ピース「ｅ」はＧＰＵ－Ａに割り当てられ、ピース「ｆ」はＧＰＵ－Ｂに割り当てられ、ピース「ｇ」はＧＰＵ－Ｃに割り当てられる等である。その結果、ジオメトリテストの処理は、ＧＰＵ（たとえば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）間で大まかにバランスされる。 Distribution 1010 (eg, ABCDABCDABCD... rows) shows the even distribution of responsiveness to geometry tests among multiple GPUs. Specifically, let a GPU take the first quarter of the geometry (e.g., in a block, say GPU A uses 'a', 'b', 'c', and Interleave the assignments to the GPUs instead of taking the first 4 pieces out of a total of approximately 16 pieces including "d"), letting the second GPU take the second quarter, etc. That is, consecutive pieces of geometry are assigned to different GPUs. For example, piece 'a' is assigned to GPU-A, piece 'b' is assigned to GPU-B, piece 'c' is assigned to GPU-C, piece 'd' is assigned to GPU-D, Piece "e" is assigned to GPU-A, piece "f" is assigned to GPU-B, piece "g" is assigned to GPU-C, and so on. As a result, geometry test processing is roughly balanced across GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D).

分配１０２０（たとえば、ＡＢＢＣＤＡＢＢＣＤＡＢＢＣＤ．．．行）は、ジオメトリテストに対するレスポンシビリティの、複数のＧＰＵ間での非対称の分配を示す。非対称の分配は、対応する画像フレームをレンダリングするときに特定のＧＰＵがジオメトリテストを実行する時間が他のＧＰＵよりも長い場合に、好都合であり得る。たとえば、あるＧＰＵは、シーンの以前のフレームまたはフレーム（複数）に対するオブジェクトのレンダリングを、他のＧＰＵよりも早く終了していることがあり、したがって（このフレームもまた、より早く終了することが予想されるため）、そのＧＰＵに、ジオメトリテストを行うためのジオメトリのピースをより多く割り当てることができる。この場合もやはり、ＧＰＵに対する割り当てがインターリーブされる。図示したように、ＧＰＵ－Ｂに、ジオメトリ事前テストのためのジオメトリのピースを他のＧＰＵよりも多く割り当てる。説明するために、ピース「ａ」はＧＰＵ－Ａに割り当てられ、ピース「ｂ」はＧＰＵ－Ｂに割り当てられ、ピース「ｃ」もＧＰＵ－Ｂに割り当てられ、ピース「ｄ」はＧＰＵ－Ｃに割り当てられ、ピース「ｅ」はＧＰＵ－Ｄに割り当てられ、ピース「ｆ」はＧＰＵ－Ａに割り当てられ、ピース「ｇ」はＧＰＵ－Ｂに割り当てられ、ピース「ｈ」もＧＰＵ－Ｂに割り当てられ、ピース「ｉ」はＧＰＵ－Ｃに割り当てられる等である。ＧＰＵに対するジオメトリテストの割り当てはバランスされていない場合があるが、完全なフェーズの複合された処理（たとえば、ジオメトリ事前テスト及びジオメトリのレンダリング）は、大まかにバランスされていることが判明する場合がある（たとえば、各ＧＰＵが、ジオメトリ事前テスト及びジオメトリのレンダリングの実行にほぼ同じ時間を費やす）。 Distribution 1020 (eg, rows ABBCDABBCDABBCD...) indicates an asymmetric distribution of responsiveness to geometry tests among multiple GPUs. Asymmetric distribution may be advantageous if certain GPUs spend more time performing geometry tests than other GPUs when rendering corresponding image frames. For example, one GPU may have finished rendering an object for a previous frame or frames of the scene earlier than other GPUs, and thus (this frame is also expected to finish earlier). ), the GPU can be allocated more pieces of geometry for geometry testing. Again, the assignments to GPUs are interleaved. As shown, GPU-B is allocated more pieces of geometry for geometry pre-test than the other GPUs. To illustrate, piece 'a' is assigned to GPU-A, piece 'b' is assigned to GPU-B, piece 'c' is also assigned to GPU-B, and piece 'd' is assigned to GPU-C. piece 'e' is assigned to GPU-D, piece 'f' is assigned to GPU-A, piece 'g' is assigned to GPU-B, and piece 'h' is also assigned to GPU-B. , piece “i” is assigned to GPU-C, and so on. The allocation of geometry tests to GPUs may not be balanced, but the full phase combined processing (e.g., geometry pre-test and geometry rendering) may turn out to be roughly balanced. (Eg, each GPU spends approximately the same amount of time performing geometry pre-testing and geometry rendering).

図１１Ａ～１１Ｂに、ジオメトリテストに対するレスポンシビリティを複数のＧＰＵ間で割り当てるときに、１つ以上の画像フレームに対する統計値を用いることを例示する。たとえば、統計値に基づいて、いくつかのＧＰＵは、ジオメトリテストの間にジオメトリのピースをより多くまたはより少なく処理して、レンダリングするときに有用な情報を生成してもよい。 11A-11B illustrate the use of statistics for one or more image frames when assigning responsiveness to geometry tests among multiple GPUs. For example, based on statistics, some GPUs may process more or fewer pieces of geometry during geometry testing to produce useful information when rendering.

詳細には、図１１Ａは、本開示の一実施形態により、複数のＧＰＵによる以前の画像フレームのジオメトリの事前テスト及びレンダリングと、レンダリング中に収集した統計値を用いて、現在の画像フレームのジオメトリの事前テストを現在の画像フレームにおける複数のＧＰＵに割り当てることに影響を与えることと、を例示する図である。純粋に説明用に、図１１Ａの第２のフレーム１１００Ｂにおいて、ＧＰＵ－Ｂは、他のＧＰＵ（たとえばＧＰＵ－Ａ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）の２倍の数のジオメトリのピースを処理する（たとえば、事前テストの間に）。ＧＰＵ－Ｂにジオメトリのピースをより多く分配して割り当て、現在の画像フレームにおいてジオメトリ事前テストを実行することは、以前の画像フレームまたは以前の画像フレーム（複数）のレンダリングの間に収集した統計値に基づく。 Specifically, FIG. 11A illustrates the geometry of a current image frame using pre-testing and rendering of the geometry of previous image frames with multiple GPUs and statistics collected during rendering, according to one embodiment of the present disclosure. FIG. 10 illustrates the pre-testing of , and influencing the assignment of the pre-tests to multiple GPUs in the current image frame. Purely for illustrative purposes, in the second frame 1100B of FIG. 11A, GPU-B processes twice as many pieces of geometry as the other GPUs (eg, GPU-A, GPU-C, and GPU-D). (for example, during pre-testing). Allocating more pieces of geometry to GPU-B and performing a geometry pre-test on the current image frame will allow the statistics gathered during the rendering of the previous image frame or previous image frames based on.

たとえば、タイミング図１１００Ａは、以前の画像フレームに対するジオメトリ事前テスト７０１Ａ及びレンダリング７０２Ａを示している。両プロセスに対して、４つのＧＰＵ（たとえば、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄ）を用いている。以前の画像フレームのジオメトリ（たとえば、ジオメトリのピース）の割り当ては、ＧＰＵ間で均一に分配されている。これは、各ＧＰＵによるジオメトリ事前テスト７０１Ａの大まかにバランスされた性能によって示される。 For example, timing diagram 1100A shows geometry pretest 701A and rendering 702A for a previous image frame. Four GPUs (eg, GPU-A, GPU-B, GPU-C, and GPU-D) are used for both processes. The previous image frame geometry (eg, piece of geometry) allocation is evenly distributed among the GPUs. This is shown by the roughly balanced performance of the geometry pretest 701A by each GPU.

１つ以上の画像フレームから収集したレンダリング統計値を用いて、現在の画像フレームのジオメトリテスト及びレンダリングをどのように実行するかを決定してもよい。すなわち、統計値を、以後の画像フレーム（たとえば、現在の画像フレーム）のジオメトリテスト及びレンダリングを行うときに用いるための情報として提供してもよい。たとえば、以前の画像フレームのオブジェクト（たとえば、ジオメトリのピース）のレンダリング中に収集した統計値は、ＧＰＵ－Ｂが他のＧＰＵよりも早くレンダリングを終了したことを示す場合がある。詳細には、ＧＰＵ－Ｂは、オブジェクトレンダリングのためにＧＰＵ－Ｂに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするジオメトリのその一部をレンダリングした後にアイドルタイム１１３０Ａを有する。他のＧＰＵ－Ａ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄはそれぞれ、以前の画像フレームの対応するフレーム周期のほぼ終了７１０までレンダリングを実行する。 Rendering statistics collected from one or more image frames may be used to determine how to perform geometry testing and rendering of the current image frame. That is, the statistics may be provided as information for use in geometry testing and rendering subsequent image frames (eg, the current image frame). For example, statistics collected while rendering objects (eg, pieces of geometry) in previous image frames may indicate that GPU-B finished rendering earlier than the other GPUs. Specifically, GPU-B spends idle time 1130A after rendering that portion of geometry that overlaps any screen regions (e.g., in corresponding sets) allocated to GPU-B for object rendering. have. The other GPU-A, GPU-C, and GPU-D each perform rendering until approximately the end 710 of the corresponding frame period of the previous image frame.

以前の画像フレーム及び現在の画像フレームは、アプリケーションを実行するときに特定のシーンに対して生成され得る。したがって、シーンからシーンへのオブジェクトは、数及び場所がほぼ同様であり得る。その場合、ジオメトリ事前テスト及びレンダリングを行うための時間は、画像フレーム列における複数の画像フレーム間でＧＰＵに対して同様である。すなわち、統計値に基づいて、ＧＰＵ－Ｂが現在の画像フレームにおいてジオメトリテスト及びレンダリングを行うときにもアイドルタイムを有すると推定することは妥当である。したがって、ＧＰＵ－Ｂに、現在フレームにおいてジオメトリ事前テストのためのジオメトリのピースをより多く割り当ててもよい。たとえば、ＧＰＵ－Ｂにジオメトリ事前テストの間にジオメトリのピースをより多く処理させる結果、ＧＰＵ－Ｂは、現在の画像フレームにおいてオブジェクトをレンダリングした後、他のＧＰＵとほぼ同じ時間に終了する。すなわち、ＧＰＵ－Ａ、ＧＰＵ－Ｂ、ＧＰＵ－Ｃ、及びＧＰＵ－Ｄはそれぞれ、現在の画像フレームの対応するフレーム周期のほぼ終了７１１までレンダリングを実行する。一実施形態では、現在の画像フレームをレンダリングする合計時間が短くなり、レンダリング統計値を用いたときに現在の画像フレームをレンダリングする時間がより短くなる。したがって、以前のフレーム及び／または以前のフレーム（複数）のレンダリングに対する統計値を用いて、ジオメトリ事前テスト（たとえば、現在の画像フレームにおいてＧＰＵ間でのジオメトリ（たとえば、ジオメトリのピース）の割り当ての分配）を調整してもよい。 A previous image frame and a current image frame may be generated for a particular scene when running an application. Therefore, objects from scene to scene may be approximately similar in number and location. In that case, the time to perform geometry pre-testing and rendering is similar for the GPU between multiple image frames in an image frame train. That is, based on statistics, it is reasonable to assume that GPU-B also has idle time when doing geometry testing and rendering in the current image frame. Therefore, GPU-B may be allocated more pieces of geometry for geometry pre-test in the current frame. For example, having GPU-B process more pieces of geometry during geometry pre-testing results in GPU-B finishing about the same time as the other GPUs after rendering the object in the current image frame. That is, GPU-A, GPU-B, GPU-C, and GPU-D each perform rendering until approximately the end 711 of the corresponding frame period of the current image frame. In one embodiment, the total time to render the current image frame is reduced, resulting in less time to render the current image frame when using the rendering statistics. Thus, geometry pre-testing (e.g., distribution of geometry (e.g., pieces of geometry) allocations among GPUs in the current image frame) using statistics on the previous frame and/or rendering of previous frame(s) ) may be adjusted.

図１１Ｂは、本開示の一実施形態により、グラフィックス処理を行うための方法を例示するフロー図１１００Ｂであり、複数のＧＰＵによる以前の画像フレームのジオメトリの事前テスト及びレンダリングと、レンダリング中に収集した統計値を用いて、現在の画像フレームのジオメトリの事前テストを現在の画像フレームにおける複数のＧＰＵに割り当てることに影響を与えることと、を含む。図１１Ａの図は、フロー図１１００Ｂの方法において統計値を用いて、画像フレームに対するＧＰＵ間でのジオメトリ（たとえば、ジオメトリのピース）の割り当ての分配を決定することを例示する。前述したように、種々のアーキテクチャには、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって複数のＧＰＵが連携して単一画像をレンダリングすることが含まれていてもよい。たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ内において、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むものなど）内においてである。 FIG. 11B is a flow diagram 1100B illustrating a method for performing graphics processing, pre-testing and rendering geometry of previous image frames with multiple GPUs, and collecting during rendering, according to one embodiment of the present disclosure. using the statistics obtained to influence the assignment of pre-tests of the geometry of the current image frame to multiple GPUs in the current image frame. The diagram of FIG. 11A illustrates using statistics in the method of flow diagram 1100B to determine the allocation distribution of geometry (eg, pieces of geometry) among GPUs for an image frame. As noted above, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application. For example, within one or more cloud gaming servers of a cloud gaming system, or within a standalone system (eg, a personal computer or gaming console that includes a high-end graphics card with multiple GPUs, etc.).

詳細には、１１１０において、本方法は、前述したように、複数のＧＰＵを用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。１１２０において、本方法は、グラフィックスのジオメトリをレンダリングするためのレスポンシビリティを、複数のスクリーン領域に基づいて複数のＧＰＵ間で分割することを含む。各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。より具体的には、前述したように、ＧＰＵはそれぞれ、複数のスクリーン領域のうちの対応する組のスクリーン領域内のジオメトリをレンダリングすることにレスポンシビリティを有している。対応する組のスクリーン領域には、１つ以上のスクリーン領域が含まれる。一実施形態では、スクリーン領域はインターリーブされる（たとえば、ディスプレイが、ジオメトリ事前テスト及びレンダリングのためにスクリーン領域の組に分割されるときに）。 Specifically, at 1110, the method includes rendering graphics for the application using multiple GPUs, as previously described. At 1120, the method includes dividing responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions. Each GPU has a corresponding division of responsiveness known to multiple GPUs. More specifically, as noted above, each GPU has a responsibility to render geometry within a corresponding set of screen regions of a plurality of screen regions. A corresponding set of screen regions includes one or more screen regions. In one embodiment, the screen regions are interleaved (eg, when the display is divided into sets of screen regions for geometry pre-testing and rendering).

１１３０において、本方法は、アプリケーションによって生成された以前の画像フレームの複数のＧＰＵにおいてジオメトリの第１の複数のピースをレンダリングすることを含む。たとえば、タイミング図１１００Ａは、以前の画像フレームにおけるジオメトリのピースのジオメトリテスト及びオブジェクト（たとえば、ジオメトリのピース）のレンダリングを行うタイミングを例示する。１１４０において、本方法は、以前の画像フレームのレンダリングに対する統計値を生成することを含む。すなわち、以前の画像フレームをレンダリングするときに統計値を収集してもよい。 At 1130, the method includes rendering a first plurality of pieces of geometry on the GPU of previous image frames generated by the application. For example, timing diagram 1100A illustrates the timing of geometry testing of a piece of geometry and rendering of an object (eg, piece of geometry) in a previous image frame. At 1140, the method includes generating statistics for the rendering of previous image frames. That is, statistics may be collected when rendering previous image frames.

１１５０において、本方法は、統計値に基づいて、アプリケーションによって生成された現在の画像フレームのジオメトリの第２の複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。すなわち、これらの統計値を用いて、ジオメトリテストに対するジオメトリのピースを、次のまたは現在の画像フレームをレンダリングするときに特定のＧＰＵに、同じで、より少なく、またはより多く割り当ててもよい。ある場合には、ジオメトリテストを行うときに、ジオメトリの第２の複数のピース内のピースを複数のＧＰＵに均一に割り当てなければならないことを、統計値は示すことがある。 At 1150, the method includes assigning a second plurality of pieces of geometry of the current image frame generated by the application to a plurality of GPUs for geometry testing based on the statistical values. That is, these statistics may be used to allocate the same, fewer, or more pieces of geometry for a geometry test to a particular GPU when rendering the next or current image frame. In some cases, statistics may indicate that pieces in the second plurality of pieces of geometry should be evenly distributed among GPUs when geometry testing is performed.

別の場合では、ジオメトリテストを行うときに、ジオメトリの第２の複数のピース内のピースを複数のＧＰＵに不均一に割り当てなければならないことを、統計値は示すことがある。たとえば、時間軸１１００Ａに示すように、以前の画像フレームにおいてＧＰＵ－Ｂが他のＧＰＵのいずれかよりも前にレンダリングを終了すると統計値は示すことがある。詳細には、第２のＧＰＵ（たとえば、ＧＰＵ－Ａ）がジオメトリの第１の複数のピース（たとえば、ジオメトリのピースのその一部）のレンダリングを終了する前に、第１のＧＰＵ（たとえば、ＧＰＵ－Ｂ）がジオメトリの第１の複数のピースのレンダリングを終了したと判定される場合があり得る。前述したように、第１のＧＰＵ（たとえば、ＧＰＵ－Ｂ）は、オブジェクトレンダリングのために第１のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップするジオメトリの第１の複数のピースのうちの１つ以上のピースをレンダリングし、第２のＧＰＵ（たとえば、ＧＰＵ－Ａ）は、オブジェクトレンダリングのために第２のＧＰＵに割り当てられた任意のスクリーン領域とオーバーラップするジオメトリの第１の複数のピースのうちの１つ以上のピースをレンダリングする。したがって、統計値に基づいて、第１のＧＰＵ（たとえば、ＧＰＵ－Ｂ）がジオメトリの第２の複数のピースのレンダリングのために必要とする時間は、第２のＧＰＵ（たとえば、ＧＰＵ－Ａ）よりも短いと予想されるため、現在の画像フレームをレンダリングするときにジオメトリ事前テストのために第１のＧＰＵにジオメトリのピースをより多く割り当ててもよい。たとえば、ジオメトリの第１の数の第２の複数のピースを、ジオメトリテストのために第１のＧＰＵ（たとえば、ＧＰＵ－Ｂ）に割り当ててもよく、ジオメトリの第２の数の第２の複数のピースを、ジオメトリテストのために第２のＧＰＵ（たとえば、ＧＰＵ－Ａ）に割り当ててもよい。第１の数は第２の数よりも大きい（時間アンバランスが十分に大きい場合には、ＧＰＵ－Ａにピースを全く割り当てなくてよい）。このように、ジオメトリテストの間にＧＰＵ－Ｂが処理するジオメトリのピースはＧＰＵ－Ａよりも多い。たとえば、タイミング図１１００Ｂは、ＧＰＵ－Ｂは、ジオメトリのピースがより多く割り当てられていて、ジオメトリテストの実行に他のＧＰＵよりも長い時間を費やすことを示している。 In another case, when performing geometry tests, statistics may indicate that pieces in the second plurality of pieces of geometry should be unevenly allocated to GPUs. For example, a statistic may indicate that GPU-B finished rendering before any of the other GPUs in the previous image frame, as shown in timeline 1100A. Specifically, before the second GPU (eg, GPU-A) finishes rendering the first plurality of pieces of geometry (eg, a portion of the piece of geometry), the first GPU (eg, GPU-A) GPU-B) may be determined to have finished rendering the first plurality of pieces of geometry. As described above, the first GPU (eg, GPU-B) renders the first plurality of pieces of geometry that overlap any screen regions allocated to the first GPU for object rendering. Rendering one or more pieces, a second GPU (e.g., GPU-A) renders a first plurality of pieces of geometry that overlap any screen regions allocated to the second GPU for object rendering. Render one or more pieces of the pieces. Therefore, based on the statistics, the time required by the first GPU (eg, GPU-B) to render the second plurality of pieces of geometry is estimated by the second GPU (eg, GPU-A) , so more pieces of geometry may be allocated to the first GPU for geometry pretesting when rendering the current image frame. For example, a second plurality of pieces of the first number of geometry may be assigned to a first GPU (eg, GPU-B) for geometry testing, and a second plurality of pieces of the second number of geometry , may be assigned to a second GPU (eg, GPU-A) for geometry testing. The first number is greater than the second number (if the time imbalance is large enough, GPU-A may not be allocated any pieces). Thus, GPU-B processes more pieces of geometry during geometry testing than GPU-A. For example, timing diagram 1100B shows that GPU-B is allocated more pieces of geometry and spends more time performing geometry tests than the other GPUs.

１１６０において、本方法は、現在の画像フレームにおいてジオメトリの第２の複数のピースに対してジオメトリ事前テストを行って、ジオメトリの第２の複数のピースの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。ジオメトリ事前テストを、割り当てに基づいて複数のＧＰＵのそれぞれにおいて行う。ジオメトリ事前テストを、事前テストＧＰにおいて、アプリケーションによって生成された画像フレームのジオメトリの複数のピースに対して行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成する。 At 1160, the method performs a geometry pre-test on the second plurality of pieces of geometry in the current image frame to test each piece of the second plurality of pieces of geometry and its on each of the plurality of screen regions. including generating information about relationships. A geometry pre-test is performed on each of the multiple GPUs based on allocation. A geometry pre-test is performed on multiple pieces of geometry of the image frame generated by the application in pre-test GP to generate information about each piece of geometry and its relationship to each of multiple screen regions.

１１７０において、本方法は、ジオメトリの第２の複数のピースのそれぞれに対して生成された情報を用いて、レンダリングフェーズの間にジオメトリの複数のピースをレンダリングすることを含む（たとえば、対応するＧＰＵにおいて、ジオメトリのピースを完全にレンダリングすること、またはジオメトリのそのピースのレンダリングをスキップすることを含む）。レンダリングは通常、実施形態において、各ＧＰＵにおいて同時に行われる。詳細には、現在の画像フレームのジオメトリの複数のピースは、ジオメトリの各ピースに対して生成された情報を用いて、複数のＧＰＵのそれぞれにおいてレンダリングされる。 At 1170, the method includes rendering the plurality of pieces of geometry during the rendering phase using the information generated for each of the second plurality of pieces of geometry (e.g., the corresponding GPU (including completely rendering a piece of geometry, or skipping rendering of that piece of geometry). Rendering is typically done concurrently on each GPU in an embodiment. Specifically, multiple pieces of geometry for the current image frame are rendered on each of multiple GPUs using the information generated for each piece of geometry.

他の実施形態では、情報を生成するためにＧＰＵにジオメトリのピースを分配することは、動的に調整される。すなわち、ジオメトリ事前テストを行うために現在の画像フレームに対してジオメトリのピースを割り当てることは、現在の画像フレームのレンダリングの間に動的に調整してもよい。たとえば、タイミング図１１００Ｂの例では、ＧＰＵ－Ａがジオメトリのその割り当てられたピースのジオメトリ事前テストを、予想よりも遅いレートで行っていたと判定され得る。したがって、ジオメトリ事前テストのためにＧＰＵ－Ａに割り当てられたジオメトリのピースをその場で再割り当てすることができ（たとえば、ジオメトリのピースをＧＰＵ－ＡからＧＰＵ－Ｂに再割り当てする）、ＧＰＵ－Ｂに今度は、現在の画像フレームをレンダリングするために用いるフレーム周期の間に、ジオメトリのそのピースに対してジオメトリ事前テストを実行することが課される。 In other embodiments, the distribution of pieces of geometry to GPUs to generate information is dynamically adjusted. That is, the assignment of pieces of geometry to the current image frame for geometry pretesting may be dynamically adjusted during rendering of the current image frame. For example, in the example of timing diagram 1100B, it may be determined that GPU-A was performing geometry pretests on its assigned piece of geometry at a slower rate than expected. Thus, a piece of geometry that was assigned to GPU-A for geometry pretesting can be reassigned on the fly (e.g., reassign a piece of geometry from GPU-A to GPU-B), and GPU- B is now tasked with performing a geometry pre-test on that piece of geometry during the frame period used to render the current image frame.

図１２Ａ～１２Ｂに、レンダリングコマンドバッファを処理するための別の方策を例示する。以前に、図７Ａ～７Ｃに関連してある方策について説明した。ここでは、コマンドバッファが、オブジェクト（たとえば、ジオメトリのピース）に対してジオメトリ事前テストを行うためのコマンド、それに続いてオブジェクト（たとえば、ジオメトリのピース）をレンダリングするためのコマンドを収容する。図１２Ａ～１２Ｂに、ジオメトリ事前テスト及びレンダリング方策であって、ＧＰＵ構成に応じていずれかの動作を行うことができるシェーダーを用いる方策を示す。 Figures 12A-12B illustrate another strategy for handling the rendering command buffer. Earlier, certain strategies were described in connection with FIGS. 7A-7C. Here, the command buffer contains commands to perform geometry pretests on an object (eg, piece of geometry) followed by commands to render the object (eg, piece of geometry). Figures 12A-12B show a geometry pre-testing and rendering strategy that uses shaders that can do either operation depending on the GPU configuration.

詳細には、図１２Ａは、本開示の一実施形態により、コマンドバッファ１２００Ａの一部を通る２回のパスにおいて画像フレームのジオメトリの事前テスト及びレンダリングの両方を実行するように構成されたシェーダーを用いることを例示する図である。すなわち、コマンドバッファ１２００Ａ内のコマンドを実行するために用いるシェーダーを、適切に構成されたときにジオメトリ事前テストを実行するか、または適切に構成されたときにレンダリングを実行するように構成してもよい。 Specifically, FIG. 12A illustrates a shader configured to both pre-test and render the geometry of an image frame in two passes through a portion of command buffer 1200A, according to one embodiment of the present disclosure. FIG. 10 is a diagram illustrating the use; That is, the shader used to execute the commands in command buffer 1200A may be configured to either perform geometry pre-testing when properly configured, or render when properly configured. good.

図示したように、図１２Ａに示すコマンドバッファ１２００Ａの一部を２回実行し、各実行から異なる動作が生じる。第１の実行はジオメトリ事前テストをもたらし、第２の実行はジオメトリのレンダリングをもたらす。これは種々の方法で達成することができる。たとえば、１２００Ａに示すコマンドバッファの一部を、サブルーチンとして２回、明示的に呼び出すことができる。各呼び出しの前に、異なる状態（たとえば、レジスタ設定またはＲＡＭ内の値）が、異なる値に明示的に設定されている。代替的に、１２００Ａに示すコマンドバッファの一部を暗黙的に２回実行することが、たとえば、特別なコマンドを用いて、その一部の開始及び終了をマークして２回実行し、またコマンドバッファのその一部の第１及び第２の実行に対して異なる構成（たとえば、レジスタ設定）を暗黙的に設定することによって、可能である。コマンドバッファ１２００Ａの一部におけるコマンド（たとえば、状態を設定するコマンドまたはシェーダーを実行するコマンド）が実行されたときに、ＧＰＵ状態に基づいて、コマンドの結果は異なる（たとえば、ジオメトリ事前テストを行うこと対レンダリングを行うことになる）。すなわち、コマンドバッファ１２００Ａ内のコマンドを、ジオメトリ事前テストまたはレンダリングのために構成してもよい。詳細には、コマンドバッファ１２００Ａの一部は、レンダリングコマンドバッファ１２００Ａからのコマンドを実行する１つ以上のＧＰＵの状態を構成するためのコマンドと、状態に応じてジオメトリ事前テストまたはレンダリングのいずれかを実行するシェーダーを実行するためのコマンドとを含む。たとえば、コマンド１２１０、１２１２、１２１４、及び１２１６はそれぞれ、状態に応じてジオメトリ事前テストまたはレンダリングのいずれかを実行するシェーダーを実行する目的で、１つ以上のＧＰＵの状態を構成するために用いられる。図示したように、コマンド１２１０は、シェーダー０がコマンド１２１１を介して実行されてジオメトリ事前テストまたはレンダリングのいずれかを実行し得るように、ＧＰＵ状態を構成する。またコマンド１２１２は、シェーダー１がコマンド１２１３を介して実行されてジオメトリ事前テストまたはレンダリングを実行し得るように、ＧＰＵ状態を構成する。加えて、コマンド１２１４は、シェーダー２がコマンド１２１５を介して実行されてジオメトリ事前テストまたはレンダリングのいずれかを実行し得るように、ＧＰＵ状態を構成する。最後に、コマンド１２１６は、シェーダー３がコマンド１２１７を介して実行されてジオメトリ事前テストまたはレンダリングのいずれかを実行し得るように、ＧＰＵ状態を構成する。 As shown, the portion of command buffer 1200A shown in FIG. 12A is executed twice, with different actions resulting from each execution. The first execution results in geometry pre-testing and the second execution results in geometry rendering. This can be achieved in various ways. For example, the portion of the command buffer shown at 1200A can be explicitly called twice as a subroutine. Before each call, different states (eg, register settings or values in RAM) are explicitly set to different values. Alternatively, the portion of the command buffer shown at 1200A can be implicitly executed twice, for example by using special commands to mark the beginning and end of that portion and executing it twice, and This is possible by implicitly setting different configurations (eg, register settings) for the first and second executions of that portion of the buffer. When a command in a portion of command buffer 1200A is executed (e.g., a command to set state or a command to run a shader), based on the GPU state, the result of the command will be different (e.g., do a geometry pre-test). rendering). That is, commands in command buffer 1200A may be configured for geometry pre-testing or rendering. Specifically, a portion of command buffer 1200A contains commands for configuring one or more GPU states that execute commands from rendering command buffer 1200A and, depending on the state, either geometry pretest or rendering. and commands to run shaders to run. For example, commands 1210, 1212, 1214, and 1216 are each used to configure one or more GPU states for the purpose of executing shaders that either perform geometry pretesting or rendering depending on the state. . As shown, command 1210 configures the GPU state such that shader 0 may be executed via command 1211 to perform either geometry pretesting or rendering. Command 1212 also configures the GPU state so that shader 1 may be executed via command 1213 to perform geometry pretesting or rendering. Additionally, command 1214 configures the GPU state so that shader 2 can be executed via command 1215 to perform either geometry pretesting or rendering. Finally, command 1216 configures the GPU state so that shader 3 can be executed via command 1217 to perform either geometry pretesting or rendering.

コマンドバッファ１２００Ａを通る第１の横断１２９１では、前述したように明示的または黙示的に設定されるＧＰＵ状態、ならびにコマンド１２１０、１２１２、１２１４、及び１２１６によって構成されるＧＰＵ状態に基づいて、対応するシェーダーがジオメトリ事前テストを実行する。たとえば、シェーダー０は、オブジェクト０（たとえば、ジオメトリのピース）（たとえば、図７Ｂ－１に示すオブジェクトに基づいて）に対してジオメトリ事前テストを実行するように構成され、シェーダー１は、オブジェクト１に対してジオメトリ事前テストを実行するように構成され、シェーダー２は、オブジェクト２に対してジオメトリ事前テストを実行するように構成され、またシェーダー３は、オブジェクト３に対してジオメトリ事前テストを実行するように構成されている。 In a first traversal 1291 through command buffer 1200A, the corresponding Shaders perform geometry pretests. For example, shader 0 is configured to perform a geometry pre-test on object 0 (eg, a piece of geometry) (eg, based on the object shown in FIG. 7B-1), shader 1 is configured to perform Shader 2 is configured to perform a geometry pre-test on object 2, and Shader 3 is configured to perform a geometry pre-test on object 3. is configured to

一実施形態では、ＧＰＵ状態に基づいて、コマンドをスキップするかまたは異なって解釈してもよい。たとえば、状態を設定する特定のコマンド（１２１０、１２１２、１２１４、及び１２１６の部分）を、前述したように明示的または黙示的に設定されるＧＰＵ状態に基づいてスキップしてもよい。たとえば、コマンド１２１０を介して実行されるシェーダー０を構成するときに、ジオメトリ事前テストのために構成する必要があるＧＰＵ状態が、ジオメトリのレンダリングのために構成するときよりも少ない場合、ＧＰＵ状態の不要な部分の設定をスキップすることが有用であり得る。なぜならば、ＧＰＵ状態を設定するとオーバーヘッドを持ち得るからである。別の例を示すために、状態（１２１０、１２１２、１２１４、及び１２１６の一部）を設定する特定のコマンドを、前述したように明示的または黙示的に設定されるＧＰＵ状態に基づいて異なって解釈してもよい。たとえば、コマンド１２１０を介して実行されるシェーダー０が、ジオメトリのレンダリングのために構成される場合とは異なるＧＰＵ状態を、ジオメトリの事前テストのために構成する必要がある場合、またはコマンド１２１０を介して実行されるシェーダー０が、ジオメトリ事前テストとジオメトリのレンダリングとの場合で異なる入力を必要とする場合である。 In one embodiment, commands may be skipped or interpreted differently based on GPU state. For example, certain commands that set state (parts 1210, 1212, 1214, and 1216) may be skipped based on the GPU state that is explicitly or implicitly set as described above. For example, when configuring shader 0, which is executed via command 1210, less GPU state needs to be configured for geometry pretest than when configuring for rendering geometry. It may be useful to skip unnecessary parts of the configuration. This is because setting the GPU state can have overhead. To give another example, the specific commands that set states (parts of 1210, 1212, 1214, and 1216) differ based on the GPU state that is explicitly or implicitly set as described above. may be interpreted. For example, if shader 0 executed via command 1210 needs to configure a different GPU state for geometry pre-testing than it is configured for rendering geometry, or A case where shader 0, which runs in , requires different inputs for geometry pretest and geometry rendering.

一実施形態では、ジオメトリ事前テストのために構成されたシェーダーは、前述したように、位置及びパラメータキャッシュにスペースを割り当てない。別の実施形態では、単一のシェーダーを用いて事前テストまたはレンダリングのいずれかを実行する。これは種々の方法で行うことができる。たとえば、シェーダーがチェックすることができる外部のハードウェア状態を介して（たとえば、前述したように明示的または黙示的に設定されるように）、またはシェーダーに対する入力を介して（たとえば、コマンドバッファを通る第１及び第２のパスにおいて異なって解釈されるコマンドによって設定されるように）行うことができる。 In one embodiment, shaders configured for geometry pretest do not allocate space for position and parameter caches, as described above. In another embodiment, either pre-testing or rendering is performed using a single shader. This can be done in various ways. For example, through an external hardware state that the shader can check (e.g., as set explicitly or implicitly as described above), or through input to the shader (e.g., the command buffer (as set by commands that are interpreted differently on the first and second pass through).

コマンドバッファ１２００Ａを通る第２の横断１２９２では、前述したように明示的または黙示的に設定されるＧＰＵ状態、ならびにコマンド１２１０、１２１２、１２１４、及び１２１６によって構成されるＧＰＵ状態に基づいて、対応するシェーダーは、対応する画像フレームに対するジオメトリのピースのレンダリングを実行する。たとえば、シェーダー０は、オブジェクト０（たとえば、ジオメトリのピース）のレンダリングを実行するように構成されている（たとえば、図７Ｂ－１に示すオブジェクトに基づいて）。また、シェーダー１はオブジェクト１のレンダリングを実行するように構成されており、シェーダー２はオブジェクト２のレンダリングを実行するように構成されており、またシェーダー３はオブジェクト３のレンダリングを実行するように構成されている。 In a second traversal 1292 through command buffer 1200A, the corresponding A shader performs the rendering of a piece of geometry for the corresponding image frame. For example, shader 0 is configured to perform rendering of object 0 (eg, piece of geometry) (eg, based on the object shown in FIG. 7B-1). Also, shader 1 is configured to render object 1, shader 2 is configured to render object 2, and shader 3 is configured to render object 3. It is

図１２Ｂは、本開示の一実施形態により、コマンドバッファの一部を通る２回のパスにおいて同じ組のシェーダーを用いて画像フレームのジオメトリの事前テスト及びレンダリングの両方を行うことを含むグラフィックス処理を行うための方法を例示するフロー図１２００Ｂである。前述したように、種々のアーキテクチャには、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって複数のＧＰＵが連携して単一画像をレンダリングすることが含まれていてもよい。たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ内において、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むもの）内において等である。 FIG. 12B illustrates graphics processing that includes both pre-testing and rendering the geometry of an image frame using the same set of shaders in two passes through a portion of the command buffer, according to one embodiment of the present disclosure; 1200B is a flow diagram illustrating a method for performing. As noted above, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application. For example, within one or more cloud gaming servers of a cloud gaming system, or within a standalone system (eg, a personal computer or gaming console that includes a high-end graphics card with multiple GPUs), and the like.

詳細には、１２１０において、本方法は、前述したように、複数のＧＰＵを用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。１２２０において、本方法は、グラフィックスのジオメトリをレンダリングするためのレスポンシビリティを、複数のスクリーン領域に基づいて複数のＧＰＵ間で分割することを含む。各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。より具体的には、前述したように、ＧＰＵはそれぞれ、複数のスクリーン領域のうちの対応する組のスクリーン領域内のジオメトリをレンダリングすることにレスポンシビリティを有している。対応する組のスクリーン領域には、１つ以上のスクリーン領域が含まれる。一実施形態では、スクリーン領域はインターリーブされる（たとえば、ディスプレイが、ジオメトリ事前テスト及びレンダリングのためにスクリーン領域の組に分割されるときに）。 Specifically, at 1210, the method includes rendering graphics for the application using multiple GPUs, as previously described. At 1220, the method includes dividing responsiveness for rendering graphics geometry among multiple GPUs based on multiple screen regions. Each GPU has a corresponding division of responsiveness known to multiple GPUs. More specifically, as noted above, each GPU has a responsibility to render geometry within a corresponding set of screen regions of a plurality of screen regions. A corresponding set of screen regions includes one or more screen regions. In one embodiment, the screen regions are interleaved (eg, when the display is divided into sets of screen regions for geometry pre-testing and rendering).

１２３０において、本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。詳細には、複数のＧＰＵのそれぞれを、ジオメトリテストを行う目的で、画像フレームに対応付けられるジオメトリの対応部分に割り当てる。前述したように、ジオメトリのピースの割り当てを均一または不均一に分配してもよい。実施形態において、各部分には、ジオメトリの１つ以上のピースが含まれているか、または潜在的にジオメトリのピースが全く含まれていない。 At 1230, the method includes assigning multiple pieces of geometry for the image frame to multiple GPUs for geometry testing. Specifically, each of the multiple GPUs is assigned a corresponding portion of the geometry associated with the image frame for the purpose of geometry testing. As previously mentioned, the allocation of pieces of geometry may be evenly or unevenly distributed. In embodiments, each portion includes one or more pieces of geometry, or potentially no pieces of geometry.

１２４０において、本方法は、１つ以上のシェーダーをジオメトリ事前テストを実行するように構成する第１のＧＰＵ状態をロードすることを含む。たとえば、ＧＰＵ状態に応じて、対応するシェーダーを異なる動作を実行するように構成してもよい。したがって、第１のＧＰＵ状態は、対応するシェーダーをジオメトリ事前テストを実行するように構成する。図１２Ａの例では、これは種々の方法で設定することができる。たとえば、前述したように、１２００Ａに示したコマンドバッファの一部に外部から状態を明示的または黙示的に設定することによって行う。詳細には、ＧＰＵ状態を種々の方法で設定してもよい。たとえば、ＣＰＵまたはＧＰＵは、ランダムアクセスメモリ（ＲＡＭ）内に値を設定することができる。ＧＰＵはＲＡＭ内の値をチェックする。別の例では、状態はＧＰＵの内部である可能性がある。たとえば、コマンドバッファをサブルーチンとして２回呼び出して、内部のＧＰＵ状態が２つのサブルーチン呼び出しの間で異なるときである。代替的に、図１２Ａのコマンド１２１０を、前述したように、明示的または黙示的に設定される状態に基づいて、異なって解釈するかまたはスキップすることができる。この第１のＧＰＵ状態に基づいて、コマンド１２１１によって実行されるシェーダー０は、ジオメトリ事前テストを実行するように構成されている。 At 1240, the method includes loading a first GPU state that configures one or more shaders to perform geometry pretests. For example, depending on the GPU state, the corresponding shader may be configured to perform different operations. Accordingly, the first GPU state configures the corresponding shader to perform geometry pretests. In the example of FIG. 12A, this can be set in various ways. For example, as described above, by explicitly or implicitly setting a state externally in the portion of the command buffer shown at 1200A. Specifically, the GPU state may be set in various ways. For example, a CPU or GPU can set values in random access memory (RAM). The GPU checks the values in RAM. In another example, the state may be internal to the GPU. For example, when a command buffer is called twice as a subroutine and the internal GPU state is different between the two subroutine calls. Alternatively, the command 1210 of FIG. 12A can be interpreted differently or skipped based on the state that is explicitly or implicitly set as described above. Based on this first GPU state, shader 0 executed by command 1211 is configured to perform a geometry pre-test.

１２５０において、本方法は、複数のＧＰＵにおいてジオメトリの複数のピースに対してジオメトリ事前テストを行って、ジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する情報を生成することを含む。前述したように、ジオメトリ事前テストは、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするか否かを判定してもよい。ジオメトリ事前テストは通常、実施形態において、ＧＰＵにより、対応する画像フレームのすべてのジオメトリに対して同時に行われるため、各ＧＰＵは、ジオメトリのどのピースをレンダリングするか、ジオメトリのどのピースをスキップするかを知ることができる。これにより、コマンドバッファを通る第１の横断が終了する。シェーダーを、ＧＰＵ状態に応じて、ジオメトリ事前テスト及び／またはレンダリングのそれぞれを実行するように構成してもよい。 At 1250, the method includes performing geometry pretests on multiple pieces of geometry on multiple GPUs to generate information about each piece of geometry and its relationship to each of multiple screen regions. As described above, the geometry pre-test determines whether a piece of geometry overlaps any screen regions (e.g., in the corresponding set) allocated to the corresponding GPU for object rendering. good too. Geometry pre-testing is typically done in an embodiment by a GPU on all the geometry of the corresponding image frame at the same time, so each GPU knows which piece of geometry to render and which piece of geometry to skip. can know This concludes the first traversal through the command buffer. Shaders may be configured to perform geometry pre-testing and/or rendering, respectively, depending on GPU state.

１２６０において、本方法は、１つ以上のシェーダーをレンダリングを実行するように構成する第２のＧＰＵ状態をロードすることを含む。前述したように、ＧＰＵ状態に応じて、対応するシェーダーを異なる動作を実行するように構成してもよい。したがって、第２のＧＰＵ状態は、対応するシェーダー（ジオメトリ事前テストを実行するために以前に用いた同じシェーダー）を、レンダリングを実行するように構成する。図１２Ａの例では、この第２のＧＰＵ状態に基づいて、コマンド１２１１によって実行されるシェーダー０は、レンダリングを実行するように構成されている。 At 1260, the method includes loading a second GPU state that configures one or more shaders to perform rendering. As noted above, depending on the GPU state, the corresponding shader may be configured to perform different operations. Therefore, the second GPU state configures the corresponding shader (the same shader previously used to perform the geometry pretest) to perform rendering. In the example of FIG. 12A, based on this second GPU state, shader 0 executed by command 1211 is configured to perform rendering.

１２７０において、本方法は、複数のＧＰＵのそれぞれにおいて、ジオメトリの複数のピースをレンダリングするときに、ジオメトリの複数のピースのそれぞれに対して生成された情報を用いることを含む（たとえば、対応するＧＰＵにおいてジオメトリのピースを完全にレンダリングすることまたはジオメトリのそのピースのレンダリングをスキップすることを含むために）。前述したように、情報は、ジオメトリのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられた任意のスクリーン領域（たとえば、対応する組における）とオーバーラップするか否かを示すことがある。この情報を、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースのそれぞれをレンダリングするために用いてもよく、各ＧＰＵは、オブジェクトレンダリングのためにその対応するＧＰＵに割り当てられた少なくとも１つのスクリーン（たとえば、対応する組における）とオーバーラップするジオメトリのピースのみを、効率的にレンダリングすることができる。これにより、コマンドバッファを通る第２の横断が終了する。シェーダーを、ＧＰＵ状態に応じて、ジオメトリ事前テスト及び／またはレンダリングのそれぞれを実行するように構成してもよい。 At 1270, the method includes using the information generated for each of the plurality of pieces of geometry when rendering the plurality of pieces of geometry on each of the plurality of GPUs (e.g., the corresponding GPU (to include rendering a piece of geometry entirely or skipping the rendering of that piece of geometry). As previously mentioned, the information may indicate whether a piece of geometry overlaps any screen regions (eg, in the corresponding set) allocated to the corresponding GPU for object rendering. This information may be used to render each of the multiple pieces of geometry on each of multiple GPUs, each GPU having at least one screen (e.g., , in the corresponding set) can be efficiently rendered. This concludes the second traversal through the command buffer. Shaders may be configured to perform geometry pre-testing and/or rendering, respectively, depending on GPU state.

図１３Ａ～１３Ｂに、レンダリングコマンドバッファを処理するための別の方策を例示する。以前に、図７Ａ～７Ｃに関連してある方策について説明した。ここでは、コマンドバッファが、オブジェクト（たとえば、ジオメトリのピース）のジオメトリ事前テストに対するコマンド、それに続いてオブジェクト（たとえば、ジオメトリのピース）をレンダリングするためのコマンドを収容する。また図１２Ａ～１２Ｂにおいて、ＧＰＵ構成に応じていずれかの動作を行うことができるシェーダーを用いる別の方策について説明した。図１３Ａ～１３Ｂに、ジオメトリ事前テストまたはレンダリングのいずれかを行うことができるシェーダーを用いるジオメトリテスト及びレンダリング方策を示す。ここでは、本開示の実施形態により、ジオメトリ事前テスト及びレンダリングのプロセスが、ジオメトリの異なる組のピースに対してインターリーブされる。 Figures 13A-13B illustrate another strategy for handling the rendering command buffer. Earlier, certain strategies were described in connection with FIGS. 7A-7C. Here, the command buffer contains commands for geometry pre-testing of objects (eg, pieces of geometry) followed by commands to render the objects (eg, pieces of geometry). Also in FIGS. 12A-12B, another approach using shaders that can perform either operation depending on the GPU configuration was described. Figures 13A-13B illustrate geometry testing and rendering strategies using shaders that can either perform geometry pre-testing or rendering. Here, according to embodiments of the present disclosure, the process of geometry pre-testing and rendering is interleaved for different sets of pieces of geometry.

詳細には、図１３Ａは、ジオメトリ事前テスト及びレンダリングの両方を実行するように構成されたシェーダーを用いることを例示する図である。本開示の一実施形態により、ジオメトリの異なる組のピースに対して行われるジオメトリ事前テスト及びレンダリングは、対応するコマンドバッファ１３００Ａの別個の部分を用いてインターリーブされる。すなわち、コマンドバッファ１３００Ａの一部を開始から終了まで実行するのでなくて、ジオメトリ事前テスト及びレンダリングがジオメトリの異なる組のピースに対してインターリーブされるように、コマンドバッファ１３００Ａを動的に構成して実行する。たとえば、コマンドバッファにおいて、いくつかのシェーダー（たとえば、コマンド１３１１及び１３１３を介して実行される）は、ジオメトリの第１の組のピースに対してジオメトリ事前テストを行うように構成されている。ジオメトリテストを行った後で、これらの同じシェーダー（たとえば、コマンド１３１１及び１３１３によって実行される）は次に、レンダリングを行うように構成されている。ジオメトリの第１の組のピースに対してレンダリングを行った後で、コマンドバッファ内の他のシェーダー（たとえば、コマンド１３１５及び１３１７を介して実行される）は、ジオメトリの第２の組のピースに対してジオメトリ事前テストを行うように構成されている。ジオメトリ事前テストを行った後で、これらの同じシェーダー（たとえば、コマンド１３１５及び１３１７を介して実行される）は次に、レンダリングを行うように構成されており、レンダリングはジオメトリの第２の組のピースに対してこれらのコマンドを用いて行われる。この方策の利点は、ＧＰＵ間のアンバランスに動的に対処することが、たとえば、レンダリングの全体にわたってジオメトリテストの非対称のインターリーブを用いることによって可能となることである。ジオメトリテストの非対称のインターリーブの例は、以前に図１０の分配１０２において導入した。 Specifically, FIG. 13A is a diagram illustrating using a shader configured to perform both geometry pre-testing and rendering. According to one embodiment of the present disclosure, geometry pre-testing and rendering performed on different sets of pieces of geometry are interleaved using separate portions of the corresponding command buffer 1300A. That is, instead of running a portion of command buffer 1300A from start to finish, dynamically configure command buffer 1300A such that geometry pretesting and rendering are interleaved for different sets of pieces of geometry. Execute. For example, in the command buffer, some shaders (eg, executed via commands 1311 and 1313) are configured to perform geometry pretests on a first set of pieces of geometry. After performing geometry testing, these same shaders (executed by commands 1311 and 1313, for example) are then configured to render. After rendering to the first set of pieces of geometry, other shaders in the command buffer (e.g., executed via commands 1315 and 1317) render to the second set of pieces of geometry. It is configured to perform a geometry pre-test against After performing the geometry pretest, these same shaders (executed, for example, via commands 1315 and 1317) are then configured to render a second set of geometry. Pieces are done with these commands. The advantage of this strategy is that it allows dynamic handling of imbalances between GPUs, for example, by using asymmetric interleaving of geometry tests throughout rendering. An example of asymmetric interleaving of geometry tests was previously introduced in distribution 102 of FIG.

ジオメトリ事前テスト及びレンダリングのインターリーブは動的に行われるため、ＧＰＵの構成（たとえば、レジスタ設定またはＲＡＭ内の値を介して）は暗黙的に行われる。つまり、ＧＰＵ構成の態様はコマンドバッファの外部で起こる。たとえば、ＧＰＵレジスタを、０（ジオメトリ事前テストを行うべきであることを示す）または１（レンダリングを行うべきであることを示す）に設定してもよい。コマンドバッファのインターリーブの横断及びこのレジスタの設定を、処理されるオブジェクトの数、処理されるプリミティブ、ＧＰＵ間のアンバランスなどに基づいて、ＧＰＵによって制御してもよい。代替的に、ＲＡＭ内の値を用いることができる。この外部構成（コマンドバッファに外部から設定することを意味する）の結果として、コマンドバッファ１３００Ａの一部におけるコマンド（たとえば、状態を設定するコマンドまたはシェーダーを実行するコマンド）が実行されたときに、ＧＰＵ状態に基づいて、コマンドの結果は異なる（たとえば、ジオメトリ事前テストを行うこと対レンダリングを行うことになる）。すなわち、コマンドバッファ１３００Ａ内のコマンドを、ジオメトリ事前テスト１３９１またはレンダリング１３９２に対して構成してもよい。詳細には、コマンドバッファ１３００Ａの一部は、レンダリングコマンドバッファ１３００Ａからのコマンドを実行する１つ以上のＧＰＵの状態を構成するためのコマンドと、状態に応じてジオメトリ事前テストまたはレンダリングのいずれかを実行するシェーダーを実行するためのコマンドとを含む。たとえば、コマンド１３１０、１３１２、１３１４、及び１３１６はそれぞれ、状態に応じてジオメトリ事前テストまたはレンダリングのいずれかを実行するシェーダーを実行する目的で、ＧＰＵの状態を構成するために用いられる。図示したように、コマンドバッファ１３１０は、シェーダー０がオブジェクト０のジオメトリ事前テストまたはレンダリングのいずれかを行うようにコマンド１３１１を介して実行され得るように、ＧＰＵ状態を構成する。また、コマンドバッファ１３１２は、シェーダー１がオブジェクト１のジオメトリ事前テストまたはレンダリングのいずれかを行うようにコマンド１３１３を介して実行され得るように、ＧＰＵ状態を構成する。また、コマンドバッファ１３１４は、シェーダー２がオブジェクト２のジオメトリ事前テストまたはレンダリングのいずれかを行うようにコマンド１３１５を介して実行され得るように、ＧＰＵ状態を構成する。さらに、コマンドバッファ１３１６は、シェーダー３がオブジェクト３のジオメトリ事前テストまたはレンダリングのいずれかを行うようにコマンド１３１７を介して実行され得るように、ＧＰＵ状態を構成する。 Geometry pre-testing and interleaving of rendering are done dynamically, so configuration of the GPU (eg, via register settings or values in RAM) is done implicitly. That is, aspects of GPU configuration occur outside the command buffer. For example, a GPU register may be set to 0 (indicating that geometry pre-testing should be done) or 1 (indicating that rendering should be done). The command buffer interleave traversal and setting of this register may be controlled by the GPU based on the number of objects processed, primitives processed, imbalance between GPUs, and so on. Alternatively, values in RAM can be used. As a result of this external configuration (meaning setting the command buffer externally), when a command in a portion of the command buffer 1300A (e.g., a command to set state or a command to execute a shader) is executed, Based on the GPU state, the command will have different results (e.g., do a geometry pretest versus do a render). That is, the commands in command buffer 1300A may be configured for geometry pretest 1391 or rendering 1392 . Specifically, some of the command buffers 1300A include commands to configure the state of one or more GPUs executing commands from the rendering command buffer 1300A, and depending on the state, either geometry pretesting or rendering. and commands to run shaders to run. For example, commands 1310, 1312, 1314, and 1316 are each used to configure the state of the GPU for the purpose of executing shaders that either perform geometry pretesting or rendering depending on the state. As shown, command buffer 1310 configures the GPU state such that shader 0 can be executed via commands 1311 to either geometry pretest or render object 0 . Command buffer 1312 also configures the GPU state so that shader 1 can be executed via commands 1313 to either geometry pre-test or render object 1 . Command buffer 1314 also configures the GPU state so that shader 2 can be executed via commands 1315 to either perform geometry pre-testing or rendering of object 2 . Additionally, command buffer 1316 configures the GPU state so that shader 3 can be executed via commands 1317 to either perform geometry pretesting or rendering of object 3 .

ジオメトリ事前テスト及びレンダリングを、ジオメトリの異なる組のピースに対してインターリーブしてもよい。例示のみを目的として、コマンドバッファ１３００Ａを、１番目にオブジェクト０及び１のジオメトリ事前テスト及びレンダリングを実行するように構成してもよく、次にコマンドバッファ１３００Ａは、２番目にオブジェクト２及び３のジオメトリ事前テスト及びレンダリングを実行するように構成されている。当然のことながら、ジオメトリの異なる数のピースを異なるセクションにおいてインターリーブしてもよい。たとえば、セクション１はコマンドバッファ１３００Ａを通る第１の横断を示す。前述したように暗黙的に設定されるＧＰＵ状態、ならびにコマンド１３１０及び１３１２によって構成されるＧＰＵ状態に基づいて、対応するシェーダーはジオメトリ事前テストを実行する。たとえば、シェーダー０は、オブジェクト０（たとえば、ジオメトリのピース）（たとえば、図７Ｂ－１に示すオブジェクトに基づいて）に対してジオメトリ事前テストを実行するように構成されており、シェーダー１は、オブジェクト１に対してジオメトリ事前テストを実行するように構成されている。セクション２は、コマンドバッファ１３００Ａを通る第２の横断を示す。前述したように暗黙的に設定されるＧＰＵ状態、ならびにコマンド１３１０及び１３１２によって構成されるＧＰＵ状態に基づいて、対応するシェーダーがレンダリングを実行する。たとえば、シェーダー０は次にオブジェクト０のレンダリングを実行するように構成され、シェーダー１は次にオブジェクト１のレンダリングを実行するように構成されている。 Geometry pre-testing and rendering may be interleaved for different sets of pieces of geometry. By way of example only, command buffer 1300A may be configured to perform geometry pre-testing and rendering of objects 0 and 1 first, and then command buffer 1300A may perform geometry pretesting and rendering of objects 2 and 3 second. Configured to perform geometry pre-testing and rendering. Of course, different numbers of pieces of geometry may be interleaved in different sections. For example, section 1 shows the first traversal through command buffer 1300A. Based on the GPU state implicitly set as described above, as well as the GPU state configured by commands 1310 and 1312, the corresponding shaders perform geometry pretests. For example, shader 0 is configured to perform a geometry pretest on object 0 (eg, a piece of geometry) (eg, based on the object shown in FIG. 7B-1), and shader 1 is configured to perform a geometry pre-test on object 1 is configured to perform a geometry pre-test. Section 2 shows a second traversal through command buffer 1300A. Based on the GPU state implicitly set as described above, as well as the GPU state configured by commands 1310 and 1312, the corresponding shader performs rendering. For example, shader 0 is configured to render object 0 next, and shader 1 is configured to render object 1 next.

図１３Ａに、ジオメトリの異なる組のピースに対してジオメトリ事前テスト及びレンダリングの実行をインターリーブすることを示す。詳細には、セクション３は、コマンドバッファ１３００Ａを通る第３の部分的な横断を示す。前述したように暗黙的に設定されるＧＰＵ状態、ならびにコマンド１３１４及び１３１６によって構成されるＧＰＵ状態に基づいて、対応するシェーダーがジオメトリ事前テストを実行する。たとえば、シェーダー２（コマンド１３１５を介して実行される）は、オブジェクト２（たとえば、ジオメトリのピース）（たとえば図７Ｂ－１に示すオブジェクトに基づいて）に対してジオメトリテストを実行し、シェーダー３（コマンド１３１７を介して実行される）は、オブジェクト３に対してジオメトリテストを実行する。セクション４は、コマンドバッファ１３００Ａを通る第４の部分的な横断を示す。前述したように暗黙的に設定されるＧＰＵ状態、ならびにコマンド１３１４及び１３１６によって構成されるＧＰＵ状態に基づいて、対応するシェーダーがレンダリングを実行する。たとえば、シェーダー２（コマンド１３１５を介して実行される）はオブジェクト２のレンダリングを実行し、シェーダー３（コマンド１３１７を介して実行される）はオブジェクト３のレンダリングを実行する。 FIG. 13A illustrates interleaving geometry pretest and rendering runs for different sets of pieces of geometry. Specifically, Section 3 shows a third partial traversal through command buffer 1300A. Based on the GPU state implicitly set as described above, as well as the GPU state configured by commands 1314 and 1316, the corresponding shader performs the geometry pre-test. For example, shader 2 (executed via command 1315) performs a geometry test on object 2 (eg, a piece of geometry) (eg, based on the object shown in FIG. 7B-1) and shader 3 (executed via command 1315) command 1317) performs a geometry test on object 3. Section 4 shows a fourth partial traversal through command buffer 1300A. Based on the GPU state implicitly set as described above, as well as the GPU state configured by commands 1314 and 1316, the corresponding shader performs rendering. For example, shader 2 (executed via command 1315) renders object 2 and shader 3 (executed via command 1317) renders object 3.

なお、ハードウェアコンテキストは保持されているか、または記録（あるいはセーブ）及び読み出し（あるいはリストア）がなされることに留意されたい。たとえば、セクション１の終わりにおけるジオメトリ事前テストＧＰＵコンテキストは、ジオメトリ事前テストを行うためにセクション３の始まりにおいて必要である。また、セクション２の終わりにおけるレンダリングＧＰＵコンテキストは、レンダリングを行うためにセクション４の始まりに対して必要である。 Note that the hardware context is preserved or recorded (or saved) and read (or restored). For example, the Geometry Pretest GPU Context at the end of Section 1 is needed at the beginning of Section 3 to do the Geometry Pretest. Also, the rendering GPU context at the end of section 2 is required for the beginning of section 4 to do the rendering.

一実施形態では、ＧＰＵ状態に基づいて、コマンドをスキップするかまたは異なって解釈してもよい。たとえば、状態を設定する特定のコマンド（１３１０、１３１２、１３１４、及び１３１６の部分）を、前述したように暗黙的に設定されるＧＰＵ状態に基づいてスキップしてもよい。たとえば、コマンド１３１０を介して実行されるシェーダー０を構成するときに、ジオメトリテストのために構成する必要があるＧＰＵ状態が、ジオメトリのレンダリングのために構成するときよりも少ない場合、ＧＰＵ状態の不要な部分の設定をスキップすることが有用であり得る。なぜならば、ＧＰＵ状態を設定するとオーバーヘッドを持ち得るからである。別の例を示すために、状態を設定する特定のコマンド（１３１０、１３１２、１３１４、及び１３１６の部分）を、前述したように暗黙的に設定されるＧＰＵ状態に基づいて異なって解釈してもよい。たとえば、コマンド１３１０を介して実行されるシェーダー０が、ジオメトリのレンダリングのために構成される場合とは異なるＧＰＵ状態を、ジオメトリの事前テストのために構成する必要がある場合、またはコマンド１３１０を介して実行されるシェーダー０が、ジオメトリテストとジオメトリのレンダリングとの場合で異なる入力を必要とする場合である。 In one embodiment, commands may be skipped or interpreted differently based on GPU state. For example, certain commands that set state (parts 1310, 1312, 1314, and 1316) may be skipped based on the GPU state implicitly set as described above. For example, if configuring shader 0, which is executed via command 1310, requires less GPU state to be configured for geometry testing than configuring for rendering geometry, the need for GPU state It may be useful to skip the configuration of parts that are This is because setting the GPU state can have overhead. To give another example, certain commands that set state (parts 1310, 1312, 1314, and 1316) may be interpreted differently based on the GPU state implicitly set as described above. good. For example, if shader 0 executed via command 1310 needs to configure a different GPU state for geometry pre-testing than it is configured for rendering geometry, or Shader 0, which runs in , requires different inputs for geometry testing and geometry rendering.

一実施形態では、ジオメトリ事前テストのために構成されたシェーダーは、前述したように、位置及びパラメータキャッシュにスペースを割り当てない。別の実施形態では、単一のシェーダーを用いて事前テストまたはレンダリングのいずれかを実行する。これは種々の方法で行うことができる。たとえば、シェーダーがチェックすることができる外部のハードウェア状態を介して（たとえば、前述したように暗黙的に設定されるように）、またはシェーダーに対する入力を介して（たとえば、コマンドバッファを通る第１及び第２のパスにおいて異なって解釈されるコマンドによって設定されるように）行うことができる。 In one embodiment, shaders configured for geometry pretest do not allocate space for position and parameter caches, as described above. In another embodiment, either pre-testing or rendering is performed using a single shader. This can be done in various ways. For example, via an external hardware state that the shader can check (e.g., as implicitly set as described above), or via an input to the shader (e.g., the first and set by commands interpreted differently in the second pass).

図１３Ｂは、本開示の一実施形態により、ジオメトリの異なる組のピースに対する画像フレームのジオメトリの事前テスト及びレンダリングを、対応するコマンドバッファの別個の部分を用いてインターリーブすることを含むグラフィックス処理を行うための方法を例示するフロー図である。前述したように、種々のアーキテクチャには、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことによって複数のＧＰＵが連携して単一画像をレンダリングすることが含まれていてもよい。たとえば、クラウドゲーミングシステムの１つ以上のクラウドゲーミングサーバ内において、またはスタンドアロンシステム（たとえば、パーソナルコンピュータまたはゲーミングコンソールであって、複数のＧＰＵを有するハイエンドグラフィックスカードを含むもの）内において等である。 FIG. 13B illustrates graphics processing including interleaving pre-testing and rendering of an image frame's geometry for different sets of pieces of geometry with separate portions of the corresponding command buffers, according to one embodiment of the present disclosure. FIG. 4 is a flow diagram illustrating a method for doing; As noted above, various architectures may include multiple GPUs working together to render a single image by performing multi-GPU rendering of geometry for an application. For example, within one or more cloud gaming servers of a cloud gaming system, or within a standalone system (eg, a personal computer or gaming console that includes a high-end graphics card with multiple GPUs), and the like.

詳細には、１３１０において、本方法は、前述したように、複数のＧＰＵを用いてアプリケーションに対するグラフィックスをレンダリングすることを含む。１３２０において、本方法は、グラフィックスのジオメトリをレンダリングするためのレスポンシビリティを、複数のスクリーン領域に基づいて複数のＧＰＵに分割することを含む。各ＧＰＵは、複数のＧＰＵに知られたレスポンシビリティの対応するディビジョンを有する。より具体的には、前述したように、ＧＰＵはそれぞれ、複数のスクリーン領域のうちの対応する組のスクリーン領域内のジオメトリをレンダリングすることにレスポンシビリティを有している。対応する組のスクリーン領域には、１つ以上のスクリーン領域が含まれる。一実施形態では、スクリーン領域はインターリーブされる（たとえばディスプレイが、ジオメトリ事前テスト及びレンダリングのためにスクリーン領域の組に分割されるときに）。 Specifically, at 1310, the method includes rendering graphics for the application using multiple GPUs, as previously described. At 1320, the method includes dividing the responsiveness for rendering geometry of the graphics to multiple GPUs based on multiple screen regions. Each GPU has a corresponding division of responsiveness known to multiple GPUs. More specifically, as noted above, each GPU has a responsibility to render geometry within a corresponding set of screen regions of a plurality of screen regions. A corresponding set of screen regions includes one or more screen regions. In one embodiment, the screen regions are interleaved (eg, when the display is divided into sets of screen regions for geometry pre-testing and rendering).

１３３０において、本方法は、画像フレームのジオメトリの複数のピースを、ジオメトリテストのために複数のＧＰＵに割り当てることを含む。詳細には、複数のＧＰＵのそれぞれを、ジオメトリテストを行う目的で、画像フレームに対応付けられるジオメトリの対応部分に割り当てる。前述したように、ジオメトリのピースの割り当てを均一または不均一に分配してもよい。各部分には、ジオメトリの１つ以上のピースが含まれているか、または潜在的にジオメトリのピースが全く含まれていない。 At 1330, the method includes assigning multiple pieces of geometry for the image frame to multiple GPUs for geometry testing. Specifically, each of the multiple GPUs is assigned a corresponding portion of the geometry associated with the image frame for the purpose of geometry testing. As previously mentioned, the allocation of pieces of geometry may be evenly or unevenly distributed. Each portion contains one or more pieces of geometry, or potentially no pieces of geometry at all.

１３４０において、本方法は、コマンドバッファ内の第１の組のシェーダーを第２の組のシェーダーとインターリーブすることを含む。シェーダーは、ジオメトリ事前テスト及びレンダリングの両方を実行するように構成されている。詳細には、第１の組のシェーダーは、ジオメトリの第１の組のピースに対してジオメトリ事前テスト及びレンダリングを実行するように構成されている。その後、第２の組のシェーダーは、ジオメトリの第２の組のピースに対してジオメトリ事前テスト及びレンダリングを実行するように構成されている。前述したように、ジオメトリ事前テストは、第１の組または第２の組内のジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する対応する情報を生成する。対応する情報を複数のＧＰＵが用いて、第１の組または第２の組内のジオメトリの各ピースをレンダリングする。前述したように、ＧＰＵ状態を、ジオメトリ事前テストまたはレンダリングのいずれかを実行するために種々の方法で設定してもよい。たとえば、ＣＰＵまたはＧＰＵはランダムアクセスメモリ（ＲＡＭ）内に値を設定することができる。ＧＰＵはＲＡＭ内の値をチェックする。別の例では、状態はＧＰＵの内部である可能性がある。たとえば、コマンドバッファをサブルーチンとして２回呼び出して、内部のＧＰＵ状態が２つのサブルーチン呼び出しの間で異なるときである。 At 1340, the method includes interleaving the first set of shaders in the command buffer with the second set of shaders. Shaders are configured to perform both geometry pre-testing and rendering. Specifically, the first set of shaders is configured to perform geometry pre-testing and rendering on the first set of pieces of geometry. A second set of shaders is then configured to perform geometry pre-testing and rendering on the second set of pieces of geometry. As previously described, the geometry pretest generates corresponding information about each piece of geometry in the first set or the second set and its relationship to each of the plurality of screen regions. Multiple GPUs use the corresponding information to render each piece of geometry in the first set or the second set. As previously mentioned, the GPU state may be set in various ways to perform either geometry pre-testing or rendering. For example, the CPU or GPU can set values in random access memory (RAM). The GPU checks the values in RAM. In another example, the state may be internal to the GPU. For example, when calling a command buffer as a subroutine twice and the internal GPU state is different between the two subroutine calls.

インターリーブプロセスについてさらに説明する。詳細には、前述したように、コマンドバッファの第１の組のシェーダーは、ジオメトリの第１の組のピースに対してジオメトリ事前テストを実行するように構成されている。ジオメトリ事前テストを複数のＧＰＵにおいてジオメトリの第１の組のピースに対して行って、第１の組におけるジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する第１の情報を生成する。そして、前述したように、第１の組のシェーダーは、ジオメトリの第１の組のピースのレンダリングを実行するように構成されている。その後、第１の情報は、複数のＧＰＵのそれぞれにおいてジオメトリの複数のピースをレンダリングするときに用いる（たとえば、対応するＧＰＵにおいてジオメトリの第１の組のピースを完全にレンダリングすることまたはジオメトリの第１の組のピースのレンダリングをスキップすることを含むために）。前述したように、情報は、ジオメトリのどのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられたスクリーン領域とオーバーラップするかを示す。たとえば、情報を用いてＧＰＵにおけるジオメトリのピースのレンダリングをスキップすることを、ジオメトリのピースが、オブジェクトレンダリングのためにＧＰＵに割り当てられたどのスクリーン領域（たとえば、対応する組における）ともオーバーラップしないとその情報が示すときに行ってもよい。 The interleaving process is further explained. Specifically, as described above, the first set of shaders in the command buffer are configured to perform geometry pretests on the first set of pieces of geometry. Geometry pre-testing is performed on a first set of pieces of geometry on multiple GPUs to generate first information about each piece of geometry in the first set and its relationship to each of a plurality of screen regions. . And, as described above, the first set of shaders is configured to perform rendering of the first set of pieces of geometry. The first information is then used in rendering multiple pieces of geometry on each of the multiple GPUs (e.g., fully rendering the first set of pieces of geometry on the corresponding GPU or (to include skipping the rendering of a set of pieces). As previously mentioned, the information indicates which pieces of geometry overlap the screen area allocated to the corresponding GPU for object rendering. For example, using information to skip rendering a piece of geometry on the GPU if the piece of geometry does not overlap any screen regions (e.g., in the corresponding set) allocated to the GPU for object rendering. You may go when the information indicates.

そして、第２の組のシェーダーを、ジオメトリの第２の組のピースのジオメトリテスト及びレンダリングのために用いる。詳細には、前述したように、コマンドバッファの第２の組のシェーダーは、ジオメトリの第２の組のピースに対してジオメトリ事前テストを実行するように構成されている。そして、ジオメトリテストを複数のＧＰＵにおいてジオメトリの第２の組のピースに対して行って、第２の組におけるジオメトリの各ピースと複数のスクリーン領域のそれぞれに対するその関係とに関する第２の情報を生成する。そして、前述したように、第２の組のシェーダーはジオメトリの第２の組のピースのレンダリングを実行するように構成されている。その後、ジオメトリの第２の組のピースのレンダリングを複数のＧＰＵのそれぞれにおいて第２の情報を用いて行う。前述したように、情報は、ジオメトリのどのピースが、オブジェクトレンダリングのために対応するＧＰＵに割り当てられたスクリーン領域（たとえば、対応する組の）とオーバーラップするかを示す。 A second set of shaders is then used for geometry testing and rendering of a second set of pieces of geometry. Specifically, as described above, the second set of shaders in the command buffer are configured to perform geometry pretests on the second set of pieces of geometry. Geometry tests are then performed on the second set of pieces of geometry on the plurality of GPUs to generate second information about each piece of geometry in the second set and its relationship to each of the plurality of screen regions. do. And, as described above, the second set of shaders is configured to perform the rendering of the second set of pieces of geometry. A second set of pieces of geometry is then rendered using the second information on each of the plurality of GPUs. As described above, the information indicates which pieces of geometry overlap the screen regions (eg, corresponding sets) allocated to the corresponding GPU for object rendering.

前述では、複数のＧＰＵがジオメトリをロックステップで処理する（すなわち、複数のＧＰＵがジオメトリ事前テストを実行し、そして複数のＧＰＵがレンダリングを実行する）と説明しているが、いくつかの実施形態では、ＧＰＵは互いに明示的に同期してはいない。たとえば、あるＧＰＵがジオメトリの第１の組のピースをレンダリングしている間に、第２のＧＰＵがジオメトリの第２の組のピースに対してジオメトリ事前テストを行っていてもよい。 Although the foregoing describes multiple GPUs processing geometry in lockstep (i.e., multiple GPUs performing geometry pre-testing and multiple GPUs performing rendering), some embodiments , the GPUs are not explicitly synchronized with each other. For example, while one GPU is rendering a first set of pieces of geometry, a second GPU may be performing geometry pretests on a second set of pieces of geometry.

図１４に、本開示の種々の実施形態の態様を実行するために使用できるデバイス例１４００のコンポーネントを例示する。たとえば、図１４では、本開示の実施形態により、画像フレームに対するオブジェクトをレンダリングする前に、スクリーン領域（インターリーブされ得る）に対してジオメトリの事前テストを行うことによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うのに適した典型的なハードウェアシステムを例示する。このブロック図で例示するデバイス１４００は、パーソナルコンピュータ、サーバコンピュータ、ゲーミングコンソール、モバイルデバイス、または他のデジタルデバイス（それぞれ、本発明の実施形態を実行するのに適している）を組み込むことができるかまたはそれらであることができる。デバイス１４００は、ソフトウェアアプリケーション及び随意的にオペレーティングシステムを実行するための中央処理ユニット（ＣＰＵ）１４０２を含んでいる。ＣＰＵ１４０２は、１つ以上の同種または異種の処理コアから構成され得る。 FIG. 14 illustrates components of an example device 1400 that can be used to implement aspects of various embodiments of this disclosure. For example, in FIG. 14, embodiments of the present disclosure enable multi-GPU rendering of geometry for applications by pre-testing geometry for screen regions (which may be interleaved) before rendering objects for image frames. A typical hardware system suitable for implementation is illustrated. Can the device 1400 illustrated in this block diagram incorporate a personal computer, server computer, gaming console, mobile device, or other digital device, each suitable for practicing embodiments of the present invention? Or they can be. Device 1400 includes a central processing unit (CPU) 1402 for executing software applications and, optionally, an operating system. CPU 1402 may be composed of one or more homogeneous or heterogeneous processing cores.

種々の実施形態により、ＣＰＵ１４０２は１つ以上の処理コアを有する１つ以上の汎用マイクロプロセッサである。さらなる実施形態を、ゲームの実行中にグラフィックス処理を行うように構成されたアプリケーションの高並列で計算集約型のアプリケーション（たとえば、媒体及びインタラクティブエンターテインメントアプリケーション）に具体的に適応されたマイクロプロセッサアーキテクチャを伴う１つ以上のＣＰＵを用いて実施することができる。 According to various embodiments, CPU 1402 is one or more general purpose microprocessors having one or more processing cores. A further embodiment provides a microprocessor architecture specifically adapted for highly parallel, computationally intensive applications (e.g., media and interactive entertainment applications) in applications configured to perform graphics processing while a game is running. It can be implemented with one or more CPUs associated with it.

メモリ１４０４は、ＣＰＵ１４０２及びＧＰＵ１４１６が用いるアプリケーション及びデータを記憶する。記憶装置１４０６は、アプリケーション及びデータ用の不揮発性記憶装置及び他のコンピュータ可読媒体であり、固定ディスクドライブ、リムーバブルディスクドライブ、フラッシュメモリ装置、及びＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ブルーレイ、ＨＤ－ＤＶＤ、または他の光学記憶装置、ならびに信号伝送及び記憶媒体を含んでいてもよい。ユーザ入力デバイス１４０８は、１人以上のユーザからのユーザ入力をデバイス１４００に伝達する。その例としては、キーボード、マウス、ジョイスティック、タッチパッド、タッチスクリーン、スチールまたはビデオレコーダ／カメラ、及び／またはマイクロフォンを挙げてもよい。ネットワークインターフェース１４０９によって、デバイス１４００は、電子通信ネットワークを介して他のコンピュータシステムと通信することができる。ネットワークインターフェース１４０９としては、ローカルエリアネットワーク及びワイドエリアネットワーク（たとえば、インターネット）を介した有線または無線通信を挙げてもよい。オーディオプロセッサ１４１２は、ＣＰＵ１４０２、メモリ１４０４、及び／または記憶装置１４０６が提供する命令及び／またはデータからアナログまたはデジタルオーディオ出力を生成するように適応されている。デバイス１４００のコンポーネント（たとえば、ＣＰＵ１４０２、グラフィックスサブシステム、たとえば、ＧＰＵ１４１６、メモリ１４０４、データ記憶装置１４０６、ユーザ入力デバイス１４０８、ネットワークインターフェース１４０９、及びオーディオプロセッサ１４１２）は、１つ以上のデータバス１４２２を介して接続されている。 Memory 1404 stores applications and data used by CPU 1402 and GPU 1416 . Storage 1406 is non-volatile storage and other computer-readable media for applications and data, including fixed disk drives, removable disk drives, flash memory devices, and CD-ROMs, DVD-ROMs, Blu-rays, HD-DVDs, or other optical storage devices and signal transmission and storage media. User input device 1408 communicates user input from one or more users to device 1400 . Examples may include keyboards, mice, joysticks, touch pads, touch screens, still or video recorders/cameras, and/or microphones. Network interface 1409 allows device 1400 to communicate with other computer systems through electronic communications networks. Network interface 1409 may include wired or wireless communications via local-area networks and wide-area networks (eg, the Internet). Audio processor 1412 is adapted to generate analog or digital audio output from instructions and/or data provided by CPU 1402, memory 1404, and/or storage device 1406. Components of device 1400 (eg, CPU 1402, graphics subsystem, eg, GPU 1416, memory 1404, data storage 1406, user input device 1408, network interface 1409, and audio processor 1412) communicate one or more data buses 1422. connected through

グラフィックスサブシステム１４１４がさらに、データバス１４２２及びデバイス１４００のコンポーネントと接続されている。グラフィックスサブシステム１４１４は、少なくとも１つのグラフィックス処理ユニット（ＧＰＵ）１４１６及びグラフィックスメモリ１４１８を含んでいる。グラフィックスメモリ１４１８は、出力画像の各ピクセルに対するピクセルデータを記憶するために用いるディスプレイメモリ（たとえば、フレームバッファ）を含んでいる。グラフィックスメモリ１４１８は、ＧＰＵ１４１６と同じデバイスに統合すること、個別のデバイスとしてＧＰＵ１４１６と接続すること、及び／またはメモリ１４０４内に実装することができる。ピクセルデータをＣＰＵ１４０２から直接グラフィックスメモリ１４１８に提供することができる。代替的に、ＣＰＵ１４０２は、ＧＰＵ１４１６に、所望の出力画像を規定するデータ及び／または命令を提供する。所望の出力画像から、ＧＰＵ１４１６が１つ以上の出力画像のピクセルデータを生成する。所望の出力画像を規定するデータ及び／または命令を、メモリ１４０４及び／またはグラフィックスメモリ１４１８に記憶することができる。一実施形態では、ＧＰＵ１４１６は、シーンに対するジオメトリ、照明、シェーディング、模様付け、動き、及び／またはカメラパラメータを規定する命令及びデータから、出力画像用のピクセルデータを生成するための３Ｄレンダリング能力を含む。ＧＰＵ１４１６はさらに、シェーダープログラムを実行することができる１つ以上のプログラマブル実行ユニットを含むことができる。 Graphics subsystem 1414 is also coupled to data bus 1422 and the components of device 1400 . Graphics subsystem 1414 includes at least one graphics processing unit (GPU) 1416 and graphics memory 1418 . Graphics memory 1418 includes display memory (eg, frame buffers) used to store pixel data for each pixel of the output image. Graphics memory 1418 may be integrated in the same device as GPU 1416 , connected to GPU 1416 as a separate device, and/or implemented within memory 1404 . Pixel data can be provided directly from CPU 1402 to graphics memory 1418 . Alternatively, CPU 1402 provides GPU 1416 with data and/or instructions that define the desired output image. From the desired output image, GPU 1416 generates pixel data for one or more output images. Data and/or instructions defining a desired output image can be stored in memory 1404 and/or graphics memory 1418 . In one embodiment, GPU 1416 includes 3D rendering capabilities for generating pixel data for output images from instructions and data defining geometry, lighting, shading, textures, motion, and/or camera parameters for a scene. . GPU 1416 may further include one or more programmable execution units capable of executing shader programs.

グラフィックスサブシステム１４１４は、グラフィックスメモリ１４１８から、ディスプレイデバイス１４１０に表示すべき、または投影システム（図示せず）によって投影すべき画像用のピクセルデータを定期的に出力する。ディスプレイデバイス１４１０は、デバイス１４００からの信号に応じて視覚情報を表示することができる任意のデバイスとすることができる。たとえば、ＣＲＴ、ＬＣＤ、プラズマ、及びＯＬＥＤディスプレイである。デバイス１４００は、ディスプレイデバイス１４１０に、たとえば、アナログまたはデジタル信号を提供することができる。 Graphics subsystem 1414 periodically outputs pixel data from graphics memory 1418 for images to be displayed on display device 1410 or projected by a projection system (not shown). Display device 1410 can be any device capable of displaying visual information in response to signals from device 1400 . Examples are CRT, LCD, Plasma, and OLED displays. Device 1400 can provide, for example, analog or digital signals to display device 1410 .

グラフィックスサブシステム１４１４を最適化するための他の実施形態には、画像フレームに対するオブジェクトをレンダリングする前にスクリーン領域（インターリーブされ得る）に対してジオメトリを事前テストすることによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うことを、含むことができる。グラフィックスサブシステム１４１４を、１つ以上の処理デバイスとして構成することができる。 Another embodiment for optimizing the graphics subsystem 1414 is to pre-test the geometry against screen regions (which may be interleaved) before rendering the object for an image frame, thereby optimizing the geometry for the application. Doing GPU rendering can be included. Graphics subsystem 1414 may be configured as one or more processing devices.

たとえば、一実施形態では、グラフィックスサブシステム１４１４を、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを行うように構成してもよい。複数のグラフィックスサブシステムが、グラフィックスを実行していることができ、及び／または単一アプリケーションに対するパイプラインをレンダリングしていることができる。すなわち、グラフィックスサブシステム１４１４は、アプリケーションを実行するときに画像または画像列の１つ以上の各画像をレンダリングするために用いる複数のＧＰＵを含んでいる。 For example, in one embodiment, graphics subsystem 1414 may be configured to perform multi-GPU rendering of geometry for applications. Multiple graphics subsystems may be performing graphics and/or rendering pipelines for a single application. That is, graphics subsystem 1414 includes multiple GPUs that are used to render one or more respective images of an image or sequence of images when executing an application.

他の実施形態では、グラフィックスサブシステム１４１４は複数のＧＰＵデバイスを含んでいる。これらは、対応するＣＰＵ上で実行されている単一アプリケーションに対するグラフィックス処理を実行するために結合される。たとえば、複数のＧＰＵは、画像フレームに対するオブジェクトをレンダリングする前にスクリーン領域（インターリーブされ得る）に対してジオメトリを事前テストすることによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングを実行することができる。他の例では、複数のＧＰＵは交互形式のフレームレンダリングを実行することができる。ここでは、ＧＰＵ１が第１のフレームをレンダリングし、ＧＰＵ２が第２のフレームをレンダリングして、これを連続的なフレーム周期で行うことなどを、最後のＧＰＵに達するまで続ける。その上で、最初のＧＰＵが次のビデオフレームをレンダリングする（たとえば、２つのＧＰＵのみが存在する場合には、ＧＰＵ１が第３のフレームをレンダリングする）。すなわち、フレームをレンダリングするときにＧＰＵが回転する。レンダリング動作はオーバーラップすることができる。ＧＰＵ１が第１のフレームのレンダリングを終了する前に、ＧＰＵ２が第２のフレームのレンダリングを開始してもよい。別の実施態様では、複数のＧＰＵデバイスに、レンダリング及び／またはグラフィックスパイプラインにおいて異なるシェーダー動作を割り当てることができる。マスタＧＰＵが主なレンダリング及び合成を行っている。たとえば、３つのＧＰＵを含むグループでは、マスタＧＰＵ１が、スレーブＧＰＵ２及びスレーブＧＰＵ３からの出力の主なレンダリング（たとえば、第１のシェーダー動作）及び合成を実行することができる。スレーブＧＰＵ２は、第２のシェーダー（たとえば、河などの流体効果）動作を実行することができ、スレーブＧＰＵ３は、第３のシェーダー（たとえば、粒子煙）動作を実行することができる。マスタＧＰＵ１は、ＧＰＵ１、ＧＰＵ２、及びＧＰＵ３のそれぞれからの結果を合成する。このように、異なるシェーダー動作（たとえば、旗を振ること、風、発煙、火災など）を実行するために異なるＧＰＵを割り当てて、ビデオフレームをレンダリングすることができる。さらなる他の実施形態では、３つのＧＰＵをそれぞれ、ビデオフレームに対応するシーンの異なるオブジェクト及び／または一部分に割り当てることができる。前述の実施形態及び実施態様では、これらの動作を同じフレーム周期（同時に並列）でまたは異なるフレーム周期（順次に並列）で行うことができる。 In other embodiments, graphics subsystem 1414 includes multiple GPU devices. These are combined to perform graphics processing for a single application running on the corresponding CPU. For example, multiple GPUs can perform multi-GPU rendering of geometry for an application by pre-testing the geometry against screen regions (which may be interleaved) before rendering the object for an image frame. In another example, multiple GPUs can perform alternating forms of frame rendering. Here, GPU1 renders the first frame, GPU2 renders the second frame, and so on in successive frame cycles, until the last GPU is reached. The first GPU then renders the next video frame (eg, GPU1 renders the third frame if there are only two GPUs). That is, the GPU rotates when rendering a frame. Rendering operations can overlap. GPU2 may start rendering a second frame before GPU1 finishes rendering the first frame. In another implementation, multiple GPU devices may be assigned different shader operations in the rendering and/or graphics pipeline. The master GPU does the main rendering and compositing. For example, in a group containing three GPUs, master GPU1 may perform main rendering (eg, first shader operations) and compositing of output from slave GPU2 and slave GPU3. Slave GPU2 may perform a second shader (eg, fluid effects such as river) operation, and slave GPU3 may perform a third shader (eg, particle smoke) operation. Master GPU1 combines the results from each of GPU1, GPU2, and GPU3. In this way, different GPUs can be assigned to perform different shader operations (eg, waving flags, wind, smoke, fire, etc.) to render a video frame. In still other embodiments, each of the three GPUs can be assigned to a different object and/or portion of the scene corresponding to the video frame. In the embodiments and implementations described above, these operations can be performed in the same frame period (concurrently in parallel) or in different frame periods (sequentially in parallel).

したがって、本開示では、アプリケーションを実行するときに画像フレームまたは画像フレーム列における１つ以上の各画像フレームに対するオブジェクトのレンダリングを行う前にスクリーン領域（インターリーブされ得る）に対してジオメトリを事前テストすることによって、アプリケーションに対するジオメトリのマルチＧＰＵレンダリングするように構成された方法及びシステムについて説明している。 Thus, the present disclosure pretests geometry against screen regions (which may be interleaved) prior to rendering objects for each of one or more image frames in an image frame or sequence of image frames when executing an application. describes a method and system configured for multi-GPU rendering of geometry for applications.

当然のことながら、本明細書で規定した種々の実施形態を、本明細書で開示した種々の特徴を用いて具体的な実施に結合するかまたは組み立ててもよい。したがって、提供した例は単にいくつかの可能な例であり、種々の要素を結合してさらに多くの実施態様を規定することによって可能な種々の実施態様に限定されない。いくつかの例では、開示した実施態様または同等な実施態様の趣旨から逸脱することなく、いくつかの実施態様にはさらに少ない要素が含まれていてもよい。 It will be appreciated that various embodiments defined herein may be combined or assembled into specific implementations using various features disclosed herein. Thus, the examples provided are merely some possible examples and are not limited to the various possible implementations by combining various elements to define even more implementations. In some instances, some implementations may include fewer elements without departing from the spirit of the disclosed implementations or equivalent implementations.

本開示の実施形態は、種々のコンピュータシステム構成（たとえば、ハンドヘルドデバイス、マイクロプロセッサシステム、マイクロプロセッサベースまたはプログラマブル民生用エレクトロニクス、ミニコンピュータ、メインフレームコンピュータなど）によって実行してもよい。また本開示の実施形態は、分散コンピューティング環境において実行することもできる。ここでは、タスクが、有線ベースまたは無線ネットワークを通してリンクされたリモート処理デバイスによって行われる。 Embodiments of the present disclosure may be practiced with various computer system configurations (eg, handheld devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, etc.). Embodiments of the present disclosure may also be practiced in distributed computing environments. Here, tasks are performed by remote processing devices linked through wire-based or wireless networks.

前述の実施形態を念頭において、当然のことながら、本開示の実施形態は、コンピュータシステムに記憶されたデータを伴う種々のコンピュータ実装動作を用いることができる。これらの動作は、物理量の物理的な操作を必要とするものである。本開示の実施形態の一部分を構成する本明細書で説明した動作のいずれも、有用なマシン動作である。また本開示の実施形態は、これらの動作を行うためのデバイスまたは装置に関する。装置は必要な目的に対して特別に構成することもできるし、または装置を、コンピュータに記憶されたコンピュータプログラムによって選択的に作動または構成される汎用コンピュータとすることもできる。詳細には、種々の汎用マシンを本明細書の教示により書き込まれたコンピュータプログラムによって用いることもできるし、または必要な動作を実行するために、さらに特化された装置を構成することがより好都合であり得る。 With the foregoing embodiments in mind, it should be understood that embodiments of the disclosure may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulations of physical quantities. Any of the operations described herein that form part of embodiments of the present disclosure are useful machine operations. Embodiments of the present disclosure also relate to devices or apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus to perform the required operations. can be

また本開示を、コンピュータ可読媒体上のコンピュータ可読コードとして具体化することができる。コンピュータ可読媒体は、データを記憶することができる任意のデータ記憶装置とすることができる。データはその後にコンピュータシステムによって読み出すことができる。コンピュータ可読媒体の例としては、ハードドライブ、ネットワーク接続ストレージ（ＮＡＳ）、読み出し専用メモリ、ランダムアクセスメモリ、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＣＤ－ＲＷ、磁気テープ、ならびに他の光学及び非光学データ記憶装置が挙げられる。コンピュータ可読媒体としては、コンピュータ可読コードが分散的に記憶及び実行されるようにネットワーク結合コンピュータシステム上に分散されたコンピュータ可読有形的表現媒体を挙げることができる。 The present disclosure can also be embodied as computer readable code on a computer readable medium. A computer-readable medium can be any data storage device that can store data. The data can then be read by a computer system. Examples of computer-readable media include hard drives, network-attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage. apparatus. The computer readable medium can include computer readable tangible media distributed over network coupled computer systems so that computer readable code is stored and executed in a distributed fashion.

本方法の動作を特定の順序で説明したが、当然のことながら、動作の合間に他のハウスキーピング動作を行ってもよいし、または動作を調整してわずかに異なる時間に行われるようにしてもよいし、またはオーバーレイ動作の処理が所望の方法で行われる限り、処理動作を処理に対応付けられる種々の間隔で可能にするシステムにおいて分散させるようにしてもよい。 Although the operations of the method have been described in a particular order, it should be appreciated that other housekeeping operations may be performed in between the operations, or the operations may be arranged to occur at slightly different times. Alternatively, it may be distributed in a system that allows processing operations at various intervals associated with the processing, so long as the processing of the overlay operations is performed in the desired manner.

前述の開示内容は、理解を明瞭にするために少し詳しく説明しているが、添付の特許請求の範囲内で特定の変形及び変更を実施できることが明らかである。したがって、本実施形態は例示的であって限定的ではないと考えるべきであり、本開示の実施形態は、本明細書で示した詳細に限定してはならないが、添付の特許請求の範囲及び均等物内で変更してもよい。 Although the foregoing disclosure has been described in some detail for clarity of understanding, it will be apparent that certain variations and modifications may be practiced within the scope of the appended claims. Accordingly, the embodiments are to be considered illustrative and not limiting, and the embodiments of the present disclosure should not be limited to the details shown herein, but the scope of the appended claims and May vary within equivalents.

Claims

A method for performing graphics processing, comprising:
rendering graphics for an application using multiple graphics processing units (GPUs);
dividing the responsiveness for the rendering of the graphics geometry among the plurality of GPUs based on multiple screen regions, each GPU having a corresponding division of the responsiveness known to the plurality of GPUs; , the screen regions in the plurality of screen regions are interleaved;
assigning a GPU a piece of geometry of an image frame generated by said application for geometry pre-testing;
performing the geometry pretest on the GPU to generate information about the piece of geometry and its relationship to each of the plurality of screen regions;
using the information in each of the plurality of GPUs when rendering the image frame;
providing said information as a hint to a rendering GPU, said rendering GPU being one of said plurality of GPUs;
the information is considered by the rendering GPU if received before rendering of the piece of geometry begins in the rendering GPU;
A method, wherein said piece of geometry is fully rendered in said rendering GPU when said information is received after rendering of a first piece of geometry has begun.

skipping rendering of the piece of geometry in the rendering GPU when the information indicates that the piece of geometry does not overlap any screen area allocated to the rendering GPU for object rendering; is one of the plurality of GPUs.

further assigning multiple pieces of geometry of the image frames to the multiple GPUs for the geometry pre-test;
2. The method of claim 1, wherein the pieces of geometry in the plurality of pieces of geometry are distributed uniformly or non-uniformly across the plurality of GPUs.

4. The method of claim 3 , allocating the plurality of pieces of geometry such that successive pieces of geometry are processed by different GPUs.

The first GPU performs the geometry pre-test on more pieces of geometry than the second GPU, or the first GPU performs the geometry pre-test and the second GPU performs the geometry pre-test. 5. The method of claim 4 , performing while not performing any .

2. The method of claim 1, wherein the multiple screen regions are configured to reduce rendering time imbalance between the multiple GPUs.

2. The method of claim 1, wherein each of the plurality of screen regions are non-uniform in size.

2. The method of claim 1, wherein the plurality of screen regions dynamically change.

2. The method of claim 1, wherein the piece of geometry corresponds to geometry that a draw call uses or generates.

2. The method of claim 1, wherein the application's draw calls use or generate geometry that is subdivided into multiple pieces of geometry, including the piece of geometry from which the GPU generates the information.

2. The method of claim 1, wherein the pieces of geometry are individual primitives.

2. The method of claim 1, wherein the information about the piece of geometry includes vertex counts or primitive counts.

2. The method of claim 1, wherein the information about the piece of geometry comprises a specific set of primitives for rendering or a specific set of vertices for rendering.

Further, using a common rendering command buffer for the plurality of GPUs,
2. The method of claim 1, restricting execution of commands in the common rendering command buffer to one or more of the plurality of GPUs.

2. The method of claim 1, wherein the information may or may not be generated depending on characteristics of the piece of geometry.

2. The method of claim 1, further comprising generating said information using a scan converter in a rasterization stage.

2. The method of claim 1, further comprising generating the information using one or more shaders in a geometry processing stage.

18. The method of claim 17 , wherein said one or more shaders employ one or more dedicated instructions to accelerate said generation of said information.

18. The method of claim 17 , wherein the one or more shaders do not perform allocations to position caches or parameter caches.

further dividing pieces of geometry among the plurality of GPUs for the geometry pre-test;
The method of claim 1, wherein successive pieces of geometry are processed by different GPUs.

21. The method of claim 20 , further dynamically adjusting divisions of the plurality of pieces of geometry based on respective performances of the plurality of GPUs during the geometry pre-test.

2. The method of claim 1, wherein one or more of the multiple GPUs are part of a larger GPU configured as multiple virtual GPUs.

a computer system,
having a processor;
a memory coupled to the processor and storing instructions which, when executed by the computer system, cause the computer system to perform a method for executing a graphics pipeline;
The method includes:
rendering graphics for an application using multiple graphics processing units (GPUs);
dividing the responsiveness for the rendering of the graphics geometry among the plurality of GPUs based on multiple screen regions, each GPU having a corresponding division of the responsiveness known to the plurality of GPUs; , the screen regions in the plurality of screen regions are interleaved;
assigning a GPU a piece of geometry of an image frame generated by said application for geometry pre-testing;
performing the geometry pre-test on the GPU to generate information about the piece of geometry and its relationship to each of the plurality of screen regions;
using the information in each of the plurality of GPUs when rendering the image frame;
providing said information as a hint to a rendering GPU, said rendering GPU being one of said plurality of GPUs;
the information is considered by the rendering GPU if received before rendering of the piece of geometry begins in the rendering GPU;
A computer system, wherein said piece of geometry is fully rendered in said rendering GPU when said information is received after rendering of a first piece of geometry has begun.

skipping rendering of the piece of geometry in the rendering GPU when the information indicates that the piece of geometry does not overlap any screen area allocated to the rendering GPU for object rendering; 24. The computer system of claim 23, wherein is one of said plurality of GPUs.