JP2022553844A

JP2022553844A - Generating Arbitrary Views

Info

Publication number: JP2022553844A
Application number: JP2022525977A
Authority: JP
Inventors: チュイ・クラレンス; パーマー・マヌ; シートン・ブルック・アーロン; ジェイン・ヒマーンシュ
Original assignee: Outward Inc
Current assignee: Outward Inc
Priority date: 2019-11-08
Filing date: 2020-11-05
Publication date: 2022-12-26
Also published as: WO2021092229A1; EP4055567A1; KR20220076514A; EP4055567A4

Abstract

【解決手段】アンサンブルシーンの任意ビューまたは任意パースペクティブを生成するための技術が開示されている。いくつかの実施形態において、複数のアセットを含むアンサンブルシーンの所定のパースペクティブの要求の受信に応答して、要求された所定のパースペクティブに対するアンサンブルシーンの出力画像が、複数のアセットの少なくとも一部の各々の既存画像の少なくとも一部を組み合わせることに少なくとも部分的に基づいて生成される。【選択図】図５A technique is disclosed for generating arbitrary views or perspectives of an ensemble scene. In some embodiments, in response to receiving a request for a given perspective of an ensemble scene including multiple assets, an output image of the ensemble scene for the requested given perspective is generated for each of at least a portion of the multiple assets. is generated based at least in part on combining at least a portion of the existing images of the . [Selection drawing] Fig. 5

Description

他の出願への相互参照
本願は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１８年１０月２５日出願の米国特許出願第１６／１７１，２２１号の一部継続出願であり、後者は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１７年９月２９日出願の米国特許出願第１５／７２１，４２１号（現在の米国特許第１０，１６３，２４９号）の継続出願であり、後者は、「ＡＲＢＩＴＲＡＲＹＶＩＥＷＧＥＮＥＲＡＴＩＯＮ」と題する２０１６年３月２５日出願の米国特許出願第１５／０８１，５５３号（現在の米国特許第９，９９６，９１４号）の一部継続出願であり、これらはすべて、すべての目的のために参照によって本明細書に組み込まれる。米国特許出願第１５／７２１，４２１号（現在の米国特許第１０，１６３，２４９号）は、さらに、「ＦＡＳＴＲＥＮＤＥＲＩＮＧＯＦＡＳＳＥＭＢＬＥＤＳＣＥＮＥＳ」と題する２０１７年８月４日出願の米国仮特許出願第６２／５４１，６０７号に基づく優先権を主張し、後者は、すべての目的のために参照によって本明細書に組み込まれる。 CROSS REFERENCE TO OTHER APPLICATIONS This application is a continuation-in-part of U.S. Patent Application Serial No. 16/ 171,221 , filed October 25, 2018, entitled "ARBITRARY VIEW GENERATION", the latter of which is entitled "ARBITRARY VIEW GENERATION No. 15/721,421 (now U.S. Pat. No. 10,163,249), filed September 29, 2017, entitled "ARBITRARY VIEW GENERATION." is a continuation-in-part of U.S. patent application Ser. incorporated herein by. U.S. patent application Ser. 541,607, the latter of which is incorporated herein by reference for all purposes.

本願は、すべての目的のために参照によって本明細書に組み込まれる、「ＱＵＡＮＴＩＺＥＤＰＥＲＳＰＥＣＴＩＶＥＣＡＭＥＲＡＶＩＥＷＳ」と題する２０１９年１１月８日出願の米国仮特許出願第６２／９３３，２５４号に基づく優先権を主張する。 This application claims priority from U.S. Provisional Patent Application No. 62/933,254, filed November 8, 2019, entitled "QUANTIZED PERSPECTIVE CAMERA VIEWS," which is incorporated herein by reference for all purposes. claim.

既存のレンダリング技術は、品質および速度という相反する目標の間のトレードオフに直面している。高品質なレンダリングは、かなりの処理リソースおよび時間を必要とする。しかしながら、遅いレンダリング技術は、インタラクティブなリアルタイムアプリケーションなど、多くのアプリケーションで許容できない。通例は、低品質だが高速なレンダリング技術が、かかるアプリケーションでは好まれる。例えば、比較的高速なレンダリングのために品質を犠牲にして、ラスタ化が、リアルタイムグラフィックスアプリケーションによって一般に利用される。したがって、品質も速度も大きく損なうことのない改良技術が求められている。 Existing rendering techniques face trade-offs between the competing goals of quality and speed. High quality rendering requires significant processing resources and time. However, slow rendering techniques are unacceptable for many applications, such as interactive real-time applications. Generally, lower quality but faster rendering techniques are preferred for such applications. For example, rasterization is commonly utilized by real-time graphics applications, sacrificing quality for relatively fast rendering. Therefore, there is a need for an improved technique that does not significantly impair quality or speed.

以下の詳細な説明と添付の図面において、本発明の様々な実施形態を開示する。 Various embodiments of the invention are disclosed in the following detailed description and accompanying drawings.

シーンの任意ビューを生成するためのシステムの一実施形態を示すハイレベルブロック図。1 is a high-level block diagram illustrating one embodiment of a system for generating arbitrary views of a scene; FIG.

データベースアセットの一例を示す図。The figure which shows an example of a database asset.

任意パースペクティブを生成するための処理の一実施形態を示すフローチャート。4 is a flowchart illustrating one embodiment of a process for generating arbitrary perspectives;

アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object. アンサンブルまたは合成オブジェクトを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す図。FIG. 4 illustrates an example embodiment of an application in which independent objects are combined to produce an ensemble or composite object.

任意アンサンブルビューを生成するための処理の一実施形態を示すフローチャート。4 is a flowchart illustrating one embodiment of a process for generating arbitrary ensemble views.

本発明は、処理、装置、システム、物質の組成、コンピュータ読み取り可能な格納媒体上に具現化されたコンピュータプログラム製品、および／または、プロセッサ（プロセッサに接続されたメモリに格納および／またはそのメモリによって提供される命令を実行するよう構成されたプロセッサ）を含め、様々な形態で実装されうる。本明細書では、これらの実施例または本発明が取りうる任意の他の形態が、技術と呼ばれうる。一般に、開示されている処理の工程の順序は、本発明の範囲内で変更されてもよい。特に言及しない限り、タスクを実行するよう構成されるものとして記載されたプロセッサまたはメモリなどの構成要素は、或る時間にタスクを実行するよう一時的に構成された一般的な構成要素として、または、タスクを実行するよう製造された特定の構成要素として実装されてよい。本明細書では、「プロセッサ」という用語は、１または複数のデバイス、回路、および／または、コンピュータプログラム命令などのデータを処理するよう構成された処理コアを指すものとする。 The present invention may be a process, apparatus, system, composition of matter, computer program product embodied on a computer readable storage medium, and/or a processor (stored in and/or by a memory coupled to the processor). implemented in various forms, including a processor configured to execute the provided instructions). In this specification, these examples, or any other form that the invention may take, may be referred to as techniques. In general, the order of steps in disclosed processes may be altered within the scope of the invention. Unless otherwise stated, a component such as a processor or memory described as being configured to perform a task is a generic component temporarily configured to perform the task at a time; , may be implemented as specific components manufactured to perform a task. As used herein, the term "processor" shall refer to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.

以下では、本発明の原理を示す図面を参照しつつ、本発明の１または複数の実施形態の詳細な説明を行う。本発明は、かかる実施形態に関連して説明されているが、どの実施形態にも限定されない。本発明の範囲は、特許請求の範囲によってのみ限定されるものであり、本発明は、多くの代替物、変形物、および、等価物を含む。以下の説明では、本発明の完全な理解を提供するために、多くの具体的な詳細事項が記載されている。これらの詳細事項は、例示を目的としたものであり、本発明は、これらの具体的な詳細事項の一部または全てがなくとも特許請求の範囲に従って実施可能である。簡単のために、本発明に関連する技術分野で周知の技術事項については、本発明が必要以上にわかりにくくならないように、詳細には説明していない。 A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention includes many alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the invention. These details are for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the sake of brevity, technical material that is well known in the technical fields related to the invention has not been described in detail so as not to unnecessarily obscure the invention.

シーンの任意ビューを生成するための技術が開示されている。本明細書に記載の実例は、非常に低い処理オーバヘッドまたは計算オーバヘッドを伴いつつ、高精細度出力も提供し、レンダリング速度と品質との間の困難なトレードオフを効果的に排除する。されている技術は、インタラクティブなリアルタイムグラフィックスアプリケーションに関して、高品質出力を非常に高速に生成するために特に有効である。かかるアプリケーションは、提示されたインタラクティブなビューまたはシーンのユーザ操作に応答してそれに従って、好ましい高品質出力を実質的に即時に提示することに依存する。 Techniques are disclosed for generating arbitrary views of a scene. The examples described herein also provide high definition output with very low processing or computational overhead, effectively eliminating difficult tradeoffs between rendering speed and quality. The disclosed technique is particularly effective for producing high quality output very quickly for interactive real-time graphics applications. Such applications rely on the substantially immediate presentation of desirable high quality output in response to and in accordance with user manipulation of a presented interactive view or scene.

図１は、シーンの任意ビューを生成するためのシステム１００の一実施形態を示すハイレベルブロック図である。図に示すように、任意ビュー生成器１０２が、任意ビューの要求を入力１０４として受信し、既存のデータベースアセット１０６に基づいて、要求されたビューを生成し、入力された要求に応答して、生成されたビューを出力１０８として提供する。様々な実施形態において、任意ビュー生成器１０２は、中央処理装置（ＣＰＵ）またはグラフィックス処置装置（ＧＰＵ）などのプロセッサを備えてよい。図１に示すシステム１００の構成は、説明のために提示されている。一般に、システム１００は、記載した機能を提供する任意の他の適切な数および／または構成の相互接続された構成要素を備えてもよい。例えば、別の実施形態において、任意ビュー生成器１０２は、異なる構成の内部構成要素１１０～１１６を備えてもよく、任意ビュー生成器１０２は、複数の並列物理および／または仮想プロセッサを備えてもよく、データベース１０６は、複数のネットワークデータベースまたはアセットのクラウドを備えてもよい、などである。 FIG. 1 is a high-level block diagram illustrating one embodiment of a system 100 for generating arbitrary views of a scene. As shown, an arbitrary view generator 102 receives a request for an arbitrary view as input 104, generates the requested view based on existing database assets 106, responds to the input request, The generated view is provided as output 108 . In various embodiments, the arbitrary view generator 102 may comprise a processor such as a central processing unit (CPU) or graphics processing unit (GPU). The configuration of system 100 shown in FIG. 1 is presented for illustration. In general, system 100 may include any other suitable number and/or configuration of interconnected components that provide the described functionality. For example, in another embodiment, the arbitrary view generator 102 may comprise different configurations of internal components 110-116, and the arbitrary view generator 102 may comprise multiple parallel physical and/or virtual processors. Often, the database 106 may comprise multiple network databases or a cloud of assets, and so on.

任意ビュー要求１０４は、シーンの任意パースペクティブの要求を含む。いくつかの実施形態において、シーンの他のパースペクティブすなわち視点を含むシーンの要求パースペクティブは、アセットデータベース１０６内にまだ存在してはいない。様々な実施形態において、任意ビュー要求１０４は、プロセスまたはユーザから受信されてよい。例えば、入力１０４は、提示されたシーンまたはその一部のユーザ操作（提示されたシーンのカメラ視点のユーザ操作など）に応答して、ユーザインターフェスから受信されうる。別の例において、任意ビュー要求１０４は、シーンのフライスルーなど、仮想環境内での運動または移動の経路の指定に応答して受信されてもよい。いくつかの実施形態において、要求できるシーンの可能な任意ビューは、少なくとも部分的に制約されている。例えば、ユーザは、提示されたインタラクティブシーンのカメラ視点を任意のランダムな位置に操作することができない場合があり、シーンの特定の位置またはパースペクティブに制約される。 Arbitrary view requests 104 include requests for arbitrary perspectives of the scene. In some embodiments, other perspectives of the scene, ie, the requested perspective of the scene containing the viewpoint, do not yet exist in the asset database 106 . In various embodiments, the arbitrary view request 104 may be received from a process or user. For example, input 104 may be received from a user interface in response to user manipulation of a presented scene or portion thereof (such as user manipulation of a camera view of the presented scene). In another example, the arbitrary view request 104 may be received in response to specifying a path of movement or movement within the virtual environment, such as a flythrough of a scene. In some embodiments, the possible arbitrary views of the scene that can be requested are at least partially constrained. For example, a user may not be able to steer the camera viewpoint of a presented interactive scene to arbitrary random positions and is constrained to a particular position or perspective of the scene.

データベース１０６は、格納された各アセットの複数のビューを格納している。所与の文脈において、アセットとは、仕様が複数のビューとしてデータベース１０６に格納されている個々のシーンのことである。様々な実施形態において、シーンは、単一のオブジェクト、複数のオブジェクト、または、リッチな仮想環境を含みうる。具体的には、データベース１０６は、各アセットの異なるパースペクティブすなわち視点に対応する複数の画像を格納する。データベース１０６に格納されている画像は、高品質の写真または写実的レンダリングを含む。データベース１０６に入力されるかかる高精細度すなわち高解像度の画像は、オフライン処理中にキャプチャまたはレンダリングされ、もしくは、外部ソースから取得されてよい。いくつかの実施形態において、対応するカメラ特性が、データベース１０６に格納された各画像と共に格納される。すなわち、相対的な位置または場所、向き、回転、奥行情報、焦点距離、絞り、ズームレベルなどのカメラ属性が、各画像と共に格納される。さらに、シャッター速度および露出などのカメラの光学情報が、データベース１０６に格納された各画像と共に格納されてもよい。 Database 106 stores multiple views of each stored asset. In a given context, an asset is an individual scene whose specifications are stored in the database 106 as multiple views. In various embodiments, a scene may include a single object, multiple objects, or a rich virtual environment. Specifically, database 106 stores multiple images corresponding to different perspectives or viewpoints of each asset. The images stored in database 106 include high quality photographs or photorealistic renderings. Such high definition or high resolution images that are input to database 106 may be captured or rendered during offline processing, or obtained from external sources. In some embodiments, corresponding camera characteristics are stored with each image stored in database 106 . That is, camera attributes such as relative position or location, orientation, rotation, depth information, focal length, aperture, zoom level are stored with each image. In addition, camera optical information such as shutter speed and exposure may be stored with each image stored in database 106 .

様々な実施形態において、アセットの任意の数の異なるパースペクティブがデータベース１０６に格納されてよい。図２は、データベースアセットの一例を示す。与えられた例では、椅子オブジェクトの周りの異なる角度に対応する７３のビューがキャプチャまたはレンダリングされ、データベース１０６に格納される。ビューは、例えば、椅子の周りでカメラを回転させるかまたはカメラの前で椅子を回転させることによってキャプチャされてよい。相対的なオブジェクトおよびカメラの位置および向きの情報が、生成された各画像と共に格納される。図２は、１つのオブジェクトを含むシーンのビューを具体的に示している。データベース１０６は、複数のオブジェクトまたはリッチな仮想環境を含むシーンの仕様も格納してよい。かかるケースにおいては、シーンまたは三次元空間の中の異なる位置または場所に対応する複数のビューがキャプチャまたはレンダリングされ、対応するカメラ情報と共にデータベース１０６に格納される。一般に、データベース１０６に格納された画像は、二次元または三次元を含んでよく、アニメーションまたはビデオシーケンスのスチールまたはフレームを含んでよい。 Any number of different perspectives of an asset may be stored in database 106 in various embodiments. FIG. 2 shows an example of a database asset. In the example given, 73 views corresponding to different angles around the chair object are captured or rendered and stored in database 106 . The view may be captured, for example, by rotating the camera around the chair or rotating the chair in front of the camera. Relative object and camera position and orientation information is stored with each image generated. FIG. 2 illustrates a view of a scene containing one object. Database 106 may also store specifications for scenes containing multiple objects or rich virtual environments. In such cases, multiple views corresponding to different positions or locations in the scene or three-dimensional space are captured or rendered and stored in database 106 along with corresponding camera information. In general, the images stored in database 106 may include two or three dimensions and may include stills or frames of animations or video sequences.

データベース１０６にまだ存在しないシーンの任意ビューの要求１０４に応答して、任意ビュー生成器１０２は、データベース１０６に格納されたシーンの複数の他の既存ビューから、要求された任意ビューを生成する。図１の構成例では、任意ビュー生成器１０２のアセット管理エンジン１１０が、データベース１０６を管理する。例えば、アセット管理エンジン１１０は、データベース１０６におけるデータの格納およびリトリーブを容易にしうる。シーン１０４の任意ビューの要求に応答して、アセット管理エンジン１１０は、データベース１０６からシーンの複数の他の既存ビューを特定して取得する。いくつかの実施形態において、アセット管理エンジン１１０は、データベース１０６からシーンのすべての既存ビューをリトリーブする。あるいは、アセット管理エンジン１１０は、既存ビューの一部（例えば、要求された任意ビューに最も近いビュー）を選択してリトリーブしてもよい。かかるケースにおいて、アセット管理エンジン１１０は、要求された任意ビューを生成するためのピクセルの収集元になりうる一部の既存ビューをインテリジェントに選択するよう構成される。様々な実施形態において、複数の既存ビューが、アセット管理エンジン１１０によって一緒にリトリーブされてもよいし、任意ビュー生成器１０２のその他の構成要素によって必要になり次第リトリーブされてもよい。 In response to a request 104 for an arbitrary view of a scene that does not already exist in database 106 , arbitrary view generator 102 generates the requested arbitrary view from multiple other existing views of the scene stored in database 106 . In the example configuration of FIG. 1, the asset management engine 110 of the arbitrary view generator 102 manages the database 106 . For example, asset management engine 110 may facilitate storing and retrieving data in database 106 . In response to a request for an arbitrary view of scene 104 , asset management engine 110 identifies and retrieves multiple other existing views of the scene from database 106 . In some embodiments, asset management engine 110 retrieves all existing views of the scene from database 106 . Alternatively, asset management engine 110 may select a portion of an existing view (eg, the view closest to any requested view) to retrieve. In such cases, the asset management engine 110 is configured to intelligently select some existing view from which pixels may be collected to generate the requested arbitrary view. In various embodiments, multiple existing views may be retrieved together by the asset management engine 110 or by other components of the arbitrary view generator 102 as needed.

アセット管理エンジン１１０によってリトリーブされた各既存ビューのパースペクティブは、任意ビュー生成器１０２のパースペクティブ変換エンジン１１２によって、要求された任意ビューのパースペクティブに変換される。上述のように、正確なカメラ情報が既知であり、データベース１０６に格納された各画像と共に格納されている。したがって、既存ビュー要求された任意ビューへのパースペクティブ変更は、単純な幾何マッピングまたは幾何変換を含む。様々な実施形態において、パースペクティブ変換エンジン１１２は、既存ビューのパースペクティブを任意ビューのパースペクティブに変換するために、任意の１または複数の適切な数学的手法を用いてよい。要求されたビューがどの既存ビューとも同一ではない任意ビューを含む場合、任意ビューのパースペクティブへの既存ビューの変換は、少なくともいくつかのマッピングされていないピクセルまたは失われたピクセル、すなわち、既存ビューに存在しない任意ビューに導入された角度または位置にあるピクセルを含むことになる。 Each existing view perspective retrieved by the asset management engine 110 is transformed into the requested arbitrary view perspective by the perspective transformation engine 112 of the arbitrary view generator 102 . As noted above, the exact camera information is known and stored with each image stored in database 106 . Perspective changes to any existing view-requested view therefore involve simple geometric mappings or transformations. In various embodiments, perspective transformation engine 112 may use any one or more suitable mathematical techniques to transform the perspective of an existing view into the perspective of an arbitrary view. If the requested view includes an arbitrary view that is not identical to any existing view, the conversion of the existing view to the perspective of the arbitrary view will result in at least some unmapped or missing pixels, i.e. It will contain pixels at angles or positions introduced into arbitrary views that do not exist.

単一のパースペクティブ変換された既存ビューからのピクセル情報では、別のビューのすべてのピクセルを埋めることができない。しかしながら、多くの場合、すべてではないが、要求された任意ビューのほとんどのピクセルが、複数のパースペクティブ変換された既存ビューから収集されうる。任意ビュー生成器１０２のマージエンジン１１４が、複数のパースペクティブ変換された既存ビューからのピクセルを組み合わせて、要求された任意ビューを生成する。理想的には、任意ビューを構成するすべてのピクセルが既存ビューから収集される。これは、例えば、考慮対象となるアセットについて十分に多様なセットの既存ビューまたはパースペクティブが利用可能である場合、および／または、要求されたパースペクティブが既存のパースペクティブとはそれほど異なっていない場合に、可能でありうる。 Pixel information from a single perspective-transformed existing view cannot fill all the pixels of another view. However, in many cases, most, if not all, of the pixels of any requested view can be collected from multiple perspective-transformed existing views. A merge engine 114 of the arbitrary view generator 102 combines pixels from multiple perspective-transformed existing views to generate the requested arbitrary view. Ideally, all pixels that make up an arbitrary view are collected from existing views. This is possible, for example, if a sufficiently diverse set of existing views or perspectives are available for the considered asset and/or if the requested perspective is not significantly different from the existing perspectives. can be

複数のパースペクティブ変換された既存ビューからのピクセルを組み合わせまたはマージして、要求された任意ビューを生成するために、任意の適切な技術が用いられてよい。一実施形態において、要求された任意ビューに最も近い第１既存ビューが、データベース１０６から選択されてリトリーブされ、要求された任意ビューのパースペクティブに変換される。次いで、ピクセルが、このパースペクティブ変換された第１既存ビューから収集され、要求された任意ビュー内の対応するピクセルを埋めるために用いられる。第１既存ビューから取得できなかった要求任意ビューのピクセルを埋めるために、これらの残りのピクセルの少なくとも一部を含む第２既存ビューが、データベース１０６から選択されてリトリーブされ、要求任意ビューのパースペクティブへ変換される。次いで、第１既存ビューから取得できなかったピクセルは、このパースペクティブ変換された第２既存ビューから収集され、要求任意ビュー内の対応するピクセルを埋めるために用いられる。この処理は、要求任意ビューのすべてのピクセルが埋められるまで、および／または、すべての既存ビューが使い果たされるかまたは所定の閾値数の既存ビューが利用されるまで、任意の数のさらなる既存ビューについて繰り返されてよい。 Any suitable technique may be used to combine or merge pixels from multiple perspective-transformed existing views to generate the desired arbitrary view. In one embodiment, the first existing view closest to the requested arbitrary view is selected and retrieved from the database 106 and transformed into the perspective of the requested arbitrary view. Pixels are then collected from this perspective-transformed first existing view and used to fill the corresponding pixels in the requested arbitrary view. To fill in the pixels of the requested arbitrary view that could not be obtained from the first existing view, a second existing view containing at least some of these remaining pixels is selected and retrieved from the database 106 to provide the perspective of the requested arbitrary view. is converted to Pixels that could not be obtained from the first existing view are then collected from this perspective transformed second existing view and used to fill the corresponding pixels in the requested arbitrary view. This process continues until all pixels of the requested arbitrary view are filled and/or until all existing views are exhausted or a predetermined threshold number of existing views are utilized. may be repeated for

いくつかの実施形態において、要求任意ビューは、どの既存ビューからも取得できなかったいくつかのピクセルを含みうる。かかる場合、補間エンジン１１６が、要求任意ビューのすべての残りのピクセルを埋めるよう構成されている。様々な実施形態において、要求任意ビュー内のこれらの埋められていないピクセルを生成するために、任意の１または複数の適切な補間技術が補間エンジン１１６によって用いられてよい。利用可能な補間技術の例は、例えば、線形補間、最近隣補間などを含む。ピクセルの補間は、平均法または平滑化を導入する。全体の画像品質は、ある程度の補間によって大きい影響を受けることはないが、過剰な補間は、許容できない不鮮明さを導入しうる。したがって、補間は、控えめに用いることが望ましい場合がある。上述のように、要求任意ビューのすべてのピクセルを既存ビューから取得できる場合には、補間は完全に回避される。しかしながら、要求任意ビューが、どのビューからも取得できないいくつかのピクセルを含む場合には、補間が導入される。一般に、必要な補間の量は、利用可能な既存ビューの数、既存ビューのパースペクティブの多様性、および／または、任意ビューのパースペクティブが既存ビューのパースペクティブに関してどれだけ異なるか、に依存する。 In some embodiments, the requested arbitrary view may contain some pixels that could not be obtained from any existing view. In such cases, interpolation engine 116 is configured to fill in all remaining pixels of the requested arbitrary view. In various embodiments, any one or more suitable interpolation techniques may be used by interpolation engine 116 to generate these unfilled pixels in the requested arbitrary view. Examples of interpolation techniques that can be used include, for example, linear interpolation, nearest neighbor interpolation, and the like. Pixel interpolation introduces averaging or smoothing. Overall image quality is not significantly affected by some interpolation, but excessive interpolation can introduce unacceptable blurriness. Therefore, it may be desirable to use interpolation sparingly. As noted above, interpolation is avoided entirely if all pixels of the requested arbitrary view can be obtained from existing views. However, if the requested arbitrary view contains some pixels that cannot be obtained from any view, interpolation is introduced. In general, the amount of interpolation required depends on the number of existing views available, the diversity of perspectives of the existing views, and/or how the perspective of any given view differs with respect to the perspective of the existing views.

図２に示した例に関して、椅子オブジェクトの周りの７３のビューが、椅子の既存ビューとして格納される。格納されたビューとのいずれとも異なるすなわち特有の椅子オブジェクトの周りの任意ビューが、もしあったとしても好ましくは最小限の補間で、複数のこれらの既存ビューを用いて生成されうる。しかしながら、既存ビューのかかる包括的なセットを生成して格納することが、効率的でなかったり望ましくなかったりする場合がある。いくつかの場合、その代わりに、十分に多様なセットのパースペクティブを網羅する十分に少ない数の既存ビューが生成および格納されてもよい。例えば、椅子オブジェクトの７３のビューが、椅子オブジェクトの周りの少数のビューの小さいセットに縮小されてもよい。 For the example shown in FIG. 2, 73 views around the chair object are stored as existing views of the chair. An arbitrary view around the chair object that is different or unique from any of the stored views can be generated using a plurality of these existing views, preferably with minimal, if any, interpolation. However, it may not be efficient or desirable to generate and store such a comprehensive set of existing views. In some cases, a sufficiently small number of existing views may instead be generated and stored that cover a sufficiently diverse set of perspectives. For example, 73 views of a chair object may be reduced to a small set of a few views around the chair object.

上述のように、いくつかの実施形態において、要求できる可能な任意ビューが、少なくとも部分的に制約される場合がある。例えば、ユーザは、インタラクティブなシーンに関連する仮想カメラを特定の位置に動かすことを制限されうる。図２で与えられた例に関しては、要求できる可能な任意ビューは、椅子オブジェクトの周りの任意の位置に制限され、例えば、椅子オブジェクトの底部のために存在するピクセルデータが不十分であるので、椅子オブジェクトの下の任意の位置を含みえない。許容される任意ビューについてのかかる制約は、要求任意ビューを任意ビュー生成器１０２によって既存データから生成できることを保証する。 As noted above, in some embodiments the possible arbitrary views that can be requested may be at least partially constrained. For example, a user may be restricted from moving a virtual camera associated with an interactive scene to certain positions. For the example given in Figure 2, the possible arbitrary views that can be requested are limited to arbitrary positions around the chair object, e.g. It cannot include any position under the chair object. Such constraints on allowed arbitrary views ensure that the requested arbitrary view can be generated by the arbitrary view generator 102 from existing data.

任意ビュー生成器１０２は、入力された任意ビュー要求１０４に応答して、要求任意ビュー１０８を生成して出力する。生成された任意ビュー１０８の解像度または品質は、既存ビューからのピクセルが任意ビューを生成するために用いられているので、それを生成するために用いられた既存ビューの品質と同じであるかまたは同等である。したがって、ほとんどの場合に高精細度の既存ビューを用いると、高精細度の出力が得られる。いくつかの実施形態において、生成された任意ビュー１０８は、関連シーンの他の既存ビューと共にデータベース１０６に格納され、後に、任意ビューに対する将来の要求に応答して、そのシーンの他の任意ビューを生成するために用いられてよい。入力１０４がデータベース１０６内の既存ビューの要求を含む場合、要求ビューは、上述のように、他のビューから生成される必要がなく、その代わり、要求ビューは、簡単なデータベースルックアップを用いてリトリーブされ、出力１０８として直接提示される。 Arbitrary view generator 102 generates and outputs requested arbitrary view 108 in response to input arbitrary view request 104 . The resolution or quality of the generated arbitrary view 108 is the same as the quality of the existing view used to generate it, since pixels from the existing view are used to generate the arbitrary view, or are equivalent. Therefore, using high definition existing views in most cases yields high definition output. In some embodiments, the generated arbitrary view 108 is stored in the database 106 along with other existing views of the relevant scene, and later, in response to future requests for arbitrary views, other arbitrary views of the scene are generated. may be used to generate If the input 104 contains a request for an existing view in the database 106, then the request view need not be generated from other views as described above; instead the request view can be created using a simple database lookup. retrieved and presented directly as output 108 .

任意ビュー生成器１０２は、さらに、記載した技術を用いて任意アンサンブルビューを生成するよう構成されてもよい。すなわち、入力１０４は、複数のオブジェクトを組み合わせて単一のカスタムビューにするための要求を含んでよい。かかる場合、上述の技術は、複数のオブジェクトの各々に対して実行され、複数のオブジェクトを含む単一の統合されたビューすなわちアンサンブルビューを生成するように組み合わせられる。具体的には、複数のオブジェクトの各々の既存ビューが、アセット管理エンジン１１０によってデータベース１０６から選択されてリトリーブされ、それらの既存ビューは、パースペクティブ変換エンジン１１２によって、要求されたビューのパースペクティブに変換され、パースペクティブ変換された既存ビューからのピクセルが、マージエンジン１１４によって、要求されたアンサンブルビューの対応するピクセルを埋めるために用いられ、アンサンブルビュー内の任意の残りの埋められていないピクセルが、補間エンジン１１６によって補間される。いくつかの実施形態において、要求されたアンサンブルビューは、アンサンブルを構成する１または複数のオブジェクトのためにすでに存在するパースペクティブを含みうる。かかる場合、要求されたパースペクティブに対応するオブジェクトアセットの既存ビューは、オブジェクトの他の既存ビューから要求されたパースペクティブを最初に生成する代わりに、アンサンブルビュー内のオブジェクトに対応するピクセルを直接埋めるために用いられる。 Arbitrary view generator 102 may also be configured to generate arbitrary ensemble views using the described techniques. That is, input 104 may include a request to combine multiple objects into a single custom view. In such cases, the techniques described above are performed on each of the multiple objects and combined to produce a single unified or ensemble view containing the multiple objects. Specifically, existing views of each of the plurality of objects are selected and retrieved from the database 106 by the asset management engine 110, and those existing views are transformed by the perspective transformation engine 112 into the perspective of the requested view. , the pixels from the perspective-transformed existing views are used by the merge engine 114 to fill the corresponding pixels of the requested ensemble view, and any remaining unfilled pixels in the ensemble view are used by the interpolation engine 116. In some embodiments, the requested ensemble view may include perspectives that already exist for one or more of the objects that make up the ensemble. In such cases, an existing view of the object asset corresponding to the requested perspective is used to directly fill the pixels corresponding to the object in the ensemble view, instead of first generating the requested perspective from other existing views of the object. Used.

複数のオブジェクトを含む任意アンサンブルビューの一例として、図２の椅子オブジェクトおよび別個に撮影またはレンダリングされたテーブルオブジェクトを考える。椅子オブジェクトおよびテーブルオブジェクトは、両方のオブジェクトの単一のアンサンブルビューを生成するために、開示されている技術を用いて組み合わせられてよい。したがって、開示された技術を用いて、複数のオブジェクトの各々の別個にキャプチャまたはレンダリングされた画像またはビューが、複数のオブジェクトを含み所望のパースペクティブを有するシーンを生成するために、矛盾なく組み合わせられうる。上述のように、各既存ビューの奥行情報は既知である。各既存ビューのパースペクティブ変換は、奥行変換を含んでおり、複数のオブジェクトが、アンサンブルビュー内で互いに対して適切に配置されることを可能にする。 As an example of an arbitrary ensemble view containing multiple objects, consider the chair object and separately photographed or rendered table object in FIG. A chair object and a table object may be combined using the disclosed techniques to produce a single ensemble view of both objects. Thus, using the disclosed techniques, separately captured or rendered images or views of each of multiple objects can be consistently combined to produce a scene containing multiple objects and having a desired perspective. . As mentioned above, the depth information for each existing view is known. The perspective transformation of each existing view, including the depth transformation, allows multiple objects to be properly positioned relative to each other within the ensemble view.

任意アンサンブルビューの生成は、複数の単一オブジェクトを組み合わせてカスタムビューにすることに限定されない。むしろ、複数のオブジェクトまたは複数のリッチな仮想環境を有する複数のシーンが、同様に組み合わせられてカスタムアンサンブルビューにされてもよい。例えば、複数の別個に独立して生成された仮想環境（おそらくは異なるコンテンツ生成源に由来し、おそらくは異なる既存の個々のパースペクティブを有する）が、所望のパースペクティブを有するアンサンブルビューになるように組み合わせられてよい。したがって、一般に、任意ビュー生成器１０２は、おそらくは異なる既存ビューを含む複数の独立したアセットを、所望のおそらくは任意パースペクティブを有するアンサンブルビューに矛盾なく組み合わせまたは調和させるよう構成されてよい。すべての組み合わせられたアセットが同じパースペクティブに正規化されるので、完璧に調和した結果としてのアンサンブルビューが生成される。アンサンブルビューの可能な任意パースペクティブは、アンサンブルビューを生成するために利用可能な個々のアセットの既存ビューに基づいて制約されうる。 Generating arbitrary ensemble views is not limited to combining multiple single objects into custom views. Rather, multiple scenes with multiple objects or multiple rich virtual environments may be similarly combined into a custom ensemble view. For example, multiple separately and independently generated virtual environments (perhaps from different content generation sources and possibly with different pre-existing individual perspectives) are combined into an ensemble view with a desired perspective. good. Thus, in general, the arbitrary view generator 102 may be configured to consistently combine or match multiple independent assets, including possibly different existing views, into an ensemble view with a desired, possibly arbitrary perspective. Since all combined assets are normalized to the same perspective, a perfectly matched resulting ensemble view is produced. The possible arbitrary perspectives of the ensemble view can be constrained based on existing views of individual assets available to generate the ensemble view.

図３は、任意パースペクティブを生成するための処理の一実施形態を示すフローチャートである。処理３００は、例えば、図１の任意ビュー生成器１０２によって用いられてよい。様々な実施形態において、処理３００は、所定のアセットの任意ビューまたは任意アンサンブルビューを生成するために用いられてよい。 FIG. 3 is a flow diagram illustrating one embodiment of a process for generating arbitrary perspectives. Process 300 may be used, for example, by arbitrary view generator 102 of FIG. In various embodiments, process 300 may be used to generate arbitrary or arbitrary ensemble views of a given asset.

処理３００は、任意パースペクティブの要求が受信される工程３０２において始まる。いくつかの実施形態では、工程３０２において受信された要求は、シーンのどの既存の利用可能なパースペクティブとも異なる所定のシーンの任意パースペクティブの要求を含みうる。かかる場合、例えば、任意パースペクティブ要求は、そのシーンの提示されたビューのパースペクティブの変更を要求されたことに応じて受信されてよい。パースペクティブのかかる変更は、カメラのパン、焦点距離の変更、ズームレベルの変更など、シーンに関連する仮想カメラの変更または操作によって促されてよい。あるいは、いくつかの実施形態において、工程３０２で受信された要求は、任意アンサンブルビューの要求を含んでもよい。一例として、かかる任意アンサンブルビュー要求は、複数の独立したオブジェクトの選択を可能にして、選択されたオブジェクトの統合されたパースペクティブ修正済みのアンサンブルビューを提供するアプリケーションに関して受信されうる。 Process 300 begins at step 302 where a request for an arbitrary perspective is received. In some embodiments, the request received at step 302 may include a request for an arbitrary perspective of a given scene that differs from any existing available perspective of the scene. In such cases, for example, an arbitrary perspective request may be received in response to a requested change in perspective of the presented view of the scene. Such changes in perspective may be prompted by changes or manipulations of the virtual camera relative to the scene, such as camera pans, focal length changes, zoom level changes, and the like. Alternatively, in some embodiments, the request received at step 302 may include a request for arbitrary ensemble views. As an example, such an arbitrary ensemble view request may be received for an application that allows selection of multiple independent objects and provides an integrated perspective-modified ensemble view of the selected objects.

工程３０４において、要求された任意パースペクティブの少なくとも一部を生成する元となる複数の既存画像が、１または複数の関連アセットデータベースからリトリーブされる。複数のリトリーブされた画像は、工程３０２において受信された要求が所定のアセットの任意パースペクティブの要求を含む場合には、所定のアセットに関連してよく、また、工程３０２において受信された要求が任意アンサンブルビューの要求を含む場合には、複数のアセットに関連してよい。 At step 304, a plurality of existing images from which to generate at least a portion of the requested arbitrary perspective are retrieved from one or more related asset databases. Multiple retrieved images may be associated with a given asset if the request received at step 302 includes a request for an arbitrary perspective of the given asset, and the request received at step 302 may be any If it contains an ensemble view request, it may relate to multiple assets.

工程３０６において、異なるパースペクティブを有する工程３０４でリトリーブされた複数の既存画像の各々が、工程３０２において要求された任意パースペクティブに変換される。工程３０４でリトリーブされた既存画像の各々は、関連するパースペクティブ情報を含む。各画像のパースペクティブは、相対位置、向き、回転、角度、奥行、焦点距離、絞り、ズームレベル、照明情報など、その画像の生成に関連するカメラ特性によって規定される。完全なカメラ情報が各画像について既知であるので、工程３０６のパースペクティブ変換は、単純な数学演算を含む。いくつかの実施形態において、工程３０６は、任意選択的に、すべての画像が同じ所望の照明条件に一貫して正規化されるような光学変換をさらに含む。 At step 306 , each of the plurality of existing images retrieved at step 304 having different perspectives is converted to the arbitrary perspective requested at step 302 . Each of the existing images retrieved in step 304 includes associated perspective information. The perspective of each image is defined by the camera properties associated with generating that image, such as relative position, orientation, rotation, angle, depth, focal length, aperture, zoom level, and lighting information. Since complete camera information is known for each image, the perspective transformation of step 306 involves simple mathematical operations. In some embodiments, step 306 optionally further includes an optical transformation such that all images are consistently normalized to the same desired lighting conditions.

工程３０８では、工程３０２において要求された任意パースペクティブを有する画像の少なくとも一部が、パースペクティブ変換済みの既存画像から収集されたピクセルで埋められる。すなわち、複数のパースペクティブ補正済みの既存画像からのピクセルが、要求された任意パースペクティブを有する画像を生成するために用いられる。 At step 308, at least a portion of the image having the arbitrary perspective requested in step 302 is filled with pixels collected from an existing image that has been perspective transformed. That is, pixels from multiple perspective-corrected existing images are used to generate an image with a desired arbitrary perspective.

工程３１０では、要求された任意パースペクティブを有する生成された画像が完成したか否かが判定される。要求された任意パースペクティブを有する生成された画像が完成していないと工程３１０において判定された場合、生成された画像の任意の残りの埋められていないピクセルを取得するためのさらなる既存画像が利用可能であるか否かが工程３１２において判定される。さらなる既存画像が利用可能であると工程３１２において判定された場合、１または複数のさらなる既存画像が工程３１４においてリトリーブされ、処理３００は工程３０６に進む。 At step 310, it is determined whether the generated image with the requested arbitrary perspective is complete. If it is determined in step 310 that the generated image with the requested arbitrary perspective is not complete, then additional existing images are available to obtain any remaining unfilled pixels of the generated image. It is determined in step 312 whether . If additional existing images are available as determined at step 312 , one or more additional existing images are retrieved at step 314 and process 300 proceeds to step 306 .

要求された任意パースペクティブを有する生成された画像が完成していないと工程３１０において判定され、かつ、もはや既存画像が利用できないと工程３１２において判定された場合、生成された画像のすべての残りの埋められていないピクセルが工程３１６において補間される。任意の１または複数の適切な補間技術が、工程３１６で用いられてよい。 If it is determined in step 310 that the generated image with the requested arbitrary perspective is not complete, and if it is determined in step 312 that no existing image is available, all remaining padding of the generated image is performed. Pixels that are not interpolated are interpolated in step 316 . Any one or more suitable interpolation techniques may be used at step 316 .

要求された任意パースペクティブを有する生成された画像が完成したと工程３１０において判定された場合、または、工程３１６においてすべての残りの埋められていないピクセルを補間した後、要求された任意パースペクティブを有する生成済みの画像が工程３１８において出力される。その後、処理３００は終了する。 If it is determined in step 310 that the generated image with the requested arbitrary perspective is complete, or after interpolating all remaining unfilled pixels in step 316, generate with the requested arbitrary perspective. The finished image is output at step 318 . Thereafter, process 300 ends.

上述のように、開示されている技術は、他の既存のパースペクティブに基づいて任意パースペクティブを生成するために用いられてよい。カメラ情報が各既存パースペクティブと共に保存されているので、異なる既存のパースペクティブを共通の所望のパースペクティブに正規化することが可能である。所望のパースペクティブを有する結果としての画像は、パースペクティブ変換された既存画像からピクセルを取得することで構築できる。開示されている技術を用いた任意パースペクティブの生成に関連する処理は、高速でほぼ即時であるだけでなく、高品質の出力も生み出すため、開示されている技術は、インタラクティブなリアルタイムグラフィックスアプリケーションに対して特に強力な技術となっている。 As noted above, the disclosed techniques may be used to generate arbitrary perspectives based on other existing perspectives. Since camera information is stored with each existing perspective, it is possible to normalize different existing perspectives to a common desired perspective. A resulting image with a desired perspective can be constructed by taking pixels from an existing image that has been perspective transformed. The disclosed technique is well suited for interactive real-time graphics applications because the processing associated with generating arbitrary perspectives using the disclosed technique is not only fast and near-instantaneous, but also produces high-quality output. It is a particularly powerful technology.

開示されている技術は、さらに、複数のオブジェクトの各々の利用可能な画像またはビューを用いた、複数のオブジェクトを含む任意アンサンブルビューの生成を記載する。上述のように、パースペクティブの変換および／または正規化は、複数のオブジェクトの別個にキャプチャまたはレンダリングされた画像またはビューを含むピクセルが、所望の任意アンサンブルビューになるように矛盾なく組み合わせられることを可能にする。 The disclosed technique further describes generating an arbitrary ensemble view containing multiple objects using available images or views of each of the multiple objects. As described above, perspective transformation and/or normalization allows pixels comprising separately captured or rendered images or views of multiple objects to be consistently combined into any desired ensemble view. to

いくつかの実施形態において、シーンまたはアンサンブルビューに含まれることが望ましいコンテンツを選択して配置することによって、最初にシーンまたはアンサンブルビューを構築または組み立てることが望ましい場合がある。いくつかのかかる場合に、複数のオブジェクトが、シーンまたはアンサンブルビューを含む合成オブジェクトを作成するために、積木のようにスタックまたは組み合わせられてよい。一例として、複数の独立したオブジェクトが、シーンまたはアンサンブルビューを作成するために、選択され、例えばキャンバス上に、適切に配置されるインタラクティブアプリケーションを考える。インタラクティブアプリケーションは、例えば、視覚化アプリケーションまたはモデリングアプリケーションを含んでよい。かかるアプリケーションにおいて、関連する焦点距離に起因する射影歪みにより、シーンまたはアンサンブルビューを構築するために、オブジェクトの任意ビューを利用できない。むしろ、実質的に射影歪みがない所定のオブジェクトビューが、次に記載するように利用される。 In some embodiments, it may be desirable to first construct or assemble the scene or ensemble view by selecting and arranging the content that is desired to be included in the scene or ensemble view. In some such cases, multiple objects may be stacked or combined like building blocks to create a composite object comprising a scene or ensemble view. As an example, consider an interactive application in which multiple independent objects are selected and appropriately positioned, eg, on a canvas, to create a scene or ensemble view. Interactive applications may include, for example, visualization or modeling applications. In such applications, arbitrary views of objects cannot be used to construct scene or ensemble views due to perspective distortions due to the focal lengths involved. Rather, a predetermined object view with substantially no perspective distortion is utilized as described below.

オブジェクトの正投影ビューが、いくつかの実施形態において、複数の独立したオブジェクトを含むシーンまたはアンサンブルビューをモデル化または規定するために用いられる。正投影ビューは、光線または投影線が実質的に平行になるように、そのサイズに対して対象物から遠い距離に配置され、比較的長い焦点距離を有する（仮想の）カメラによって近似された平行射影を含む。正投影ビューは、奥行を有しておらず、または、固定の奥行を有しており、そのため、射影歪みが全くまたはほとんどない。したがって、オブジェクトの正投影ビューは、アンサンブルシーンまたは合成オブジェクトを指定する時に積木と同様に用いられてよい。任意の組みあわせのオブジェクトを含むアンサンブルシーンが、かかる正投影ビューを用いて指定または規定された後、シーンまたはそのオブジェクトは、図１～図３の記載に関して上述した任意ビュー生成技術を用いて任意の所望のカメラパースペクティブに変換されてよい。 Orthographic views of objects are used in some embodiments to model or define a scene or ensemble view containing multiple independent objects. An orthographic view is placed at a far distance from the object for its size, such that the rays or projection lines are substantially parallel, approximated by a (virtual) camera with a relatively long focal length. Including projection. An orthographic view has no depth, or has a fixed depth, so there is little or no perspective distortion. Orthographic views of objects may thus be used like building blocks when specifying ensemble scenes or composite objects. After an ensemble scene containing any combination of objects has been specified or defined using such orthographic views, the scene or its objects can be generated using arbitrary view generation techniques described above with respect to the description of FIGS. may be converted to the desired camera perspective of

いくつかの実施形態において、図１のシステム１００のデータベース１０６に格納されたアセットの複数のビューは、アセットの１または複数の正投影ビューを含む。かかる正投影ビューは、三次元ポリゴンメッシュモデルからキャプチャ（例えば、撮影またはスキャン）もしくはレンダリングされてよい。あるいは、正投影ビューは、図１～図３の記載に関して上述した任意ビュー生成技術に従って、データベース１０６内で利用可能なアセットの他のビューから生成されてもよい。 In some embodiments, the multiple views of an asset stored in database 106 of system 100 of FIG. 1 include one or more orthographic views of the asset. Such orthographic views may be captured (eg, photographed or scanned) or rendered from the 3D polygonal mesh model. Alternatively, orthographic views may be generated from other views of assets available in database 106 according to the arbitrary view generation techniques described above with respect to the description of FIGS. 1-3.

図４Ａ～図４Ｎは、アンサンブルまたは合成オブジェクトまたはシーンを生成するために、独立したオブジェクトが組み合わせられるアプリケーションの実施形態の例を示す。具体的には、図４Ａ～図４Ｎは、異なるユニットソファ構成を生成するために、様々な独立したソファ構成要素が組み合わせられる家具組み立てアプリケーションの一例を示す。 Figures 4A-4N illustrate example embodiments of applications in which independent objects are combined to produce an ensemble or composite object or scene. Specifically, Figures 4A-4N illustrate an example of a furniture construction application in which various individual sofa components are combined to create different unit sofa configurations.

図４Ａは、３つの独立したソファ構成要素（すなわち、左アーム付き一人掛け、アームなし二人掛け、および、右アーム付きシェーズロング）を示す斜視図の一例である。図４Ａの例における斜視図は各々、２５ｍｍの焦点距離を有する。図に見られるように、結果として生じる射影歪みが、互いに隣接させた構成要素のスタッキング（すなわち、構成要素の隣り合わせの配置）（これは、構成要素を含むユニットソファ構成を組み立てる時に望まれる場合がある）を妨げる。 FIG. 4A is an example of a perspective view showing three separate sofa components (ie, a single seat with left arm, a love seat without arm, and a chaise longue with right arm). The perspective views in the example of FIG. 4A each have a focal length of 25 mm. As can be seen in the figure, the resulting perspective distortion causes the stacking of the components next to each other (i.e., the side-by-side placement of the components), which may be desired when assembling a unit sofa construction containing the components. there is).

図４Ｂは、図４Ａと同じ３つの構成要素の正投影ビューの一例を示す。図に示すように、オブジェクトの正投影ビューは、モジュール式またはブロック状であり、隣り合わせにスタックまたは配置されるのに適している。しかしながら、奥行情報が、正投影ビューでは実質的に失われる。図に見られるように、図４Ａでは、特にシェーズロングに関して、奥行の差が見られるが、正投影ビューでは、３つとも同じ奥行を有するように見える。 FIG. 4B shows an example of an orthographic view of the same three components as in FIG. 4A. As shown, an orthographic view of an object is modular or blocky and suitable for being stacked or placed next to each other. However, depth information is substantially lost in orthographic views. As can be seen, in FIG. 4A there is a difference in depth, especially for the chaise longue, but in the orthographic view all three appear to have the same depth.

図４Ｃは、合成オブジェクトを規定するために、図４Ｂの３つの構成要素の正投影ビューを組み合わせた一例を示す。すなわち、図４Ｃは、図４Ｂの３つの構成要素の正投影ビューを隣り合わせに配置することによるユニットソファの正投影ビューの生成を示している。図４Ｃに示すように、３つのソファ構成要素の正投影ビューの境界ボックスが互いに隣接してぴったりと合うことで、ユニットソファの正投影ビューが作成される。すなわち、構成要素の正投影ビューは、シーン内での構成要素のユーザフレンドリーな操作と、正確な配置とを容易にする。 FIG. 4C shows an example of combining orthographic views of the three components of FIG. 4B to define a composite object. That is, FIG. 4C shows the generation of an orthographic view of a unit sofa by placing the orthographic views of the three components of FIG. 4B next to each other. As shown in FIG. 4C, the orthographic view bounding boxes of the three sofa components fit adjacent to each other to create the unit sofa orthographic view. That is, an orthographic view of a component facilitates user-friendly manipulation and precise placement of the component within the scene.

図４Ｄおよび図４Ｅは各々、図１～図３の記載に関して上述した任意ビュー生成技術を用いて、図４Ｃの合成オブジェクトの正投影ビューを任意カメラパースペクティブに変換した一例を示す。すなわち、図４Ｄおよび図４Ｅの各例において、合成オブジェクトの正投影ビューが、奥行を正確に描写する通常のカメラパースペクティブに変換されている。図に示すように、正投影ビューでは失われていた一人掛けおよび二人掛けに対するシェーズロングの相対的な奥行が、図４Ｄおよび図４Ｅの斜視図で見えるようになっている。 4D and 4E each show an example of transforming the orthographic view of the composite object of FIG. 4C to an arbitrary camera perspective using the arbitrary view generation techniques described above with respect to the description of FIGS. 1-3. That is, in each of the examples of Figures 4D and 4E, the orthographic view of the composite object has been transformed into a normal camera perspective that accurately depicts depth. As shown, the relative depth of the chaiselongue to the singles and loveseats, which was lost in the orthographic view, is now visible in the perspective views of Figures 4D and 4E.

図４Ｆ、図４Ｇ、および、図４Ｈは、それぞれ、左アーム付き一人掛け、アームなし二人掛け、および、右アーム付きシェーズロングの複数の正投影ビューの例を示す。上述のように、アセットの任意の数の異なるビューまたはパースペクティブが、図１のシステム１００のデータベース１０６に格納されていてよい。図４Ｆ～図４Ｈのセットは、別個にキャプチャまたはレンダリングされてデータベース１０６に格納された各アセットの周りの異なる角度に対応する２５の正射影ビューを含んでおり、それらの正射影ビューから、オブジェクトの任意の組み合わせの任意の任意ビューが生成されうる。家具組み立てアプリケーションにおいて、例えば、上面ビューは、床面配置に有用でありうるが、前面ビューは、壁面配置に有用でありうる。いくつかの実施形態において、よりコンパクトな参照データセットを維持するために、所定の数の正投影ビューのみが、データベース１０６内にアセットに対して格納され、そこからアセットの任意の任意ビューが生成されてよい。 Figures 4F, 4G, and 4H show examples of multiple orthographic views of a one-seater with left arm, a two-seater without arm, and a chaise longue with right arm, respectively. As noted above, any number of different views or perspectives of an asset may be stored in database 106 of system 100 of FIG. The set of FIGS. 4F-4H includes 25 orthographic views corresponding to different angles around each asset separately captured or rendered and stored in database 106, from which the object Any arbitrary view of any combination of . In furniture construction applications, for example, the top view may be useful for floor placement, while the front view may be useful for wall placement. In some embodiments, to maintain a more compact reference data set, only a predetermined number of orthographic views are stored for an asset in database 106, from which any arbitrary view of the asset is generated. may be

図４Ｉ～図４Ｎは、オブジェクトの任意の組みあわせの任意ビューまたはパースペクティブを生成する様々な例を示す。具体的には、図４Ｉ～図４Ｎの各々は、複数の別個のソファオブジェクトまたは構成要素を含むユニットソファの任意パースペクティブまたは任意ビューの生成を示している。各任意ビューは、例えば、図１～図３の記載に関して上述した任意ビュー生成技術を用いて、アンサンブルビューまたは合成オブジェクトを構成するオブジェクトの１または複数の正投影（またはその他の）ビューを任意ビューに変換し、任意ビューを埋めるためにピクセルを取り入れ、場合によっては任意の残りの不足ピクセルを補間することによって、生成されてよい。 Figures 4I-4N show various examples of generating arbitrary views or perspectives of arbitrary combinations of objects. Specifically, each of Figures 4I-4N illustrates the generation of an arbitrary perspective or view of a unit sofa containing multiple separate sofa objects or components. Each arbitrary view represents one or more orthographic (or other) views of the objects that make up the ensemble view or composite object, using, for example, the arbitrary view generation techniques described above with respect to the description of FIGS. , taking pixels to fill any view and possibly interpolating any remaining missing pixels.

上述のように、データベース１０６内のアセットの各画像またはビューは、対応するメタデータ（相対的なオブジェクトおよびカメラの位置および向きの情報ならびに照明情報など）と共に格納されていてよい。メタデータは、アセットの三次元ポリゴンメッシュモデルからビューをレンダリングする時、アセットを撮像またはスキャンする時（この場合、奥行および／または面法線のデータが推定されてよい）、または、それら両方を組み合わせた時に、生成されてよい。 As described above, each image or view of an asset in database 106 may be stored with corresponding metadata (such as relative object and camera position and orientation information and lighting information). Metadata may be used when rendering a view from a 3D polygonal mesh model of an asset, imaging or scanning the asset (in which case depth and/or surface normal data may be estimated), or both. may be produced when combined.

アセットの所定のビューまたは画像が、画像を含む各ピクセルのピクセル強度値（例えば、ＲＧＢ値）と、各ピクセルに関連する様々なメタデータパラメータとを含む。いくつかの実施形態において、ピクセルの赤、緑、および、青（ＲＧＢ）のチャネルまたは値の内の１または複数が、ピクセルメタデータを符号化するために用いられてよい。ピクセルメタデータは、例えば、そのピクセルに投影される三次元空間内の点の相対的な場所または位置（例えば、ｘ、ｙ、および、ｚ座標値）に関する情報を含んでよい。さらに、ピクセルメタデータは、その位置における面法線ベクトルに関する情報（例えば、ｘ、ｙ、および、ｚ軸となす角度）を含んでもよい。また、ピクセルメタデータは、テクスチャマッピング座標（例えば、ｕおよびｖ座標値）を含んでもよい。かかる場合、点における実際のピクセル値は、テクスチャ画像における対応する座標のＲＧＢ値を読み取ることによって決定される。 A given view or image of an asset includes pixel intensity values (eg, RGB values) for each pixel in the image and various metadata parameters associated with each pixel. In some embodiments, one or more of a pixel's red, green, and blue (RGB) channels or values may be used to encode pixel metadata. Pixel metadata may include, for example, information about the relative location or position (eg, x, y, and z coordinate values) of a point in three-dimensional space that is projected onto that pixel. Additionally, the pixel metadata may include information about the surface normal vector at that location (eg, angles with the x, y, and z axes). Pixel metadata may also include texture mapping coordinates (eg, u and v coordinate values). In such cases, the actual pixel value at the point is determined by reading the RGB values of the corresponding coordinates in the texture image.

面法線ベクトルは、生成された任意ビューまたはシーンの照明の修正または変更を容易にする。より具体的には、シーンの照明変更は、ピクセルの面法線ベクトルが、新たに追加、削除、または、その他の方法で変更された光源の方向にどれだけうまく一致するか（例えば、光源方向とピクセルの法線ベクトルとのドット積によって、少なくとも部分的に定量化されうる）に基づいて、ピクセル値をスケーリングすることを含む。テクスチャマッピング座標を用いてピクセル値を規定すると、生成された任意ビューまたはシーンもしくはその一部のテクスチャの修正または変更が容易になる。より具体的には、テクスチャは、参照されたテクスチャ画像を、同じ寸法を有する別のテクスチャ画像と単に交換または置換することによって変更されることができる。 Surface normal vectors facilitate modifying or changing the lighting of any generated view or scene. More specifically, scene lighting changes determine how well a pixel's surface normal vector matches the direction of newly added, removed, or otherwise changed light sources (e.g., light source direction and the pixel's normal vector, which can be quantified at least in part by the dot product of the pixel's normal vector. Using texture mapping coordinates to define pixel values facilitates modifying or altering the texture of any generated view or scene or portion thereof. More specifically, the texture can be modified by simply exchanging or replacing the referenced texture image with another texture image having the same dimensions.

開示されている任意ビュー生成技術は、効果的に、比較的低い計算コストのパースペクティブ変換および／またはルックアップ動作に基づいている。任意（アンサンブル）ビューは、正しいピクセルを単に選択し、生成される任意ビューをそれらのピクセルで適切に埋めることによって生成されてよい。いくつかの場合、ピクセル値は、例えば、照明が調整されている場合に、任意選択的にスケーリングされてよい。開示されている技術の低いストレージオーバヘッドおよび処理オーバヘッドは、生成の元となる高精細度の参照ビューと同等の品質で、複雑なシーンの任意ビューを高速、リアルタイム、または、オンデマンドで生成することを容易にする。 The disclosed arbitrary view generation techniques are effectively based on relatively low computational cost perspective transformation and/or lookup operations. Arbitrary (ensemble) views may be generated by simply selecting the correct pixels and appropriately filling the generated arbitrary view with those pixels. In some cases, pixel values may optionally be scaled, for example, when lighting is being adjusted. The low storage and processing overhead of the disclosed technique enables fast, real-time, or on-demand generation of arbitrary views of complex scenes with quality comparable to the high-definition reference views from which they are generated. make it easier.

上述のように、いくつかの実施形態においてアンサンブルまたは合成オブジェクトまたはシーンを組み立てることは、正投影ビューを用いて、アンサンブルを構成する複数のオブジェクトアセットを指定することを含む。正投影ビューは、アンサンブルシーンにおける複数のオブジェクトまたはアセットの正確な配置および整列を容易にする。次いで、アンサンブルシーンの正投影ビューが、例えば、任意の望ましいまたは要求されたパースペクティブを生成するために、任意の任意カメラパースペクティブに変換されてよい。アンサンブルビューを所定のカメラパースペクティブへ変換することは、上述の技術を用いて、アンサンブルシーンを構成する複数のオブジェクトまたはアセットの各々を所定のパースペクティブへ個別に変換することを含んでよい。任意アンサンブルビューを生成するための上述の技術は、比較的効率的であるが、さらにいっそう効率的であることが、エンドユーザにはほとんど検出できない待ち時間ペナルティで、ほとんど即時または少なくとも非常に高速に、出力を生成することが有利である特定のアプリケーション（例えば、インタラクティブでリアルタイムな体験をユーザに提供するアプリケーションなど）で望ましい場合がある。 As noted above, assembling an ensemble or composite object or scene in some embodiments involves using an orthographic view to specify a plurality of object assets that make up the ensemble. Orthographic views facilitate accurate placement and alignment of multiple objects or assets in an ensemble scene. The orthographic view of the ensemble scene may then be transformed to any arbitrary camera perspective, eg, to generate any desired or required perspective. Transforming the ensemble view to the predetermined camera perspective may include individually transforming each of the multiple objects or assets that make up the ensemble scene to the predetermined perspective using the techniques described above. While the above-described techniques for generating arbitrary ensemble views are relatively efficient, even more efficient is the ability to generate almost instantaneous or at least very fast, with almost imperceptible latency penalties for end-users. , may be desirable in certain applications where it is advantageous to generate output (eg, applications that provide interactive, real-time experiences to users).

いくつかの実施形態において、効率のさらなる改善が、アンサンブルシーンを構成する複数のオブジェクトまたはアセットの大部分（例えば、その正投影ビューまたはその他のビュー）を所定の任意パースペクティブに変換することに関連する処理を排除することによって、少なくとも部分的に促進されてもよい。その代わり、アンサンブルシーン内のオブジェクトまたはアセットの所定の位置および向きについて所定の任意パースペクティブに最も近くまたは最も類似したオブジェクトまたはアセットの利用可能な既存ビューが、所定の任意パースペクティブを表す出力アンサンブルビューまたは画像を生成する時に、オブジェクトまたはアセットに対して用いられる。ほとんどの場合、結果として得られる出力アンサンブルビューは、完全にパースペクティブが正確なわけではないが、多くのアプリケーションにとって許容可能であり完全にパースペクティブの正確な出力を生成するよりも大幅に短い待ち時間で生成される適切な近似を提供する。次に、アンサンブルを構成する１または複数のオブジェクトまたはアセットのすでに存在する参照ビューの最大限に量子化されたサブセットを用いた、任意カメラ姿勢に対する任意アンサンブルビューのかかる近似の生成について、さらに詳しく記載する。 In some embodiments, further improvements in efficiency relate to transforming a majority of the objects or assets that make up the ensemble scene (e.g., an orthographic or other view thereof) to a predetermined arbitrary perspective. Elimination of processing may at least partially facilitate. Instead, the output ensemble view or image representing the given arbitrary perspective is the available existing view of the object or asset that is closest or most similar to the given arbitrary perspective for a given position and orientation of the object or asset in the ensemble scene. Used for an object or asset when creating a . In most cases, the resulting output ensemble view is not perfectly perspective-accurate, but is acceptable for many applications, with significantly lower latency than producing fully perspective-accurate output. Provide a good approximation to be generated. We now describe in more detail the generation of such an approximation of an arbitrary ensemble view for an arbitrary camera pose using a maximally quantized subset of pre-existing reference views of one or more of the objects or assets that make up the ensemble. do.

図５は、任意アンサンブルビューを生成するための処理の一実施形態を示すハイレベルフローチャートである。いくつかの実施形態において、処理５００は、アンサンブルシーンを構成する（大部分または全部ではないとしても）少なくとも１または複数のオブジェクトまたはアセットの単一の最良一致既存ビューを適切に組みあわせまたは合成することに少なくとも部分的に基づいて、アンサンブルシーンの出力画像を効率的に生成するために用いられる。 FIG. 5 is a high-level flowchart illustrating one embodiment of a process for generating arbitrary ensemble views. In some embodiments, process 500 suitably combines or synthesizes a single best match existing view of at least one or more objects or assets (if not most or all) that make up the ensemble scene. Based at least in part on the ensemble scene output image, it is used to efficiently generate an output image of the ensemble scene.

処理５００は、アンサンブルシーンの所定のパースペクティブの要求が受信される工程５０２において始まる。アンサンブルシーンの要求された所定のパースペクティブは、アンサンブルシーンに関して選択されまたは他の方法で指定されたカメラパースペクティブを含んでおり、一般に、任意の任意ビューを含んでよい。所与の文脈での任意ビューは、仕様またはカメラ姿勢が要求の前に予めわかっていないシーンの任意の所望のビューまたはパースペクティブを含む。アンサンブルシーンは、複数の独立したオブジェクトまたはアセットの複合ビューを含む。一般に、独立したオブジェクトまたはアセットの仕様は、異なるカメラパースペクティブおよび対応するメタデータを有する個々のオブジェクトまたはアセットの既存参照画像またはビューのセットを含んでおり、その内の１または複数が、オブジェクトまたはアセットに関連するアンサンブルシーンの一部を生成または指定するために用いられてよい。いくつかの実施形態において、工程５０２の要求は、アンサンブルシーン空間におけるカメラアングルまたはカメラ姿勢の操作、および／または、合成シーンまたはアンサンブルシーンを作成するための複数のオブジェクトまたはアセットの配置、を容易にするインタラクティブなモバイルまたはウェブベースのアプリケーションから受信される。例えば、要求は、視覚化アプリケーションまたはモデリングアプリケーションもしくは拡張現実（ＡＲ）アプリケーションから受信されてよい。いくつかの実施形態において、正投影ビューは、アンサンブルシーンを構成する複数のオブジェクトまたはアセットのより容易な操作、配置、および、整列を容易にするので、工程５０２の要求は、アンサンブルシーンの正投影ビューに関して受信される。 Process 500 begins at step 502 where a request for a given perspective of an ensemble scene is received. The requested predetermined perspective of the ensemble scene includes the camera perspective selected or otherwise specified for the ensemble scene, and may generally include any arbitrary view. Arbitrary views in a given context include any desired view or perspective of a scene for which the specifications or camera poses are not known in advance prior to the request. An ensemble scene contains a composite view of multiple independent objects or assets. In general, an independent object or asset specification includes a set of existing reference images or views of individual objects or assets with different camera perspectives and corresponding metadata, one or more of which are may be used to generate or specify the portion of the ensemble scene associated with the . In some embodiments, the request of step 502 facilitates manipulation of camera angles or poses in ensemble scene space and/or placement of multiple objects or assets to create a composite or ensemble scene. is received from an interactive mobile or web-based application that For example, the request may be received from a visualization application or a modeling application or an augmented reality (AR) application. In some embodiments, the orthographic view facilitates easier manipulation, placement, and alignment of multiple objects or assets that make up the ensemble scene, so the request of step 502 may be an orthographic view of the ensemble scene. Received for a view.

工程５０４では、最も近くまたは最も類似した一致既存参照画像またはビューが、アンサンブルシーンを構成する１または複数のオブジェクトまたはアセットの少なくとも一部の各々に対して選択される。工程５０４では、アンサンブルシーンを構成する個々または独立したオブジェクトまたはアセットに対して、順次および／または並列に実行されてよい。いくつかの実施形態において、アンサンブルシーン空間におけるオブジェクトまたはアセットの所与の姿勢のために要求された所定のパースペクティブに最良一致する１つのみすなわち単一の既存参照画像またはビューが、オブジェクトまたはアセットに対して選択される。アンサンブルシーン空間は、適切な方法で規定された所定の原点（アンサンブルシーンの中心（例えば、重心）など）を有するアンサンブルシーン座標系を備える。工程５０４では、アンサンブルシーンを構成するオブジェクトまたはアセットに対して最も近い一致既存参照画像またはビューを選択するために、アンサンブルシーン座標系に関するオブジェクトまたはアセットの位置ならびに向きまたは姿勢が決定され、その後、オブジェクトまたはアセットの既存参照画像またはビューに関連付けられた個々の座標系における等価の姿勢に転換または変換またはその他の方法で相関される。したがって、最も近い一致既存参照画像またはビューが工程５０４において選択されうるように、比較的低い計算の複雑性を有する単純なカメラメトリクス計算が、アンサンブルシーンにおいて要求されたパースペクティブならびに相対的なオブジェクトまたはアセットの姿勢に基づいて実行される。 At step 504, the closest or most similar matching existing reference image or view is selected for each of at least some of the one or more objects or assets that make up the ensemble scene. Step 504 may be performed sequentially and/or in parallel on individual or independent objects or assets that make up the ensemble scene. In some embodiments, for a given pose of an object or asset in the ensemble scene space, only one or single existing reference image or view that best matches the desired predetermined perspective is attached to the object or asset. selected against. The ensemble scene space comprises an ensemble scene coordinate system with a given origin (such as the ensemble scene center (eg, centroid)) defined in a suitable manner. In step 504, the positions and orientations or poses of the objects or assets with respect to the ensemble scene coordinate system are determined to select the closest matching existing reference image or view for the objects or assets that make up the ensemble scene. or transformed or otherwise correlated to equivalent poses in individual coordinate systems associated with existing reference images or views of the asset. Therefore, simple camera metric calculations with relatively low computational complexity can be used to determine the perspective and relative objects or assets required in the ensemble scene so that the closest matching existing reference image or view can be selected in step 504. is performed based on the posture of

１または複数の基準および／または閾値が、オブジェクトまたはアセットに対する最も近い一致既存参照画像またはビューを決定または特定するために規定されてよい。いくつかの場合、既存参照画像またはビューは、１または複数のかかる閾値が満たされた場合にのみ、工程５０４において選択される。理想的な場合、完全な一致が、工程５０４において見つけられて選択される。しかしながら、いくつかの場合において、利用可能な既存参照画像データセットが不完全すぎる場合（オブジェクトまたはアセットの利用可能な既存参照画像またはビューが、要求されたパースペクティブとかなり異なっている場合、など）、もしくは、オブジェクトまたはアセットに対して利用可能な参照画像またはビューが存在しない場合には、１または複数の選択基準および／または閾値が満たされえない。いくつかのかかる場合に、オブジェクトまたはアセットの最も近い一致プレースホルダまたはゴースト画像またはビューが、工程５０４において代わりに選択される。かかるプレースホルダ画像またはビューは、オブジェクトまたはアセットの形状を表すが、その他の属性（テクスチャおよび光学特性など）を欠いている。いくつかの実施形態において、オブジェクトまたはアセットの周りの十分な密度の可能なビューのセットに及ぶ（例えば、オブジェクトまたはアセットの周りの３６０°を網羅する角度を含む）１セットのプレースホルダ画像が、比較的計算の複雑性が低いレンダリング技術を用いて、各固有のオブジェクト形状に対して生成および格納される。プレースホルダは、オブジェクトまたはアセットの完全にレンダリングされたバージョンが、利用不可能であり、または、要求されたパースペクティブから許容できないズレを示している時に利用される。 One or more criteria and/or thresholds may be defined for determining or identifying the closest matching existing reference image or view for an object or asset. In some cases, an existing reference image or view is selected at step 504 only if one or more such thresholds are met. In the ideal case, a perfect match is found and selected in step 504 . However, in some cases, if the available existing reference image dataset is too incomplete (such as when the available existing reference image or view of the object or asset is significantly different from the requested perspective), Alternatively, one or more selection criteria and/or thresholds may not be met if there is no reference image or view available for the object or asset. In some such cases, the closest matching placeholder or ghost image or view of the object or asset is selected in step 504 instead. Such placeholder images or views represent the shape of the object or asset, but lack other attributes (such as texture and optical properties). In some embodiments, a set of placeholder images spanning a sufficiently dense set of possible views around the object or asset (e.g., including angles covering 360° around the object or asset) It is generated and stored for each unique object shape using rendering techniques of relatively low computational complexity. Placeholders are used when a fully rendered version of an object or asset is not available or exhibits an unacceptable deviation from the requested perspective.

工程５０６では、アンサンブルシーンの出力画像が、少なくとも部分的には、工程５０４において選択されたアンサンブルシーンを構成するオブジェクトまたはアセットの最も近い一致既存参照画像またはビューを適切に組みあわせまたは合成することによって、要求された所定のパースペクティブに対して生成される。工程５０６では、オブジェクトまたはアセットに対して選択された最も近い一致既存参照画像またはビューを適切にスケーリングまたはリサイズする工程、および／または、オブジェクトまたはアセットに対して選択された最も近い一致既存参照画像またはビューをペーストまたは合成するアンサンブルビュー内の場所または位置を決定する工程、を含んでよい。ほとんどの場合、アンサンブルシーンの生成済みの出力画像は、要求された所定のパースペクティブを厳密に近似する。アンサンブルシーンを構成するほとんどのオブジェクトまたはアセットは、それらに最も近くまたは最も類似した利用可能な既存の姿勢を持つ出力画像に表現されるので、これらのオブジェクトまたはアセットは、厳密にレンダリングまたは生成されないため、パースペクティブが完全に正確ではない。すなわち、ほとんどの場合、これらのオブジェクトまたはアセットは、利用可能な既存画像またはビューの中で完全な一致が見出されない限りは、出力画像において、要求された所定のパースペクティブを持たない。かかるオブジェクトまたはアセットの消失点のすべてが出力画像内の同じ点に向かうわけではないが、オブジェクトまたはアセットは、ほとんどの場合、出力画像をほとんどの部分で正確なパースペクティブとして認識するように人間の視覚系を錯覚させるのに十分に小さい量（例えば、数度）だけ、オフセットまたは傾斜される。 At step 506, an output image of the ensemble scene is generated, at least in part, by appropriately combining or synthesizing the closest matching existing reference images or views of the objects or assets that make up the ensemble scene selected at step 504. , is generated for the given perspective requested. At step 506, appropriately scaling or resizing the selected closest matching existing reference image or view for the object or asset and/or the selected closest matching existing reference image or view for the object or asset. determining a place or position within the ensemble view to paste or composite the view. In most cases, the generated output image of the ensemble scene closely approximates the desired predetermined perspective. Since most objects or assets that make up an ensemble scene are represented in the output image with the closest or most similar existing pose available to them, these objects or assets are not strictly rendered or generated. , the perspective is not entirely accurate. That is, in most cases these objects or assets will not have the required predetermined perspective in the output image unless an exact match is found among the existing images or views available. Although not all vanishing points of such objects or assets will point to the same point in the output image, objects or assets will most likely be viewed by the human eye so as to perceive the output image as the correct perspective for the most part. It is offset or tilted by an amount small enough (eg, a few degrees) to create an illusion of the system.

アンサンブルシーンの出力画像における一貫性は、さらに、全体的に一貫した方法または同様の方法でアンサンブルシーンの少なくともいくつかの部分を生成することによって促進され、これは、さらに、人間が出力画像を実質的に視覚的に正確なものとして解釈することを容易にする。例えば、アンサンブルシーンを構成する１または複数のオブジェクトまたはアセット、ならびに／もしくは、アンサンブルシーンを構成する（平坦なまたはその他の）表面、構造要素、全体的特徴などは、パースペクティブが正確であるように、すなわち、要求されたパースペクティブの近似ではなく、要求された所定のパースペクティブを有するように、厳密にレンダリングまたは生成されうる。例えば、アンサンブルシーンが、部屋などの空間を含む場合、壁、天井、床、ラグ、壁掛け、などは、要求されたパースペクティブのカメラ姿勢を用いて生成されうるため、工程５０６において生成されるアンサンブルシーンの出力画像内に正確に表現されうる。さらに、アンサンブルシーンの出力画像は、例えば、利用可能なメタデータ（面法線ベクトルなど）を用いて照明変更する時に、同様の一貫した方法でシーンのすべての部分に影響を与えるグローバルな照明位置を備えてよい。したがって、グローバルな方法またはパースペクティブを修正する方法でアンサンブルビューのいくつかの部分を生成し、アンサンブルビューを構成するほとんどの独立したオブジェクトを最良近似として表現することにより、多くの場合で完全にパースペクティブの正確なバージョンからほとんど見分けられない出力が生成される。いくつかの場合、何らかの傾斜が見られうるが、デザイナーまたはユーザが、パースペクティブの正確さに関係なく、オブジェクトまたはアセットのアンサンブルを見ることで恩恵を受けるムードボードアプリケーションまたは空間／部屋計画アプリケーションなど、完全に正確なビューを必要としない特定のアプリケーションでは、それでも許容可能でありうる。とは言え、利用可能な既存の画像またはビューのリポジトリまたはデータベースが、時間と共に増大するにつれて、開示されている技術は、要求された所定のパースペクティブをますます正確に表現する出力を生成し続ける。最適な場合では、すべてのオブジェクトまたはアセットに対して完全な一致が見つかり、近似ではなく、要求された所定のパースペクティブを実際に有する出力画像を生成するために用いられる。 Consistency in the output image of an ensemble scene is further facilitated by generating at least some portions of the ensemble scene in a globally consistent or similar manner, which further facilitates the ability of humans to reproduce the output image substantially. make it easier to interpret as visually accurate. For example, one or more objects or assets that make up the ensemble scene, and/or surfaces (planar or otherwise), structural elements, global features, etc. That is, it may be rendered or generated exactly to have the desired predetermined perspective rather than an approximation of the requested perspective. For example, if the ensemble scene includes spaces such as rooms, walls, ceilings, floors, rugs, wall hangings, etc. can be generated with the required perspective camera pose, so the ensemble scene generated in step 506 can be accurately represented in the output image of In addition, the output image of the ensemble scene is a global lighting position that affects all parts of the scene in a similar and consistent manner when changing lighting, e.g. with available metadata (surface normal vectors, etc.) may be provided. Therefore, by generating some part of the ensemble view in a global way or in a way that modifies the perspective, and representing the most independent objects that make up the ensemble view as a best approximation, often completely perspective Produces output that is almost indistinguishable from the exact version. In some cases some slant may be seen, but not perfect, such as mood board applications or space/room planning applications where the designer or user would benefit from seeing an ensemble of objects or assets regardless of the accuracy of perspective. It may still be acceptable for certain applications that do not require an accurate view of . That said, as the repository or database of existing images or views available grows over time, the disclosed techniques will continue to produce output that more and more accurately represents the desired given perspective. In the best case, an exact match is found for all objects or assets and used to produce an output image that actually has the desired given perspective rather than an approximation.

上述の実施形態は、理解しやすいようにいくぶん詳しく説明されているが、本発明は、提供された詳細事項に限定されるものではない。本発明を実施する多くの代替方法が存在する。開示されている実施形態は、例示であり、限定するものではない。 Although the above embodiments are described in some detail for ease of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative, not limiting.

Claims

a method,
receiving a request for a given perspective of an ensemble scene containing multiple assets;
generating an output image of the ensemble scene that approximates the requested predetermined perspective based at least in part on combining a single existing image of each of at least a portion of the plurality of assets;
A method.

2. The method of claim 1, wherein the request is received for an orthographic view of the ensemble scene.

3. The method of claim 2, wherein the orthographic view of the ensemble scene comprises a combined orthographic view of the plurality of assets.

2. The method of claim 1, further comprising selecting the single existing image of each of the at least a portion of the plurality of assets.

5. The method of claim 4, wherein said selecting comprises selecting an exact match with said requested predetermined perspective.

5. The method of claim 4, wherein the selecting comprises selecting an available match that is closest or most similar to the requested predetermined perspective.

5. The method of claim 4, wherein the selecting comprises selecting based on poses of related assets within the ensemble scene.

5. The method of claim 4, wherein the selecting comprises selecting a rotated existing image of a related asset.

5. The method of claim 4, wherein the selecting includes the closest or most similar available match to the requested predetermined perspective based on poses of associated assets within the ensemble scene. A method comprising selecting.

2. The method of claim 1, wherein generating the output image of the ensemble scene comprises scaling the single existing image of one or more assets within the portion of assets. ,Method.

2. The method of claim 1, wherein generating the output image of the ensemble scene comprises resizing the single existing image of one or more assets within the portion of assets. ,Method.

2. The method of claim 1, wherein generating the output image of the ensemble scene includes determining where to include the single existing image of each of the at least some of the assets in the ensemble scene. method, including

2. The method of claim 1, wherein said combining comprises synthesizing.

2. The method of claim 1, wherein generating the output image of the ensemble scene comprises generating a view of at least one of the plurality of assets having the requested predetermined perspective. A method, including

15. The method of claim 14, wherein the views are generated using multiple existing images of the at least one asset.

2. The method of claim 1, wherein generating the output image of the ensemble scene comprises generating at least one portion of the ensemble scene to have the requested predetermined perspective. Method.

17. The method of Claim 16, wherein said at least one portion comprises a surface of said ensemble scene.

17. The method of claim 16, wherein said at least one portion comprises structural elements of said ensemble scene.

17. The method of claim 16, wherein said at least one portion includes global features of said ensemble scene.

2. The method of claim 1, further comprising globally lighting the generated output image of the ensemble scene.

2. The method of claim 1, wherein the output images comprise frames of a video sequence.

a system,
a processor,
receiving a request for a given perspective of an ensemble scene containing multiple assets;
configured to generate an output image of the ensemble scene approximating the requested predetermined perspective based at least in part on combining a single existing image of each of at least a portion of the plurality of assets. there is a processor and
a memory coupled to the processor and configured to provide instructions to the processor;
A system comprising:

A computer program product embodied in a persistent computer-readable storage medium,
computer instructions for receiving a request for a predetermined perspective of an ensemble scene including multiple assets;
Computer instructions for generating an output image of the ensemble scene that approximates the requested predetermined perspective based at least in part on combining a single existing image of each of at least a portion of the plurality of assets. When,
A computer program product comprising: