JP2024062935A

JP2024062935A - Method of creating solid vision display content and device of them

Info

Publication number: JP2024062935A
Application number: JP2023134464A
Authority: JP
Inventors: シエシン; Xing Xie; シュウナン; Nan Xu; チェンシュウ; Xu Chen
Original assignee: Orbbec 3d Technology International Inc
Current assignee: Orbbec 3d Technology International Inc
Priority date: 2022-10-25
Filing date: 2023-08-22
Publication date: 2024-05-10
Also published as: KR20240057994A; US20240137481A1; US20240236288A9

Abstract

To provide a method of creating a high quality solid vision image and a video, and provide a device of them.SOLUTION: A method of generating a solid vision display content, contains steps of: acquiring a first RGB image and a depth image from a RGB plus distance (RGB-D) image by using a processor; determining a first parallax map in accordance with the RGB-D image on the basis of a depth value in the depth image; determining a second parallax map and a third parallax map by converting the first parallax map by using a parallax distribution ratio; and generating a pair of solid vision images containing a second RGB image and a third RGB image by the processor. The second RGB image is generated by shifting a first pixel set in the first RGB image on the basis of the second parallax map, and the third RGB image is generated by shifting the set of a second pixel in the first RGB image on the basis of the third parallax map.SELECTED DRAWING: Figure 3

Description

本開示は立体視に関し、特に立体視表示コンテンツの生成に関する。 This disclosure relates to stereoscopic vision, and in particular to generating stereoscopic display content.

次世代の人間とコンピュータの対話方法としての仮想現実（ＶＲ）、拡張現実（ＡＲ）、および複合現実（ＭＲ）は、非常に没入型で直観的である。最高の没入型のＶＲ、ＡＲ、ＭＲ視聴体験を提供するには、高品質の立体視画像とビデオを生成することが必要である。 Virtual reality (VR), augmented reality (AR), and mixed reality (MR), as the next generation of human-computer interaction methods, are highly immersive and intuitive. Producing high-quality stereoscopic images and videos is necessary to provide the best immersive VR, AR, and MR viewing experience.

現在、３次元深度の知覚は、２台以上のカメラを使用して各目にわずかに異なる２つの画像を生成することによって実現することができる。しかしながら、これは複雑でコンピューティング集約的なプロセスになる可能性がある。さらに、正確な深度情報がなければ、生成されたＶＲ、ＡＲ、およびＭＲ環境は人々に良好な視聴体験を提供できない。 Currently, the perception of three-dimensional depth can be achieved by using two or more cameras to generate two slightly different images for each eye. However, this can be a complex and computationally intensive process. Furthermore, without accurate depth information, the generated VR, AR, and MR environments cannot provide people with a good viewing experience.

本明細書では、立体視表示コンテンツを生成するための方法、装置、およびシステムの実装形態を開示する。 This specification discloses implementations of methods, devices, and systems for generating stereoscopic display content.

一態様においては、立体視表示コンテンツを生成する方法が開示されている。この方法は、プロセッサを使用して、赤緑青プラス距離（ＲＧＢ－Ｄ）画像から、第１の赤緑青（ＲＧＢ）画像および深度画像を取得することと、深度画像内の深度値に基づいて、ＲＧＢ－Ｄ画像に従って第１の視差マップを決定することであって、第１の視差マップは、一対の立体視画像に変換される第１のＲＧＢ画像に対する複数の視差値を含むことと、視差分配比を用いて第１の視差マップを変換することによって第２の視差マップおよび第３の視差マップを決定することと、プロセッサによって、第２のＲＧＢ画像および第３のＲＧＢ画像を含む一対の立体視画像を生成することであって、第２のＲＧＢ画像は、第２の視差マップに基づいて第１のＲＧＢ画像内の第１のピクセルのセットをシフトすることによって生成され、第３のＲＧＢ画像は、第３の視差マップに基づいて第１のＲＧＢ画像内の第２のピクセルのセットをシフトすることによって生成されることと、を含む。 In one aspect, a method of generating stereoscopic display content is disclosed. The method includes using a processor to obtain a first red-green-blue (RGB) image and a depth image from a red-green-blue-plus-distance (RGB-D) image; determining a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images; determining a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio; and generating, by the processor, a pair of stereoscopic images including the second RGB image and the third RGB image, the second RGB image being generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image being generated by shifting a second set of pixels in the first RGB image based on the third disparity map.

他の態様においては、立体視表示コンテンツを生成する装置が開示されている。この装置は、非一時的メモリと、プロセッサと、を備え、非一時的メモリは、赤緑青プラス距離（ＲＧＢ－Ｄ）画像から、第１の赤緑青（ＲＧＢ）画像および深度画像を取得することと、深度画像内の深度値に基づいて、ＲＧＢ－Ｄ画像に従って第１の視差マップを決定することであって、第１の視差マップは、一対の立視体画像に変換される第１のＲＧＢ画像に対する複数の視差値を含むことと、視差分配比を用いて第１の視差マップを変換することによって第２の視差マップおよび第３の視差マップ決定することと、第２のＲＧＢ画像と第３のＲＧＢ画像とを含む一対の立体視画像を生成することであって、第２のＲＧＢ画像は、第２の視差マップに基づいて第１のＲＧＢ画像内の第１のピクセルのセットをシフトすることによって生成され、第３のＲＧＢ画像は、第３の視差マップに基づいて第１のＲＧＢ画像内の第２のピクセルのセットをシフトすることによって生成されることと、をプロセッサによって実行可能な命令を含む。 In another aspect, an apparatus for generating stereoscopic display content is disclosed. The apparatus includes a non-transitory memory and a processor, the non-transitory memory including instructions executable by the processor to obtain a first red-green-blue (RGB) image and a depth image from a red-green-blue-plus-distance (RGB-D) image; determine a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that is transformed into a pair of stereoscopic images; determine a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio; and generate a pair of stereoscopic images including the second RGB image and the third RGB image, the second RGB image being generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image being generated by shifting a second set of pixels in the first RGB image based on the third disparity map.

他の態様においては、立体視表示コンテンツを生成するためのコンピュータプログラムを記憶するように構成された非一時的なコンピュータ可読記憶媒体が開示されている。このコンピュータプログラムは、赤緑青プラス距離（ＲＧＢ－Ｄ）画像から、第１の赤緑青（ＲＧＢ）画像および深度画像を取得することと、深度画像内の深度値に基づいて、ＲＧＢ－Ｄ画像に従って第１の視差マップを決定することであって、第１の視差マップは、一対の立体視画像に変換される第１のＲＧＢ画像に対する複数の視差値を含むことと、視差分配率を用いて第１の視差マップを変換することによって第２の視差マップおよび第３の視差マップ決定することと、プロセッサによって、第２のＲＧＢ画像と第３のＲＧＢ画像とを含む一対の立体視画像を生成することであって、第２のＲＧＢ画像は、第２の視差マップに基づいて第１のＲＧＢ画像内の第１のピクセルのセットをシフトすることによって生成され、第３のＲＧＢ画像は、第３の視差マップに基づいて第１のＲＧＢ画像内の第２のピクセルのセットをシフトすることによって生成されることと、をプロセッサによって実行可能な命令を含む。 In another aspect, a non-transitory computer-readable storage medium configured to store a computer program for generating stereoscopic display content is disclosed. The computer program includes instructions executable by a processor to obtain a first red-green-blue (RGB) image and a depth image from a red-green-blue-plus-distance (RGB-D) image; determine a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images; determine a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio; and generate, by the processor, a pair of stereoscopic images including the second RGB image and the third RGB image, the second RGB image being generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image being generated by shifting a second set of pixels in the first RGB image based on the third disparity map.

本開示は、添付の図面と併せて読めば、以下の詳細な説明から最もよく理解される。一般的な慣例によれば、図面のさまざまな特徴は縮尺どおりではないことを強調しておく。逆に、さまざまな機能の寸法は、明確にするために任意に拡大または縮小されている。 The present disclosure is best understood from the following detailed description when read in conjunction with the accompanying drawings. It is emphasized that, according to common practice, the various features of the drawings are not drawn to scale. Conversely, dimensions of various features have been arbitrarily expanded or reduced for clarity.

コンピューティングおよび通信のための装置の例示的なブロック図である。FIG. 1 is an exemplary block diagram of an apparatus for computing and communications.

両眼立体視の原理を説明するための例示的な図である。FIG. 1 is an exemplary diagram for explaining the principle of binocular stereoscopic vision.

本開示のいくつかの実装形態による、立体視表示コンテンツを生成するための例示的なプロセスのフローチャートである。1 is a flowchart of an example process for generating stereoscopic display content according to some implementations of the present disclosure.

本開示のいくつかの実装形態による、人間の左目と右目の視差値を決定するための例である。1 is an example for determining disparity values for a left and right human eye, according to some implementations of the present disclosure.

図５は、本開示のいくつかの実装形態による、一対の立体視画像を生成するための例示的なフロー図である。FIG. 5 is an example flow diagram for generating a pair of stereoscopic images according to some implementations of the present disclosure.

詳細な説明
仮想現実（ＶＲ）、拡張現実（ＡＲ）、および複合現実（ＭＲ）技術は、例えば、仮想観光および旅行、デジタル仮想エンターテイメント（例えば、ＶＲゲームおよびＶＲ映画など）、仮想トレーニングと教育、ＶＲ露出療法などのいくつかの応用分野で開発されている。一方で、ＶＲヘッドセット、ＶＲヘルメット、ＡＲ／ＭＲアプリやメガネなどのＶＲ／ＡＲ／ＭＲ装置は、人々が参加できる３Ｄ没入型環境をシミュレートするために使用されている。ＶＲ／ＡＲ／ＭＲヘッドセットを装着したユーザーが頭を動かすと、シミュレートされた３Ｄ環境がユーザーの動きに従い、ユーザーの前に表示される。 DETAILED DESCRIPTION Virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies have been developed in several application areas, such as virtual tourism and travel, digital virtual entertainment (e.g., VR games and VR movies), virtual training and education, VR exposure therapy, etc. Meanwhile, VR/AR/MR devices, such as VR headsets, VR helmets, AR/MR apps and glasses, are used to simulate 3D immersive environments in which people can participate. When a user wearing a VR/AR/MR headset moves his/her head, the simulated 3D environment follows the user's movements and is displayed in front of the user.

シミュレータされた３Ｄ没入環境は、両眼視によって実現することができる。人間の左目と右目では、わずかに異なる視点から物体が見える。観察されたさまざまな２次元（２Ｄ）画像は脳によって処理され、３Ｄ深度の知覚が生成される。両眼視に基づいて、ＶＲ／ＡＲ／ＭＲの立体視は、（たとえば、左目に対する１つの画像と右目に対する１つの画像のような）２つの２Ｄ画像をそれぞれ左目と右目の入力として使用することによって生成される。２つの２Ｄ画像は、同じシーンに対して２台のカメラによって異なる視点から取得される。従来、仮想現実（ＶＲ）／拡張現実（ＡＲ）／複合現実（ＭＲ）ヘルメット／メガネに使用される立体視画像ペア（例えば、左目に対する１つの画像と右目に対する１つの画像）は、逆整流プロセスを使用して生成される。２Ｄ画像は距離／深度情報が含まれていないため、このような処理により生成される３ＤのＶＲ／ＡＲ／ＭＲ表示コンテンツは、不正確な距離推定により違和感や３Ｄめまいを引き起こす可能性がある。 A simulated 3D immersive environment can be achieved by binocular vision. The left and right eyes of a human see objects from slightly different perspectives. The various observed two-dimensional (2D) images are processed by the brain to generate the perception of 3D depth. Based on binocular vision, VR/AR/MR stereoscopic vision is generated by using two 2D images (e.g., one image for the left eye and one image for the right eye) as inputs for the left and right eyes, respectively. The two 2D images are acquired from different perspectives by two cameras for the same scene. Traditionally, stereoscopic image pairs (e.g., one image for the left eye and one image for the right eye) used in virtual reality (VR)/augmented reality (AR)/mixed reality (MR) helmets/glasses are generated using an inverse rectification process. Because the 2D images do not contain distance/depth information, the 3D VR/AR/MR display content generated by such processing may cause discomfort and 3D dizziness due to inaccurate distance estimation.

本開示の実装形態によれば、方法は、ＲＧＢ－Ｄセンサから記録された正確な距離／深度情報を有する３次元の赤緑青プラス距離（ＲＧＢ－Ｄ）画像を使用して、ＶＲ／ＡＲ／ＭＲの３Ｄ表示コンテンツを生成するために使用される。ＲＧＢ－Ｄセンサは、例えば、ＲＧＢ－Ｄセンサは、構造化された光ベースのＲＧＢ－Ｄセンサ、アクティブ／パッシブの立体視ベースのＲＧＢ－Ｄセンサ、飛行時間型ＲＧＢ－Ｄセンサ、またはそれらの任意の組み合わせなどを含むことができる。従来の赤緑青（ＲＧＢ）画像はｘ座標とｙ座標の関数であり、２Ｄ画像内のＲＧＢカラー値の分布のみを表す。たとえば、（ｘ，ｙ）座標にある表示色が赤＝１、緑＝１、青＝１のピクセルは、Ｐｉｘｅｌ（ｘ，ｙ）＝（１，１，１）と表すことができ、これは画像上のｘおよびｙ座標にある黒いピクセルを表示する。ＲＧＢ－Ｄセンサから記録されたＲＧＢ－Ｄ画像は、ＲＧＢ画像の各ピクセルに追加の深度情報を提供する。たとえば、（ｘ，ｙ，ｚ）座標にある表示色が赤＝１、緑＝１、青＝１のピクセルは、Ｐｉｘｅｌ（ｘ，ｙ）＝（１，１，１，ｚ）と表すことができ、これは画像上のｘおよびｙ座標にありｚ単位距離（たとえばミリメートル）離れたところにある黒いピクセルを表示する。 According to an implementation of the present disclosure, a method is used to generate 3D display content for VR/AR/MR using three-dimensional red-green-blue plus distance (RGB-D) images with accurate distance/depth information recorded from an RGB-D sensor. The RGB-D sensor can include, for example, a structured light-based RGB-D sensor, an active/passive stereo vision-based RGB-D sensor, a time-of-flight RGB-D sensor, or any combination thereof. A conventional red-green-blue (RGB) image is a function of x and y coordinates and represents only the distribution of RGB color values in a 2D image. For example, a pixel at (x,y) coordinates with display colors red=1, green=1, and blue=1 can be represented as Pixel(x,y)=(1,1,1), which displays a black pixel at x and y coordinates on the image. The RGB-D image recorded from the RGB-D sensor provides additional depth information for each pixel of the RGB image. For example, a pixel at (x,y,z) coordinates with display colors Red=1, Green=1, Blue=1 can be represented as Pixel(x,y)=(1,1,1,z), which displays a black pixel at the x and y coordinates on the image, one z unit distance away (e.g., millimeters).

本開示の実装形態によれば、立体視表示コンテンツを生成するために、ＲＧＢ－Ｄセンサを使用してＲＧＢ－Ｄ画像を生成することができる。ＲＧＢ－Ｄ画像に基づいて、対応するＲＧＢ画像と深度画像を取得できる。深度画像は、ＲＧＢ画像内のピクセルに対応する物体の距離情報を示す。三角測量関係に基づいて、ＲＧＢ画像内の各ピクセルの距離、焦点距離、瞳孔間距離を使用して、ＲＧＢ画像に対する全体視差マップを生成できる。全体視差マップは２Ｄマトリックスであり、各要素はＲＧＢ画像内のピクセルの視差値を示す。左視差マップは、視差分配比ｋと全体視差マップとにより決定することができる。右視差マップは、視差分配比ｋと全体視差マップにより決定することができる。したがって、左視差マップと右視差マップに基づいてＲＧＢ画像から一対の立体視画像を生成することができる。一対の立体視画像は、左目画像と右目画像とを含む。拡張現実（ＡＲ）、仮想現実（ＶＲ）、または複合現実（ＭＲ）のデバイスの表示要件に従って、左目画像と右目画像をズーム、トリミング、またはサイズ変更して、左表示画像と右表示画像を生成できる。 According to an implementation of the present disclosure, an RGB-D image can be generated using an RGB-D sensor to generate stereoscopic display content. Based on the RGB-D image, a corresponding RGB image and a depth image can be obtained. The depth image indicates distance information of an object corresponding to a pixel in the RGB image. Based on the triangulation relationship, a global disparity map can be generated for the RGB image using the distance, focal length, and interpupillary distance of each pixel in the RGB image. The global disparity map is a 2D matrix, and each element indicates the disparity value of a pixel in the RGB image. The left disparity map can be determined by the disparity distribution ratio k and the global disparity map. The right disparity map can be determined by the disparity distribution ratio k and the global disparity map. Thus, a pair of stereoscopic images can be generated from the RGB image based on the left disparity map and the right disparity map. The pair of stereoscopic images includes a left eye image and a right eye image. The left and right eye images can be zoomed, cropped, or resized to generate left and right view images according to the display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device.

本開示の応用および実装形態は例に限定されず、本開示の実装形態の代替、変形、または修正は、任意の計算環境に対して達成できることに留意されたい。開示された方法、装置、およびシステムの詳細は、システムおよびコーディング構造の概要の後で以下に説明される。開示された方法およびサーバーの詳細は以下に説明される。 It should be noted that the application and implementation of the present disclosure is not limited to the examples, and alternatives, variations, or modifications of the implementation of the present disclosure can be achieved for any computing environment. Details of the disclosed method, apparatus, and system are described below after an overview of the system and coding structure. Details of the disclosed method and server are described below.

図１は、本開示の実装形態によるコンピューティングおよび通信のための装置１００の内部コンポーネントを示す例示的なブロック図である。図１に示すように、コンピューティングおよび通信のための装置１００は、メモリ１０４、プロセッサ１０６、通信ユニット１０８、入出力（Ｉ／Ｏ）コンポーネント１１０、センサ１１２、供給電源１１４、およびバス１０２を含むことができる。バス１０２は、内部信号を分配するために使用することができる。バス１０２は、１つまたは複数のバス（アドレスバス、データバス、またはそれらの組み合わせなど）であるかもしれないことを表している。この装置は、赤緑青プラス距離（ＲＧＢ－Ｄ）カメラ、ブリッジカメラ、フィルムカメラ、スマートフォンカメラ、魚眼カメラ、マイクロコンピュータ、メインフレームコンピュータ、汎用コンピュータ、データベースコンピュータ、特定用途／専用コンピュータ、リモートサーバーコンピュータ、パーソナルコンピュータ、タブレットコンピュータ、ラップトップコンピュータ、携帯電話、組み込みコンピューティング／エッジコンピューティングデバイス、シングルボードコンピュータ、ＡＳＩＣ（特定用途向け集積回路）チップ、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）チップ、ＳｏＣ（システムオンチップ）チップ、クラウドコンピューティングデバイス／サービス、またはウェアラブルコンピューティングデバイスなどの１つまたは複数のコンピューティングデバイスの任意の構成によって実装できる。いくつかの実装形態では、異なる装置は、異なる地理的位置にあり、ネットワークなどを介して互いに通信できる複数のグループのＲＧＢ－Ｄカメラの形態で実装することができる。いくつかの実装形態では、異なる装置は異なる動作で構成される。いくつかの実装形態では、コンピューティングおよび通信のための装置は、本明細書に記載される方法およびシステムの１つまたは複数の態様を実行することができる。例えば、特殊チップを含むＲＧＢ－Ｄカメラ内の特定目的プロセッサを使用して、本明細書に記載の方法およびシステムの１つまたは複数の態様または要素を実装することができる。 1 is an exemplary block diagram illustrating internal components of an apparatus 100 for computing and communication according to an implementation of the present disclosure. As shown in FIG. 1, the apparatus 100 for computing and communication can include a memory 104, a processor 106, a communication unit 108, an input/output (I/O) component 110, a sensor 112, a power supply 114, and a bus 102. The bus 102 can be used to distribute internal signals. The bus 102 represents what may be one or more buses (such as an address bus, a data bus, or a combination thereof). The apparatus may be implemented by any configuration of one or more computing devices, such as a Red Green Blue Plus Distance (RGB-D) camera, a bridge camera, a film camera, a smartphone camera, a fisheye camera, a microcomputer, a mainframe computer, a general purpose computer, a database computer, a special purpose/dedicated computer, a remote server computer, a personal computer, a tablet computer, a laptop computer, a mobile phone, an embedded computing/edge computing device, a single board computer, an ASIC (Application Specific Integrated Circuit) chip, an FPGA (Field Programmable Gate Array) chip, a SoC (System on Chip) chip, a cloud computing device/service, or a wearable computing device. In some implementations, the different apparatuses may be implemented in the form of multiple groups of RGB-D cameras that are in different geographic locations and can communicate with each other via a network or the like. In some implementations, the different apparatuses are configured for different operations. In some implementations, the apparatus for computing and communication may perform one or more aspects of the methods and systems described herein. For example, a special purpose processor in an RGB-D camera that includes a specialized chip may be used to implement one or more aspects or elements of the methods and systems described herein.

図１は、コンピューティングおよび通信のための装置１００が、メモリ１０４、プロセッサ１０６、通信ユニット１０８、入出力（Ｉ／Ｏ）コンポーネント１１０、センサ１１２、供給電源１１４、およびバス１０２を含むことを示す。いくつかの実装形態では、コンピューティングおよび通信のための装置１００は、任意の数のメモリユニット、プロセッサユニット、通信ユニット、入出力（Ｉ／Ｏ）コンポーネント、センサユニット、電源ユニット、およびバスユニットを含むことができる。 1 shows that the device 100 for computing and communication includes a memory 104, a processor 106, a communication unit 108, an input/output (I/O) component 110, a sensor 112, a power supply 114, and a bus 102. In some implementations, the device 100 for computing and communication can include any number of memory units, processor units, communication units, input/output (I/O) components, sensor units, power supply units, and bus units.

メモリ１０４には、二次または永続的な長期記憶など、プログラムコードおよび／またはデータを長期間記憶する非一時的なコンピュータ可読媒体が含まれるが、これに限定されない。メモリ１０４は、データの取得、データの保存、またはその両方を行うことができる。ここでのメモリ１０４は、読み取り専用メモリ（ＲＯＭ）デバイス、ハードドライブ、ランダムアクセスメモリ（ＲＡＭ）、フラッシュドライブ、ＳＳＤ（ソリッドステートドライブ）、ＥＭＭＣ（埋め込み型マルチメディアカード）、光／磁気ディスク、セキュリティデジタル（ＳＤ）カード、または適切なタイプのストレージデバイスの任意の組み合わせであり得る。 Memory 104 includes, but is not limited to, non-transitory computer-readable media for long-term storage of program code and/or data, such as secondary or permanent long-term storage. Memory 104 may retrieve data, store data, or both. Memory 104 here may be a read-only memory (ROM) device, a hard drive, a random access memory (RAM), a flash drive, a solid-state drive (SSD), an embedded multimedia card (EMMC), an optical/magnetic disk, a security digital (SD) card, or any combination of suitable types of storage devices.

プロセッサ１０６は、メモリ１０４、通信ユニット１０８、Ｉ／Ｏコンポーネント１１０、センサ１１２、またはそれらの組み合わせから受信できる情報を操作または処理するために使用することができる。いくつかの実装形態では、プロセッサ１０６は、デジタル信号プロセッサ（ＤＳＰ）、中央プロセッサ（例えば、中央処理装置またはＣＰＵ）、特定用途向け命令セットプロセッサ（ＡＳＩＰ）、組み込みコンピューティング／エッジコンピューティングデバイス、シングルボードコンピュータ、ＡＳＩＣ（特定用途向け集積回路）チップ、ＦＰＧＡチップ（フィールドプログラマブルゲートアレイ）、ＳｏＣ（システムオンチップ）チップ、クラウドコンピューティングサービス、グラフィックスプロセッサ（ＧＰＵのグラフィックス処理装置）を含むことができる。プロセッサ１０６は、バス１０２を介してメモリ１０４に格納されたコンピュータ命令にアクセスすることができる。いくつかの実装形態では、１つまたは複数のプロセッサを使用して、ここで説明する方法およびシステムの１つまたは複数の態様を実行するためのコンピュータ命令を実行または処理することを含むデータ処理を高速化することができる。プロセッサ１０６からの出力データは、バス１０２を介してメモリ１０４、通信ユニット１０８、Ｉ／Ｏコンポーネント１１０、センサ１１２に分配することができる。プロセッサ１０６は、１つまたは複数の構成されたまたは埋め込まれた動作を実行するために、コンピューティングおよび通信のために装置１００を制御するように動作可能な任意のタイプのデバイスまたは複数のデバイスであり得る。 The processor 106 can be used to manipulate or process information that can be received from the memory 104, the communication unit 108, the I/O components 110, the sensors 112, or a combination thereof. In some implementations, the processor 106 can include a digital signal processor (DSP), a central processor (e.g., a central processing unit or CPU), an application specific instruction set processor (ASIP), an embedded computing/edge computing device, a single board computer, an ASIC (application specific integrated circuit) chip, an FPGA chip (field programmable gate array), a SoC (system on chip) chip, a cloud computing service, a graphics processor (a graphics processing unit of a GPU). The processor 106 can access computer instructions stored in the memory 104 via the bus 102. In some implementations, one or more processors can be used to speed up data processing, including executing or processing computer instructions to perform one or more aspects of the methods and systems described herein. Output data from the processor 106 can be distributed to the memory 104, the communication unit 108, the I/O components 110, and the sensors 112 via the bus 102. The processor 106 may be any type of device or devices operable to control the apparatus 100 for computing and communication to perform one or more configured or embedded operations.

プロセッサ１０６およびメモリ１０４に加えて、装置１００はセンサ１１２を含むことができる。例えば、装置１００の動作環境の１つまたは複数の条件は、センサ１１２によって検出、キャプチャ、または決定することができる。いくつかの実装形態では、センサ１１２は、１つ以上の電荷結合素子（ＣＣＤ）、アクティブピクセルセンサ（ＣＭＯＳセンサ）、または他の可視光もしくは非可視光の検出およびキャプチャユニットを含むことができる。コンピューティングおよび通信のための装置１００の動作環境の感知された側面についてキャプチャされたデータは、センサ１１２からメモリ１０４、プロセッサ１０６、通信ユニット１０８、入出力（Ｉ／Ｏ）コンポーネント１１０、供給電源１１４、およびバス１０２に送信することができる。いくつかの実装形態では、例えば、ライダーユニット、マイクロホン、ＲＧＢ－Ｄ感知デバイス、超音波ユニット、または圧力センサなどの複数のセンサを装置１００に含めることができる。上述のセンサは、コンピューティングおよび通信のための装置１００の動作環境の１つまたは複数の状態をキャプチャ、検出、または決定することができる。 In addition to the processor 106 and memory 104, the device 100 may include a sensor 112. For example, one or more conditions of the operating environment of the device 100 may be detected, captured, or determined by the sensor 112. In some implementations, the sensor 112 may include one or more charge-coupled devices (CCDs), active pixel sensors (CMOS sensors), or other visible or non-visible light detection and capture units. Data captured about sensed aspects of the operating environment of the device 100 for computing and communication may be transmitted from the sensor 112 to the memory 104, the processor 106, the communication unit 108, the input/output (I/O) components 110, the power supply 114, and the bus 102. In some implementations, multiple sensors may be included in the device 100, such as, for example, a lidar unit, a microphone, an RGB-D sensing device, an ultrasonic unit, or a pressure sensor. The above-mentioned sensors may capture, detect, or determine one or more conditions of the operating environment of the device 100 for computing and communication.

プロセッサ１０６およびメモリ１０４に加えて、装置１００は、Ｉ／Ｏコンポーネント１１０を含むことができる。Ｉ／Ｏコンポーネント１１０は、ユーザー入力を受信することができる。Ｉ／Ｏコンポーネント１１０は、ユーザー入力をバス１０２、供給電源１１４、メモリ１０４、通信ユニット１０８、センサ１１２、プロセッサ１０６、またはそれらの組み合わせに送信することができる。Ｉ／Ｏコンポーネント１１０は、視覚的出力または表示出力を個人に提供することができる。いくつかの実装形態では、Ｉ／Ｏコンポーネント１１０は、信号および／またはデータを送信するための通信デバイスから形成することができる。 In addition to the processor 106 and memory 104, the device 100 may include an I/O component 110. The I/O component 110 may receive user input. The I/O component 110 may transmit the user input to the bus 102, the power supply 114, the memory 104, the communication unit 108, the sensor 112, the processor 106, or a combination thereof. The I/O component 110 may provide a visual or display output to an individual. In some implementations, the I/O component 110 may be formed from a communication device for transmitting signals and/or data.

プロセッサ１０６およびメモリ１０４に加えて、装置１００は通信ユニット１０８を含むことができる。装置１００は、通信ユニット１０８を使用して、セルラーデータネットワーク、ワイドエリアネットワーク（ＷＡＮ）、仮想プライベートネットワーク（ＶＰＮ）、またはインターネットなどの１つまたは複数の通信ネットワークを介して有線または無線通信プロトコルを使用して別のデバイスと通信することができる。 In addition to the processor 106 and the memory 104, the device 100 may include a communication unit 108. Using the communication unit 108, the device 100 may communicate with another device using wired or wireless communication protocols over one or more communication networks, such as a cellular data network, a wide area network (WAN), a virtual private network (VPN), or the Internet.

プロセッサ１０６およびメモリ１０４に加えて、装置１００は供給電源１１４を含むことができる。供給電源１１４は、バス１０２、メモリ１０４、メモリ１０４などの装置１００内の他のコンポーネントに電力を供給することができる。いくつかの実装形態では、供給電源１１４は、充電式電池などの電池であり得る。いくつかの実装形態では、供給電源１１４は、外部電源からエネルギーを受け取ることができる電力入力接続を含むことができる。 In addition to the processor 106 and memory 104, the device 100 may include a power supply 114. The power supply 114 may provide power to other components in the device 100, such as the bus 102, the memory 104, and the memory 104. In some implementations, the power supply 114 may be a battery, such as a rechargeable battery. In some implementations, the power supply 114 may include a power input connection capable of receiving energy from an external power source.

プロセッサ１０６およびメモリ１０４に加えて、装置１００はバス１０２を含むことができる。供給電源１１４からの電力信号および内部データ信号は、バス１０２を介してメモリ１０４、通信ユニット１０８、センサ１１２、プロセッサ１０６、Ｉ／Ｏコンポーネント１１０、および供給電源１１４に分配することができる。 In addition to the processor 106 and memory 104, the device 100 may include a bus 102. Power signals and internal data signals from a power supply 114 may be distributed via the bus 102 to the memory 104, the communication unit 108, the sensors 112, the processor 106, the I/O components 110, and the power supply 114.

立体視表示コンテンツを生成するための装置およびシステムの部分または構成要素は、図１に示されるものに限定されない要素を含むことができることに留意されたい。本開示の範囲から逸脱することなく、立体視表示コンテンツを生成するための装置およびシステムは、立体視表示コンテンツの生成に加えてまたはそれに関連する様々な機能を実行するための、より多くのまたはより少ない部品、構成要素、およびハードウェアまたはソフトウェアモジュールを含むことができる。 It should be noted that the portions or components of the apparatus and systems for generating stereoscopic display content may include elements not limited to those shown in FIG. 1. Without departing from the scope of this disclosure, the apparatus and systems for generating stereoscopic display content may include more or fewer parts, components, and hardware or software modules for performing various functions in addition to or related to generating stereoscopic display content.

図２は、両眼立体視原理を説明するための例示的な図２００を示す。図２００は、左画像２３０、右画像２４０、左光学中心Ｏ’（０，０）、右光学中心Ｏ’’（０，０）、左焦点Ｌ＝（Ｘ_Ｌ，Ｙ_Ｌ，Ｚ_Ｌ）、右焦点Ｒ＝（Ｘ_Ｒ，Ｙ_Ｒ，Ｚ_Ｒ）、目標点Ｐ＝（Ｘ_Ｃ、Ｙ_Ｃ、Ｚ_Ｃ）を含む。左光学中心Ｏ’は、左画像２３０の中心にあるピクセル点である。右光学中心Ｏ’’は、右画像２４０の中心にある別のピクセル点である。左光学中心Ｏ’に対するピクセル座標は、左画像２３０内の（０，０）である。右光学中心Ｏ’’に対するピクセル座標は、右画像２４０内の（０，０）である。世界座標点（例えば、３Ｄ点）としての目標点Ｐは、左焦点Ｌを介して左画像２３０内の２Ｄ座標点Ｐ’＝（Ｘ_ｌｅｆｔ，Ｙ）として変換および投影することができる。右焦点Ｒを介して、目標点Ｐを右画像２４０内の別の２Ｄ座標点Ｐ’’＝（Ｘ_{ｒｉｇｈｔ}，Ｙ）として変換および投影することができる。左焦点Ｌと右焦点Ｒとの間の距離がベースラインｂである。 2 shows an exemplary diagram 200 for explaining the binocular stereoscopic principle. The diagram 200 includes a left image 230, a right image 240, a left optical center O'(0,0), a right optical center O''(0,0), a left focal point L=( _XL , _YL , _ZL ), a right focal point R=( _XR , _YR , _ZR ), and a target point P=( _XC , _YC , _ZC ). The left optical center O' is a pixel point at the center of the left image 230. The right optical center O'' is another pixel point at the center of the right image 240. The pixel coordinate for the left optical center O' is (0,0) in the left image 230. The pixel coordinate for the right optical center O'' is (0,0) in the right image 240. A target point P as a world coordinate point (e.g., a 3D point) can be transformed and projected as a 2D coordinate point P′=(X _left ,Y) in the left image 230 through a left focal point L. A target point P can be transformed and projected as another 2D coordinate point P″=(X _right ,Y) in the right image 240 through a right focal point R. The distance between the left focal point L and the right focal point R is the baseline b.

２Ｄ座標点Ｐ’および２Ｄ座標点Ｐ’’は、同じ目標点Ｐに対して、それぞれ左画像２３０および右画像２４０内に投影された２つの点である。左画像２３０および右画像２４０内のＰ’とＰ’’の水平座標の差（例えば、視差：ｄ＝Ｘ_ｌｅｆｔ－Ｘ_{ｒｉｇｈｔ}）は、目標点Ｐと２つの焦点（例えば、左焦点Ｌと右焦点Ｒ）との間の距離を評価するために使用することができる。いくつかの実装形態では、目標点Ｐは、３Ｄ物体内の３Ｄ世界座標点である。３Ｄ物体内の各３Ｄ世界座標点は、左画像２３０と右画像２４０の内の両方に投影することができる。３Ｄ物体の対応するピクセルを見つけて、左画像２３０と右画像２４０との間で照合することができる。各ピクセルの視差（たとえば、目標点Ｐに対する視差：ｄ＝Ｘ_ｌｅｆｔ－Ｘ_{ｒｉｇｈｔ}）を計算でき、計算された視差に基づいて、３Ｄ物体に対する視差マップを生成できる。視差マップを使用して世界座標系の３Ｄ物体を再構築できる。 The 2D coordinate point P′ and the 2D coordinate point P″ are two projected points in the left image 230 and the right image 240, respectively, for the same target point P. The difference in horizontal coordinates of P′ and P″ in the left image 230 and the right image 240 (e.g., disparity: d= _Xleft − _Xright ) can be used to estimate the distance between the target point P and two focal points (e.g., the left focal point L and the right focal point R). In some implementations, the target point P is a 3D world coordinate point in a 3D object. Each 3D world coordinate point in the 3D object can be projected in both the left image 230 and the right image 240. Corresponding pixels of the 3D object can be found and matched between the left image 230 and the right image 240. The disparity of each pixel (e.g., disparity for the target point P: d= _Xleft − _Xright ) can be calculated, and a disparity map for the 3D object can be generated based on the calculated disparity. The disparity map can be used to reconstruct the 3D object in the world coordinate system.

いくつかの実装形態では、人間の左目は、左焦点Ｌとすることができる。人間の右目は、右焦点Ｒとすることができる。人間の左目と右目は、周囲の世界のわずかに異なる視野を有する。その場合、ベースラインｂは、左目と右目の間の瞳孔間距離（例えば、５０～７５ｍｍ）である。目標点Ｐは、人間が観察する任意の世界座標点にすることができる。目標点Ｐは、人間の左目画像と右目画像の両方に投影することができる。左目画像と右目画像との間の対応するピクセルの視差を利用して、目標点Ｐと人間との間の距離を計算することができる。その場合、人間の脳によって、一対の立体視画像として左目画像と右目画像が使用されて、周囲の世界に対する立体視を生成することができる。 In some implementations, the human's left eye can be the left focal point L. The human's right eye can be the right focal point R. The human's left and right eyes have slightly different views of the surrounding world. In that case, the baseline b is the interpupillary distance between the left and right eyes (e.g., 50-75 mm). The target point P can be any world coordinate point observed by the human. The target point P can be projected onto both the human's left and right eye images. The disparity of corresponding pixels between the left and right eye images can be used to calculate the distance between the target point P and the human. The left and right eye images can then be used by the human's brain as a pair of stereoscopic images to generate a stereoscopic view of the surrounding world.

いくつかの実装形態では、異なる位置にある２つのカメラ（例えば、左カメラと右カメラ）は、同じ３Ｄ物体に対して異なる２Ｄピクセルを含む左画像２３０と右画像２４０を生成することができる。左カメラの焦点は、左焦点Ｌとすることができる。右カメラの焦点は右焦点Ｒとすることができる。左カメラと右カメラの２つの焦点の間の距離をベースラインｂとすることができる。場合によっては、左カメラと右カメラが水平に配置されていない場合、左画像２３０と右画像２４０の両方のすべてのピクセルに対する視差マップを正しく示すように、左画像２３０と右画像２４０を校正することができる。左画像２３０および右画像２４０に対する視差マップを使用して、左カメラおよび右カメラによってキャプチャされた３Ｄ環境を再構成するために各ピクセルの深度情報を生成することができる。 In some implementations, two cameras (e.g., left and right cameras) at different positions can generate left image 230 and right image 240 that contain different 2D pixels for the same 3D object. The focus of the left camera can be a left focus L. The focus of the right camera can be a right focus R. The distance between the two focuses of the left and right cameras can be a baseline b. In some cases, if the left and right cameras are not positioned horizontally, the left and right images 230 and 240 can be calibrated to correctly show the disparity maps for all pixels in both the left and right images 230 and 240. The disparity maps for the left and right images 230 and 240 can be used to generate depth information for each pixel to reconstruct the 3D environment captured by the left and right cameras.

いくつかの実装形態では、２つまたはそれ以上の画像センサを備えたステレオカメラを使用して、同じ３Ｄ物体に対して異なる２Ｄピクセルを含む左画像２３０および右画像２４０を生成することができる。たとえば、ステレオカメラが２つのイメージセンサ（たとえば、左画像センサと右画像センサ）を含む場合、ステレオカメラを使用して、深度情報を持つ３Ｄ物体を再構成できる。左画像センサを使用して、左画像２３０を生成することができる。右画像センサを使用して、右画像２４０を生成することができる。左画像センサと右画像センサとの間の水平距離は、ベースラインｂとすることができる。視差マップは、周囲の世界のわずかに異なる視野を表す左画像２３０と右画像２４０に基づいて計算することができる。 In some implementations, a stereo camera with two or more image sensors can be used to generate left image 230 and right image 240 that contain different 2D pixels for the same 3D object. For example, if the stereo camera includes two image sensors (e.g., a left image sensor and a right image sensor), the stereo camera can be used to reconstruct a 3D object with depth information. The left image sensor can be used to generate the left image 230. The right image sensor can be used to generate the right image 240. The horizontal distance between the left image sensor and the right image sensor can be the baseline b. A disparity map can be calculated based on the left image 230 and the right image 240 that represent slightly different views of the surrounding world.

一般に、両眼立体視の実現は、視差（パララックス）（例えば、視差（ディスパリティ））の原理に基づいてなされる。例えば、図２では、２つの画像（例えば、左画像２３０と右画像２４０）が行に整列されており、これは、左画像２３０と右画像２４０が同じ平面内にあることを意味する。目標点Ｐは、左画像２３０と右画像２４０にそれぞれ異なるピクセル座標で投影することができる。ピクセル座標の差（例えば、視差：ｄ＝Ｘ_ｌｅｆｔ－Ｘ_{ｒｉｇｈｔ}）を使用して、目標点Ｐと２つの画像（例えば、左画像２３０と右画像２４０）との間の距離を計算することができる。計算された距離情報は、世界中の３Ｄ物体を再構成するために使用できる。 In general, binocular stereoscopic vision is realized based on the principle of parallax (e.g., disparity). For example, in FIG. 2, two images (e.g., left image 230 and right image 240) are aligned in a row, which means that the left image 230 and the right image 240 are in the same plane. A target point P can be projected in the left image 230 and the right image 240 with different pixel coordinates, respectively. The difference in pixel coordinates (e.g., disparity: d=X _left -X _right ) can be used to calculate the distance between the target point P and the two images (e.g., left image 230 and right image 240). The calculated distance information can be used to reconstruct 3D objects in the world.

図３は、本開示のいくつかの実装形態に従って立体視表示コンテンツを生成するための例示的なプロセス３００のフローチャートである。プロセス３００は、図１の装置１００内のソフトウェアおよび／またはハードウェアモジュールとして実装することができる。例えば、プロセス３００は、図１の装置１００などのカメラのプロセッサ１０６によって実行可能な命令および／またはデータとしてメモリ１０４に格納されたソフトウェアモジュールとして実装することができる。別の例では、プロセス３００は、特殊チップにより実行可能な命令を記憶する特殊チップとしてハードウェアで実装することができる。プロセス３００の動作の一部またはすべては、図４に関連して以下に説明するような視差マップを使用して実装することができる。上述のように、本明細書に記載される開示の態様のすべてまたは一部は、ここに記載されている実行されるとそれぞれの技術、アルゴリズム、および／または指示のいずれかを実行するコンピュータプログラムを備えた汎用コンピュータ／プロセッサを使用して実装することができる。追加的にまたは代替的に、例えば、本明細書に記載の技術、アルゴリズム、または命令のいずれかを実行するための特殊なハードウェアを含み得る専用コンピュータ／プロセッサを利用することができる。 3 is a flow chart of an exemplary process 300 for generating stereoscopic display content according to some implementations of the present disclosure. The process 300 can be implemented as a software and/or hardware module in the device 100 of FIG. 1. For example, the process 300 can be implemented as a software module stored in the memory 104 as instructions and/or data executable by the processor 106 of a camera such as the device 100 of FIG. 1. In another example, the process 300 can be implemented in hardware as a specialized chip that stores instructions executable by the specialized chip. Some or all of the operations of the process 300 can be implemented using a disparity map as described below in connection with FIG. 4. As mentioned above, all or some of the aspects of the disclosure described herein can be implemented using a general-purpose computer/processor with a computer program that, when executed, executes any of the respective techniques, algorithms, and/or instructions described herein. Additionally or alternatively, a special-purpose computer/processor can be utilized that may include, for example, specialized hardware for executing any of the techniques, algorithms, or instructions described herein.

動作３０２では、第１の赤緑青（ＲＧＢ）画像および深度画像が、プロセッサを使用して赤緑青プラス距離（ＲＧＢ－Ｄ）画像から取得することができる。例えば、プロセッサは、図１のプロセッサ１０６であってもよい。場合によっては、図１の装置１００のセンサ１１２は、装置１００の動作環境においてＲＧＢ－Ｄ画像を取得するために使用することができる。ＲＧＢ－Ｄ画像は、バス１０２を介してプロセッサ１０６に送信され、ＲＧＢ画像および深度画像を取得することができる。深度画像は、ＲＧＢ画像内の対応する物体（または複数の対応する物体）の距離情報を示す。 In operation 302, a first red-green-blue (RGB) image and a depth image can be obtained from the red-green-blue plus distance (RGB-D) image using a processor. For example, the processor can be the processor 106 of FIG. 1. In some cases, the sensor 112 of the device 100 of FIG. 1 can be used to obtain an RGB-D image in the operating environment of the device 100. The RGB-D image can be transmitted to the processor 106 via the bus 102 to obtain an RGB image and a depth image. The depth image indicates distance information of a corresponding object (or multiple corresponding objects) in the RGB image.

一例として図５を用いて、ＲＧＢ－Ｄ画像は、ＲＧＢ－Ｄセンサ５０２によって取得することができる。ＲＧＢ－Ｄ画像は、任意の技術によって処理して、ＲＧＢ画像５１２および深度画像５１４を取得することができる。いくつかの実装形態では、ＲＧＢ－Ｄ画像はＲＧＢ－Ｄセンサによってキャプチャできる。例えば、ＲＧＢ－Ｄセンサは、図１のセンサ１１２とすることができる。ＲＧＢ画像５１２は、例えば、人間、動物、ソファ、机、および他の物体などの様々な物体を含むことができる。深度画像５１４では、図５では異なる距離を示すために異なる影付きが使用されており、より濃い色合いは距離がより近いことを示す。深度画像５１４は、ＲＧＢ画像５１２内の対応する物体の距離を示す。 Using FIG. 5 as an example, the RGB-D image can be acquired by an RGB-D sensor 502. The RGB-D image can be processed by any technique to obtain an RGB image 512 and a depth image 514. In some implementations, the RGB-D image can be captured by an RGB-D sensor. For example, the RGB-D sensor can be the sensor 112 of FIG. 1. The RGB image 512 can include various objects, such as, for example, humans, animals, sofas, desks, and other objects. In the depth image 514, different shading is used in FIG. 5 to indicate different distances, with darker shading indicating closer distances. The depth image 514 indicates the distance of the corresponding object in the RGB image 512.

いくつかの実装形態では、深度画像内のピクセルは、ＲＧＢ－ＤセンサとＲＧＢ－Ｄ画像内のキャプチャされた対応する物体との間の距離を示す。たとえば、ＲＧＢ－Ｄ画像内のピクセルは深度画像内のピクセルに対応することができる。ＲＧＢ－Ｄ画像内のピクセルは、物体に属する点を示している。深度画像内の同じ位置にある対応するピクセルは、対応する物体とＲＧＢ－Ｄセンサとの間の距離を示すことができる。 In some implementations, a pixel in the depth image indicates the distance between the RGB-D sensor and a corresponding object captured in the RGB-D image. For example, a pixel in the RGB-D image can correspond to a pixel in the depth image. The pixel in the RGB-D image indicates a point that belongs to an object. A corresponding pixel at the same location in the depth image can indicate the distance between the corresponding object and the RGB-D sensor.

図５の例において、深度画像５１４内のピクセルは、ＲＧＢ－Ｄセンサ５０２と、ＲＧＢ画像５１２内でキャプチャされた対応する物体との間の距離を示す。対応する物体には、例えば、物体５１６（例えば、おもちゃのクマ）が含まれ得る。ＲＧＢ画像５１２内の各ピクセルは、物体（例えば、物体５１６）に関連付けることができる。ＲＧＢ画像５１２内の各ピクセルに対する深度画像５１４内の対応するピクセルは、ＲＧＢ－Ｄセンサ５０２と対応する物体との間の距離を示す。 5, pixels in depth image 514 indicate the distance between RGB-D sensor 502 and a corresponding object captured in RGB image 512. The corresponding object may include, for example, object 516 (e.g., a toy bear). Each pixel in RGB image 512 may be associated with an object (e.g., object 516). The corresponding pixel in depth image 514 for each pixel in RGB image 512 indicates the distance between RGB-D sensor 502 and the corresponding object.

図３に戻ると、動作３０４において、ＲＧＢ－Ｄ画像に基づく第１の視差マップは、深度画像内の深度値に基づいて決定することができ、第１の視差マップは、一対の立体視画像に変換される第１のＲＧＢ画像に対する複数の視差値を含む。いくつかの場合では、第１の視差マップは、第１のＲＧＢ画像に対する複数の視差値を含み、第１のＲＧＢ画像の中での視差値は一対の立体視画像を生成するために使用することができる。 Returning to FIG. 3, in operation 304, a first disparity map based on the RGB-D image can be determined based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images. In some cases, the first disparity map includes a plurality of disparity values for the first RGB image, and the disparity values in the first RGB image can be used to generate a pair of stereoscopic images.

各ピクセルの視差値は、例として図４を使用して深度画像内の深度値に基づいて決定することができる。図４は、本開示のいくつかの実装形態による人間の左目と右目の視差値を決定する一例を示す図である。例えば、図４において、目標点Ｏの距離は距離Ｚであり、目標点Ｏに対する視差値はｆ＊ｂ／Ｚであり、ただし、ｆは焦点距離、ｂは左目Ｅ_１と右目Ｅ_２との間の瞳孔間距離であり、Ｚは目標点ＯとＲＧＢ－Ｄセンサ間の距離である。図４の三角測量関係から、第１のＲＧＢ画像内の各ピクセルについて、対応する視差値を決定することができる（例えば、ｆ＊ｂ／Ｚ）。一般に、三角測量関係に基づいて、深度画像内の各ピクセルの深度値、焦点距離、および瞳孔間距離を使用して、第１のＲＧＢ画像（たとえば、ＲＧＢ画像）内の各ピクセルの視差値を決定することができる。図４によれば、第１の視差マップ内の視差値は、例えば、以下に説明する式（５）を使用して決定することができる。 The disparity value of each pixel can be determined based on the depth value in the depth image using FIG. 4 as an example. FIG. 4 is a diagram illustrating an example of determining disparity values for the left and right eyes of a human according to some implementations of the present disclosure. For example, in FIG. 4, the distance of the target point O is distance Z, and the disparity value for the target point O is f*b/Z, where f is the focal length, b is the interpupillary distance between the left eye _E1 and the right eye _E2 , and Z is the distance between the target point O and the RGB-D sensor. From the triangulation relationship in FIG. 4, for each pixel in the first RGB image, a corresponding disparity value can be determined (e.g., f*b/Z). In general, based on the triangulation relationship, the depth value, focal length, and interpupillary distance of each pixel in the depth image can be used to determine the disparity value of each pixel in the first RGB image (e.g., RGB image). According to FIG. 4, the disparity value in the first disparity map can be determined, for example, using Equation (5) described below.

図５の例では、ＲＧＢ－Ｄ画像は、ＲＧＢ－Ｄセンサ５０２によって取得することができる。深度画像５１４内のピクセルは、ＲＧＢ画像５１２内の対応する物体とＲＧＢ－Ｄセンサとの間の距離（すなわち、深度）を示す。例えば、ＲＧＢ画像５１２内の物体５１６に対する距離は、深度画像５１４内に表示される。深度画像５１４内の各ピクセルの深度に基づいて、ＲＧＢ画像５１２に対する全体視差マップ５２２を決定することができる。いくつかの実装形態では、全体視差マップ５２２は、左目と右目の間の瞳孔間距離、各ピクセルの深度値、およびＲＧＢ－Ｄセンサの焦点距離を使用して決定することができる。例えば、全体視差マップ５２２は、以下に説明するように、式（５）を使用して決定することができる。例えば、物体５１６は、グレースケールで表される視差値として図５の全体視差マップ５２２に示されている。そして以下に説明するように、全体視差マップ５２２を使用して、ＲＧＢ画像を一対の立体視画像（例えば、左目画像５４２および右目画像５４４）に変換することができる。 In the example of FIG. 5, an RGB-D image can be acquired by an RGB-D sensor 502. A pixel in a depth image 514 indicates a distance (i.e., depth) between a corresponding object in the RGB image 512 and the RGB-D sensor. For example, the distance for an object 516 in the RGB image 512 is displayed in the depth image 514. Based on the depth of each pixel in the depth image 514, a global disparity map 522 for the RGB image 512 can be determined. In some implementations, the global disparity map 522 can be determined using the interpupillary distance between the left and right eyes, the depth value of each pixel, and the focal length of the RGB-D sensor. For example, the global disparity map 522 can be determined using equation (5), as described below. For example, the object 516 is shown in the global disparity map 522 of FIG. 5 as a disparity value represented in grayscale. The global disparity map 522 can then be used to convert the RGB image into a pair of stereoscopic images (e.g., left-eye image 542 and right-eye image 544), as described below.

いくつかの実装形態では、第１の視差マップは２次元（２Ｄ）マトリックスであり、各要素が視差値を示す。一例として図５を使用すると、第１の視差マップ（例えば、全体視差マップ５２２）は、深度画像５１４およびＲＧＢ画像５１２に基づいて決定することができる。全体視差マップ５２２は２Ｄマトリックスであってもよく、各要素がＲＧＢ画像５１２内のピクセルの視差値を示す。 In some implementations, the first disparity map is a two-dimensional (2D) matrix, with each element indicating a disparity value. Using FIG. 5 as an example, the first disparity map (e.g., global disparity map 522) can be determined based on the depth image 514 and the RGB image 512. The global disparity map 522 can be a 2D matrix, with each element indicating a disparity value for a pixel in the RGB image 512.

いくつかの実装形態では、第１の視差マップは、焦点距離ｆまたは瞳孔間距離ｂのうちの少なくとも１つを使用して決定することができる。一例として図４を使用すれば、焦点距離ｆ、左目Ｅ_１と右目Ｅ_２の間の瞳孔間距離ｂ、および距離Ｚに基づいて、目標点Ｏに対する第１の視差マップ内の視差値（例えば、ｆ＊（ｂ／（ｚ（ｘ，ｙ））））を決定することができる。例えば、視差値は、以下に説明する式（５）を使用して決定することができる。 In some implementations, the first disparity map can be determined using at least one of the focal length f or the interpupillary distance b. Using FIG. 4 as an example, a disparity value (e.g., f*(b/(z(x,y)))) in the first disparity map for the target point O can be determined based on the focal length f, the interpupillary distance b between the left eye _E1 and the right eye _E2, and the distance Z. For example, the disparity value can be determined using equation (5) described below.

図５の例では、ＲＧＢ画像５１２内のピクセルは、深度画像５１４内の距離に関連付けられる。焦点距離ｆまたは瞳孔間距離ｂは、公開データから事前に定義することも、手動入力によって設定することもできる。焦点距離ｆおよび距離を備える瞳孔間距離ｂを使用して、ＲＧＢ画像５１２に対する全体視差マップ５２２を決定することができる。 In the example of FIG. 5, pixels in the RGB image 512 are associated with distances in the depth image 514. The focal length f or interpupillary distance b can be predefined from public data or set by manual input. The focal length f and interpupillary distance b with distance can be used to determine a global disparity map 522 for the RGB image 512.

図３に戻ると、動作３０６において、視差分配比を用いて第１の視差マップを変換することにより、第２の視差マップおよび第３の視差マップを決定することができる。言い換えれば、第２の視差マップと第３の視差マップは、視差分配比を用いて同じ元の視差マップに基づいて決定することができる。いくつかの実装形態では、第１の視差マップは、例えば式（１）を使用して第２の視差マップに変換され、視差分配比ｋに基づいて、例えば下記の式（２）を使用して第３の視差マップに変換することができる。

ただし、ｄ_Ｌ（ｘ，ｙ）は第２のパリティマップ内の視差値であり、ｄ_Ｒ（ｘ，ｙ）は第３パリティマップ内の視差値である。ｄ（ｘ，ｙ）は第１のパリティマップの視差値であり、ｚ（ｘ，ｙ）はＲＧＢ－ＤセンサとＲＧＢ画像内のピクセル（ｘ，ｙ）に関連付けられた対応する物体との間の距離を示し、ｋは視差分配比である。ただし、視差分配比は、左目と右目との間の観察点の位置を示す定数値であり得る。いくつかの実装形態では、視差分配比ｋは、事前に設定された一定値であり得る。 Returning to FIG. 3 , in operation 306, a second disparity map and a third disparity map can be determined by transforming the first disparity map using a disparity distribution ratio. In other words, the second disparity map and the third disparity map can be determined based on the same original disparity map using a disparity distribution ratio. In some implementations, the first disparity map can be transformed into the second disparity map, for example using equation (1), and then transformed into the third disparity map based on the disparity distribution ratio k, for example using equation (2) below.

where d _L (x,y) is the disparity value in the second parity map, d _R (x,y) is the disparity value in the third parity map, d(x,y) is the disparity value of the first parity map, z(x,y) indicates the distance between the RGB-D sensor and the corresponding object associated with pixel (x,y) in the RGB image, and k is the disparity distribution ratio, where the disparity distribution ratio may be a constant value indicating the position of the observation point between the left eye and the right eye. In some implementations, the disparity distribution ratio k may be a preset constant value.

いくつかの実装形態では、第２の視差マップおよび第３の視差マップは、式（１）および（２）を使用せずに他の方法で第１の視差マップから決定することができる。例えば、第２の視差マップおよび第３の視差マップは、視差分配比ｋに加えてオフセットを用いて決定することができる。 In some implementations, the second and third disparity maps can be determined from the first disparity map in other ways without using equations (1) and (2). For example, the second and third disparity maps can be determined using an offset in addition to the disparity distribution ratio k.

第１のパリティマップに対する視差値ｄ（ｘ，ｙ）は、例えば、下記の式（５）を使用して決定することができる。この式において、ｆは焦点距離、ｂは左目と右目の間の瞳孔間距離である。 The disparity value d(x,y) for the first parity map can be determined, for example, using equation (5) below, where f is the focal length and b is the interpupillary distance between the left and right eyes.

一例として図４を用いて、目標点Ｏに対する第１のパリティマップでの視差値ｄ（ｘ，ｙ）は、焦点距離ｆ（例えば、ｆ＝ｆ_１＝ｆ_２）、瞳孔間距離ｂ、および距離Ｚに基づいて決定することができる。視差分配比ｋに基づいて、目標点Ｏに対して、第２のパリティマップの視差値ｄ_Ｌ（ｘ，ｙ）と第３のパリティマップの視差値ｄ_Ｒ（ｘ，ｙ）とは、式（１）および（２）を用いて、上述したように決定できる。 4 as an example, the disparity value d(x,y) in the first parity map for the target point O can be determined based on the focal length f (e.g., f= _f1 = _f2 ), the interpupillary distance b, and the distance Z. Based on the disparity distribution ratio k, the disparity value _dL (x,y) in the second parity map and the disparity value _dR (x,y) in the third parity map for the target point O can be determined as described above using equations (1) and (2).

図５の例では、ＲＧＢ画像５１２および深度画像５１４に基づいて、全体視差マップ５２２を決定することができる。視差分配比ｋに基づいて、左視差マップ５３４および右視差マップ５３６は、それぞれ以下に説明する式（３）および（４）を使用して決定することができる。左視差マップ５３４および右視差マップ５３６を使用して、ＲＧＢ画像を一対の立体視画像に変換することができる。 In the example of FIG. 5, a global disparity map 522 can be determined based on the RGB image 512 and the depth image 514. Based on the disparity distribution ratio k, a left disparity map 534 and a right disparity map 536 can be determined using equations (3) and (4), respectively, described below. The left disparity map 534 and the right disparity map 536 can be used to convert the RGB image into a pair of stereoscopic images.

図３に戻ると、動作３０８で、第２のＲＧＢ画像および第３のＲＧＢ画像を含む一対の立体視画像をプロセッサによって生成することができる。第２のＲＧＢ画像は、第２の視差マップに基づいて、第１のＲＧＢ画像内の第１のピクセルのセットをシフトすることによって生成される。第３のＲＧＢ画像は、第３の視差マップに基づいて第１のＲＧＢ画像内の第２のピクセルのセットをシフトすることによって生成される。 Returning to FIG. 3, at operation 308, a pair of stereoscopic images including a second RGB image and a third RGB image may be generated by a processor. The second RGB image is generated by shifting a first set of pixels in the first RGB image based on the second disparity map. The third RGB image is generated by shifting a second set of pixels in the first RGB image based on the third disparity map.

第２の視差マップおよび第３の視差マップ内の視差値を使用して、第１のＲＧＢ画像内のピクセルを左または右に水平にシフトして、第２のＲＧＢ画像および第３のＲＧＢ画像を生成することができる。いくつかの実装形態では、プロセッサ（例えば、プロセッサ１０６）は、式（３）を使用して第２の視差マップ（例えば、図５の左視差マップ５３４）に基づいて、第１のＲＧＢ画像（例えば、図５のＲＧＢ画像５３２）内の第１のピクセルのセットをシフトすることによって、第２のＲＧＢ画像（例えば、図５の右視差マップ５３６）を生成することができる。プロセッサは、式（４）を使用して第３の視差マップ（例えば、図５の右視差マップ５３６）に基づいて、第１のＲＧＢ画像（例えば、図５のＲＧＢ画像５３２）内の第２のピクセルのセットをシフトすることによって、第３のＲＧＢ画像（例えば、図５の右目画像５４４）を生成することができる。

The disparity values in the second and third disparity maps can be used to shift pixels in the first RGB image horizontally to the left or right to generate the second and third RGB images. In some implementations, a processor (e.g., processor 106) can generate the second RGB image (e.g., right disparity map 536 of FIG. 5 ) by shifting a first set of pixels in the first RGB image (e.g., RGB image 532 of FIG. 5 ) based on the second disparity map (e.g., left disparity map 534 of FIG. 5 ) using equation (3). The processor can generate the third RGB image (e.g., right-eye image 544 of FIG. 5 ) by shifting a second set of pixels in the first RGB image (e.g., RGB image 532 of FIG. 5 ) based on the third disparity map (e.g., right disparity map 536 of FIG. 5 ) using equation (4).

式（３）および（４）において、Ｐｉｘｅｌ_Ｌ（ｘ，ｙ）は第２のＲＧＢ画像内のピクセル（ｘ，ｙ）であり、Ｐｉｘｅｌ_Ｒ（ｘ，ｙ）は第３のＲＧＢ画像内のピクセル（ｘ，ｙ）であり、Ｐｉｘｅｌ（ｘ，ｙ）は第１のＲＧＢ画像内のピクセル（ｘ，ｙ）であり、（Ｒ（ｘ，ｙ），Ｇ（ｘ，ｙ），Ｂ（ｘ，ｙ））はピクセル（ｘ，ｙ）に対するＲＧＢカラーであり、式（１）のｄ_Ｌ（ｘ，ｙ）を参照するｄ_Ｌは第２の視差マップにおける視差値を示し、ｄ_Ｒ（ｘ，ｙ）を参照するｄ_Ｒは、第３の視差マップにおける視差値を示す。 In equations (3) and (4), Pixel _L (x,y) is pixel (x,y) in the second RGB image, Pixel _R (x,y) is pixel (x,y) in the third RGB image, Pixel(x,y) is pixel (x,y) in the first RGB image, (R(x,y),G(x,y),B(x,y)) is the RGB color for pixel (x,y), _dL referring to _dL (x,y) in equation (1) indicates the disparity value in the second disparity map, and _dR referring to _dR (x,y) indicates the disparity value in the third disparity map.

いくつかの実装形態では、第２の視差マップおよび第３の視差マップ内の視差値は、式（３）および（４）を使用せずに他の方法で決定することができる。いくつかの実装形態では、視差値を決定するために、上述の水平方向のシフトに加えて、例えば、追加の１つのピクセルまたは追加の複数のピクセルを上部または下部に追加できる。いくつかの実装形態では、水平方向のシフトに加えて、追加のピクセル（複数可）を左または右に追加することができる。 In some implementations, the disparity values in the second and third disparity maps can be determined in other ways without using equations (3) and (4). In some implementations, in addition to the horizontal shift described above, for example, an additional pixel or additional pixels can be added to the top or bottom to determine the disparity values. In some implementations, in addition to the horizontal shift, additional pixel(s) can be added to the left or right.

一例として図５を用いると、ＲＧＢ画像５３２は、第１のＲＧＢ画像であり得る。左視差マップ５３４は、第２の視差マップであり得る。右視差マップ５３６は、第３の視差マップであり得る。左視差マップ５３４および右視差マップ５３６は、上述のように、視差分配比ｋに基づいて全体視差マップ５２２を変換することによって決定することができる。左視差マップ５３４に基づいて、ＲＧＢ画像５３２内の第１のピクセルのセットを変換することによって、左目画像５４２を生成することができる。例えば、式（３）を左視差マップ５３４とともに使用して、左目画像５４２を生成することができる。式（４）を右視差マップ５３６とともに使用して、右目画像５４４を生成することができる。左目画像５４２および右目画像５４４は、一対の立体視画像であり得る。 5 as an example, RGB image 532 may be a first RGB image. Left disparity map 534 may be a second disparity map. Right disparity map 536 may be a third disparity map. Left disparity map 534 and right disparity map 536 may be determined by transforming global disparity map 522 based on disparity distribution ratio k, as described above. Left eye image 542 may be generated by transforming a first set of pixels in RGB image 532 based on left disparity map 534. For example, equation (3) may be used with left disparity map 534 to generate left eye image 542. Equation (4) may be used with right disparity map 536 to generate right eye image 544. Left eye image 542 and right eye image 544 may be a pair of stereoscopic images.

いくつかの実装形態では、拡張現実（ＡＲ）、仮想現実（ＶＲ）、または複合現実（ＭＲ）装置の表示要件に合わせてサイズ変更された一対の調整された表示画像が、一対の立体視画像に基づいて、プロセッサ（例えば、プロセッサ１０６）によって生成することができる。一例として図５を用いると、一対の立体視画像は、左目画像５４２と右目画像５４４を含む。拡張現実（ＡＲ）、仮想現実（ＶＲ）、複合現実（ＭＲ）装置の表示要件に合わせてサイズ変更された、調整された一対の調整された表示画像は、例えば、左目画像５４２および右目画像５４４に基づいて生成され得る左表示画像５５２および右表示画像５５４を含むことができる。 In some implementations, a pair of adjusted display images resized for the display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device can be generated by a processor (e.g., processor 106) based on the pair of stereoscopic images. Using FIG. 5 as an example, the pair of stereoscopic images includes a left eye image 542 and a right eye image 544. The pair of adjusted display images resized for the display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device can include, for example, a left display image 552 and a right display image 554, which can be generated based on the left eye image 542 and the right eye image 544.

図４は、本開示のいくつかの実装形態による人間の左目と右目の視差計算例４００の図である。図４は、左目Ｅ_１、右目Ｅ_２、目標点Ｏ、左目Ｅ_１と右目Ｅ_２の間の瞳孔間距離ｂ、目標点ＯとＲＧＢセンサ間の距離Ｚ、左目Ｅ_１に対する焦点距離ｆ_１、右目Ｅ_２に対する焦点距離ｆ_２、左目Ｅ_１の画像面における目標点Ｏの投影点Ｏ_１’、右目Ｅ_２の画像面における目標点Ｏの投影点Ｏ_２’、左目Ｅ_１の画像面における原点Ｃ_１’、右目Ｅ_２の画像面における原点Ｃ_２’を含むことができる。一般性を失うことなく、左目の焦点距離ｆ_１は右目の焦点距離ｆ_２に等しく、ｆ_１とｆ_２は両方ともｆに等しい。 4 is a diagram of an example disparity calculation 400 for the left and right eyes of a human being according to some implementations of the present disclosure. FIG. 4 may include a left eye E ₁ , a right eye E ₂ , a target point O, an interpupillary distance b between the left eye E ₁ and the right eye E ₂ , a distance Z between the target point O and an RGB sensor, a focal length f ₁ for the left eye E ₁ , a focal length f ₂ for the right eye E ₂ , a projection point O ₁ ' of the target point O in the image plane of the left eye E ₁ , a projection point O ₂ ' of the target point O in the image plane of the right eye E ₂ , an origin C ₁ ' in the image plane of the left eye E ₁ , and an origin C ₂ ' in the image plane of the right eye E ₂ . Without loss of generality, the focal length f ₁ of the left eye is equal to the focal length f ₂ of the right eye, and both f ₁ and f ₂ are equal to f.

人間の左目Ｅ_１と右目Ｅ_２は、水平方向に瞳孔間距離ｂだけ離れている。これにより、目標点Ｏを、左目Ｅ_１の画像および右目Ｅ_２の画像のそれぞれにおいて異なる位置（例えば、投影点Ｏ_１’および投影点Ｏ_２’）に投影することができる。投影点Ｏ_１’は、左目Ｅ_１の画像面において原点Ｃ_１’の左側に投影される。左目Ｅ_１の画像面における投影点Ｏ_１’と原点Ｃ_１’との間のピクセル距離はＵ１である。投影点Ｏ_２’は、右目Ｅ_２の画像面において原点Ｃ_２’の右側に投影される。右目Ｅ_２の画像面における投影点Ｏ_２’と原点Ｃ_２’との間のピクセル距離はＵ２である。ピクセル位置の差は、目標点Ｏの視差値である。左目Ｅ_１の画像面におけるすべてのピクセルは、右目Ｅ_２の画像面における同じ位置にあるピクセルと一致させることができる。視差マップは、左目Ｅ_１の画像面と右目Ｅ_２の画像面との間のピクセル位置の差に基づいて生成することができる。 The left eye _E1 and the right eye _E2 of a human being are separated by an interpupillary distance b in the horizontal direction. This allows the target point O to be projected to different positions (e.g., projection points _O1 ' and _O2 ') in the image of the left eye _E1 and the image of the right eye _E2 , respectively. The projection point _O1 ' is projected to the left of the origin _C1 ' in the image plane of the left eye _E1 . The pixel distance between the projection point _O1 ' and the origin _C1 ' in the image plane of the left eye _E1 is U1. The projection point _O2 ' is projected to the right of the origin _C2 ' in the image plane of the right eye _E2 . The pixel distance between the projection point _O2 ' and the origin _C2 ' in the image plane of the right eye _E2 is U2. The difference in pixel positions is the disparity value of the target point O. Every pixel in the image plane of the left eye _E1 can be matched with a pixel at the same position in the image plane of the right eye _E2 . A disparity map can be generated based on the difference in pixel positions between the left eye _E1 image plane and the right eye _E2 image plane.

いくつかの実装形態では、深度画像内の各ピクセルは、ＲＧＤセンサと対応する物体との間の距離を示す。例えば、図４において、目標点Ｏに対する距離は距離Ｚである。投影点Ｏ_１’と投影点Ｏ_２’との間のピクセル距離差は｜Ｕ１｜＋｜Ｕ２｜である。図４の三角測量の関係から、｜Ｕ１｜＋｜Ｕ２｜は（ｂ＊ｆ）／Ｚに等しく、ただし、ｂは左目Ｅ_１と右目Ｅ_２の間の瞳孔間距離、ｆは左目Ｅ_１と右目Ｅ_２に対する焦点距離、Ｚは目標点ＯとＲＧＢセンサの間の距離である。したがって、ｂ／Ｚ＊ｆは目標点Ｏに対する視差値である。ＲＧＢ画像内の各ピクセルの視差値は、深度画像内の各ピクセルの深度値、焦点距離、および瞳孔間距離を利用して三角測量関係を使用して決定できる。視差マップは、たとえば次の式を使用して、左目Ｅ_１の画像面と右目Ｅ_２の画像面のすべてのピクセルに対して取得できる。

In some implementations, each pixel in the depth image indicates the distance between the RGD sensor and the corresponding object. For example, in FIG. 4, the distance for the target point O is distance Z. The pixel distance difference between the projection points O ₁ ' and O ₂ ' is |U1|+|U2|. From the triangulation relationship in FIG. 4, |U1|+|U2| is equal to (b*f)/Z, where b is the interpupillary distance between the left eye E ₁ and the right eye E ₂ , f is the focal length for the left eye E ₁ and the right eye E ₂ , and Z is the distance between the target point O and the RGB sensor. Thus, b/Z*f is the disparity value for the target point O. The disparity value for each pixel in the RGB image can be determined using the triangulation relationship utilizing the depth value, focal length, and interpupillary distance for each pixel in the depth image. A disparity map can be obtained for all pixels in the image planes of the left eye E ₁ and the right eye E ₂ , for example, using the following equation:

式（５）において、ｚ（ｘ，ｙ）は、ＲＧＢ－Ｄセンサと、ＲＧＢ画像内のピクセル（ｘ、ｙ）に関連付けられた対応する物体との間の距離を示す。ｚ（ｘ，ｙ）は、ＲＧＢ－Ｄセンサによって生成された深度画像から取得できる。式（５）のｆ（例えば、ｆ＝ｆ_１＝ｆ_２）は、左目Ｅ_１と右目Ｅ_２に対する焦点距離である。ｄ（ｘ，ｙ）は視差マップの各要素を示す。いくつかの実装形態では、図３による視差マップの計算は、動作３０４で実行できる。 In equation (5), z(x,y) denotes the distance between the RGB-D sensor and the corresponding object associated with pixel (x,y) in the RGB image. z(x,y) can be obtained from a depth image generated by the RGB-D sensor. f (e.g., f= _f1 = _f2 ) in equation (5) is the focal length for the left eye _E1 and the right eye _E2 . d(x,y) denotes each element of the disparity map. In some implementations, the calculation of the disparity map according to FIG. 3 can be performed in operation 304.

図５は、本開示のいくつかの実装形態に従って、一対の立体視画像を生成するための例示的なワークフローである。１つまたは複数のＲＧＢ－Ｄセンサ（例えば、ＲＧＢ－Ｄセンサ５０２）を使用して、ＲＧＢ－Ｄ画像を取得することができる。取得されたＲＧＢ－Ｄ画像からＲＧＢ画像５１２と深度画像５１４を取得することができる。深度画像５１４は、ＲＧＢ画像５１２内の対応する物体の距離を示す。例えば、物体５１６がＲＧＢ画像５１２内に表示され、物体５１６に対する距離が深度画像５１４内に示される。いくつかの実装形態では、例えば、図３によれば、ＲＧＢ－Ｄ画像の取得は動作３０２で実行できる。 5 is an example workflow for generating a pair of stereoscopic images according to some implementations of the present disclosure. An RGB-D image can be acquired using one or more RGB-D sensors (e.g., RGB-D sensor 502). An RGB image 512 and a depth image 514 can be acquired from the acquired RGB-D image. The depth image 514 shows the distance of a corresponding object in the RGB image 512. For example, an object 516 is displayed in the RGB image 512, and the distance to the object 516 is shown in the depth image 514. In some implementations, for example, according to FIG. 3, acquiring the RGB-D image can be performed in operation 302.

全体視差マップ５２２は、例えば、深度画像５１４内の距離に基づいてＲＧＢ画像５１２に対して決定することができる。ＲＧＢ画像５１２に対する全体視差マップ５２２内の視差値は、深度画像５１４内の距離、焦点距離、および瞳孔間距離（例えば、図４の焦点距離ｆ＝ｆ_１＝ｆ_２、瞳孔間距離ｂ）に基づいて計算することができる。ＲＧＢ画像５１２に対する全体視差マップ５２２の視差値は、例えば、深度画像５１４内の距離、焦点距離、および瞳孔間距離に基づく三角測量関係を有する式（５）を用いて計算することができる。例えば、全体視差マップ５２２内の物体５１６に対するいくつかのピクセルは、物体５１６に対する視差値を示す。いくつかの実装形態において、例えば、図３によれば、全体視差マップ５２２の決定は、動作３０４で実行できる。 The global disparity map 522 can be determined for the RGB image 512 based on, for example, the distance in the depth image 514. The disparity values in the global disparity map 522 for the RGB image 512 can be calculated based on the distance in the depth image 514, the focal length, and the interpupillary distance (e.g., focal length f= _f1 = _f2 , interpupillary distance b in FIG. 4). The disparity values of the global disparity map 522 for the RGB image 512 can be calculated using, for example, Equation (5) having a triangulation relationship based on the distance in the depth image 514, the focal length, and the interpupillary distance. For example, some pixels for the object 516 in the global disparity map 522 indicate a disparity value for the object 516. In some implementations, for example according to FIG. 3, the determination of the global disparity map 522 can be performed in operation 304.

左視差マップ５３４は、全体視差マップ５２２を変換することによって視差分配ｋに基づいて決定することができる。右視差マップ５３６は、全体視差マップ５２２を変換することによって視差分配ｋに基づいて決定することができる。視差分配ｋに基づいて、全体視差マップ５２２内の視差値を、ある部分の左視差マップ５３４と右視差マップ５３６に割り当てることができる。例えば、視差分配ｋを使用して左視差マップ５３４と右視差マップ５３６を決定できる。前述したように、式（１）および（２）は、視差マップを決定するために使用することができる。いくつかの実装形態では、例えば、図３によれば、左視差マップ５３４および右視差マップ５３６の決定は、動作３０６で実行できる。 The left disparity map 534 can be determined based on the disparity distribution k by transforming the global disparity map 522. The right disparity map 536 can be determined based on the disparity distribution k by transforming the global disparity map 522. Based on the disparity distribution k, the disparity values in the global disparity map 522 can be assigned to the left disparity map 534 and the right disparity map 536 of a portion. For example, the left disparity map 534 and the right disparity map 536 can be determined using the disparity distribution k. As mentioned above, equations (1) and (2) can be used to determine the disparity maps. In some implementations, for example according to FIG. 3, the determination of the left disparity map 534 and the right disparity map 536 can be performed in operation 306.

一対の立体視画像は、左視差マップ５３４および右視差マップ５３６に基づいて生成することができる。左目画像５４２は、ＲＧＢ画像５３２内のピクセルのセットを変換することによって、左視差マップ５３４に基づいて生成することができる（例えば、ＲＧＢ画像５１２）。右目画像５４４は、ＲＧＢ画像５３２内の別のピクセルのセットを変換することによって、右視差マップ５３６に基づいて生成することができる（例えば、ＲＧＢ画像５１２）。左目画像５４２と右目画像５４４は、一対の立体視画像である。左目画像５４２は、ＲＧＢ画像５３２内のピクセルのセットを水平にシフトする式（３）を用いて生成することができる。右目画像５４４は、ＲＧＢ画像５３２内のピクセルのセットを水平にシフトするために式（４）を用いて生成することができる。いくつかの実装形態では、例えば、図３によれば、一対の立体視画像の生成は、動作３０８で実行できる。 A pair of stereoscopic images can be generated based on the left disparity map 534 and the right disparity map 536. The left eye image 542 can be generated based on the left disparity map 534 by transforming a set of pixels in the RGB image 532 (e.g., the RGB image 512). The right eye image 544 can be generated based on the right disparity map 536 by transforming another set of pixels in the RGB image 532 (e.g., the RGB image 512). The left eye image 542 and the right eye image 544 are a pair of stereoscopic images. The left eye image 542 can be generated using equation (3) to horizontally shift a set of pixels in the RGB image 532. The right eye image 544 can be generated using equation (4) to horizontally shift a set of pixels in the RGB image 532. In some implementations, for example according to FIG. 3, the generation of the pair of stereoscopic images can be performed in operation 308.

左目画像５４２と右目画像は、拡張現実（ＡＲ）、仮想現実（ＶＲ）、または複合現実（ＭＲ）装置の表示要件を満たす左表示画像５５２および右表示画像５５４を生成するために、ズームおよびトリミングしてサイズを変更できる。 The left-eye image 542 and the right-eye image can be zoomed and cropped to generate left display image 552 and right display image 554 that meet the display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device.

本明細書で説明される本開示の態様は、機能ブロックコンポーネントおよびさまざまな処理動作の観点から説明することができる。開示されたプロセスおよびシーケンスは、単独で実行することも、任意の組み合わせで実行することもできる。機能ブロックは、指定された機能を実行する任意の数のハードウェアおよび／またはソフトウェアコンポーネントによって実現できる。例えば、記載された態様は、１つまたは複数のマイクロプロセッサまたはその他の制御デバイスの制御下でさまざまな機能を実行可能な、例えば、メモリ素子、処理素子、論理素子、ルックアップテーブルなどの様々な集積回路コンポーネントを使用することができる。同様に、説明された態様の要素がソフトウェアプログラミングまたはソフトウェア要素を使用して実装される場合、本開示は、Ｃ、Ｃ＋＋、Ｊａｖａ、アセンブラなどの任意のプログラミングまたはスクリプト言語を使用して、データ構造、オブジェクト、プロセス、ルーチン、またはその他のプログラミング要素を任意に組み合わせて実装されるさまざまなアルゴリズムを使用して実装できる。機能的な側面は、１つ以上のプロセッサ上で実行されるアルゴリズムで実装できる。さらに、本開示の態様は、電子構成、信号処理および／または制御、データ処理などのための任意の数の従来の技術を使用することができる。「メカニズム」と「要素」という言葉は広く使用されており、機械的または物理的な実装や側面に限定されるものではなく、プロセッサなどと連携するソフトウェアルーチンも含まれ得る。 Aspects of the disclosure described herein can be described in terms of functional block components and various processing operations. The disclosed processes and sequences can be performed alone or in any combination. The functional blocks can be realized by any number of hardware and/or software components performing the specified functions. For example, the described aspects can use various integrated circuit components, such as, for example, memory elements, processing elements, logic elements, look-up tables, etc., capable of performing various functions under the control of one or more microprocessors or other control devices. Similarly, where elements of the described aspects are implemented using software programming or software elements, the disclosure can be implemented using various algorithms implemented using any combination of data structures, objects, processes, routines, or other programming elements using any programming or scripting language, such as C, C++, Java, Assembler, etc. Functional aspects can be implemented in algorithms executed on one or more processors. Additionally, aspects of the disclosure can use any number of conventional techniques for electronic configuration, signal processing and/or control, data processing, etc. The terms "mechanism" and "element" are used broadly and are not limited to mechanical or physical implementations or aspects, but can also include software routines that interface with processors or the like.

上記開示の実装または実装の一部は、例えばコンピュータ使用可能またはコンピュータ可読媒体からアクセス可能なコンピュータプログラム製品の形態をとることができる。コンピュータ使用可能またはコンピュータ可読媒体は、例えば、任意のプロセッサによってまたは任意のプロセッサに関連して使用するためのプログラムまたはデータ構造を有形的に含み、記憶し、通信し、または移送できる任意のデバイスであり得る。媒体は、例えば、電子、磁気、光学、電磁、または半導体デバイスであり得る。他の適切な媒体も利用可能である。このようなコンピュータ使用可能またはコンピュータ可読媒体は、非一時的メモリまたは媒体と呼ばれることがあり、時間の経過とともに変化する可能性のあるＲＡＭまたは他の揮発性メモリまたは記憶装置を含むことができる。本明細書に記載される装置のメモリは、特に指定がない限り、装置に物理的に含まれる必要はないが、装置によってリモートにアクセスできるメモリであり、装置内に物理的に含まれ得る他のメモリと連続している必要はない。 Implementations or parts of implementations of the above disclosure may take the form of a computer program product accessible, for example, from a computer usable or computer readable medium. The computer usable or computer readable medium may be, for example, any device that can tangibly contain, store, communicate, or transport a program or data structure for use by or in association with any processor. The medium may be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable media may also be used. Such computer usable or computer readable media may be referred to as non-transitory memory or media, and may include RAM or other volatile memory or storage that may change over time. The memory of the devices described herein need not be physically contained in the device, unless otherwise specified, but may be memory that can be remotely accessed by the device and need not be contiguous with other memories that may be physically contained within the device.

本開示の例として実行されるものとして本明細書で説明される個別の機能または組み合わせられた機能のいずれも、前述のハードウェアの任意のまたは任意の組み合わせを動作させるためのコードの形式で機械可読命令を使用して実装することができる。計算コードは、個別の機能または組み合わせた機能を計算ツールとして実行できる。１つまたは複数のモジュールの形式で実装でき、各モジュールの入出力データは、本明細書に記載の方法およびシステムの動作中に１つまたは複数のさらなるモジュールとの間で受け渡される。 Any of the individual or combination of functions described herein as being performed as examples of the present disclosure may be implemented using machine-readable instructions in the form of code to operate any or any combination of the aforementioned hardware. The computational code may execute the individual or combination of functions as a computational tool. It may be implemented in the form of one or more modules, with input and output data of each module being passed to and from one or more further modules during operation of the methods and systems described herein.

情報、データ、および信号は、さまざまな異なる技術および技法を使用して表現することができる。例えば、本明細書で参照される任意のデータ、命令、コマンド、情報、信号、ビット、シンボル、およびチップは、電圧、電流、電磁波、磁場または粒子、光場または粒子、他の項目、または前述のものの組み合わせによって表すことができる。 Information, data, and signals may be represented using a variety of different technologies and techniques. For example, any data, instructions, commands, information, signals, bits, symbols, and chips referred to herein may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, other items, or combinations of the foregoing.

用語「例」は、本明細書では、例、実例、または図示例を提供することを意味するために使用される。本明細書に「例」として記載されるいかなる態様または設計も、必ずしも他の態様または設計よりも好ましいまたは有利であると解釈されるべきではない。むしろ、「例」という言葉の使用は、概念を具体的に示すことを目的としています。さらに、本開示全体を通じて「ある態様」または「一態様」という用語の使用は、そのように記載されない限り、同じ態様または実装を意味することを意図したものではない。 The term "example" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as an "example" is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word "example" is intended to illustrate a concept. Moreover, use of the terms "an embodiment" or "one embodiment" throughout this disclosure is not intended to refer to the same embodiment or implementation unless so described.

本開示で使用される「または」という用語は、それが結合する２つ以上の要素についての排他的な「または」ではなく、包括的な「または」を意味することを意図している。つまり、別段の指定がない限り、または文脈上別段の明確な指示がない限り、「ＸにＡまたはＢが含まれる」は、その自然な包含的置換のいずれかを意味することを意図している。言い換えれば、ＸにＡが含まれる場合、ＸにＢが含まれる場合、または、ＸにＡとＢの両方が含まれる場合、この場合、「ＸにＡまたはＢが含まれる」は、前述のいずれかの場合にも満たされる。同様に、「ＸにＡおよびＢのいずれか１つが含まれる」は、「ＸにＡまたはＢが含まれる」と同等の意味で使用されることが意図されている。本開示で使用される「および／または」という用語は、「および」または包括的な「または」を意味することを意図している。つまり、別段の指定がない限り、または文脈上別段の明確な指示がない限り、「ＸにＡ、Ｂ、および／またはＣが含まれる」は、ＸがＡ、Ｂ、およびＣの任意の組み合わせを含み得ることを意味することを意図している。言い換えれば、ＸにＡが含まれる場合、ＸにＢが含まれる場合、ＸにＣが含まれる場合、ＸにＡとＢの両方が含まれる場合、ＸにＢとＣの両方が含まれまる場合、ＸにＡとＣの両方が含まれる場合、または、ＸにＡ、Ｂ、およびＣのすべてが含まれる場合、この場合、「ＸにＡ、Ｂ、および／またはＣが含まれる」は、前述のいずれかの場合にも満たされる。同様に、「ＸにＡ、Ｂ、およびＣの少なくとも１つが含まれる」は、「ＸにＡ、Ｂ、および／またはＣが含まれる」と同等の意味で使用されることが意図されている。 The term "or" as used in this disclosure is intended to mean an inclusive "or" rather than an exclusive "or" for the two or more elements it binds. That is, unless otherwise specified or unless the context clearly dictates otherwise, "X includes A or B" is intended to mean any of its natural inclusive permutations. In other words, if X includes A, if X includes B, or if X includes both A and B, then "X includes A or B" is satisfied in any of the foregoing cases. Similarly, "X includes any one of A and B" is intended to be used in the same sense as "X includes A or B". The term "and/or" as used in this disclosure is intended to mean "and" or an inclusive "or". That is, unless otherwise specified or unless the context clearly dictates otherwise, "X includes A, B, and/or C" is intended to mean that X may include any combination of A, B, and C. In other words, if X includes A, if X includes B, if X includes C, if X includes both A and B, if X includes both B and C, if X includes both A and C, or if X includes all of A, B, and C, then "X includes A, B, and/or C" is satisfied in any of the foregoing cases. Similarly, "X includes at least one of A, B, and C" is intended to be used equivalently to "X includes A, B, and/or C."

本明細書における用語「含む」または「有する」およびその変形の使用は、その後に列挙される項目およびその等価物、ならびに追加の項目を包含することを意味する。文脈に応じて、本明細書で使用される「場合」という単語は、「時」、「その間」、または「に応じて」と解釈できる。 The use of the terms "including" or "having" and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof, as well as additional items. Depending on the context, the word "if" as used herein can be interpreted as "at the time," "during," or "depending on."

本開示を説明する文脈（特に特許請求の範囲の文脈）における用語「ａ」および「ａｎ」および「ｔｈｅ」および類似の指示対象の使用は、単数形および複数形の両方を包含すると解釈されるべきである。さらに、本明細書に別段の記載がない限り、本明細書における値の範囲の記載は、その範囲内にあるそれぞれの個別の値を個別に参照する簡単な方法として機能することのみを意図しており、それぞれの個別の値は、あたかも本明細書に個別に記載されているかのように明細書に組み込まれる。最後に、本明細書に記載されているすべての方法の操作は、本明細書に別段の指示があるか、文脈と明らかに矛盾しない限り、任意の適切な順序で実行可能である。本明細書で提供されるあらゆる例、または例が説明されていることを示す文言（例えば、「など」）の使用は、単に本開示をより良く理解することを目的としており、別段の定めがない限り、本開示の範囲に制限を課すものではない。 The use of the terms "a" and "an" and "the" and similar referents in the context of describing this disclosure (especially in the context of the claims) should be interpreted to include both the singular and the plural. Furthermore, unless otherwise stated herein, the recitation of a range of values herein is intended only to serve as a shorthand method of individually referring to each individual value within the range, and each individual value is incorporated into the specification as if it were individually set forth herein. Finally, the operations of any method described herein can be performed in any suitable order unless otherwise indicated herein or clearly contradicted by context. Any examples provided herein, or the use of language indicating that an example is described (e.g., "etc.") are intended solely for the purpose of providing a better understanding of the disclosure, and do not impose limitations on the scope of the disclosure, unless otherwise specified.

本明細書では、さまざまな見出しおよび小見出しを付けて説明された。これらは、読みやすさを向上させ、仕様内の資料を検索および参照するプロセスを容易にするために含まれている。これらの見出しおよび小見出しは、特許請求の範囲の解釈に影響を与えたり、その範囲をいかなる形でも制限したりすることを意図したものではなく、使用されるべきではない。本明細書に示され説明される特定の実装は、本開示の例示的な例であり、いかなる形でも本開示の範囲を限定することを意図するものではない。 This specification has been described under various headings and subheadings. These have been included to enhance readability and to facilitate the process of locating and referencing material within the specification. These headings and subheadings are not intended, and should not be used, to affect the interpretation of the claims or to limit their scope in any way. The specific implementations shown and described herein are illustrative examples of the disclosure and are not intended to limit the scope of the disclosure in any way.

本明細書に引用される刊行物、特許出願、および特許を含むすべての参考文献は、あたかも各参考文献が個別かつ具体的に参照により組み込まれると示され、その全体が本明細書に記載されるのと同じ程度に、参照により本明細書に組み込まれる。 All references cited in this specification, including publications, patent applications, and patents, are hereby incorporated by reference to the same extent as if each reference was individually and specifically indicated to be incorporated by reference and was set forth in its entirety herein.

本開示は、特定の実施形態および実装に関連して説明されているが、本開示は、開示された実装に限定されるものではなく、逆に、含まれる様々な修正および同等の構成を網羅することを意図していることを理解されたい。添付の特許請求の範囲の範囲内で、その範囲には、そのようなすべての修正および同等の配置を包含するように、法律の下で許可される最も広い解釈が与えられるべきである。
Although the disclosure has been described in connection with particular embodiments and implementations, it should be understood that the disclosure is not limited to the disclosed implementations, but on the contrary, is intended to cover various modifications and equivalent arrangements included. Within the scope of the appended claims, their scope should be accorded the broadest interpretation permitted under law so as to encompass all such modifications and equivalent arrangements.

Claims

1. A method for generating stereoscopic display content, comprising:
obtaining, using a processor, a first red-green-blue (RGB) image and a depth image from the red-green-blue-plus-distance (RGB-D) image;
determining a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images;
determining a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio;
generating, by the processor, the pair of stereoscopic images comprising a second RGB image and a third RGB image, wherein the second RGB image is generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image is generated by shifting a second set of pixels in the first RGB image based on the third disparity map.

The method of claim 1, further comprising: generating, by the processor, a pair of adjusted display images based on the pair of stereoscopic images, the pair of adjusted display images being resized to fit display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device.

The method of claim 1, wherein the first disparity map is a two-dimensional (2D) matrix, each element of which represents a disparity value.

The method of claim 1, wherein the RGB-D image is captured by an RGB-D sensor.

The method of claim 4, wherein a pixel in the depth image indicates a distance between the RGB-D sensor and a corresponding object captured in the RGB image.

Determining the first disparity map includes:
The method of claim 5 , comprising using at least one of a focal length f or an interpupillary distance b to determine the first disparity map.

Determining the second disparity map by transforming the first disparity map using the disparity distribution ratio is based on:

Determining the third disparity map by transforming the first disparity map using the disparity distribution ratio is based on:

7. The method of claim 6, wherein d _L (x,y) is the disparity value of the second disparity map, d _R (x,y) is the disparity value of the third parity map, d(x,y) is the disparity value of the first parity map, z(x,y) is the distance between the RGB-D sensor and an object corresponding to the pixel (x,y) in the RGB-D image, and k is the disparity distribution ratio, which is a constant value indicating the position of an observation point between the left eye and the right eye.

Shifting the first set of pixels in the first RGB image based on the second disparity map based on:

Shifting the second set of pixels in the first RGB image based on the third disparity map based on:

8. The method of claim 7, wherein Pixel _L (x,y) is pixel (x,y) in the second RGB image, Pixel _R (x,y) is pixel (x,y) in the third RGB image, Pixel(x,y) is pixel (x,y) in the first RGB image, and (R(x,y),G(x,y),B(x,y)) is the RGB color for pixel (x,y).

An apparatus for generating stereoscopic display content, comprising:
A non-transient memory;
a processor, the non-transitory memory comprising:
obtaining a first red-green-blue (RGB) image and a depth image from a red-green-blue plus distance (RGB-D) image;
determining a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images;
determining a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio;
generating the pair of stereoscopic images including a second RGB image and a third RGB image, the second RGB image being generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image being generated by shifting a second set of pixels in the first RGB image based on the third disparity map;
The apparatus further comprises instructions executable by the processor.

The instructions executable by the processor include:
10. The device of claim 9, further comprising instructions for generating a pair of adjusted display images based on the pair of stereoscopic images, the pair of adjusted display images being resized to display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device.

The apparatus of claim 9, wherein the first disparity map is a two-dimensional (2D) matrix, each element of which represents a disparity value.

The device of claim 9, wherein the RGB-D image is captured by an RGB-D sensor.

The device of claim 12, wherein a pixel in the depth image indicates a distance between the RGB-D sensor and a corresponding object captured in the RGB image.

Determining the first disparity map includes:
The apparatus of claim 13 , comprising using at least one of a focal length f or an interpupillary distance b to determine the first disparity map.

14. The apparatus of claim 13, wherein d _L (x,y) is the disparity value of the second disparity map, d _R (x,y) is the disparity value of the third parity map, d(x,y) is the disparity value of the first parity map, z(x,y) is the distance between the RGB-D sensor and an object corresponding to the pixel (x,y) in the RGB-D image, and k is the disparity distribution ratio, which is a constant value indicating the position of an observation point between the left eye and the right eye.

16. The apparatus of claim 15, wherein Pixel _L (x,y) is pixel (x,y) in the second RGB image, Pixel _R (x,y) is pixel (x,y) in the third RGB image, Pixel(x,y) is pixel (x,y) in the first RGB image, and (R(x,y),G(x,y),B(x,y)) is the RGB color for pixel (x,y).

1. A non-transitory computer-readable storage medium configured to store a computer program for generating stereoscopic display content, the computer program comprising:
obtaining a first red-green-blue (RGB) image and a depth image from a red-green-blue plus distance (RGB-D) image;
determining a first disparity map according to the RGB-D image based on depth values in the depth image, the first disparity map including a plurality of disparity values for the first RGB image that are converted into a pair of stereoscopic images;
determining a second disparity map and a third disparity map by transforming the first disparity map using a disparity distribution ratio;
generating, by a processor, the pair of stereoscopic images including a second RGB image and a third RGB image, the second RGB image being generated by shifting a first set of pixels in the first RGB image based on the second disparity map, and the third RGB image being generated by shifting a second set of pixels in the first RGB image based on the third disparity map;
A non-transitory computer-readable storage medium comprising instructions executable by the processor.

The instructions executable by the processor include:
20. The non-transitory computer-readable storage medium of claim 17, further comprising instructions for generating, by the processor, a pair of adjusted display images based on the pair of stereoscopic images, the pair of adjusted display images being resized to display requirements of an augmented reality (AR), virtual reality (VR), or mixed reality (MR) device.

The non-transitory computer-readable storage medium of claim 17, wherein the first disparity map is a two-dimensional (2D) matrix, with each element indicating a disparity value.

The non-transitory computer-readable storage medium of claim 17 , wherein the RGB-D image is captured by an RGB-D sensor.