JP2014519132A

JP2014519132A - Controlling objects in a virtual environment

Info

Publication number: JP2014519132A
Application number: JP2014514566A
Authority: JP
Inventors: エバート，ジェフリー・ジーザス; クラーク，ジャスティン・アヴラム; ウィロビー，クリストファー・ハーレー; スカヴェッゼ，マイク; ディーゲロ，ジョエル; マルコヴィッチ，レルジャ; ソラ，ジョー; ヘイリー，デーヴィッド
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2011-06-06
Filing date: 2012-06-05
Publication date: 2014-08-07
Anticipated expiration: 2032-06-05
Also published as: US20150379719A1; US10460445B2; KR102121707B1; KR20190118207A; JP5985620B2; US20180225829A1; CN103703489A; EP2718897B1; CN103703489B; WO2012170445A2; KR102031302B1; US9953426B2; WO2012170445A3; KR20140043379A; US20120307010A1; EP2718897A2; EP2718897A4; US9208571B2

Abstract

本明細書では、写真内のオブジェクトをディジタル化することが論じられる。ユーザは、カメラにオブジェクトを提示し、カメラは、オブジェクトの前面および裏面について色データおよび深さデータを含む画像を取り込む。前面画像と裏面画像の両方について、カメラまでの最も近い点が、深さデータを解析することによって決定される。最も近い点から、オブジェクトのエッジは、深さデータにおける大きな差異に注目することによって見つけられる。深さデータはまた、オブジェクトの前面および裏面のポイントクラウド構成を構成するために使用される。さまざまな技術を適用して、エッジを外挿し、シームを除去し、インテリジェントに色を広げ、ノイズをフィルタリングし、オブジェクトにスケルトンの構造をあて、ディジタル化をさらに最適化する。最終的に、ディジタル表現が、ユーザに提示され、さまざまなアプリケーション（例えば、ゲーム、ウェブ、等）において潜在的に使用される。 In this document, it is discussed to digitize objects in a photograph. The user presents the object to the camera, and the camera captures an image containing color data and depth data for the front and back surfaces of the object. For both the front and back images, the closest point to the camera is determined by analyzing the depth data. From the nearest point, the edge of the object is found by noting the large difference in depth data. Depth data is also used to construct point cloud configurations for the front and back of the object. Various techniques are applied to extrapolate edges, remove seams, intelligently spread colors, filter noise, apply skeleton structures to objects, and further optimize digitization. Ultimately, the digital representation is presented to the user and potentially used in various applications (eg, games, web, etc.).

Description

本願発明の一実施例は、例えば、仮想環境におけるオブジェクトの制御に関する。 An embodiment of the present invention relates to control of an object in a virtual environment, for example.

[0001]近年のゲームおよびインターネットの技術は、過去においてこれらの技術が有するよりもはるかに個人的な用途においてユーザと対話する。ゲームコンソールに接続されたコントローラ上のボタンを単に押すのでなく、今日のゲームシステムは、カメラの前に立っているプレーヤの動作またはワイヤレスコントローラとのプレーヤが行うアクション（例えば、野球のバットのようにコントローラを振ること）を読み取ることができる。この個人的な対話は、ゲームの新しい領域全体を広げる。 [0001] Recent game and Internet technologies interact with users in far more personal applications than these technologies have in the past. Rather than simply pressing a button on the controller connected to the game console, today's gaming systems operate the player standing in front of the camera or the action the player performs with the wireless controller (for example, a baseball bat Shake the controller). This personal dialogue expands the whole new territory of the game.

[0002]この「発明の概要」は、「発明を実施するための形態」において下記にさらに説明される概念の選択を単純化した形態で紹介するために提供される。この「発明の概要」は、特許請求した主題の鍵となる特徴または本質的な特徴を特定することを目的としない。また、この「発明の概要」は、特許請求した主題の範囲を決定する際に補助として使用されることも目的としない。 [0002] This "Summary of the Invention" is provided to introduce in a simplified form the selection of concepts further described below in "DETAILED DESCRIPTION OF THE INVENTION". This Summary of the Invention is not intended to identify key features or essential features of the claimed subject matter. Nor is this Summary of the Invention intended to be used as an aid in determining the scope of the claimed subject matter.

[0003]一態様は、画像内のオブジェクトのディジタル表現（「ディジタル化」）を作成することを対象とする。ユーザは、カメラにオブジェクトを提示し、カメラは、オブジェクトの前面および裏面についての色データおよび深さデータを含む画像を取り込む。前面画像と裏面画像の両方について、カメラまでの最も近い点が、深さデータを解析することによって決定される。最も近い点から、オブジェクトのエッジは、深さデータにおける大きな差異に注目することによって見つけられる。深さデータはまた、オブジェクトの前面および裏面のポイントクラウド構成を構築するために使用される。さまざまな技術を適用して、エッジを外挿し、シームを除去し、インテリジェントに色を広げ、ノイズをフィルタリングし、オブジェクトにスケルトンの構造をあて、ディジタル化をさらに最適化する。最終的に、ディジタル表現が、ユーザに提示され、さまざまなアプリケーション（例えば、ゲーム、ウェブ、等）において潜在的に使用される。 [0003] One aspect is directed to creating a digital representation ("digitization") of an object in an image. The user presents the object to the camera, and the camera captures an image that includes color data and depth data for the front and back surfaces of the object. For both the front and back images, the closest point to the camera is determined by analyzing the depth data. From the nearest point, the edge of the object is found by noting the large difference in depth data. Depth data is also used to build point cloud configurations for the front and back of the object. Various techniques are applied to extrapolate edges, remove seams, intelligently spread colors, filter noise, apply skeleton structures to objects, and further optimize digitization. Ultimately, the digital representation is presented to the user and potentially used in various applications (eg, games, web, etc.).

[0004]本発明の例示の実施形態は、添付した図面を参照して下記に詳細に説明される。 [0004] Exemplary embodiments of the invention are described in detail below with reference to the accompanying drawings.

[0005]本明細書で論じられる実施形態を実装するのに適した例示的なコンピューティング環境のブロック図である。[0005] FIG. 1 is a block diagram of an exemplary computing environment suitable for implementing the embodiments discussed herein. [0006]一実施形態による、ディジタル化のためのオブジェクトを提示するユーザの図である。[0006] FIG. 4 is a diagram of a user presenting an object for digitization, according to one embodiment. [0007]一実施形態による、オブジェクトをディジタル化するための作業の流れの図である。[0007] FIG. 2 is a flow diagram of a workflow for digitizing an object, according to one embodiment. [0008]図４Ａは、一実施形態による、ディジタル化のためにオブジェクトを提示するユーザのカメラ視野像である。[0008] FIG. 4A is a camera view image of a user presenting an object for digitization according to one embodiment.

図４Ｂは、一実施形態による、ディジタル化のためにオブジェクトを提示するユーザのカメラ視野像である。
[0009]一実施形態による、オブジェクトをディジタル化するために利用可能なセグメント化した深さ画像である。 [0010]一実施形態による、深さと色のオフセットの図である。 [0011]一実施形態による、オブジェクトをディジタル化するために利用可能なソースカラー画像である。 [0012]一実施形態による、取り込んだオブジェクトの色セグメント化の図である。 [0013]一実施形態による、ディジタル化すべきオブジェクトを保持するために指図を与えるユーザインターフェース（ＵＩ）の図である。一実施形態による、ディジタル化すべきオブジェクトを保持するために指図を与えるユーザインターフェース（ＵＩ）の図である。 [0014]一実施形態による、オブジェクトの三次元（３Ｄ）ポイントクラウド構成の図である。 [0015]一実施形態による、位置を合わせたポイントシートの２つの視野の図である。 [0016]一実施形態による、最終的なポイントクラウド構成の図である。 [0017]一実施形態による、ユーザに表示されたディジタル化したオブジェクトの確認画像を表示するＵＩの図である。 [0018]一実施形態による、取り込んだ画像のメッシュ出力の図である。 [0019]一実施形態による、オブジェクトのスムージングし処理した画像である。 [0020]一実施形態による、ＵＶ座標を有する画像である。 [0021]一実施形態による、最終的なテクスチャマップのセクションへと描かれた前向きの三角形エッジの図である。 [0022]図１９Ａは、一実施形態による、生成したスケルトンの構造の骨に加えられた重み付けを示す図である。図１９Ｂは、一実施形態による、生成したスケルトンの構造の骨に加えられた重み付けを示す図である。図１９Ｃは、一実施形態による、生成したスケルトンの構造の骨に加えられた重み付けを示す図である。図１９Ｄは、一実施形態による、生成したスケルトンの構造の骨に加えられた重み付けを示す図である。図１９Ｅは、一実施形態による、生成したスケルトンの構造の骨に加えられた重み付けを示す図である。 [0023]図２０Ａは、一実施形態による、輝度／彩度処理前の画像の図である。図２０Ｂは、一実施形態による、輝度／彩度処理後の画像である。 [0024]図２１Ａは、一実施形態による、ソース画像の図である。図２１Ｂは、一実施形態による、エッジがフィルタリングされた後の出力画像である。 [0025]図２２Ａは、一実施形態による、エッジ修復フィルタが背景色になる色を見つけた場合の画像の図である。図２２Ｂは、一実施形態による、エッジ修復フィルタが対象になる色を見つけた場合の画像である。 [0026]図２３Ａは、一実施形態による、エッジから問題となる領域までの距離を示す画像である。図２３Ｂは、一実施形態による、計算した背景確率値を示す画像である。 [0027]一実施形態による、最終的な合成テクスチャマップの図である。 [0028]図２５Ａは、一実施形態による、マスクされた値の図である。図２５Ｂは、一実施形態による、ひどくぼけた頂点の色の図である。 [0029]図２６Ａは、一実施形態による、テクスチャだけを有するメッシュの図である。図２６Ｂは、一実施形態による、頂点の色がマスク値によって混ぜ合わさったテクスチャを有するメッシュの図である。 [0030]一実施形態による、ディジタル化したオブジェクトの最終的なレンダリングの図である。 [0031]一実施形態による、オブジェクトをディジタル化するための作業の流れを詳細に説明する流れ図である。 [0032]一実施形態による、オブジェクトをディジタル化するための作業の流れを詳細に説明する流れ図である。 FIG. 4B is a camera view image of a user presenting an object for digitization according to one embodiment.
[0009] FIG. 3 is a segmented depth image that can be used to digitize an object, according to one embodiment. [0010] FIG. 4 is a diagram of depth and color offsets according to one embodiment. [0011] FIG. 3 is a source color image that can be used to digitize an object, according to one embodiment. [0012] FIG. 4 is a diagram of color segmentation of captured objects, according to one embodiment. [0013] FIG. 4 is a user interface (UI) that provides instructions for holding an object to be digitized, according to one embodiment. FIG. 6 is a user interface (UI) that provides instructions for holding objects to be digitized, according to one embodiment. [0014] FIG. 3 is a diagram of a three-dimensional (3D) point cloud configuration of an object, according to one embodiment. [0015] FIG. 5 is a diagram of two views of an aligned point sheet, according to one embodiment. [0016] FIG. 4 is a diagram of a final point cloud configuration, according to one embodiment. [0017] FIG. 4 is a UI for displaying a confirmation image of a digitized object displayed to a user, according to one embodiment. [0018] FIG. 6 is a diagram of mesh output of a captured image, according to one embodiment. [0019] FIG. 6 is a smoothed and processed image of an object, according to one embodiment. [0020] FIG. 6 is an image having UV coordinates, according to one embodiment. [0021] FIG. 6 is a diagram of forward-facing triangle edges drawn into a section of the final texture map, according to one embodiment. [0022] FIG. 19A is a diagram illustrating the weighting applied to the bone of the generated skeleton structure, according to one embodiment. FIG. 19B is a diagram illustrating weighting applied to the bone of the generated skeleton structure, according to one embodiment. FIG. 19C is a diagram illustrating the weighting applied to the bone of the generated skeleton structure, according to one embodiment. FIG. 19D illustrates the weighting applied to the bone of the generated skeleton structure, according to one embodiment. FIG. 19E illustrates weighting applied to the bone of the generated skeleton structure, according to one embodiment. [0023] FIG. 20A is a diagram of an image before luminance / saturation processing, according to one embodiment. FIG. 20B is an image after luminance / saturation processing according to one embodiment. [0024] FIG. 21A is a diagram of a source image, according to one embodiment. FIG. 21B is an output image after the edges have been filtered, according to one embodiment. [0025] FIG. 22A is an illustration of an image when an edge repair filter finds a color that becomes a background color, according to one embodiment. FIG. 22B is an image when an edge repair filter finds a color of interest according to one embodiment. [0026] FIG. 23A is an image showing the distance from an edge to a problematic area, according to one embodiment. FIG. 23B is an image showing calculated background probability values according to one embodiment. [0027] FIG. 6 is a diagram of a final composite texture map, according to one embodiment. [0028] FIG. 25A is a diagram of masked values, according to one embodiment. FIG. 25B is a diagram of severely blurred vertex colors, according to one embodiment. [0029] FIG. 26A is a diagram of a mesh having only a texture, according to one embodiment. FIG. 26B is a diagram of a mesh having a texture with vertex colors intermixed by mask values, according to one embodiment. [0030] FIG. 4 is a final rendering of a digitized object, according to one embodiment. [0031] FIG. 6 is a flowchart detailing a workflow for digitizing an object, according to one embodiment. [0032] FIG. 7 is a flow diagram detailing a workflow for digitizing an object, according to one embodiment.

[0033]本発明の実施形態の主題は、法的要件を満たすように本明細書において特異性と共に説明される。しかし、説明それ自体は必ずしも、特許請求の範囲を限定することを目的としない。むしろ、特許請求した主題が、他の方法で具体化されて、他の現在の技術または将来の技術と協働して、本明細書に説明するものとは異なるステップまたは類似のステップの組合せを含み得る。これらの用語は、個々のステップの順序が明確に記載されない限りおよび記載されるときを除いて、本明細書中に開示したさまざまなステップ間でいずれかの特定の順序を示唆するようには解釈されるべきではない。 [0033] The subject matter of embodiments of the present invention is described herein with specificity to meet legal requirements. However, the description itself is not necessarily intended to limit the scope of the claims. Rather, the claimed subject matter may be embodied in other ways to cooperate with other current or future technologies to provide different steps or combinations of similar steps than those described herein. May be included. These terms are to be interpreted as implying any particular order between the various steps disclosed herein unless and unless otherwise stated to the order of the individual steps. Should not be done.

[0034]本明細書において説明する実施形態は、全般に、カメラによって取り込まれたオブジェクトのディジタル表現を作成することに関係する。一実施形態では、ユーザは、カメラの前でオブジェクトを保持し、カメラは、オブジェクトの画像を取り込み、デバイスは、取り込んだ画像を、例えば、テレビゲームにおいてエンティティとしてディジタル的に表示可能な３Ｄ表示へとディジタル化する。説明するために、下記の例を考える。ユーザは、カメラを装着したゲームデバイスに対しておもちゃのタコをかざす。カメラを使用して、ゲームデバイスは、オブジェクトの前面および裏面の写真を撮り、各々の側についての色データと深さデータの両方を取り込む。深さデータに基づいて、タコの３Ｄ表示が構成され、色データが、次に３Ｄ表示に付け加えられ、タコのディジタル表示（本明細書においては「ディジタル化」と呼ぶ）を作成する。ディジタル化は、その後、タコの表示が役に立つゲームまたはいずれかの別のソフトウェアもしくはウェブアプリケーションにおいて使用され得る。 [0034] The embodiments described herein generally relate to creating a digital representation of an object captured by a camera. In one embodiment, the user holds the object in front of the camera, the camera captures an image of the object, and the device converts the captured image into a 3D display that can be displayed digitally, for example, as an entity in a video game. And digitize. To illustrate, consider the following example. The user holds a toy octopus over a game device equipped with a camera. Using the camera, the gaming device takes pictures of the front and back of the object and captures both color data and depth data for each side. Based on the depth data, a 3D representation of the octopus is constructed and the color data is then added to the 3D representation to create a digital representation of the octopus (referred to herein as “digitizing”). Digitization can then be used in games or any other software or web application where octopus display is useful.

[0035]少なくとも１つの実施形態は、オブジェクトをディジタル化することを対象とする。ユーザは、（ゲームコンソールなどの）コンピューティングデバイス上のカメラにオブジェクトを提示する。デバイスは、表示のためのオブジェクトの場所を決めるためにユーザに指図することができ、例えば、カメラで見られる画像を反映するスクリーン上に輪郭線を置き、ユーザが輪郭線の中へとオブジェクトを動かすべきであることを指示することによって、取り込む画像を最適化する。最終的には、デバイスは、オブジェクトの１つまたは複数の画像を取り込む。ユーザは、次に、取り込むためにカメラに対してオブジェクトの裏側を提示するように指示されることがある。デバイスは、次に、オブジェクトの裏側の１つまたは複数の画像を取り込むことができる。取り込んだ前面画像および裏面画像は、オブジェクトの３Ｄディジタル化を構成するために処理される。 [0035] At least one embodiment is directed to digitizing an object. A user presents an object to a camera on a computing device (such as a game console). The device can direct the user to determine the location of the object for display, for example by placing an outline on a screen that reflects the image seen by the camera, and the user places the object into the outline. Optimize the captured image by indicating that it should move. Ultimately, the device captures one or more images of the object. The user may then be instructed to present the back side of the object to the camera for capture. The device can then capture one or more images of the back side of the object. The captured front and back images are processed to form a 3D digitization of the object.

[0036]一実施形態では、処理は、カメラによって取り込まれた画像の深さデータを使用する。深さデータは、ピクセルごとにまたは他の空間的表現で画像内に取り込まれた物体の近さを記述する。深さデータを使用して、画像内のオブジェクトの最も近い点が位置決めされる。この実施形態は、画像の最も近いオブジェクトが、ユーザが取り込もうと目を向けているオブジェクトであると仮定し、例えば、カメラに対してタコを保持するユーザは、タコがカメラに対して最も近い物体であることをおそらく意味するであろう。 [0036] In one embodiment, the process uses image depth data captured by the camera. Depth data describes the proximity of an object captured in an image on a pixel-by-pixel basis or other spatial representation. The depth data is used to locate the closest point of the object in the image. This embodiment assumes that the closest object in the image is the object that the user is looking to capture, eg, a user holding an octopus with respect to the camera, Probably means that.

[0037]本発明の概要において手短に説明した、本発明のさまざまな態様が実装され得る例示的な動作環境が、ここで説明される。全体的に図面を、特に最初に図１を参照すると、本発明の実施形態を実装するための例示的な動作環境が示され、全体的にコンピューティングデバイス１００として指定されている。コンピューティングデバイス１００は、適したコンピューティング環境のほんの一例であり、本発明の使用または機能の範囲に関して何らかの限定を示唆しないものとする。コンピューティングデバイス１００が、示したコンポーネントのいずれか１つまたは組合せに関係する何らかの依存性または必要条件を有するように解釈されるべきではない。 [0037] Exemplary operating environments in which various aspects of the invention, briefly described in the summary of the invention may be implemented, are now described. Referring generally to the drawings and in particular first to FIG. 1, an exemplary operating environment for implementing embodiments of the present invention is shown and designated generally as computing device 100. The computing device 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing device 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.

[0038]本発明の実施形態は、コンピュータ、またはパーソナルデータアシスタントもしくは他のハンドヘルドデバイスなどの他のマシンによって実行される、プログラムモジュールなどのコンピュータ実行可能な命令を含むコンピュータコードまたはマシン使用可能な命令の概括的なコンテキストにおいて説明することができる。概して、ルーチン、プログラム、オブジェクト、コンポーネント、データ構造、等を含むプログラムモジュールとは、特定のタスクを実行するまたは特定の抽象データタイプを実装するコードを言う。本発明の実施形態は、ハンドヘルドデバイス、家庭用電気製品、汎用コンピュータ、より専門的なコンピューティングデバイス、等を含むさまざまなシステム構成において実行可能である。本発明の実施形態はまた、通信ネットワークを介してリンクされるリモート処理デバイスによってタスクが実行される分散型コンピューティング環境において実行可能である。 [0038] Embodiments of the present invention include computer code or machine usable instructions, including computer executable instructions, such as program modules, executed by a computer or other machine such as a personal data assistant or other handheld device. Can be described in the general context of Generally, a program module that includes routines, programs, objects, components, data structures, etc. refers to code that performs a particular task or implements a particular abstract data type. Embodiments of the present invention can be implemented in various system configurations including handheld devices, consumer electronics, general purpose computers, more specialized computing devices, and the like. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

[0039]図１を参照し続けると、コンピューティングデバイス１００は、下記のデバイス：メモリ１０２、１つまたは複数のプロセッサ１０３、１つまたは複数のプレゼンテーションコンポーネント１０４、入力／出力（Ｉ／Ｏ）ポート１０５、Ｉ／Ｏコンポーネント１０６、および例示の電源１０７、を直接的にまたは間接的につなげるバス１０１を含む。バス１０１は、何であろうとも（アドレスバス、データバス、またはこれらの組合せなどの）１つまたは複数のバスを表す。図１のさまざまなブロックが明確化のために線で示されるが、実際には、描かれているさまざまなコンポーネントは、それほど明確ではなく、例えて言えば、線は、さらに正確な表現をすれば、灰色であり曖昧であるはずである。例えば、Ｉ／Ｏコンポーネントになる表示デバイスなどのプレゼンテーションコンポーネントを考えることができる。加えて、多くのプロセッサはメモリを有する。本発明の発明者らは、このようなものが本技術の本質であることを認識し、図１の図が本発明の１つまたは複数の実施形態とともに使用され得る例示的なコンピューティングデバイスの単なる例示であることを繰り返す。下記のすべてが、図１および「コンピューティングデバイス」への言及の範囲内であると考えられるので、「ワークステーション」、「サーバ」、「ラップトップ」、「ゲームコンソール」、「ハンドヘルドデバイス」、等などのカテゴリー間で、区別は行われない。 [0039] Continuing to refer to FIG. 1, computing device 100 includes the following devices: memory 102, one or more processors 103, one or more presentation components 104, input / output (I / O) ports. 105, an I / O component 106, and an exemplary power source 107, including a bus 101 that connects directly or indirectly. Bus 101 represents one or more buses (such as an address bus, a data bus, or combinations thereof) whatever. Although the various blocks in FIG. 1 are shown with lines for clarity, in practice, the various components depicted are less clear, for example, the lines are more accurate. It should be gray and ambiguous. For example, a presentation component such as a display device that becomes an I / O component can be considered. In addition, many processors have memory. The inventors of the present invention recognize that such is the essence of the technology, and that the diagram of FIG. 1 is an exemplary computing device that can be used with one or more embodiments of the present invention. I repeat that it is just an example. All of the following are considered within the scope of reference to FIG. 1 and “computing device”, so “workstation”, “server”, “laptop”, “game console”, “handheld device”, No distinction is made between categories such as.

[0040]コンピューティングデバイス１００は、典型的にはさまざまなコンピュータ可読媒体を含む。コンピュータ可読媒体は、コンピューティングデバイス１００によってアクセス可能な任意の利用可能な媒体である場合があり、揮発性媒体と不揮発性媒体の両方、リムーバブル媒体およびノンリムーバブル媒体を含む。例として、限定ではなく、コンピュータ可読媒体は、コンピュータ記憶媒体および通信媒体を含むことができる。コンピュータ記憶媒体は、コンピュータ可読命令、データ構造、プログラムモジュールまたは他のデータなどの情報の記憶のための任意の方法または技術において実装された揮発性媒体と不揮発性媒体の両方、リムーバブル媒体およびノンリムーバブル媒体を含む。コンピュータ記憶媒体は、ランダムアクセスメモリ（ＲＡＭ）、読取り専用メモリ（ＲＯＭ）、電気的消去書込み可能読取り専用メモリ（ＥＥＰＲＯＭ）、フラッシュメモリもしくは他のメモリ技術、ＣＤ−ＲＯＭ、ディジタル多用途ディスク（ＤＶＤ）もしくは他のホログラフィックメモリ、磁気カセット、磁気テープ、磁気ディスク記憶装置もしくは他の磁気記憶デバイス、または所望の情報をエンコードするために使用されること可能であり、コンピューティングデバイス１００によってアクセス可能な任意の他の媒体を含むが、これらに限定されない。 [0040] Computing device 100 typically includes a variety of computer-readable media. Computer readable media can be any available media that can be accessed by computing device 100 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can include computer storage media and communication media. Computer storage media can be implemented in any method or technique for storage of information such as computer readable instructions, data structures, program modules or other data, both volatile and non-volatile media, removable and non-removable media. Includes media. Computer storage media include random access memory (RAM), read only memory (ROM), electrically erasable writable read only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) Or any other holographic memory, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage device, or any that can be used to encode the desired information and is accessible by the computing device 100 Including, but not limited to, other media.

[0041]メモリ１０２は、揮発性メモリおよび／または不揮発性メモリの形態のコンピュータ記憶媒体を含む。メモリ１０２は、リムーバブル、ノンリムーバブル、またはこれらの組合せであってもよい。例示的なハードウェアデバイスは、固体メモリ、ハードドライブ、光ディスクドライブ、等を含む。コンピューティングデバイス１００は、メモリ１０２またはＩ／Ｏコンポーネント１０６などのさまざまなエンティティからデータを読み出す１つまたは複数のプロセッサを含む。（１つまたは複数の）プレゼンテーションコンポーネント１０４は、ユーザまたは他のデバイスにデータ表示を提示する。例示的なプレゼンテーションコンポーネントは、表示デバイス、スピーカ、印刷コンポーネント、振動コンポーネント、等を含む。 [0041] The memory 102 includes computer storage media in the form of volatile and / or nonvolatile memory. Memory 102 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical disk drives, and the like. Computing device 100 includes one or more processors that read data from various entities such as memory 102 or I / O component 106. The presentation component (s) 104 presents a data display to the user or other device. Exemplary presentation components include display devices, speakers, printing components, vibration components, and the like.

[0042]Ｉ／Ｏコンポーネント１０６は、静止画またはビデオを撮ることができるカメラを備えることができる。一実施形態では、カメラは、写真を撮るときに、色データ（例えば、赤、緑、青）および深さデータを取り込む。深さデータは、カメラによってカメラそれ自体に取り込まれるオブジェクトの−一実施形態では、ピクセルごとのベースで−近さを示す。深さデータは、投影したＩＲ光を読み取るため赤外線（ＩＲ）カメラを使用すること、投影したレーザ光を読み取ること、等のような多くの方法で取り込まれることがある。深さデータは、センチメートルごとに、メートルごとに、または他の空間的表現で記憶されることがある。例えば、ＩＲドットは、投影されることがあり、ＩＲカメラによって読み取られ、カメラのすぐ前の領域内の画像の深さを詳細に説明する出力ファイルを生成し、メートルごとの向きで測定されることがある。加えて、深さデータはまた、深さが測定されるスクリーン領域のピクセルを記録することによって、取り込んだオブジェクトの特定の部分の向きを示すことができる。カラーカメラおよび深さカメラが相互に別々に設置されることがあるという理由で、変換は、取得した色データを対応する深さデータにマッピングするために行われることがある。 [0042] The I / O component 106 may comprise a camera capable of taking still images or videos. In one embodiment, the camera captures color data (eg, red, green, blue) and depth data when taking a picture. Depth data indicates the proximity of objects captured by the camera itself, in one embodiment, on a pixel-by-pixel basis. Depth data may be captured in many ways, such as using an infrared (IR) camera to read the projected IR light, reading the projected laser light, and the like. Depth data may be stored per centimeter, per meter, or other spatial representation. For example, IR dots may be projected and read by an IR camera, producing an output file detailing the depth of the image in the region immediately in front of the camera, and measured in meter-by-meter orientation Sometimes. In addition, depth data can also indicate the orientation of a particular portion of the captured object by recording the pixels of the screen area where the depth is measured. Because the color camera and the depth camera may be installed separately from each other, the conversion may be performed to map the acquired color data to the corresponding depth data.

[0043]Ｉ／Ｏポート１１８は、コンピューティングデバイス１００がＩ／Ｏコンポーネント１２０を含む他のデバイスに論理的につなげられることを可能にし、他のデバイスのいくつかは、ビルトインであってもよい。例示のＩ／Ｏコンポーネント１２０は、マイクロフォン、ジョイスティック、ゲームパッド、衛星用パラボラアンテナ、スキャナ、プリンタ、ワイヤレスデバイス、等を含む。 [0043] The I / O port 118 allows the computing device 100 to be logically coupled to other devices that include the I / O component 120, some of which may be built-in. . Exemplary I / O components 120 include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and the like.

[0044]前に示したように、いくつかの実施形態は、仮想環境においてオブジェクトのディジタル表示を作成することを対象とする。図２は、一実施形態による、オブジェクト２０６のディジタル表現を作成するためのユーザ２０４のための環境２００の図である。本明細書において説明するこの配置および他の配置が例としてのみ示されることを理解すべきである。別の配置および要素（例えば、マシン、インターフェース、機能、順序、および機能のグループ分け、等）が、示したものに加えてまたは代わりに使用されても、いくつかの要素は、完全に省略されてもよい。さらに、本明細書において説明する要素の多くは、独立したコンポーネントもしくは分散型コンポーネントとして、または他のコンポーネントとともに、ならびに任意の適した組合せおよび場所に実装されることがある機能エンティティである。１つまたは複数のエンティティによって実行されるように本明細書において説明するさまざまな機能は、ハードウェア、ファームウェア、および／またはソフトウェアによって実行されることがある。例えば、さまざまな機能は、メモリ内に記憶された命令を実行するプロセッサによって実行されることがある。 [0044] As previously indicated, some embodiments are directed to creating a digital representation of an object in a virtual environment. FIG. 2 is a diagram of an environment 200 for a user 204 for creating a digital representation of an object 206, according to one embodiment. It should be understood that this and other arrangements described herein are shown by way of example only. Even though other arrangements and elements (eg, machines, interfaces, functions, order, and function groupings, etc.) may be used in addition to or instead of those shown, some elements may be omitted entirely. May be. Further, many of the elements described herein are functional entities that may be implemented as independent or distributed components, or with other components, and in any suitable combination and location. Various functions described herein as being performed by one or more entities may be performed by hardware, firmware, and / or software. For example, various functions may be performed by a processor that executes instructions stored in memory.

[0045]図２に注目すると、環境２００は、コンピューティングデバイス２０２へタコの人形として示されたオブジェクト２０６を提示するユーザ２０４を示し、コンピューティングデバイス２０２は、２つのカメラすなわちカラーカメラ２０８および深さカメラ２１０を搭載している。環境２００において、コンピューティングデバイス２０２は、ＭｉｃｒｏｓｏｆｔＣｏｒｐｏｒａｔｉｏｎ（登録商標）によって作られたＭｉｃｒｏｓｏｆｔＫｉｎｅｃｔ（商標）などのゲームコンソールである。コンピューティングデバイス２０２上のカメラは、オブジェクト２０６を含む１つまたは複数の画像を取り込む。カラーカメラ２０８は、画像用の色データを取り込み、深さカメラ２１０は、深さデータを取り込む。代替実施形態では、コンピューティングデバイス２０２は、色データおよび深さデータの両方を取り込む１つのカメラを有するだけであってもよい。 [0045] Turning to FIG. 2, environment 200 shows a user 204 presenting an object 206 shown as an octopus doll to computing device 202, which includes two cameras, a color camera 208 and a depth. The camera 210 is mounted. In environment 200, computing device 202 is a game console, such as Microsoft Kinect ™, made by Microsoft Corporation ™. A camera on the computing device 202 captures one or more images that include the object 206. The color camera 208 captures image color data, and the depth camera 210 captures depth data. In an alternative embodiment, computing device 202 may only have one camera that captures both color data and depth data.

[0046]スタンドアロンデバイスとして示されるとしても、コンピューティングデバイス２０２は、他のコンピューティングデバイス（例えば、ゲームコンソール、サーバ、等）に一体化されるまたは通信で接続されることがある。コンピューティングシステム２００のコンポーネントは、ネットワークを介して相互に通信することができ、ネットワークは、限定ではなく、１つもしくは複数のローカルエリアネットワーク（ＬＡＮ）および／またはワイドエリアネットワーク（ＷＡＮ）を含むことができる。このようなネットワーク環境は、事務所、会社規模のコンピュータネットワーク、イントラネット、およびインターネットにおいて普通である。いくつかの実施形態が追加のコンピューティングデバイス２０２を含むことができることを、理解すべきである。１つのデバイス／インターフェースまたは分散型環境において協働する複数のデバイス／インターフェースを、各々が含むことができる。 [0046] Even though shown as a stand-alone device, the computing device 202 may be integrated or communicatively connected to other computing devices (eg, game consoles, servers, etc.). The components of the computing system 200 can communicate with each other over a network, which includes, without limitation, one or more local area networks (LANs) and / or wide area networks (WANs). Can do. Such network environments are commonplace in offices, company-wide computer networks, intranets, and the Internet. It should be understood that some embodiments can include additional computing devices 202. Each device may include a single device / interface or multiple devices / interfaces cooperating in a distributed environment.

[0047]いくつかの実施形態では、本明細書において説明するディジタル化技術の１つまたは複数は、スタンドアロンアプリケーションによって実装されることがある。あるいは、ディジタル化技術の１つまたは複数は、インターネットなどのネットワークの全体にわたる異種のコンピューティングデバイスによって、またはゲームシステム内部のモジュールによって実装され得る。図２に示されたコンポーネント／モジュールは、性質においておよび数において例示的であり、限定するようには解釈されるべきでないことが、当業者には理解されるであろう。任意の数のコンポーネント／モジュールが、本発明の実施形態の範囲内で所望の機能を実現するために採用されることがある。さらに、コンポーネント／モジュールは、任意の数のサーバまたはクライアントコンピューティングデバイス上に設置され得る。 [0047] In some embodiments, one or more of the digitization techniques described herein may be implemented by a stand-alone application. Alternatively, one or more of the digitization techniques may be implemented by disparate computing devices across a network, such as the Internet, or by modules within a gaming system. Those skilled in the art will appreciate that the components / modules shown in FIG. 2 are exemplary in nature and number and should not be construed as limiting. Any number of components / modules may be employed to achieve the desired functionality within the scope of embodiments of the present invention. Further, the components / modules can be installed on any number of servers or client computing devices.

[0048]ユーザ２０４が、コンピューティングデバイス２０２にオブジェクト２０６の前側を提示するように図２には示されているが、ユーザ２０４は、コンピューティングデバイス２０２にオブジェクト２０６の裏側を提示することができ、そのため、オブジェクト２０６の裏側画像を、取り込むことができる。裏側画像は、次に、オブジェクト２０６の表側画像と結合可能であり、オブジェクト２０６の３Ｄ表示を生成することが可能である。各々の取り込まれた画像は、色データおよび深さデータを含むことができ、その両方は、コンピューティングデバイス２０２がオブジェクト１０６の３Ｄ表示を正確に作成することを可能にする。 [0048] Although shown in FIG. 2 as the user 204 presents the front side of the object 206 to the computing device 202, the user 204 can present the back side of the object 206 to the computing device 202. Therefore, the back side image of the object 206 can be captured. The back side image can then be combined with the front side image of the object 206 and a 3D display of the object 206 can be generated. Each captured image can include color data and depth data, both of which allow the computing device 202 to accurately create a 3D representation of the object 106.

[0049]オブジェクト１０６の追加の画像視野が、異なる実施形態では、ディジタル化を助けるためにやはり使用され得る。オブジェクト１０６は、任意の異なる角度から写真撮影されても、ビデオ撮影されてもよい。例えば、数枚の画像が、よりしっかりとした３Ｄディジタル化を生成するために、前面視野および裏面視野に加えて、または代わりに、画像１０６の右、左、底部、および上部から撮られることがある。例えば、いくつかの側面視野が、オブジェクト１０６の特定の側面をディジタル化するために使用されることがある。少なくとも実施形態では、使用されるオブジェクト１０６の視野が多いほど、３Ｄ表示は、より完全または正確である。 [0049] Additional image fields of the object 106 may also be used to assist in digitization in different embodiments. The object 106 may be photographed from any different angle or may be filmed. For example, several images may be taken from the right, left, bottom, and top of the image 106 in addition to or instead of the front and back views to produce a more robust 3D digitization. is there. For example, several side views may be used to digitize specific sides of the object 106. In at least an embodiment, the more field of view of the object 106 used, the more complete or accurate the 3D display.

[0050]図３は、一実施形態による、オブジェクトをディジタル化するための作業の流れ３００の図である。最初に、３０２において示したように、ユーザは、画像が撮られるように、コンピューティングデバイス上のカメラにオブジェクトを提示する。コンピューティングデバイスは、いくつかの実施形態では、画像の最適画像を取り込むために特定の領域へとオブジェクトを動かすようにユーザに指示することができる。例えば、ディスプレイ上に外形線を形成することを問い合わせ、ユーザおよびオブジェクトのリアルタイム画像を示し、次に外形線へとオブジェクトを動かすようにユーザに指示する。一旦、初期画像が撮られると、３０４において示したように、コンピューティングデバイスは、取り込むためにオブジェクトの裏側を提示するようにユーザに指示することができる。裏側を取り込むための指図は、コンピューティングデバイスによって同様に行われることがある。取り込まれた各画像について、色データおよび深さデータが、記憶され、表示されるオブジェクトをディジタル化するために使用される。その上、複数の画像が、オブジェクトの前側像および裏側像用に取り込まれることがある。例えば、コンピューティングデバイスは、１０枚の前面画像および１０枚の裏面画像を撮るように、そしておそらく前面の１０枚を一緒におよび裏面の１０枚を一緒にマージするように、または画像をディジタル化するために２０枚すべてを使用するように構成され得る。１０枚の画像が、オブジェクトをディジタル化するための画像の理想的な数であるように示しているが、別の実施形態は、異なる数の取り込んだ画像を使用することができる。 [0050] FIG. 3 is a diagram of a workflow 300 for digitizing an object, according to one embodiment. Initially, as shown at 302, a user presents an object to a camera on a computing device so that an image can be taken. The computing device, in some embodiments, can instruct the user to move the object to a particular area to capture an optimal image of the image. For example, inquire to form an outline on the display, show real time images of the user and object, and then instruct the user to move the object to the outline. Once the initial image has been taken, the computing device can instruct the user to present the back side of the object for capture, as shown at 304. The instructions for capturing the back side may be performed by the computing device as well. For each captured image, color data and depth data are stored and used to digitize the displayed object. In addition, multiple images may be captured for the front and back images of the object. For example, the computing device may take 10 front images and 10 back images, and possibly merge the front 10 together and the back 10 together, or digitize the image It can be configured to use all 20 sheets to do. Although 10 images are shown to be the ideal number of images for digitizing an object, other embodiments can use a different number of captured images.

[0051]一旦、オブジェクトの前面画像および裏面画像がカメラによって取り込まれると、一実施形態は、３０６において示したように、カメラまでの画像内で最も近接した点を画像の深さデータを使用して検索することによって、オブジェクトをディジタル化することを始める。ユーザは、ユーザの前にディジタル化するオブジェクトをおそらく保持し、そのためオブジェクトは、他のものよりもカメラに近いはずである。いったん図２に戻ると、ユーザ２０４が自分の前にオブジェクト２０６を保持し、そのため、コンピューティングデバイス２０２により近いことに気付くことがある。画像内で最も近接したオブジェクトの場所を見つけることは、画像に関係する深さデータを使用して実現されることがあり、いくつかの実施形態は、前側画像および裏側画像の両者について処理を実行し、両者の中で最も近接したオブジェクトを特定する。 [0051] Once the front and back images of the object are captured by the camera, one embodiment uses the image depth data to find the closest point in the image up to the camera, as shown at 306. Begin to digitize objects by searching for them. The user probably holds the object to digitize in front of the user, so the object should be closer to the camera than the others. Returning to FIG. 2, the user 204 may notice that he holds the object 206 in front of him and is therefore closer to the computing device 202. Finding the location of the closest object in the image may be achieved using depth data related to the image, and some embodiments perform processing on both the front and back images Then, the closest object among them is specified.

[0052]３０８において示したように、画像内で特定された最も近接したオブジェクトは、次に、オブジェクト端がどこであるかを特定するためにエッジを検索される。深さデータは、画像中のオブジェクトのエッジの位置を見つけるために再び使用される。エッジ検索は、最も近接した点から外へ向かって始められ、複数の点の深さの著しい違いを探す。例えば、図２中のタコのエッジは、ユーザ２０４の肩を表す隣接する点よりもほぼ０．５メートル近い点を有することがある。このような著しい違いは、隣接する点がオブジェクトの一部ではなく、したがってさらに先のディジタル化ステップには含まれるべきではないという読み取り可能な信号を表す。このような方法でオブジェクトのすべてのエッジの位置を見つけることは、コンピューティングデバイスが画像内のオブジェクトを特定することを可能にする。 [0052] As indicated at 308, the closest object identified in the image is then searched for an edge to identify where the object edge is. The depth data is again used to find the position of the edge of the object in the image. The edge search is started outward from the closest point and looks for significant differences in the depth of the points. For example, the edge of the octopus in FIG. 2 may have a point that is approximately 0.5 meters closer than the adjacent point that represents the shoulder of the user 204. Such significant differences represent a readable signal that adjacent points are not part of the object and therefore should not be included in further digitization steps. Finding the location of all edges of an object in this way allows the computing device to identify the object in the image.

[0053]一旦、オブジェクトが決定されると、一実施形態は、画像の残り（すなわち、オブジェクトとして特定されない画像の部分）に関係する色データをオフに切り替える。いくつかの実施形態では、複数の画像（例えば、オブジェクトの前面の１０枚の画像および裏面の１０枚の画像）を取り込むことが必要である場合があり、そのため、３１０において示したように、スムージング技術が、フレーム間の見つけたエッジを混ぜ合わせるために必要とされることがある。例えば、オブジェクトは、フレーム１とフレーム４との間で動くことがあり、そのためフレーム間のエッジのスムージングが、オブジェクトの正確な表示を得るために必要とされることがある。加えて、ノイズ、低解像度、および深さと色の位置合わせにおける不完全性がまた、エッジの追加のスムージングおよび／またはフィルタリングを必要とさせることがある。 [0053] Once an object is determined, one embodiment switches off color data related to the rest of the image (ie, the portion of the image that is not identified as an object). In some embodiments, it may be necessary to capture multiple images (eg, 10 images on the front side and 10 images on the back side of the object), so smoothing as shown at 310. Techniques may be required to blend found edges between frames. For example, the object may move between frame 1 and frame 4, so smoothing of the edges between frames may be required to obtain an accurate representation of the object. In addition, noise, low resolution, and imperfections in depth and color alignment may also require additional smoothing and / or filtering of the edges.

[0054]一実施形態では、３１２において示したように、結果として得られたスムージングしおよび／またはフィルタリングしたオブジェクトは、確認のためにユーザに提示される。ユーザは、次に、結果として得られたオブジェクトを受容するまたは拒絶することが可能である。受容された場合には、追加の処理が、オブジェクトをディジタル化するために続くことがある。拒絶された場合には、実施形態は、オブジェクトの新しい写真を撮ることによって処理全体を始めるかどうかをユーザに問い合わせることができる、または単純にオブジェクトを再スムージングするまたは再フィルタリングすることができる。 [0054] In one embodiment, as shown at 312, the resulting smoothed and / or filtered object is presented to the user for confirmation. The user can then accept or reject the resulting object. If accepted, additional processing may continue to digitize the object. If rejected, the embodiment can ask the user whether to start the entire process by taking a new picture of the object, or simply re-smooth or re-filter the object.

[0055]最終的に、前面画像および裏面画像は、３Ｄでのオブジェクトのポイントクラウド構成を生成するために使用される。図１１に詳細に示される「ポイントクラウド構成」は、特定されたオブジェクトの各点またはピクセルの深さを用いた、３Ｄ空間へのオブジェクトの前面画像および／または裏面画像のマッピングである。ポイントクラウド構成は、オブジェクトのさらに先のディジタル化の際に使用される。とは言え、代替実施形態は、深さデータのおよび色データの別の表現または空間集合体を使用することができ、さまざまな画像からオブジェクトの構成物または別のタイプの表現を作成することができる。 [0055] Finally, the front and back images are used to generate a point cloud composition of the object in 3D. The “point cloud configuration” shown in detail in FIG. 11 is a mapping of the front and / or back image of an object to 3D space using the depth of each point or pixel of the identified object. A point cloud configuration is used for further digitization of objects. Nonetheless, alternative embodiments can use different representations or spatial collections of depth data and color data to create object constructs or other types of representations from various images. it can.

[0056]図４〜図２６は、ディジタル化プロセスにおけるさまざまなステップの画像を示し、さまざまな実施形態によって使用される処理を図説するために下記にさらに詳細に論じられる。具体的に、図４Ａおよび図４Ｂは、一実施形態による、ディジタル化のためにオブジェクトを提示するユーザのカメラ視野像である。図示した実施形態では、オブジェクトの２つの視野が取り込まれる。カラーカメラは、対象になるオブジェクトの周りの６４０×４８０の色ウィンドウを得るためにフレームの中心上にズームインされ、色ウィンドウの角は、次に（角が、対象になるオブジェクトの前面にあると仮定して）深さフレーム座標へと変換される。一致する１６０×１２０のウィンドウが、次に、深さフレームから捉えられる。（カメラまでの対象になるオブジェクトの距離に依存して）この事前フレームウィンドウ調節を用いないと、深さウィンドウおよび色ウィンドウは、可能な限り十分には重ならないことがある。その上、生の色および生の深さは、深さと色の位置合わせまたは色と深さの位置合わせを実行しないで取り込まれ得る。さまざまな他の解像度が代わりに使用され得るので、この解像度数およびウィンドウは、単に例示の目的で与えられる。 [0056] FIGS. 4 through 26 show images of various steps in the digitization process and are discussed in further detail below to illustrate the processes used by various embodiments. Specifically, FIGS. 4A and 4B are camera view images of a user presenting an object for digitization according to one embodiment. In the illustrated embodiment, two views of the object are captured. The color camera is zoomed in on the center of the frame to obtain a 640 × 480 color window around the object of interest, and the corner of the color window is then (the corner is in front of the object of interest (Assuming) converted to depth frame coordinates. A matching 160 × 120 window is then captured from the depth frame. Without this pre-frame window adjustment (depending on the distance of the object of interest to the camera), the depth window and the color window may not overlap as well as possible. Moreover, raw color and raw depth can be captured without performing depth-to-color alignment or color-to-depth alignment. Since various other resolutions can be used instead, this resolution number and window are provided for illustrative purposes only.

[0057]一実施形態では、深さ画像は、対象になるオブジェクトにセグメント化される。そうするために、カメラに最も近接した深さピクセルが、このような点が対象になるオブジェクト上にあると仮定して、検索され、見つけられる。この実施形態は、次に、見つけた最も近接した点から外へ向かって、深さエッジ（すなわち、深さがオブジェクトの前面から遠すぎる場合、または深さデータがない場合）にぶつかるまで塗りつぶす。加えて、大きな勾配の領域の周りでありかつ非常に少ない隣接点しか有さない点は、除かれることがある。結果は、図５に示したように、対象になるオブジェクト上にある深さピクセルのマスク（本明細書においては「セグメント化した深さ画像」と呼ばれる）である。セグメント化した深さ画像は、深さフレーム（１０のリングバッファサイズで出荷されたＢＡＢ／ＧＯＥ）のリングバッファ中に記憶され、最も古い深さフレームを書き換え、すべてのフレームを全体として平均して最終的な深さ画像を得る。一実施形態では、セグメント化した深さピクセルだけが、最終的な平均に寄与する。結果として、ノイズはスムージングされ、より安定なオブジェクトエッジをもたらし、オブジェクトの複数の部分がノイズまたはＩＲ反射の悪い物質のために点滅しセグメント化から外れるシナリオを改善する。 [0057] In one embodiment, the depth image is segmented into objects of interest. To do so, the depth pixel closest to the camera is searched and found assuming such a point is on the object of interest. This embodiment then fills outward from the closest point found until it hits a depth edge (ie, if the depth is too far from the front of the object or there is no depth data). In addition, points that are around large gradient regions and have very few adjacent points may be excluded. The result is a mask of depth pixels (referred to herein as a “segmented depth image”) over the object of interest, as shown in FIG. The segmented depth image is stored in the depth buffer (BAB / GOE shipped with a ring buffer size of 10) in the ring buffer, rewriting the oldest depth frame and averaging all frames together Get the final depth image. In one embodiment, only the segmented depth pixels contribute to the final average. As a result, the noise is smoothed, resulting in a more stable object edge, improving the scenario where multiple parts of the object blink and go out of segmentation due to noise or poor IR reflection material.

[0058]図６は、一実施形態による、深さと色のオフセットの図である。図示したように、一実施形態は、緑色（最上部の右角に示される）、赤色（下側左角に示される）、およびその間に２つの混合色を有する深さと色のオフセットテーブルを作る。各ピクセルの深さ空間座標と色空間座標との間のオフセットは、色セグメント化およびメッシュ処理中に迅速なルックアップのためにテーブル中に記憶され、同様に、特定のカメラの較正設定にかかわらず、２枚の取り込んだカラー画像だけを使用して出力メッシュを完全に再生することを助ける。オブジェクトセグメント化の外のテーブルの領域は、外に向かうセグメント化のエッジのところのオフセットをコピーすることによって埋められることがある。エッジにおけるコピーしたオフセットは、深さ画像へと投影された出力メッシュ中の頂点が深さセグメント化の境界線の外側になるケースを取り扱うために後で使用され得る。 [0058] FIG. 6 is a diagram of depth and color offsets according to one embodiment. As shown, one embodiment creates a depth and color offset table with green (shown in the upper right corner), red (shown in the lower left corner), and two mixed colors in between. The offset between the depth and color space coordinates of each pixel is stored in a table for quick lookup during color segmentation and meshing, as well as depending on the specific camera calibration settings. Rather, it helps to completely reproduce the output mesh using only the two captured color images. The area of the table outside the object segmentation may be filled by copying the offset at the outward segmentation edge. The copied offset at the edge can later be used to handle the case where the vertices in the output mesh projected onto the depth image are outside the boundary of the depth segmentation.

[0059]一実施形態に従って、図７は、ソースカラー画像であり、図８は、取り込んだオブジェクトの色セグメント化の図である。深さ空間におけるセグメント化から始めると、一実施形態は、星形分配パターンを使用して、各々のセグメント化した深さピクセルを３２０×２４０の色セグメント化バッファへと分配する。結果として得られたパターンは、次に、６４０×４８０へと「アップサンプル」され、ソース深さピクセルが「理想的な」距離からどれだけ遠いかを記述する「理想からの距離」値が、次に各々のセグメント化した色ピクセルについて計算される。理想的な距離は、可能な限り色／深さデータを多く取得するために、ユーザが対象になるオブジェクトをどれだけカメラに近づけて保持すべきかを、深さカメラの前面クリップ平面を横切らずに表す。これらの値は、取込みプロセス中にユーザへのフィードバックとして表示され得る。理想からさらに遠くのピクセルは、理想に近いピクセルよりももっとひどくぼかされ汚されることがある。理想からの距離値は、リアルタイムプレビューのために使用されるカラー画像のアルファチャネルへと最終的にはコピーされる。 [0059] FIG. 7 is a source color image and FIG. 8 is a color segmentation diagram of a captured object, according to one embodiment. Starting with segmentation in depth space, one embodiment uses a star distribution pattern to distribute each segmented depth pixel into a 320 × 240 color segmentation buffer. The resulting pattern is then “upsampled” to 640 × 480 and a “distance from ideal” value describing how far the source depth pixel is from the “ideal” distance, It is then calculated for each segmented color pixel. The ideal distance is how far the user should keep the object of interest close to the camera to get as much color / depth data as possible, without crossing the depth camera's front clip plane. Represent. These values can be displayed as feedback to the user during the capture process. Pixels that are further from the ideal may be more seriously blurred and smudged than pixels that are closer to the ideal. The distance value from the ideal is eventually copied to the alpha channel of the color image used for real-time preview.

[0060]図９および図１０は、一実施形態による、ディジタル化されるオブジェクトを保持するための指図を与えるユーザインターフェース（ＵＩ）の図である。図９は、図示した実施形態が、セグメント化したピクセルの数、カメラまでの距離、カメラ視野の中心からの距離、ピクセル安定性、オブジェクトサイズを解析し、オブジェクトをどのようにして最良の位置に置くかについての視覚フィードバックおよびテキストフィードバックを与えることを示す。フィードバックは、スクリーン上の外形線の形態であってもよい。図１０は、上記のものと同じプロセスを使用する、対象になるオブジェクトの裏面の画像の色データおよび深さデータを示す。一実施形態は、セグメント化した前面取込み画像の外形線を使用してオブジェクトを正しく向けるようにユーザに指図する。前面取込み画像および裏面取込み画像が後で自動的に位置合わせ可能であるという理由で、ユーザは、正確に外形線を一致させなければならないことはない。 [0060] FIGS. 9 and 10 are diagrams of a user interface (UI) that provides instructions for holding an object to be digitized, according to one embodiment. FIG. 9 illustrates how the illustrated embodiment analyzes the number of segmented pixels, the distance to the camera, the distance from the center of the camera field of view, the pixel stability, the object size, and how to best position the object. Shows giving visual feedback and text feedback about placing. The feedback may be in the form of an outline on the screen. FIG. 10 shows the color data and depth data of the back image of the object of interest using the same process as described above. One embodiment uses the outline of the segmented frontal captured image to direct the user to correctly point the object. The user does not have to match the outline exactly because the front and back captured images can be automatically aligned later.

[0061]図１１は、一実施形態による、ポイントクラウド構成を示す。この時点で、２つの色データ画像および深さデータ画像は、対象になるオブジェクトへとセグメント化されている。これらの画像を使用して、対象になるオブジェクトの表面上の点のポイントクラウド構成が、作られることが可能であり、三角形メッシュを再構成するために後で使用され得る。前面深さ画像中のセグメント化したピクセルは、３Ｄポイント「シート」へと変換される。一実施形態では、位置は、深さデータを使用して深さ画像空間からモデル空間へと投影されずに、原点はシートの裏面中心である。シートのエッジは、追加の点を加えることによって裏に向かって突き出し、オブジェクトの側面を形成する。オブジェクトがどれだけ「深い」かを推測するために、ＢＡＢ／ＧＯＥでは、突き出した距離についての一定値が使用され得る。 [0061] FIG. 11 illustrates a point cloud configuration, according to one embodiment. At this point, the two color data images and the depth data image have been segmented into objects of interest. Using these images, a point cloud composition of points on the surface of the object of interest can be created and later used to reconstruct the triangular mesh. The segmented pixels in the front depth image are converted into 3D points “sheets”. In one embodiment, the position is not projected from depth image space to model space using depth data, and the origin is the back center of the sheet. The edge of the sheet protrudes back by adding additional points to form the sides of the object. In order to infer how “deep” an object is, BAB / GOE can use a constant value for the protruding distance.

[0062]同様に、裏面深さ画像からの３Ｄポイントシートは、原点として前面取込み画像の裏面中心を使用して作られる。図１２は、一実施形態による、位置を合わせられたポイントシートの２つの視野である。シートの位置を合わせるために、初期変換は、上向きの軸の周りで１８０度このシートを回転して計算され、その結果、ポイントクラウドの裏面を形成する。一実施形態では、前面シートおよび裏面シートのエッジを可能な限り近くに位置を合わせる別の変換が、計算される。位置合わせプロセスは、裏面シートの重心が前面シートの重心と一致するように裏面シートを移動することがある。ブルートフォース反復は、次に、各前面エッジ点からその最も近接した裏面エッジ点までの距離の総和として計算される「位置合わせ誤差」値を最小にするために、移動および回転の範囲全体にわたって使用される。反復は、マルチパス（各パスで、１回に１つ各々の移動および回転軸についての最良値を計算することを試みる）で行われることがあり、各軸を横切る検索は、効率性のために２段の階層的な手法を使用して行われる。最近接点検索は、３Ｄセル空間区分を使用して加速される。一実施形態はまた、高速のきめ細かい位置合わせ用の反復最近接点（「ＩＣＰ」）アルゴリズムを実装する、または代替で、より優れた制御の必要性が、ブルートフォース法反復だけの使用を指示することがある。 [0062] Similarly, a 3D point sheet from a back depth image is created using the back center of the front captured image as the origin. FIG. 12 is two views of a aligned point sheet, according to one embodiment. To align the sheet, the initial transformation is calculated by rotating the sheet 180 degrees around the upward axis, thereby forming the back of the point cloud. In one embodiment, another transform is calculated that aligns the edges of the front and back sheets as close as possible. The alignment process may move the back sheet so that the center of gravity of the back sheet matches the center of gravity of the front sheet. The brute force iteration is then used over the entire range of translation and rotation to minimize the “alignment error” value calculated as the sum of the distance from each front edge point to its nearest back edge point. Is done. Iterations may be made in multiple passes (at each pass, try to calculate the best value for each translation and rotation axis one at a time), and searching across each axis is for efficiency This is done using a two-stage hierarchical approach. Nearest neighbor search is accelerated using 3D cell space partitioning. One embodiment also implements an iterative closest point (“ICP”) algorithm for fast fine-grained alignment, or alternatively, the need for better control dictates the use of only brute force iterations There is.

[0063]裏面シート中に対応する点を有さない前面シートからの点は、最も近い裏面点を見つけるために各前面点から後ろに向かって検索するために抜粋されることがある。同じように、前面シート中に対応する点を有さない裏面シートからの点は、抜粋されることがある。これが、ユーザの手が取込み画像内にあるが、取込み画像間で位置を変える場合に、またはオブジェクトが前面取込み画像と裏面取込み画像との間で形状を変える場合に生じることがあるような、前面取込み画像と裏面取込み画像との間で一致しないシートの部分を取り除く。 [0063] Points from the front sheet that do not have corresponding points in the back sheet may be extracted to search backward from each front point to find the nearest back point. Similarly, points from the back sheet that do not have corresponding points in the front sheet may be extracted. This is the case when the user's hand is in the captured image but changes position between captured images or when the object changes shape between the front captured image and the back captured image. The portion of the sheet that does not match between the captured image and the back captured image is removed.

[0064]一実施形態では、残りの点は、最終的なポイントクラウドへと一緒にマージされ、点に対する法線は、各点およびその右隣りと下隣りによって形成される平面を使用して計算される。図１３は、一実施形態による、最終的なポイントクラウド構成を示す。 [0064] In one embodiment, the remaining points are merged together into the final point cloud, and the normals for the points are calculated using the plane formed by each point and its right and bottom neighbors. Is done. FIG. 13 illustrates a final point cloud configuration according to one embodiment.

[0065]確認画像は、図１４に示したように、次に、ユーザに提示され得る。確認画像は、シート位置合わせおよび点抜粋の結果を組み入れて、取込み、位置合わせ、または抜粋がひどく失敗したケースを検出すること、および構成プロセスの残りを経ずに再び取り込むことをユーザに可能にする。画像は、最終的なポイントクラウド中の点を前面カラー画像および裏面カラー画像のアルファチャネルへと投影しかつ分配し、位置合わせ変換に基づいて裏面画像を回転し、ある追加の画像のクリーンアップを行うことによって、作られる。 [0065] The confirmation image may then be presented to the user as shown in FIG. The confirmation image incorporates the results of sheet alignment and point excerpts to allow the user to detect cases where capture, alignment, or excerpts failed badly, and to recapture without going through the rest of the configuration process To do. The image projects and distributes the points in the final point cloud to the alpha channel of the front and back color images, rotates the back image based on the alignment transformation, and cleans up some additional images. Made by doing.

[0066]表面再構成ステップは、最終的なポイントクラウドを撮り、三角形メッシュを生成する。図１５は、表面再構成を用いたメッシュ出力の図を示す。一実施形態は、ＭＲＳ−ＢｅｉｊｉｎｇのＸｉｎＴｏｎｇのグループ内のＭｉｎｍｉｎＧｏｎｇによって開発され、Ｋａｚｈｄａｎ、Ｂｏｌｉｔｈｏ、およびＨｏｐｐｅによる「ＰｏｉｓｓｏｎＳｕｒｆａｃｅＲｅｃｏｎｓｔｒｕｃｔｉｏｎ（ポアソン表面再構成）」、ならびにＺｈｏｕ、Ｇｏｎｇ、Ｈｕａｎｇ、およびＧｕｏによる「ＨｉｇｈｌｙＰａｒａｌｌｅｌＳｕｒｆａｃｅＲｅｃｏｎｓｔｒｕｃｔｉｏｎ（非常に平行な表面再構成）」に詳細に記載されたＰｏｉｓｓｏｎＳｕｒｆａｃｅＲｅｃｏｎｓｔｒｕｃｔｉｏｎアルゴリズムのハイブリッドＣＰＵ／ＧＰＵ実装形態を使用する。これは、メモリと時間の両方においてディジタル化の計算的に見て最も激しい部分である場合があり、いくつかの実施形態では、ほぼ２０，０００点の典型的なポイントクラウドデータに対して１０〜２０秒かかる。穴埋めの量は、制御下にメモリ使用量を保つために再構成中には制限され得るが、ポイントクラウド中に大きな穴がある場合には、このような制限は、水漏れの多いメッシュのような状態を結果としてもたらすことがある。 [0066] The surface reconstruction step takes a final point cloud and generates a triangular mesh. FIG. 15 shows a diagram of mesh output using surface reconstruction. One embodiment was developed by Minmin Gong in the Xin Tong group of MRS-Beijing, “Poisson Surface Reconstruction” by Kazhdan, Bolitho, and Hoppe, and by Zhou, Gong, Ghou, Gong, A hybrid CPU / GPU implementation of the Poisson Surface Reconstruction algorithm described in detail in "Highly Parallel Surface Reconstruction" is used. This may be the computationally most intense part of digitization in both memory and time, and in some embodiments 10 to 10 points for typical point cloud data of approximately 20,000 points. It takes 20 seconds. The amount of hole filling can be limited during reconfiguration to keep the memory usage under control, but if there are large holes in the point cloud, such a limit can be seen as a leaky mesh. Can result in a bad state.

[0067]図１６は、一実施形態による、オブジェクトのスムージングし処理した画像である。頂点隣接リストが作られ、面法線および頂点法線が計算される。次に、一実施形態は、ラプラスアルゴリズムを使用して、いくつかの制約をスムージングする。結果として、オブジェクトの側面は丸められ、ノイズが除去され、ポイントシートが完全には整列していな領域は、クリーンアップされる。 [0067] FIG. 16 is a smoothed and processed image of an object, according to one embodiment. A vertex adjacency list is created, and surface normals and vertex normals are calculated. Next, one embodiment uses a Laplace algorithm to smooth some constraints. As a result, the sides of the object are rounded, noise is removed, and areas where the point sheets are not perfectly aligned are cleaned up.

[0068]ポイントクラウドの品質に応じて、表面再構成は、単一の大きなメッシュの代わりに形状の小さな「島」を作ることが可能である。一実施形態は、接続されたコンポーネントのラベルを使用して、島を見つけ、その体積を計算し、最も大きな島よりも著しく小さな島を除去する。 [0068] Depending on the quality of the point cloud, surface reconstruction can create small “islands” of shape instead of a single large mesh. One embodiment uses connected component labels to find islands, calculate their volume, and remove islands that are significantly smaller than the largest island.

[0069]各頂点について、一実施形態は、その頂点の法線と前面取込み視野方向および裏面取込み視野方向との間のドット積を考察する。前面視野方向は、モデル空間の負のＺ軸に沿うことがあり、裏面視野方向は、シート位置合わせプロセスの結果に依存することがあり、正のＺ軸に沿わないことがある。結果として、いくつかの頂点は、前面取込み視野および裏面取込み視野の両方に見られことがあり、いくつかの頂点は、どちらの視野にも見られないことがある。いくつかの頂点は、その法線が裏面よりも前面に向いている場合には「前面」として分類されることがあり、逆もまた同様である。これはやはり、「シーム」頂点（すなわち、オブジェクトの前面視野および裏面視野にまたがる頂点）の位置決定を可能にする。 [0069] For each vertex, one embodiment considers the dot product between the normal of that vertex and the front and back capture field directions. The front view direction may be along the negative Z axis of the model space, and the back view direction may depend on the result of the sheet alignment process and may not be along the positive Z axis. As a result, some vertices may be seen in both the front capture field and the back capture field, and some vertices may not be seen in either field. Some vertices may be classified as “front” if their normals are facing the front rather than the back, and vice versa. This again allows the location of “seam” vertices (ie vertices that span the front and back views of the object).

[0070]最終的なメッシュ上へと適用するテクスチャマップを作るために、一実施形態は、前面取込み画像からのカラー画像をテクスチャの最上部に置き、裏面取込み画像からのカラー画像を前面取込み画像のすぐ下に置く。テクスチャの最上部部分からのテクセルは、次に、主として前向きの三角形上へとマッピングされ、主として裏向きの三角形についても同様である。頂点は、最初には、ちょうど前面裏面のシームに沿った前面三角形と裏面三角形との間で共有されることがあり、後に、これらの共有された頂点は、複製されることがあり、その結果、前面三角形と対比して裏面三角形にテクスチャの異なる部分をマッピングする。 [0070] To create a texture map to apply onto the final mesh, one embodiment places the color image from the front captured image on top of the texture and the color image from the back captured image to the front captured image. Just below. The texels from the top part of the texture are then mapped onto mainly forward triangles, and so on for the back triangles. Vertices may initially be shared between the front and back triangles just along the front-back seam, and later, these shared vertices may be duplicated, resulting in Map different parts of the texture to the back triangle as opposed to the front triangle.

[0071]一実施形態に従って、図１７は、ＵＶ座標を有する画像を示し、図１８は、最終的なテクスチャマップの一部へと描かれた前向きの三角形エッジの図を示す。ＵＶ座標を計算するために、前向き三角形は、前面取込みカラー画像が置かれるテクスチャの最上部部分にマッピングされ、底部についても同様である。頂点位置は、深さカメラの空間内にあるが、カラー画像は、カラーカメラの空間内にあり、そのため、前面深さ画像／裏面深さ画像上へと頂点を投影した後で、一実施形態は、深さと色のオフセットテーブルを使用して、座標をカラーカメラ空間へと変換する。 [0071] According to one embodiment, FIG. 17 shows an image with UV coordinates, and FIG. 18 shows a diagram of forward-facing triangular edges drawn into a portion of the final texture map. To calculate the UV coordinates, the forward triangle is mapped to the top portion of the texture where the front captured color image is placed, and so on for the bottom. The vertex position is in the depth camera space, but the color image is in the color camera space, so after projecting the vertex onto the front / back depth image, one embodiment Converts coordinates to color camera space using a depth and color offset table.

[0072]一実施形態では、メッシュは、中心位置を変更され、上向きの軸に関してミラーリングされ、最大幅／高さアスペクト比を守るためにスケーリングされる。取り込んだカラー画像および深さ画像は、実際の物理的なオブジェクトと比較するためにミラーリングされ、そのために別のミラーリングが、これを反転するために使用される。スケルトンは、最適化されることがあり、アニメーションが、幅の広いオブジェクトよりもむしろ背の高いオブジェクトに対して追加されることがあり、そのため、幅／高さアスペクト比制限が、ある種のスケルトンとは一致しない幅の広いオブジェクトをアニメ化することによって生じるアーティファクト上に境界線を引く。 [0072] In one embodiment, the mesh is repositioned, mirrored about the upward axis, and scaled to preserve the maximum width / height aspect ratio. The captured color and depth images are mirrored for comparison with actual physical objects, so another mirroring is used to invert it. Skeletons may be optimized and animations may be added to tall objects rather than wide objects, so width / height aspect ratio restrictions may be some sort of skeleton Draw a border on the artifact that results from animating a wide object that doesn't match.

[0073]一実施形態では、１つのスケルトンが、スケルトンのすべてのアニメーションに対して使用される。スケルトンは、対象になるオブジェクトがはるかに多くの形状を有することを必要とせずに、動き（歩くこと、ジャンプすること、はうこと、ダンスすること、左右を見ること、等）の良い範囲を与える骨を有することができる。 [0073] In one embodiment, one skeleton is used for all animations of the skeleton. Skeletons have a good range of movement (walking, jumping, hopping, dancing, looking left and right, etc.) without requiring the object in question to have much more shapes. You can have bones to give.

[0074]ディジタル化した画像に肌を付けるために、メッシュは、スケーリングし直され、位置決めされ、その結果、スケルトンは、最上部の骨がオブジェクトの最上部からある割合（例えば、ほぼ９０％）に位置し（最上部の骨をオブジェクトの「頭部」の大雑把に内側に置き）、底部の骨をオブジェクトの底部範囲のところに置いた状態で、その内側にぴったりと一致する。骨インデックスが、次に計算されることが可能であり、各頂点への上向きの軸に沿って最も近い骨を見つけることによってスケルトンに重みが付け加えられ、フォールオフ曲線を使用して骨を重み付けする。図１９Ａ〜図１９Ｅは、一実施形態による、生成したスケルトン構造のさまざまな骨に加えられた重み付けを示す図である。 [0074] In order to skin the digitized image, the mesh is rescaled and positioned so that the skeleton is a percentage of the top bone from the top of the object (eg, approximately 90%). (The top bone is placed roughly inside the object's "head") and the bottom bone is placed in the bottom area of the object, and it fits snugly inside. The bone index can then be calculated, weighting the skeleton by finding the closest bone along the upward axis to each vertex, and weighting the bone using a falloff curve . 19A-19E illustrate weighting applied to various bones of the generated skeleton structure, according to one embodiment.

[0075]カラー画像および／または深さ画像は、ノイズを減少させ品質を向上させるために処理される。処理は、一実施形態では、前面画像および裏面画像に独立に行われ、結果が、最終的なテクスチャマップへとマージされ、このマップは追加の処理を必要とすることがある。いくつかの実験およびアーティストからのフィードバックの後で、下記のステップが最適であることが見出された。ｓＲＧＢ色を線形空間へと変換し、「グレイワールド」自動ホワイトバランスを適用し、エッジアーティファクトを修復し、輝度値および彩度値を計算し、輝度に対してバイラテラルフィルタリング、ヒストグラムイコライゼーション、およびシャープニングを適用し、彩度に中央値フィルタリングを適用し、ｓＲＧＢに変換して戻し、最後に、画像のセグメント化していない領域へと外に向かって色のエッジを延伸する。他のステップが、追加されてもよく、上記のいくつかは、別の実施形態では削除されてもよい。 [0075] Color and / or depth images are processed to reduce noise and improve quality. Processing is performed independently for the front and back images, in one embodiment, and the results are merged into the final texture map, which may require additional processing. After several experiments and feedback from the artists, the following steps were found to be optimal. Convert sRGB colors to linear space, apply “gray world” automatic white balance, repair edge artifacts, calculate luminance and saturation values, bilateral filtering on luminance, histogram equalization, and sharpening Apply ning, apply median filtering to saturation, convert back to sRGB, and finally stretch the color edges outward to the unsegmented region of the image. Other steps may be added, and some of the above may be deleted in other embodiments.

[0076]図２０Ａおよび図２０Ｂは、一実施形態による、輝度／彩度処理前後の画像を示す。独立に輝度／彩度を処理することは、輝度画像中の詳細を保存しながら彩度をはるかに強くフィルタリングすることを可能にし、これは画像をノイズ除去することに効果的である。ヒストグラムイコライゼーションは、露出の悪い画像を補償するために非常に軽く適用され得る。 [0076] FIGS. 20A and 20B show images before and after luminance / saturation processing, according to one embodiment. Independently processing the luminance / saturation allows the saturation to be filtered much more strongly while preserving details in the luminance image, which is effective in denoising the image. Histogram equalization can be applied very lightly to compensate for poorly exposed images.

[0077]図２１Ａおよび図２１Ｂは、一実施形態による、ソース画像およびエッジがフィルタリングされた後の出力画像を示す。一実施形態では、「エッジ修復フィルタ」は、実際には背景からでありオブジェクト自体ではないものに由来する対象になるオブジェクトのエッジのところの色を置き換えるように試みる。悪い色は、深さ画像の比較的低い解像度および大きなノイズならびに不完全な深さと色の位置合わせに起因して紛れ込むことがある。エッジ修復フィルタは、オブジェクトエッジのすぐ周りのピクセルの「議論の領域」上に動作する。議論の領域の内側のピクセルは、明確に対象になるオブジェクトの一部分であり、もっと外部のピクセルは背景の一部分であるという仮定を使用すると、「背景確率」値は、議論の領域のピクセルごとに計算され、高確率背景ピクセルを内側の色に対して混ぜるために使用される。 [0077] FIGS. 21A and 21B illustrate the source image and the output image after the edges have been filtered, according to one embodiment. In one embodiment, an “edge repair filter” attempts to replace the color at the edge of the object of interest that is actually from the background and not from the object itself. Bad colors can be confused due to the relatively low resolution and large noise of depth images and imperfect depth and color alignment. The edge repair filter operates on the “discussion area” of the pixels immediately surrounding the object edge. Using the assumption that the pixels inside the discussion area are clearly part of the object of interest and the more external pixels are part of the background, the "background probability" value is Calculated and used to blend high probability background pixels against the inner color.

[0078]図２２Ａおよび図２２Ｂは、一実施形態による、エッジ修復フィルタが背景色および対象になる色を見つける画像を示す。対象になる色は、外側から議論の領域へと外挿される。 [0078] FIGS. 22A and 22B illustrate images in which an edge repair filter finds background and colors of interest, according to one embodiment. The target color is extrapolated from the outside to the area of discussion.

[0079]図２３Ａおよび図２３Ｂは、一実施形態による、エッジから議論の領域までの距離および計算した背景確率値を示す画像である。さらにその上、図２４は、一実施形態による、最終処理をしていない画像上にテクスチャ加工を行った画像の最終的な合成テクスチャマップを示す。 [0079] FIGS. 23A and 23B are images showing the distance from the edge to the area of discussion and the calculated background probability value, according to one embodiment. Furthermore, FIG. 24 illustrates a final composite texture map of an image that has been textured onto an image that has not been final processed, according to one embodiment.

[0080]前面画像および裏面画像を一緒に置くことから生じるシームは、修復される必要がある場合がある。メッシュ処理のわずかな残りは、前面と裏面シームの近くでありかつ取込み中にカラーカメラに見えない領域内のオブジェクトの外見を改善するために使用される。最初に、テクスチャ色が頂点においてどれだけ「悪い」かを表す頂点ごとのマスク値が計算される。この値は、（前面画像および裏面画像が接触するが、概して、上手く並んでない）シームまでの距離と、（カメラ視野から遠くに向いている表面に起因して、およびやはり不十分なテクセル密度から、テクスチャ色が分解される）取り込んだ画像のいずれかに対して頂点がどれだけ裏に向いているかとの積である。これらの値は、頂点色アルファチャネル内に記憶され得る。次に、表面色のぼかしたバージョンが、計算され、頂点色ＲＧＢチャネル中に記憶される。これらの色は、（細部においては低いが）かなり良い品質である。修復を必要とする負のアーティファクトは、比較的局在化され、高い頻度のものであり、ぼかすことは、より全体的な、低い頻度の色を与える。 [0080] Seams resulting from placing the front and back images together may need to be repaired. The slight remainder of the meshing is used to improve the appearance of objects in areas near the front and back seams and invisible to the color camera during capture. First, a per-vertex mask value is calculated that represents how “bad” the texture color is at the vertices. This value is due to the distance to the seam (where the front and back images touch but are generally not well aligned), and due to the surface facing away from the camera field of view, and again from insufficient texel density. The texture color is decomposed) is the product of how far the vertices are facing away from any of the captured images. These values can be stored in the vertex color alpha channel. Next, a blurred version of the surface color is calculated and stored in the vertex color RGB channel. These colors are of fairly good quality (although low in detail). Negative artifacts that require repair are relatively localized, high frequency, and blurring gives a more overall, low frequency color.

[0081]図２５Ａおよび図２５Ｂは、一実施形態による、マスクした値およびひどくぼけた頂点の色を示す。実行時間において、マスク値は、一実施形態では、ソーステクスチャとぼけた頂点の色とを混ぜ合わせるために使用される。図２６Ａおよび図２６Ｂは、一実施形態による、テクスチャだけを有するメッシュ（図２６Ａ）および頂点の色がマスク値によって混ぜ合わさったメッシュ（図２６Ｂ）を示す。 [0081] FIGS. 25A and 25B show masked values and severely blurred vertex colors, according to one embodiment. At run time, the mask value is used, in one embodiment, to blend the source texture and the color of the blurred vertex. FIGS. 26A and 26B show a mesh with texture only (FIG. 26A) and a mesh with vertex colors mixed by mask values (FIG. 26B), according to one embodiment.

[0082]図２７は、一実施形態による、ディジタル化したオブジェクトの最終的なレンダリングを示す。一実施形態では、一旦、最終的なメッシュおよびテクスチャが完成すると、ＵｎｒｅａｌＥｎｇｉｎｅ３メッシュが、作られ、環境ならびにリム照明、自己遮蔽、およびアニメーションでレンダリングされる。ＧＯＥアプリもまた、Ｎｕｉスケルトンをスケルトン上へとマッピングすることによってオブジェクトをアバター化することを可能にする。 [0082] FIG. 27 illustrates the final rendering of a digitized object, according to one embodiment. In one embodiment, once the final mesh and texture are complete, a Unreal Engine 3 mesh is created and rendered with the environment and rim lighting, self-occlusion, and animation. The GOE app also allows objects to be avatared by mapping Nui skeletons onto skeletons.

[0083]上記のステップは、ユーザビリティ、ＣＰＵ／ＧＰＵ／メモリ制約、出力品質、芸術的な配慮、センサ精度、および開発時間のバランスを取る。トレードオフが行われ、これは、各シナリオに固有でないこともある。そのようなものとして、最終的なディジタル化の速度または品質を向上させるために、さまざまなステップが付け加えられても、または上記のいくつかが削除されてもよい。 [0083] The above steps balance usability, CPU / GPU / memory constraints, output quality, artistic considerations, sensor accuracy, and development time. Tradeoffs are made and this may not be unique to each scenario. As such, various steps may be added or some of the above may be deleted to improve the speed or quality of the final digitization.

[0084]図２８は、一実施形態による、オブジェクトをディジタル化するための作業の流れ２８００を示す。２８０２において示したように、ある画像についての色データおよび深さデータが受信される。深さデータを解析して、ユーザが、取込み用のカメラにオブジェクトを提示する可能性が最も高いという仮定に基づいて、カメラまでの画像の最も近い点を特定することによって、対象のオブジェクトが見つけられる。対象のオブジェクトを決定するための代替方法が、代わりにまたは追加して使用されることがある。実施形態は、画像内でオブジェクトを位置決めするためのいずれかの特定のタイプの手段に限定されないので、さまざまな画像認識技術またはアルゴリズミックマッチング技術が、画像内でオブジェクトを位置決めするために使用されることがある。また、実施形態は、オブジェクトを位置決めするために深さデータに加えてまたは代わりに画像の色データを使用することができる。例えば、コカコーラ缶は、赤のトレードマーク色を含むことがあり、写真内に缶を位置決めしようとするときに、色データを特に関係させる。このように、対象のオブジェクトは、多くの異なる方法で見つけることができる。 [0084] FIG. 28 illustrates a workflow 2800 for digitizing an object, according to one embodiment. As indicated at 2802, color data and depth data for an image is received. Analyzing the depth data and finding the object of interest by identifying the closest point in the image to the camera, based on the assumption that the user is most likely to present the object to the capturing camera It is done. Alternative methods for determining the object of interest may alternatively or additionally be used. Since embodiments are not limited to any particular type of means for positioning an object in the image, various image recognition techniques or algorithmic matching techniques are used to position the object in the image. Sometimes. Embodiments can also use image color data in addition to or instead of depth data to position objects. For example, a Coca-Cola can may contain a red trademark color, which is particularly relevant when trying to position the can in a photograph. In this way, the target object can be found in many different ways.

[0085]一旦、対象のオブジェクトが位置決めされると、２８０６において示したように、オブジェクトのエッジが特定される。このような決定は、オブジェクトの周りの画像内の色、深さ、またはコントラストの差異を解析することによって行われることがある。一旦、エッジが位置決めされると、オブジェクトのポイントクラウド構成が、２８０８において示したように、画像の色データおよび深さデータを使用して実行されることがある。３Ｄにオブジェクトをディジタル化するために、オブジェクトのさまざまな面についての複数のポイントクラウド構成が、複数の画像（例えば、裏面、前面、上面、底面、等）の色データおよび深さデータに基づいて構成されることがある。複数のポイントクラウド構成は、一旦作られると、２８１０において示したように、オブジェクトの最終的なディジタル化を作成するために集計され得る。 [0085] Once the target object is positioned, the edges of the object are identified, as shown at 2806. Such a determination may be made by analyzing differences in color, depth, or contrast in the image around the object. Once the edge is positioned, the object's point cloud configuration may be performed using the color and depth data of the image, as shown at 2808. To digitize an object in 3D, multiple point cloud configurations for various aspects of the object are based on color and depth data for multiple images (eg, back, front, top, bottom, etc.). May be configured. Once created, multiple point cloud configurations can be aggregated to create the final digitization of the object, as shown at 2810.

[0086]図２９は、一実施形態による、オブジェクトをディジタル化するための作業の流れ２９００を示す。一旦、オブジェクトの画像が受信されると、２９０２において示したように、画像の複数の最も近い点が、２９０４において示したように特定される。オブジェクトの側面（例えば、左、右、北、南、上部、底部、等）が、２９０６において示したように特定される。複数の画像の複数のポイントクラウド構成が、２９０８において示したように作られ、２９１０において示したように１つの表現へとマージされる。得られた表現は、次に、２９１２において示したように保存され得、表示デバイス上に提示される。 [0086] FIG. 29 illustrates a workflow 2900 for digitizing an object, according to one embodiment. Once the image of the object is received, the closest points of the image are identified as shown at 2904, as shown at 2902. The side of the object (eg, left, right, north, south, top, bottom, etc.) is identified as shown at 2906. Multiple point cloud configurations of multiple images are created as shown at 2908 and merged into a single representation as shown at 2910. The resulting representation can then be saved as shown at 2912 and presented on the display device.

[0087]示したさまざまなコンポーネント、ならびに示されていないコンポーネントの多くの異なる配置が、別記の特許請求の範囲の範囲から逸脱せずに可能である。我々の技術の実施形態は、限定的であるというよりは例示であることを目的として説明されてきている。代替実施形態は、この開示を読んだ後、およびその理由により明らかになるであろう。前述の内容を実装する代替手段は、別記の特許請求の範囲の範囲から逸脱せずに完成することができる。ある種の特徴および部分的組合せが有用なものであり、他の特徴および部分的組合せを参照せずに利用される場合があり、特許請求の範囲の範囲内であると考えられる。 [0087] Many different arrangements of the various components shown, as well as those not shown, are possible without departing from the scope of the appended claims. Embodiments of our technology have been described for purposes of illustration rather than limitation. Alternative embodiments will become apparent after reading this disclosure and for that reason. Alternative means of implementing the foregoing can be completed without departing from the scope of the appended claims. Certain features and subcombinations are useful and may be utilized without reference to other features and subcombinations and are considered to be within the scope of the claims.

Claims

One or more computer storage media incorporating computer-executable instructions that, when executed, digitize objects captured by the camera, the method comprising:
Receiving color data and depth data related to the image;
Identifying the closest point in the image to the camera;
Identifying an edge of an object in the image from the closest point;
Generating a point cloud construction of the object using the depth data;
Generating one or more digitizations of the object using the point cloud configuration.

Identifying the closest point in the image to the camera,
A sub-step of calculating distances between the camera aperture and a plurality of points in the image;
The one or more computer storage media of claim 1, further comprising a substep of selecting a shortest distance among the distances.

A sub-step for identifying a connected feature to the object at the shortest distance;
The one or more computer storage media of claim 2, further comprising a substep of storing an indication of the shape that represents the closest portion of the object to the camera.

Receiving color data and depth data for a second image of the object;
Determining a second closest point in the second image;
Identifying the object at the second closest point;
Generating a second point cloud configuration of the object to be oriented in the second image (as oriented in);
The one or more computer storage media of claim 1, further comprising:

Identifying a seam between the point cloud configuration and the second point cloud configuration;
Determining a filler color to fill a portion of the seam;
Painting the portion of the seam with the fill color to create the 3D rendition having a seamless edge between the point cloud configuration and the second point cloud configuration;
Storing the 3D display;
The one or more computer storage media of claim 4, further comprising:

Identifying the object comprises:
Performing an image analysis within a spatial region surrounding the nearest point in the image;
Determining a color difference between two regions in the spatial region based on the image analysis;
A sub-step of designating one of the regions to be associated with the object;
The one or more computer storage media of claim 1, further comprising a sub-step of removing another region having a different color from the one of the regions.

Removing one or more points of the image around a region associated with the object and having a threshold number less than the threshold number of adjacent points, and masking the depth pixels of the object The one or more computer storage media of claim 1, further comprising:

Overwrites at least one depth frame and averages multiple frames together to produce a final depth image in the depth frame ring buffer The one or more computer storage media of claim 7, further comprising storing the mask of the depth pixels at a time.

A method for displaying a digital representation of an object comprising:
Receiving a plurality of images incorporating the object from different views;
Using each depth data to identify a plurality of closest points in the image up to one or more cameras in two separate images;
Identifying at least two different sides of the object from the plurality of closest points;
Creating constructions incorporating at least two different aspects of the object;
Merging the constructs together into a representation of the object;
Storing the representation of the object.

Determining a plurality of points of one of the components to connect to another component;
The method of claim 9, further comprising aligning the component at the plurality of points.

The method of claim 10, wherein at least one of the images is received from a server.

For each image, identifying a boundary of the object;
The method of claim 9, further comprising: filling at least one color gap between two boundaries to alleviate a portion of a seam between the constructs when merging the constructs. The method described.

The method of claim 9, wherein the image includes color data and depth data.

A camera capable of capturing or receiving images each containing color data and depth data;
One or more computer storage media storing the color data and the depth data of at least one image;
One or more processors,
(1) identifying an object in the at least one image;
(2) creating a digital representation of the object in the at least one image;
(3) one or more processors configured to create a 3D representation of the object by combining the digital representation with a second digital representation of the object created from a second image;
A computing device comprising: a display device configured to display the 3D display.

The one or more computer storage media of claim 14, wherein the one or more processors move the object on the display device according to a set of rules for moving the object's limbs.