JP7138361B2

JP7138361B2 - User Pose Estimation Method and Apparatus Using 3D Virtual Space Model

Info

Publication number: JP7138361B2
Application number: JP2020548924A
Authority: JP
Inventors: ナジュト; ガヒョンイム; チャンフンヒョン; ドンウキム; ブンチョルチャン; ヒョンエーチェ
Original assignee: Korea University Research and Business Foundation
Current assignee: Korea University Research and Business Foundation
Priority date: 2019-06-14
Filing date: 2020-04-07
Publication date: 2022-09-16
Anticipated expiration: 2040-04-07
Also published as: JP2021531524A; KR102387797B1; KR20200143228A

Description

技術分野は、空間地図の生成および活用に関し、より詳細には、３次元仮想空間モデルを利用して現実空間内のユーザポーズを推定する方法および装置に関する。 The technical field relates to the generation and utilization of spatial maps, and more particularly to a method and apparatus for estimating user poses in real space using a 3D virtual space model.

空間地図を活用してユーザポーズ（ｐｏｓｅ）を推定する方法は、地形値（ｇｅｏｍｅｔｒｙ）を利用する方式、映像値を利用する方式、地形値と映像値との混合を利用する方式がある。
このとき、ＬｉＤＡＲ（ライダー）あるいはこれと類似の動作原理をもつ深さ測定装置を活用して点群情報を取得したり、カメラあるいはこれと類似の動作原理をもつ映像測定装置を活用して映像情報を取得したり、Ｋｉｎｅｃｔ（キネクト）あるいはこれと類似の動作原理をもつ深さ－映像測定装置を活用して色－点群情報を取得したり、またはこれらの組み合わせを利用して、現実空間を表現する空間地図を構成することができる。 A method of estimating a user pose using a spatial map includes a method of using a terrain value (geometry), a method of using an image value, and a method of using a mixture of a terrain value and an image value.
At this time, LiDAR or a depth measuring device with a similar operating principle is used to acquire point cloud information, or a camera or a video measuring device with a similar operating principle is used to capture images. acquire information, acquire color-point cloud information using a Kinect or similar depth-to-image measurement device, or use a combination of these to A spatial map can be constructed to represent

現実空間に対する映像情報、深さ情報、および深さ－映像連携情報は、「空間情報」と称される。
ユーザポーズは、ユーザデバイスが現実空間で取得したユーザ情報を空間地図と比べることによって推定される。 The video information, depth information, and depth-video link information for the real space are called "spatial information."
The user pose is estimated by comparing the user information acquired by the user device in real space with the spatial map.

ここで、「ユーザ情報」とは、ユーザデバイスが現実空間で取得した映像を含む情報である。また、「ポーズ」とは、位置（ｐｏｓｉｔｉｏｎ）と方向（ｏｒｉｅｎｔａｔｉｏｎ）の両方を含む概念である。したがって、「ユーザポーズ」とは、現実空間に対して映像情報を取得した位置情報と、映像情報を取得した方向情報を含む情報であると言える。
ただ、従来技術における空間地図を利用したユーザポーズの推定は、次のような問題を抱えている。 Here, “user information” is information including images acquired by the user device in the real space. Also, "pose" is a concept that includes both position and orientation. Therefore, it can be said that the “user pose” is information including position information in which the image information is acquired with respect to the real space and direction information in which the image information is acquired.
However, user pose estimation using a spatial map in the prior art has the following problems.

第一に、空間地図は、空間情報を取得するポーズに敏感なことがある。したがって、空間地図が空間情報を取得するポーズに敏感な場合、ユーザポーズ推定の精密度は低下するようになる。例えば、理論的に可能なすべてのポーズで空間情報を取得して空間地図を構成する場合は、精密度の高いユーザポーズが推定可能となる。 First, spatial maps can be sensitive to the pose from which spatial information is obtained. Therefore, if the spatial map is sensitive to the pose from which spatial information is obtained, the precision of user pose estimation will be degraded. For example, when spatial information is acquired for all theoretically possible poses to form a spatial map, highly accurate user poses can be estimated.

ところが、現実空間に対してすべてのポーズで空間情報を取得することは事実上不可能である。空間地図を構成するために現実空間から均等な分布で多数のポーズの空間情報を取得した場合、空間情報を取得するポーズの分布によってユーザポーズ推定の敏感度は低下する。しかし、このような場合には、空間情報の取得時間、空間情報の容量、および処理時間などのようなシステム負荷問題が発生する恐れがある。 However, it is practically impossible to acquire spatial information for all poses in the real space. When the spatial information of a large number of poses is obtained from the real space with an even distribution to construct a spatial map, the sensitivity of user pose estimation is reduced by the distribution of the poses from which the spatial information is obtained. However, in such a case, system load problems such as acquisition time of spatial information, capacity of spatial information, and processing time may occur.

一方、システムの負荷問題を考慮しながら少数のポーズで空間情報を取得する場合には、空間地図が現実空間を十分に表現することができない。さらに、空間地図を取得する経路が変われば空間地図の信頼性が低下し、現実空間を強靭に表現することができなくなる。現実空間が強靭に表現されていない空間地図は、ユーザポーズ推定の精密度の低下に繋がる。
第二に、不連続的な空間地図は、ユーザポーズ推定の精密度を低下させることがある。図１は、点群情報で構成された不連続的な空間地図の例を示した図である。 On the other hand, when spatial information is acquired with a small number of poses while considering the system load problem, the spatial map cannot sufficiently represent the real space. Furthermore, if the route for obtaining the spatial map changes, the reliability of the spatial map will decrease, and the real space will not be robustly represented. A spatial map that does not robustly represent the real space leads to a decrease in accuracy of user pose estimation.
Second, a discontinuous spatial map can reduce the accuracy of user pose estimation. FIG. 1 is a diagram showing an example of a discontinuous spatial map composed of point group information.

図１に示すように、点群情報を活用して空間地図を構成する場合、空間情報の取得範囲や経路に応じて点群情報を稠密に取得できないことがある。点群情報を稠密に取得できない場合には連続性のない空間地図が生成されるようになり、これはユーザポーズ推定の精密度を低下させることに繋がる。
第三に、空間地図を構成するための空間情報が取得された時点とユーザ情報を取得する時点との差により、ユーザポーズ推定の精密度が下がることがある。 As shown in FIG. 1, when constructing a spatial map using point cloud information, it may not be possible to acquire point cloud information densely depending on the acquisition range and route of the spatial information. If point cloud information cannot be acquired densely, a spatial map without continuity is generated, which leads to a decrease in accuracy of user pose estimation.
Third, the accuracy of user pose estimation may decrease due to the difference between the time when the spatial information for constructing the spatial map is obtained and the time when the user information is obtained.

図２および図３は、時間の流れによる空間の変化を示した例示図である。
図２は、光または照明が時間の変化によって変化する例を示している。
より具体的に、図２の（ａ）、（ｂ）、および（ｃ）は、同じ空間において、照明や外部から流入する光の量が時間の流れによって変化する例を示している。
また、図３の（ａ）および（ｂ）は、同じ空間において、時間の流れによって物体が変化した例を示している。 2 and 3 are illustrative diagrams showing changes in space over time.
FIG. 2 shows an example in which the light or illumination changes over time.
More specifically, (a), (b), and (c) of FIG. 2 show examples in which the amount of illumination and the amount of light entering from the outside changes over time in the same space.
Moreover, (a) and (b) of FIG. 3 show an example in which an object changes with the flow of time in the same space.

図３の（ａ）ではテーブル２１０の上に何も置かれていないが、図３の（ｂ）ではテーブル２２０上に物体が置かれていることを示している。
例えば、図２に示した空間に対し、空間地図を構成するための空間情報は（ａ）から取得され、ユーザ情報は（ｃ）から取得されることがある。また、図３に示した空間に対し、空間地図を構成するための空間情報は（ａ）から取得され、ユーザ情報は（ｂ）から取得されることがある。 Although nothing is placed on the table 210 in FIG. 3(a), an object is placed on the table 220 in FIG. 3(b).
For example, for the space shown in FIG. 2, spatial information for constructing a spatial map may be obtained from (a) and user information may be obtained from (c). For the space shown in FIG. 3, spatial information for constructing a spatial map may be obtained from (a), and user information may be obtained from (b).

このように、同じ空間であっても、空間情報が取得された時点とユーザ情報を取得する時点（ｔｉｍｅ）との差により映像情報が一致しないことがある。したがって、同じ空間であっても、空間情報が取得された時点とユーザ情報を取得する時点との差により、ユーザポーズ推定の精密度は低下するようになる。
現実空間は、時間の流れにより、光または照明の変化、人物などの動的な動きの変化、物体またはインテリアなどの変化が発生する。このような変化が更新されていない空間地図を使用する場合にはユーザ情報との類似性が低下するようになり、これはユーザポーズ推定の精密度を低下させることに繋がる。
したがって、空間地図に基づいてユーザポーズを推定するときに発生する従来の問題を解決する方法が必要となっている。 As described above, even in the same space, the video information may not match due to the difference between the time when the spatial information is acquired and the time when the user information is acquired. Therefore, even in the same space, the accuracy of user pose estimation is degraded due to the difference between the time when the spatial information is obtained and the time when the user information is obtained.
In the real space, changes in light or illumination, changes in dynamic movements of people, changes in objects or interiors, etc. occur with the passage of time. When using a spatial map that has not been updated with such changes, the similarity with user information is reduced, which leads to a reduction in accuracy of user pose estimation.
Therefore, there is a need for a method that solves the conventional problems encountered when estimating user pose based on spatial maps.

本発明は、上述した問題点を解決するために、現実空間で取得された空間情報に基づいて構成された３次元仮想空間モデルとユーザが取得したユーザ情報とを活用してユーザポーズを推定する方法及び装置を提供する。 In order to solve the above-described problems, the present invention estimates a user pose by utilizing a three-dimensional virtual space model configured based on space information acquired in the real space and user information acquired by the user. A method and apparatus are provided.

一実施形態に係る３次元空間に対するユーザポーズ推定方法は、深さ測定装置および映像取得装置を利用して３次元空間に対する深さ情報および映像情報を含む空間情報を取得する段階、前記空間情報に基づいて深さ－映像連携情報を構成し、前記深さ－映像連携情報に基づいて前記３次元空間に対応する３次元仮想空間モデルを構築する段階、前記３次元空間でユーザデバイスによって取得された映像を含むユーザ情報を受信する段階、前記３次元仮想空間モデル内で前記ユーザ情報に対応する対応情報を生成する段階、前記対応情報と前記ユーザ情報との類似度を算出する段階、および前記類似度に基づいてユーザポーズを推定する段階を含む。 A method for estimating a user pose for a 3D space according to an exemplary embodiment includes acquiring spatial information including depth information and image information for a 3D space using a depth measurement device and an image acquisition device; constructing depth-image association information based on the depth-image association information, and constructing a 3D virtual space model corresponding to the 3D space based on the depth-image association information; receiving user information including video; generating correspondence information corresponding to the user information in the three-dimensional virtual space model; calculating a similarity between the correspondence information and the user information; estimating a user pose based on the degrees.

前記３次元仮想空間モデルを構築する段階は、前記３次元空間に対する映像情報で前記３次元空間の構造と関連する背景領域と前記３次元空間に置かれた物体に該当する非背景領域とを区分し、前記背景領域を利用して前記３次元仮想空間モデルを構築することを含んでよい。 The step of constructing the 3D virtual space model includes dividing a background region related to the structure of the 3D space and a non-background region corresponding to an object placed in the 3D space, using image information for the 3D space. and constructing the three-dimensional virtual space model using the background region.

前記対応情報を生成する段階は、前記ユーザ情報に含まれた映像で前記３次元空間の構造と関連する背景領域と前記３次元空間に置かれた物体に該当する非背景領域とを区分する段階、前記ユーザ情報に含まれた映像の背景領域を利用して前記ユーザ情報を加工する段階、および前記３次元仮想空間モデル内で加工されたユーザ情報に対応する対応情報を生成する段階を含んでよい。 Generating the correspondence information includes dividing a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in the image included in the user information. , processing the user information using a background area of an image included in the user information; and generating corresponding information corresponding to the user information processed in the 3D virtual space model. good.

前記類似度を算出する段階は、前記類似度を高める方向に前記対応情報を再生成する段階、および再生成された対応情報に基づいて類似度を再算出する段階を含んでよい。 The step of calculating the degree of similarity may include the steps of: regenerating the correspondence information in a direction to increase the degree of similarity; and recalculating the degree of similarity based on the regenerated correspondence information.

前記類似度を算出する段階は、前記ユーザ情報と前記対応情報とを比較するための比較対象領域を抽出する段階、前記ユーザ情報から抽出された比較対象領域と前記対応情報から抽出された比較対象領域で共通領域を決定する段階、前記共通領域に基づいて前記ユーザ情報および前記対応情報をそれぞれ再生成する段階を含んでよい。 The step of calculating the similarity includes extracting a comparison target region for comparing the user information and the correspondence information, and a comparison target region extracted from the user information and a comparison target extracted from the correspondence information. The method may include determining a common area in the area, and respectively regenerating the user information and the corresponding information based on the common area.

前記類似度を算出する段階は、前記ユーザデバイス周辺に対する追加のユーザ情報を取得する段階、および前記ユーザ情報および追加のユーザ情報に基づいて類似度を算出する段階を含んでよい。 Calculating the similarity may include obtaining additional user information about the user device surroundings, and calculating a similarity based on the user information and the additional user information.

前記ユーザポーズを推定する段階は、前記ユーザデバイスによって前記ユーザポーズの推定に利用される付加的な情報であるユーザ付加情報が取得される場合、前記ユーザ情報または前記追加のユーザ情報とともに、前記ユーザ付加情報を利用して前記ユーザポーズを推定することを含んでよい。 In the step of estimating the user pose, when user additional information, which is additional information used for estimating the user pose by the user device, is acquired, the user information or the additional user information is Estimating the user pose using side information may be included.

前記追加のユーザ情報を取得する段階は、前記３次元仮想空間モデルに基づいて追加のユーザ情報を取得するための案内情報を前記ユーザデバイスに送信することを含んでよい。 Acquiring the additional user information may include transmitting guidance information to the user device for acquiring additional user information based on the 3D virtual space model.

前記案内情報は、前記３次元仮想空間モデル内の予め設定された特徴点に対するユーザ情報取得ポーズを含み、前記追加のユーザ情報を取得する段階は、前記類似度を高める方向に繰り返し実行されてよい。
他の実施形態に係る３次元空間に対するユーザの位置（ｐｏｓｉｔｉｏｎ）および方向（ｏｒｉｅｎｔａｔｉｏｎ）情報を含むユーザポーズを推定する方法は、前記３次元空間で取得された映像を含むユーザ情報を受信する段階、前記３次元空間に対する深さ情報および映像情報を含む空間情報に基づいて構築された３次元仮想空間モデルを確認する段階、前記３次元仮想空間モデル内で前記ユーザ情報に対応する対応情報を生成する段階、前記対応情報と前記ユーザ情報との類似度を算出する段階、および前記類似度に基づいてユーザポーズを推定する段階を含む。 The guidance information may include a user information acquisition pose for preset feature points in the 3D virtual space model, and the step of acquiring the additional user information may be repeatedly performed in a direction of increasing the similarity. .
A method for estimating a user pose including information on a user's position and orientation with respect to a 3D space according to another embodiment comprises receiving user information including an image captured in the 3D space; confirming a 3D virtual space model constructed based on space information including depth information and image information for the 3D space; generating correspondence information corresponding to the user information in the 3D virtual space model; calculating a similarity between the corresponding information and the user information; and estimating a user pose based on the similarity.

一実施形態に係る３次元空間に対するユーザポーズ推定装置は、３次元空間に対する深さ情報および映像情報を含む空間情報を取得する空間情報取得部、前記空間情報に基づいて深さ－映像連携情報を構成し、前記深さ－映像連携情報に基づいて前記３次元空間に対応する３次元仮想空間モデルを生成する仮想空間モデル生成部、前記３次元空間でユーザデバイスによって取得された映像を含むユーザ情報を受信するユーザ情報受信部、および前記３次元仮想空間モデル内で前記ユーザ情報に対応する対応情報を生成し、前記対応情報と前記ユーザ情報との類似度を算出し、前記類似度に基づいて前記ユーザポーズを推定するように構成された少なくとも１つのプロセッサを含む制御部を備える。 A user pose estimation apparatus for a 3D space according to an embodiment includes a spatial information acquisition unit that acquires spatial information including depth information and video information for a 3D space, and depth-video link information based on the spatial information. a virtual space model generation unit configured to generate a 3D virtual space model corresponding to the 3D space based on the depth-image linkage information; and user information including the video acquired by the user device in the 3D space. and generating correspondence information corresponding to the user information in the three-dimensional virtual space model, calculating a similarity between the correspondence information and the user information, and based on the similarity A control unit including at least one processor configured to estimate the user pose.

前記空間モデル生成部は、前記３次元空間に対する映像情報で前記３次元空間の構造と関連する背景領域と前記３次元空間に置かれた物体に該当する非背景領域とを区分し、前記背景領域を利用して前記３次元仮想空間モデルを構築してよい。 The space model generation unit divides image information for the 3D space into a background region related to the structure of the 3D space and a non-background region corresponding to an object placed in the 3D space, and divides the background region into may be used to construct the three-dimensional virtual space model.

前記制御部は、前記ユーザ情報に含まれた映像で前記３次元空間の構造と関連する背景領域と前記３次元空間に置かれた物体に該当する非背景領域とを区分し、前記ユーザ情報に含まれた映像の背景領域を利用して前記ユーザ情報を加工し、前記３次元仮想空間モデル内で加工されたユーザ情報に対応する対応情報を生成してよい。 The controller distinguishes between a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in the image included in the user information, The user information may be processed using the background area of the included image to generate corresponding information corresponding to the user information processed within the 3D virtual space model.

他の実施形態に係る３次元空間に対するユーザの位置（ｐｏｓｉｔｉｏｎ）および方向（ｏｒｉｅｎｔａｔｉｏｎ）情報を含むユーザポーズを推定する装置は、前記３次元空間に対する深さ情報および映像情報を含む空間情報に基づいて構築された３次元仮想空間モデルを提供する仮想空間モデル提供部、前記３次元空間でユーザデバイスによって取得された映像を含むユーザ情報を受信するユーザ情報受信部、前記３次元仮想空間モデル内で前記ユーザ情報に対応する対応情報を生成し、前記対応情報と前記ユーザ情報との類似度を算出し、前記類似度に基づいて前記ユーザポーズを推定するように構成された少なくとも１つのプロセッサを含む制御部を備える。 According to another embodiment, an apparatus for estimating a user pose including user position and orientation information in a 3D space is based on spatial information including depth information and image information in the 3D space. a virtual space model providing unit that provides a constructed 3D virtual space model; a user information receiving unit that receives user information including images acquired by a user device in the 3D space; A control comprising at least one processor configured to generate correspondence information corresponding to user information, calculate a similarity between the correspondence information and the user information, and estimate the user pose based on the similarity. have a department.

また他の実施形態に係る３次元空間に対するユーザポーズ推定装置は、３次元空間に対する映像を含むユーザ情報を生成するユーザ情報生成部、前記ユーザ情報をユーザポーズ推定サーバに送信し、３次元仮想空間モデルで推定されたユーザポーズに関する情報を前記サーバから受信する通信部、前記ユーザ情報生成部および通信部の動作を制御し、前記ユーザポーズに関する情報を現在実行中のアプリケーションまたは駆動システムに伝達するように構成された少なくとも１つのプロセッサを含む制御部を備える。 A user pose estimation apparatus for a three-dimensional space according to another embodiment includes a user information generating unit that generates user information including an image for a three-dimensional space, a user information that is transmitted to a user pose estimation server, and a three-dimensional virtual space. controlling operations of a communication unit for receiving information about the user pose estimated by the model from the server, the user information generating unit and the communication unit, and transmitting the information about the user pose to a currently executing application or driving system; a controller including at least one processor configured to:

本発明の実施形態は、空間地図として３次元仮想空間モデルを使用することにより、空間情報の取得経路に強靭な３次元仮想空間モデルを構築することができ、空間情報取得ポーズによるユーザポーズ推定精密度の敏感度を減らすことができる。 By using a 3D virtual space model as a spatial map, the embodiment of the present invention can construct a robust 3D virtual space model for the acquisition path of spatial information, and perform user pose estimation precision based on the spatial information acquisition pose. degree sensitivity can be reduced.

また、本発明の実施形態に係る３次元仮想空間モデルは、現実空間と類似に構成可能である上に、空間情報の取得時間、空間情報の容量、データの処理時間などを減らすことができる。
また、時間の流れるによる現実空間の変化にも強靭なユーザポーズ推定方法を提供することができる。
また、本発明の実施形態は、複合現実（ＭｉｘｅｄＲｅａｌｉｔｙ）でユーザポーズを推定するときに活用することができる。
さらに、精密なユーザポーズ推定によって現実空間と仮想空間との違和感を軽減することができ、複合現実におけるユーザの没入度を高めることができる。したがって、本発明の実施形態は、複合現実の関連技術の商用化および発展に寄与することができる。 In addition, the 3D virtual space model according to the embodiment of the present invention can be configured similarly to the real space, and can reduce the acquisition time of spatial information, the volume of spatial information, the processing time of data, and the like.
In addition, it is possible to provide a user pose estimation method that is robust against changes in the real space over time.
Also, embodiments of the present invention can be utilized when estimating user poses in mixed reality.
Furthermore, accurate user pose estimation can reduce discomfort between the real space and the virtual space, and increase the user's immersion in mixed reality. Therefore, embodiments of the present invention can contribute to the commercialization and development of mixed reality-related technologies.

点群情報を活用して構成された不連続的な空間地図の例を示した図である。FIG. 4 is a diagram showing an example of a discontinuous spatial map constructed using point group information; 時間の流れるによる空間の変化を示した例示図である。FIG. 4 is an exemplary diagram showing changes in space due to the passage of time; 時間に流れるによる空間の変化を示した他の例示図である。FIG. 10 is another illustrative diagram showing changes in space due to passage of time; 本発明の一実施形態における、３次元仮想空間モデルの例を示した図である。FIG. 4 is a diagram showing an example of a three-dimensional virtual space model in one embodiment of the present invention; 一実施形態における、３次元仮想空間モデルを生成する例を説明するための図である。FIG. 4 is a diagram for explaining an example of generating a 3D virtual space model in one embodiment; 一実施形態における、３次元仮想空間モデルを利用したユーザポーズ推定システムを説明するための図である。1 is a diagram for explaining a user pose estimation system using a 3D virtual space model in one embodiment; FIG. 一実施形態における、３次元空間に対するユーザポーズ推定装置の構成を説明するための図である。1 is a diagram for explaining the configuration of a user pose estimation device for a three-dimensional space in one embodiment; FIG. 一実施形態における、ユーザデバイスの構成を説明するための図である。1 is a diagram for explaining the configuration of a user device in one embodiment; FIG. 一実施形態における、ポーズの概念を説明するための例示図である。FIG. 10 is an exemplary diagram for explaining the concept of poses in one embodiment; 一実施形態における、３次元空間に対するユーザポーズ推定方法を説明するためのフローチャートである。4 is a flow chart for explaining a user pose estimation method for a 3D space in one embodiment. 他の実施形態における、３次元空間に対するユーザポーズ推定方法を説明するためのフローチャートである。9 is a flowchart for explaining a method of estimating a user pose for a 3D space according to another embodiment; 一実施形態における、追加でユーザポーズを取得する方法の例を説明するための図である。FIG. 10 is a diagram for explaining an example of a method for additionally acquiring a user pose in one embodiment;

以下、添付の図面および添付の図面に記載された内容を参照しながら本発明の実施形態について詳細に説明するが、本発明が実施形態によって制限あるいは限定されることはない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and the contents described in the accompanying drawings, but the present invention is not restricted or limited by the embodiments.

本明細書で使用される用語は、実施形態を説明するためのものに過ぎず、本発明を制限するためのものではない。本明細書に記載する単数形は、文面で特に明記されない限り複数形も含む。明細書で使用される「含む（ｃｏｍｐｒｉｓｅｓ）」および／または「含む（ｃｏｍｐｒｉｓｉｎｇ）」は、記載された構成要素、段階、動作、および／または素子において、１つ以上の他の構成要素、段階、動作、および／または素子の存在または追加を排除しない。 The terminology used herein is for the purpose of describing embodiments only and is not intended to be limiting of the invention. Singular forms used herein also include plural forms unless the context clearly dictates otherwise. As used herein, "comprises" and/or "comprising" means that a stated component, step, act, and/or element includes one or more other components, steps, It does not exclude operations and/or the presence or addition of elements.

本明細書で使用される「実施例」、「例」、「側面」、「例示」などは、記載された任意の態様（ａｓｐｅｃｔ）または設計が、他の態様または設計よりも好ましいまたは有利であると解釈されてはならない。
また、「または」という用語は、排他的論理合「ｅｘｃｌｕｓｉｖｅｏｒ」というよりは包含的論理合「ｉｎｃｌｕｓｉｖｅｏｒ」を意味する。つまり、別途記載されない限りまたは文脈から明らかでない限り、「ｘがａまたはｂを利用する」という表現は、自然な包含的順列（ｎａｔｕｒａｌｉｎｃｌｕｓｉｖｅｐｅｒｍｕｔａｔｉｏｎｓ）のうちのいずれか１つを意味する。 As used herein, "embodiment,""example,""aspect,""exemplary," etc. are used to indicate that any aspect or design described is preferred or advantageous over other aspects or designs. should not be construed as
Also, the term "or" means an inclusive or rather than an exclusive or. That is, unless stated otherwise or clear from context, the phrase "x utilizes a or b" means any one of the natural inclusive permutations.

また、本明細書および特許請求の範囲で使用される第１や第２などの用語は、多様な構成要素を説明するために使用されるものであるが、前記構成要素が前記用語によって限定されてはならない。前記用語は、１つの構成要素を他の構成要素と区別する目的としてのみ使用される。 Also, terms such as first and second used in the specification and claims are used to describe various components, but the components are not limited by the terms. must not. The terms are only used to distinguish one component from another.

別途の定義なければ、本明細書で使用されるすべての用語（技術および科学的用語を含む）は、本発明が属する技術分野において通常の知識を有する者が共通して理解することのできる意味として使用されるであろう。また、一般的に使用される事前に定義されている用語は、明らかに特に定義されていない限り、理想的または過度に解釈されてはならない。 Unless otherwise defined, all terms (including technical and scientific terms) used herein have the meaning commonly understood by one of ordinary skill in the art to which this invention belongs. will be used as Also, commonly used pre-defined terms should not be idealized or overly interpreted unless explicitly specifically defined.

なお、本発明を説明するにあたり、関連する公知機能または構成についての具体的な説明が本発明の要旨を不必要に不明瞭にし得ると判断される場合には、その詳細な説明は省略する。さらに、本明細書で使用される用語（ｔｅｒｍｉｎｏｌｏｇｙ）は、本発明の実施形態を適切に表現するための用語であり、これは、ユーザ、運用者の意図、または本発明が属する分野の慣例などによって異なることもある。したがって、本明細書で使用される用語は、本明細書の全般にわたる内容に基づいて定義されなければならない。 In describing the present invention, if it is determined that a specific description of related known functions or configurations may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Further, the terminology used herein is a term to adequately describe the embodiments of the present invention, which may be subject to user, operator intent, or conventions in the field to which the present invention pertains. may differ depending on Therefore, terms used herein should be defined based on the general content of the specification.

図４は、本発明の一実施形態における、３次元仮想空間モデルの例を示した図である。
図５は、一実施形態における、３次元仮想空間モデルを生成する例を説明するための図である。
図４を参照すると、通常の３次元仮想空間モデルは、ｏｂｊ、ｘ３ｄなどのような深さ－映像連携情報を利用して構成された現実空間に対するモデルを指す。例えば、３次元仮想空間モデルは、韓国登録特許第１０－１８３５４３４号公報（発明の名称：投映イメージ生成方法およびその装置、イメージピクセルと深さ値とのマッピング方法）により、ｏｂｊ、ｘ３ｄなどが生成されたモデルまたは「ＴｅｅＶＲモデル」を含んでもよい。 FIG. 4 is a diagram showing an example of a three-dimensional virtual space model in one embodiment of the present invention.
FIG. 5 is a diagram for explaining an example of generating a 3D virtual space model in one embodiment.
Referring to FIG. 4, a typical 3D virtual space model refers to a model for a real space constructed using depth-image link information such as obj, x3d, and the like. For example, the 3D virtual space model is generated by Korean Patent No. 10-1835434 (title of invention: projection image generation method and apparatus, mapping method between image pixels and depth values), such as obj, x3d, etc. model or "TeeVR model".

このとき、一実施形態に係る３次元仮想空間モデルは、図５に示すように、３次元仮想空間モデルを構成するために背景領域と非背景領域とを区分し、背景領域だけで構成されてよい。
図５において、（ａ）は空間情報に含まれた映像情報であり、（ｂ）は非背景領域を除いた映像を、（ｃ）は背景領域を確張して映像情報を生成した例を示している。
例えば、背景領域とは、３次元空間を形成する建築物自体の構造を意味するか、またはドアや窓のように建築物に付随した構造物であってよい。したがって、映像情報において、背景領域は、３次元空間の構造と関連する領域であると定義されてよい。 At this time, as shown in FIG. 5, the 3D virtual space model according to an embodiment is composed only of the background region by dividing the background region and the non-background region to construct the 3D virtual space model. good.
In FIG. 5, (a) is the image information included in the spatial information, (b) is the image excluding the non-background area, and (c) is an example of generating the image information by expanding the background area. showing.
For example, the background area may refer to the structure of the building itself that forms a three-dimensional space, or may be a structure attached to the building such as a door or window. Therefore, in the image information, the background area may be defined as an area associated with the structure of the 3D space.

図５の（ａ）において、室内空間に位置する多様な物体（机、本棚、椅子など）に対応する領域は、非背景領域に該当する。図５の（ｂ）は、室内空間に位置する多様な物体（白色で表示された領域）が取り除かれた非背景領域を示している。
３次元仮想空間モデルは、室内空間と室外空間の両方を含む概念であって、独立的な室内空間、独立的な室外空間、または室内と室外とが連結した空間であってもよい。３次元仮想空間モデルにｏｂｊ、ｘ３ｄなどのようなモデル（人物、事物など）を追加してもよく、３次元仮想空間モデルは、モデルが追加された３次元仮想空間モデルを含む概念であってもよい。一方、空間地図は、３次元の代わりとして、次元を低めて２次元仮想空間モデルの使用も可能である。 In FIG. 5(a), areas corresponding to various objects (desks, bookshelves, chairs, etc.) located in the indoor space correspond to non-background areas. FIG. 5(b) shows a non-background area from which various objects (areas displayed in white) located in the indoor space are removed.
The 3D virtual space model is a concept that includes both indoor and outdoor spaces, and may be an independent indoor space, an independent outdoor space, or a space in which the indoor and outdoor spaces are connected. A model such as obj, x3d, etc. (person, object, etc.) may be added to the 3D virtual space model, and the 3D virtual space model is a concept including the 3D virtual space model to which the model is added. good too. On the other hand, it is also possible to use a 2D virtual space model by lowering the dimension of the space map instead of the 3D space map.

３次元空間モデルは、事前に構築されたｏｂｊ、ｘ３ｄのようなモデルを利用してもよいし、空間情報を新たに取得して構築してもよいし、事前に構築されたモデルを更新して利用してもよい。３次元空間モデルは、現実空間と類似するものと判断されてよい。 The three-dimensional space model may use a pre-constructed model such as obj or x3d, may be constructed by newly acquiring spatial information, or may be constructed by updating a pre-constructed model. can be used. A three-dimensional space model may be determined to be similar to real space.

図６は、一実施形態における、３次元仮想空間モデルを利用したユーザポーズ推定システムを説明するための図である。
図６を参照すると、３次元仮想空間モデルを利用したユーザポーズ推定システムは、ユーザデバイス６１０およびユーザポーズ推定装置６２０を備える。一実施形態に係るユーザポーズ推定装置６２０は、サーバ（図示せず）に備えられてもよいし、ユーザデバイス６１０に備えられてもよい。 FIG. 6 is a diagram for explaining a user pose estimation system using a 3D virtual space model in one embodiment.
Referring to FIG. 6, a user pose estimation system using a 3D virtual space model comprises a user device 610 and a user pose estimation device 620. As shown in FIG. User pose estimator 620 according to an embodiment may be provided in a server (not shown) or may be provided in user device 610 .

ユーザデバイス６１０は、現実空間６０１でユーザ情報６１１を取得し、ユーザデバイス６１０にユーザ情報６１１を送信してよい。
ユーザポーズ推定装置６２０は、装置内部または外部のストレージシステム６０２に記録された３次元仮想空間モデル６３０およびユーザ情報６１１を利用してユーザポーズを推定してよい。 User device 610 may acquire user information 611 in physical space 601 and transmit user information 611 to user device 610 .
The user pose estimation device 620 may estimate the user pose using the 3D virtual space model 630 and the user information 611 recorded in the storage system 602 inside or outside the device.

ユーザポーズ推定装置６２０は、３次元仮想空間モデル６３０でユーザポーズに該当する確率が高い対応情報６２１をユーザ情報６１１と比較することで、正確なユーザポーズを推定することが可能となる。 The user pose estimation device 620 can accurately estimate the user pose by comparing the correspondence information 621 that has a high probability of corresponding to the user pose in the three-dimensional virtual space model 630 with the user information 611 .

図７は、一実施形態における、３次元空間に対するユーザポーズ推定装置の構成を説明するための図である。
図７を参照すると、一実施形態に係る３次元空間に対するユーザポーズ推定装置６２０は、仮想空間モデル提供部７３０、制御部７４０、およびユーザ情報受信部７５０を備える。また、ユーザポーズ推定装置６２０は、空間情報取得部７１０および仮想空間モデル生成部７２０をさらに備えてよい。さらに、ユーザポーズ推定装置６２０は、ユーザ情報要請部７６０をさらに備えてよい。
空間情報取得部７１０は、３次元空間に関する深さ情報および映像情報を含む空間情報を取得する。例えば、空間情報は、深さ測定装置および映像測定装置を利用して取得されてよい。 FIG. 7 is a diagram for explaining the configuration of a user pose estimation device for three-dimensional space in one embodiment.
Referring to FIG. 7, a user pose estimation apparatus 620 for 3D space according to an embodiment includes a virtual space model provider 730, a controller 740, and a user information receiver 750. FIG. Also, the user pose estimation device 620 may further include a spatial information acquisition unit 710 and a virtual space model generation unit 720 . Furthermore, the user pose estimation device 620 may further comprise a user information requester 760 .
The spatial information acquisition unit 710 acquires spatial information including depth information and image information regarding a three-dimensional space. For example, spatial information may be obtained using a depth measurement device and a video measurement device.

３次元仮想空間モデルを構成するための空間情報を取得する深さ測定装置または映像測定装置などで構成された測定装置の視野角（ＦｏＶ：ＦｉｅｌｄｏｆＶｉｅｗ）が現実空間を確保することのできる経路で空間情報を取得すれば、３次元仮想空間モデルは現実空間と類似するように構成されるようになる上に、空間情報の取得時間、空間情報の容量、データの処理時間などを減らすことができて効率的である。 A path through which the field of view (FoV) of a measurement device, such as a depth measurement device or a video measurement device, that acquires spatial information for constructing a 3D virtual space model can secure the real space. If spatial information is acquired with , the 3D virtual space model will be configured to resemble the real space, and the acquisition time of spatial information, the amount of spatial information, and the processing time of data can be reduced. Done and efficient.

映像情報は、３次元空間に対する２次元イメージとして、２自由度の基底ベクトル（ｂａｓｉｓｖｅｃｔｏｒ）で表現可能な形態であってよく、カメラのように３次元を２次元で表現する形態、またはカメラに赤外線フィルタを装着して３次元列情報を２次元で表現した形態であってよい。 The image information may be in a form that can be expressed as a two-dimensional image in a three-dimensional space with two-degree-of-freedom basis vectors. It may be in a form in which an infrared filter is attached to express the three-dimensional string information two-dimensionally.

深さ情報は、３自由度の基底ベクトルで表現可能な点形態であり、深さ測定装置を利用して取得されてもよく、互いに異なる場所で撮影された２つ以上のイメージを活用して推定されてもよい。前者の例としては、ＬｉＤＡＲ（ライダー）、ＳＯＮＡＲ（ソナー）、赤外線（ＩｎｆｒａＲｅｄ）、ＴＯＦ（ＴｉｍｅＯｆＦｌｉｇｈｔ）距離探知機を利用して取得された深さ情報があり、後者の例としては、ステレオカメラ、マルチカメラ、全方向ステレオカメラなどを利用して取得された深さ情報がある。一方、Ｋｉｎｅｃｔ、ＪＵＭＰ、ＰｒｉｍｅＳｅｎｓｅ、ＰｒｏｊｅｃｔＢｅｙｏｎｄなどのデバイスを利用すれば、深さ情報および映像情報を同時に取得することも可能である。 Depth information is in the form of points that can be represented by basis vectors with 3 degrees of freedom, and may be obtained using a depth measuring device using two or more images taken at different locations. may be estimated. Examples of the former include depth information obtained using LiDAR, SONAR, infrared (InfraRed), and TOF (Time Of Flight) rangefinders, while examples of the latter include stereo There are depth information acquired using cameras, multi-cameras, omnidirectional stereo cameras, etc. On the other hand, if a device such as Kinect, JUMP, PrimeSense, Project Beyond is used, depth information and image information can be obtained at the same time.

例えば、本発明の一実施形態では、深さ測定装置を利用して取得した深さ情報だけでなく、内挿法（ｉｎｔｅｒｐｏｌａｔｉｏｎ）によって深さ情報を新たに推定して使用してもよい。より具体的には、取得した複数の深さ情報のうちから３つ以上の深さ情報を選別して多角形（三角形を含む）メッシュ（Ｍｅｓｈ）を構成した後、多角形メッシュの内部に内挿法によって新たな深さ情報を推定して追加するのである。 For example, in one embodiment of the present invention, in addition to depth information obtained using a depth measuring device, new depth information may be estimated and used by interpolation. More specifically, three or more pieces of depth information are selected from a plurality of acquired pieces of depth information to form a polygonal (including triangular) mesh (Mesh). New depth information is estimated and added by interpolation.

一方、本発明の一実施形態に係る取得深さ情報および映像情報は、統合センサシステムを利用して同時に取得されてもよい。複数の測定装置を使用する場合、センサ間の座標関係を求めるキャリブレーション過程が必要となることがある。 Alternatively, the acquired depth information and image information according to an embodiment of the present invention may be acquired simultaneously using an integrated sensor system. When using multiple measurement devices, a calibration process may be required to determine the coordinate relationships between the sensors.

空間情報を取得する過程で慣性測定装置（ＩＭＵ）などを追加で使用してもよく、タイヤ型移動ロボットにセンサを装着して測定する場合には距離情報（ｏｄｏｍｅｔｒｙ）を活用してもよい。現実空間が測定装置の視野角よりも広い場合、センサの回転、移動、またはこれを組み合わせて空間情報を取得してよい。このとき、個別空間情報が取得された３次元ポーズ（ｐｏｓｅ）がそれぞれ異なる場合があり、個別空間情報が取得されたポーズを予測するために、スラム（ＳＬＡＭ：ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎＡｎｄＭａｐｐｉｎｇ）、視覚慣性オドメトリ（ＶＩＯ：ＶｉｓｕａｌＩｎｅｒｔｉａｌＯｄｏｍｅｔｒｙ）、視覚オドメトリ（ＶＯ：ＶｉｓｕａｌＯｄｏｍｅｔｒｙ）などの手法が活用されてよい。 An inertial measurement unit (IMU) or the like may be additionally used in the process of acquiring spatial information, and distance information (odometry) may be used when sensors are mounted on a tire-type mobile robot for measurement. If the physical space is wider than the viewing angle of the measurement device, the sensor may be rotated, moved, or a combination thereof to obtain spatial information. At this time, the three-dimensional poses from which the individual spatial information is acquired may be different. (VIO: Visual Inertial Odometry), visual odometry (VO: Visual Odometry), and other methods may be utilized.

一方、空間情報の構成は、測定装置の種類に応じて異なってよい。一例として、測定装置が単一カメラだけで構成された場合、事前測定情報はカメラ映像情報で構成され、該当の映像情報を活用して単一カメラである場合にはピクセル間の相対的距離を、複数のカメラの場合にはピクセル間の絶対距離の予測が可能である。特に、特徴点を抽出せずに単一カメラの場合は、累積した映像情報を活用してピクセルの深さを予測することができ、複数のカメラの場合は、複数のカメラ映像やこの累積した映像情報を活用してピクセルの深さを予測することもできる。 On the other hand, the configuration of the spatial information may differ depending on the type of measurement device. For example, if the measurement device consists of only a single camera, the pre-measurement information consists of camera image information, and if it is a single camera, the relative distance between pixels is calculated using the corresponding image information. , it is possible to estimate the absolute distance between pixels in the case of multiple cameras. In particular, in the case of a single camera without extracting feature points, the pixel depth can be predicted by utilizing the accumulated image information. Pixel depth can also be predicted using image information.

さらに、追加の深さ情報、慣性情報などの情報がともに活用された場合、各測定装置の固有の特性に合うように空間情報処理が可能となる。一例として、慣性測定装置によって慣性情報が取得可能な場合は、これを活用してスラムの性能を向上させたり、映像情報処理時の映像取得ポーズに対する推定（ｐｒｅｄｉｃｔｉｏｎ）情報として使用して映像取得ポーズに対する補正（ｃｏｒｒｅｃｔｉｏｎ）がより容易に行われるようにできる。また、慣性情報の加速度値あるいは角速度値を活用して実際の移動距離を予想することもでき、これを単一カメラあるいは複数のカメラから抽出された深さ情報の大きさ（ｓｃａｌｅ）を補正するのに活用することもできる。 Furthermore, when information such as additional depth information, inertial information, etc. are utilized together, spatial information processing can be tailored to the unique characteristics of each measurement device. For example, if inertial information can be acquired by an inertial measurement device, it can be used to improve the performance of SRAM, or can be used as prediction information for image acquisition poses during image information processing. can be made more easily. In addition, the acceleration value or angular velocity value of inertial information can be used to estimate the actual moving distance, which is corrected for the scale of depth information extracted from a single camera or multiple cameras. can also be used for

仮想空間モデル生成部７２０は、空間情報に基づいて深さ－映像連携情報を構成し、前記深さ－映像連携情報に基づいて前記３次元空間に対応する３次元仮想空間モデルを生成する。 The virtual space model generator 720 constructs depth-image linkage information based on spatial information, and generates a 3D virtual space model corresponding to the 3D space based on the depth-image linkage information.

例えば、室内空間の３次元仮想空間モデルを生成する場合、空間情報取得部７１０は、室内空間に対する空間情報である室内空間イメージを取得してよい。このとき、室内空間イメージは、室内空間内部の多様な位置で撮影されたイメージであってよい。 For example, when generating a 3D virtual space model of an indoor space, the spatial information acquisition unit 710 may acquire an indoor space image, which is spatial information for the indoor space. At this time, the indoor space images may be images captured at various positions inside the indoor space.

このとき、仮想空間モデル生成部７２０は、室内空間の構造に対応する領域である背景領域と、室内空間に位置する物体または移動中の人物に対応する領域である非背景領域とを区分してよい。
仮想空間モデル生成部７２０は、室内空間イメージを構成するイメージのピクセル値に基づいて背景領域と非背景領域とを区分してよい。 At this time, the virtual space model generation unit 720 divides the background region, which is the region corresponding to the structure of the indoor space, and the non-background region, which is the region corresponding to the object located in the indoor space or the moving person. good.
The virtual space model generation unit 720 may divide the background region and the non-background region based on the pixel values of the images forming the indoor space image.

背景領域とは、他の要素によって一部が隠れているため完全なデータではないが、隠れていない部分と類似性があるものと推論され、隠れていない部分をホールフィリング（ｈｏｌｅｆｉｌｌｉｎｇ）やインペイティング（ｉｎｐａｉｎｔｉｎｇ）手法によって再構成することのできる部分に該当してよい。さらに、背景領域とは、建物内部の大きな看板や案内デスクなどのような他の物体を隠すことがあるが、該当の物体のエッジ部分で映像と地形との整合度がすべてのデータ内で一致するか、別の整合過程によって一致させることのできる部分であってよい。 The background area is not complete data because it is partly hidden by other elements, but it is inferred that it has similarity to the non-hidden part, and the non-hidden part is used as hole filling or inpayment. It may correspond to a part that can be reconstructed by an inpainting technique. In addition, the background area may hide other objects such as large billboards and information desks inside the building, but the edges of the objects in question match the image and terrain in all data. or can be a portion that can be matched by another matching process.

仮想空間モデル生成部７２０は、室内空間イメージで背景領域を非背景領域に確張して少なくとも１つの拡張室内空間イメージを生成してよい。例えば、図５の（ｂ）において、非背景領域が取り除かれた白色で表現された部分を背景領域として確張してよい。 The virtual space model generating unit 720 may generate at least one extended indoor space image by affixing a background region to a non-background region in the indoor space image. For example, in (b) of FIG. 5, the portion expressed in white from which the non-background area has been removed may be confirmed as the background area.

仮想空間モデル生成部７２０は、背景領域に含まれたエッジが非背景領域との境界線で断絶した場合にエッジの延長線が背景領域と非背景領域との境界線を越えて非背景領域に繋がるという推論に基づき、拡張されたイメージを生成してよい。 The virtual space model generation unit 720 determines that when an edge included in the background area is cut off at the boundary between the background area and the non-background area, the extended line of the edge crosses the boundary between the background area and the non-background area and enters the non-background area. Based on the connection inference, an augmented image may be generated.

このとき、特定の室内空間イメージの他に１つ以上の室内空間イメージを背景の補完イメージとして指定し、特定の室内空間イメージの非背景領域に該当する領域を、背景の補完イメージの情報を利用して減らしてよい。 At this time, one or more indoor space images are designated as background complementary images in addition to the specific indoor space image, and the information of the background complementary image is used for the area corresponding to the non-background area of the specific indoor space image. can be reduced by

仮想空間モデル生成部７２０は、少なくとも１つの拡張された室内空間イメージと室内空間に関する深さ値の情報を含む地形情報に基づいて深さ－映像連携情報を生成してよい。深さ－映像連携情報は、少なくとも１つの拡張された室内空間イメージのピクセルに対応する室内空間の深さ値がマッチングされた情報であってよい。 The virtual space model generation unit 720 may generate depth-image linkage information based on at least one extended indoor space image and terrain information including depth value information about the indoor space. The depth-image link information may be information in which the depth value of the indoor space corresponding to at least one pixel of the extended indoor space image is matched.

仮想空間モデル生成部７２０は、少なくとも１つの拡張された室内空間イメージと地形情報の他にも、少なくとも１つの拡張された室内空間イメージと地形情報それぞれの取得位置、および取得角度に関する情報を含むイメージ取得ポーズと深さ取得ポーズをさらに利用して深さ－イメージ連携情報を生成してよい。 The virtual space model generation unit 720 generates an image including information on acquisition positions and acquisition angles of at least one extended indoor space image and terrain information, as well as at least one extended indoor space image and terrain information. The acquisition pose and the depth acquisition pose may be further utilized to generate depth-image association information.

仮想空間モデル生成部７２０は、少なくとも１つの拡張された室内空間イメージ、地形情報、および深さ－映像連携情報を利用して現実の３次元空間に対する３次元仮想空間モデルを生成する。
仮想空間モデル提供部７３０は、ユーザポーズの推定が必要な場合、３次元空間に関する深さ情報および映像情報を含む空間情報に基づいて構築された３次元仮想空間モデルを提供する。 The virtual space model generation unit 720 generates a 3D virtual space model for a real 3D space using at least one extended indoor space image, terrain information, and depth-image linkage information.
The virtual space model providing unit 730 provides a 3D virtual space model constructed based on space information including depth information and image information regarding a 3D space when user pose estimation is required.

このとき、ユーザポーズ推定は、ユーザデバイス６１０またはユーザポーズ推定装置６２０にインストールされたアプリケーションの実行後に行われてよい。仮想空間モデル提供部７３０は、ユーザデバイス６１０またはユーザポーズ推定装置６２０で実行中のアプリケーションまたは該当の装置の駆動システムに３次元仮想空間モデルを提供してよい。 At this time, the user pose estimation may be performed after executing an application installed on the user device 610 or the user pose estimation device 620 . The virtual space model providing unit 730 may provide the 3D virtual space model to an application running on the user device 610 or the user pose estimation device 620 or a driving system of the corresponding device.

制御部７４０は、少なくとも１つのプロセッサを含んでよい。このとき、制御部７４０は、命令語（ｉｎｓｔｒｕｃｔｉｏｎｓ）またはプログラムが記録された少なくとも１つのコンピュータ読み取り可能なストレージ（ｏｎｅｏｒｍｏｒｅｃｏｍｐｕｔｅｒ－ｒｅａｄａｂｌｅｓｔｏｒａｇｅｍｅｄｉａ）と連結してよい。 Controller 740 may include at least one processor. At this time, the control unit 740 may be connected to at least one computer-readable storage medium (one or more computer-readable storage media) in which instructions or programs are recorded.

したがって、制御部７４０は、３次元仮想空間モデル内でユーザ情報に対応する対応情報を生成し、対応情報とユーザ情報との類似度を算出し、類似度に基づいてユーザポーズを推定するように構成された少なくとも１つのプロセッサを含む。 Therefore, the control unit 740 generates correspondence information corresponding to the user information in the 3D virtual space model, calculates the similarity between the correspondence information and the user information, and estimates the user pose based on the similarity. configured with at least one processor.

一実施形態に係るユーザポーズ推定は、ディープラーニング（ＤｅｅｐＬｅａｒｎｉｎｇ）またはニューラルネットワークを活用して３次元仮想空間モデルを学習することによって実行されてよい。 User pose estimation according to an embodiment may be performed by learning a 3D virtual space model using deep learning or a neural network.

学習問題の形態に応じて強化学習（ｒｅｉｎｆｏｒｃｅｍｅｎｔｌｅａｒｎｉｎｇ）、教師あり学習（ｓｕｐｅｒｖｉｓｅｄｌｅａｒｎｉｎｇ）、教師なし学習（ｕｎｓｕｐｅｒｖｉｓｅｄｌｅａｒｎｉｎｇ）に分けられてよい。学習段階では膨大な量の訓練データ（ｔｒａｉｎｉｎｇｔｅｓｔ）が必要となることがあり、訓練データは、映像情報が含まれたデータとそのデータが取得されたポーズが含まれたデータなどで構成されてよく、学習データの量を増加させるために前記２つの種類のデータにノイズ（ｎｏｉｓｅ）を追加して変形させて使用してもよい。畳み込みニューラルネットワーク（ＣＮＮ）または多様なニューラルネットワークの全体または一部を使用してもよい。ディープラーニングの性能または速度向上のために、１つ以上のＧＰＵを使用してもよく、並列演算を実行してもよい。ディープラーニングの結果はスカラー、ベクトル、確率などで表現されてよく、この結果を利用することにより、ユーザ情報が取得されたポーズとして予想されるユーザポーズを推定してよい。入力によってユーザ情報の映像情報を使用してよく、ユーザ付加情報をともに使用してよい。ユーザ付加情報をともに使用する場合、ニューラルネットワークにレイヤ（ｌａｙｅｒ）を追加したり、関数を変化させたり、パラメータ数を調節したり、その値を変更したりしてよい。ニューラルネットワークを構成するために、パイソン、Ｃ言語、ＭＡＴＬＡＢ（マトラボ）などや、これらの組み合わせからなるコンピュータ言語を使用してもよい。 Learning may be divided into reinforcement learning, supervised learning, and unsupervised learning according to the form of the learning problem. A huge amount of training data (training test) may be required in the learning stage, and the training data consists of data including image information and data including poses from which the data is obtained. In order to increase the amount of training data, the two types of data may be modified by adding noise. Convolutional Neural Networks (CNN) or various neural networks, in whole or in part, may be used. For deep learning performance or speed enhancement, one or more GPUs may be used and parallel computation may be performed. The result of deep learning may be expressed by a scalar, vector, probability, etc. By using this result, the user pose expected as the pose from which the user information was acquired may be estimated. Depending on the input, the video information of the user information may be used, and the user additional information may be used together. When using user additional information, layers may be added to the neural network, functions may be changed, the number of parameters may be adjusted, and their values may be changed. A computer language such as Python, C language, MATLAB, or a combination thereof may be used to construct the neural network.

ユーザ情報が順に取得される場合、３次元仮想空間モデルを基盤として粒子フィルタ（ＰａｒｔｉｃｌｅＦｉｌｔｅｒ）、ＥＫＦ．ＥＩＦ、ＵＫＦなどの手法を活用してユーザポーズを推定してよい。ユーザ付加情報として慣性情報または距離情報が取得される場合、推定されたユーザポーズを補正してよい。順に取得されたユーザ情報によって粒子フィルタの値が特定のポーズとして収斂されてよく、このとき収斂された地点をユーザポーズとして推定してよい。ユーザポーズの推定時に加重値（Ｗｅｉｇｈｔ）を付加してよく、多数の収斂ポーズのうちからユーザポーズを決定してもよい。 When user information is sequentially acquired, a particle filter, EKF. Techniques such as EIF, UKF, etc. may be utilized to estimate the user pose. If inertial information or distance information is acquired as user additional information, the estimated user pose may be corrected. A value of the particle filter may be converged as a specific pose according to the sequentially acquired user information, and the converged point may be estimated as the user pose. A weight may be added when estimating the user pose, and the user pose may be determined from a number of convergent poses.

ディープラーニングによって推定したポーズと粒子フィルタなどで推定されたポーズとを融合してユーザポーズを推定してよい。例えば、ディープラーニングで推定したポーズの周りで粒子フィルタを実行してユーザポーズを推定してよく、反対の方法として、粒子フィルタによって収斂されたポーズの周りでディープラーニングを利用してユーザポーズを推定してよい。ユーザポーズの推定時に加重値（Ｗｅｉｇｈｔ）を付加してよく、多数の収斂ポーズのうちからユーザポーズを決定してもよい。 A user pose may be estimated by fusing a pose estimated by deep learning and a pose estimated by a particle filter or the like. For example, a particle filter may be run around the pose estimated by deep learning to estimate the user pose, or conversely, deep learning may be used around the pose converged by the particle filter to estimate the user pose. You can A weight may be added when estimating the user pose, and the user pose may be determined from a number of convergent poses.

類似度とは、３次元仮想空間モデルで生成した対応情報とユーザ情報との類似の程度を意味し、類似度が高いほど前記対応情報と前記ユーザ情報が類似すると見なしてよく、類似度が高い対応情報が生成された３次元仮想空間モデルのポーズをユーザ情報が取得されたユーザポーズとして推定してよい。類似度は、スカラー（ｓｃａｌａｒ）、ベクトル（ｖｅｃｔｏｒ）、共分散行列（ｃｏｖａｒｉａｎｃｅｍａｔｒｉｘ）などで表現されてよく、ユークリッド距離（ｅｕｃｌｉｄｅａｎｄｉｓｔａｎｃｅ）、マンハッタン距離（ｍａｎｈａｔｔａｎｄｉｓｔａｎｃｅ）、マハラノビス距離（ｍａｈａｌａｎｏｂｉｓｄｉｓｔａｎｃｅ）、構造的類似性（ＳＳＩＭ：ｓｔｒｕｃｔｕｒａｌｓｉｍｉｌａｒｉｔｙ）、正規化情報距離（ＮＩＤ：ｎｏｒｍａｌｉｚｅｄｉｎｆｏｒｍａｔｉｏｎｄｉｓｔａｎｃｅ）、最小平均２乗誤差推定（ＭＭＳＥ：ＭｉｎｉｍｕｎＭｅａｎＳｑｕａｒｅＥｒｒｏｒ）、エントロピ（Ｅｎｔｒｏｐｙ）などによって算出されてもよい。 The degree of similarity means the degree of similarity between the corresponding information generated by the 3D virtual space model and the user information. The pose of the 3D virtual space model for which the correspondence information is generated may be estimated as the user pose for which the user information is acquired. The similarity may be represented by a scalar, vector, covariance matrix, etc., and may be represented by euclidean distance, Manhattan distance, mahalanobis distance, structure structural similarity (SSIM), normalized information distance (NID), minimum mean square error (MMSE), entropy, and the like.

類似度算出およびユーザポーズ推定について、図１０を参照しながらより詳しく説明する。 Similarity calculation and user pose estimation will be described in more detail with reference to FIG.

このとき、３次元仮想空間モデルは、３次元空間に対する映像情報で３次元空間の構造と関連する背景領域と３次元空間に置かれた物体に該当する非背景領域とを区分し、背景領域で構築されたものであってよい。 At this time, the 3D virtual space model divides the background area related to the structure of the 3D space and the non-background area corresponding to the object placed in the 3D space in the image information for the 3D space. It may be constructed.

ユーザ情報受信部７５０は、３次元空間でユーザデバイスによって取得された映像を含むユーザ情報を受信する。
ユーザ情報は、映像情報が含まれた情報であって、１つ以上の映像測定装置を含み、深さ測定装置または付加装置などを利用して取得してよい。測定装置の視野角が狭くて十分なユーザ情報が取得されない場合、測定装置の回転、移動、またはこれを組み合わせてユーザ情報を取得してよい。ユーザ情報は、単一または複数の映像センサ（カメラ）によって取得してよく、ピンホール（ｐｉｎ－ｈｏｌｅ）モデルや魚眼レンズ、またはパノラマ形式で取得されてもよい。単一の映像情報、複数の映像情報、または順列の映像情報が取得されてよい。取得したユーザ情報を利用して映像情報、深さ情報、または深さ－映像連携情報などを構成してよい。 The user information receiver 750 receives user information including images captured by the user device in a 3D space.
The user information is information including image information, includes one or more image measurement devices, and may be obtained using a depth measurement device or an additional device. If the viewing angle of the measurement device is too narrow to obtain sufficient user information, the user information may be obtained by rotating, moving, or combining the measurement device. The user information may be acquired by single or multiple video sensors (cameras) and may be acquired in pin-hole model, fisheye, or panoramic format. Single video information, multiple video information, or permutations of video information may be obtained. Image information, depth information, depth-image linkage information, or the like may be configured using the acquired user information.

例えば、単一映像測定装置を使用すれば映像情報を取得することができ、順に取得された映像情報を使用すれば深さ情報を計算することができ、これにより深さ－映像連携情報を構成することができる。
例えば、複数の映像測定装置を使用すれば、各映像測定装置で取得された映像情報と映像測定装置との関係を活用して深さ情報を計算することができ、これにより深さ－映像連携情報を構成することができる。映像測定装置との関係は、映像測定装置間のキャリブレーション（Ｃａｌｉｂｒａｔｉｏｎ）情報、または各映像測定装置で取得された映像情報間の変換情報（Ｈｏｍｏｇｒａｐｈｙｍａｔｒｉｘ）であってよい。 For example, image information can be obtained using a single image measurement device, and depth information can be calculated using the sequentially obtained image information, thereby forming depth-image link information. can do.
For example, if multiple image measuring devices are used, depth information can be calculated by utilizing the relationship between the image information acquired by each image measuring device and the image measuring device. Information can be configured. The relationship with the image measuring devices may be calibration information between image measuring devices or conversion information (homography matrix) between image information obtained by each image measuring device.

例えば、少なくとも１つ以上の映像測定装置と少なくとも１つ以上の深さ測定装置を使用する場合、２つの装置間のキャリブレーション情報を利用して深さ－映像連携情報を構成してよい。ディープラーニング（ＤｅｅｐＬｅａｒｎｉｎｇ）を活用して映像情報から深さ情報を抽出してもよい。ニューラルネットワークが構成されてよく、畳み込みニューラルネットワークが使用されてよい。学習および試験のために多数のデータが必要となることがあり、ニューラルネットワークは、線形関数、非線形関数、多数のレイヤなどで構成されてよく、ディープラーニングの結果は、確率、スカラー、ベクトルなどで表現されてよい。繰り返し学習が実行されてもよく、パラメータチューニングが必要となることもある。ディープラーニングによって抽出された深さ情報を活用して深さ－映像連携情報を構成してよい。映像情報を加工した、加工された映像情報を使用してもよく、例えば、イメージの明暗や彩度などを変化させたり、パノラマイメージを校正イメージ（ＲｅｃｔｉｆｉｅｄＩｍａｇｅ）に変換させたりする作業が実行されてよい。 For example, when using at least one or more image measurement devices and at least one or more depth measurement devices, depth-image linkage information may be constructed using calibration information between the two devices. Depth information may be extracted from image information using deep learning. A neural network may be constructed and a convolutional neural network may be used. A large amount of data may be required for training and testing, neural networks may consist of linear functions, non-linear functions, many layers, etc., and deep learning results may be probabilistic, scalar, vector, etc. may be expressed. Iterative learning may be performed and may require parameter tuning. Depth-image link information may be configured using depth information extracted by deep learning. Processed image information may be used to process image information. For example, operations such as changing the brightness or saturation of an image or converting a panoramic image into a rectified image are performed. you can

ユーザ情報要請部７６０は、追加のユーザ情報の取得が必要な場合、ユーザデバイス６１０に案内情報を送信してよい。案内情報については、図１２を参照しながら詳しく説明する。 The user information requesting unit 760 may send guidance information to the user device 610 when it is necessary to obtain additional user information. Guidance information will be described in detail with reference to FIG.

図８は、一実施形態における、ユーザデバイスの構成を説明するための図である。
図８を参照すると、ユーザデバイス６１０は、ユーザ情報生成部８１０、通信部８２０、および制御部８３０を備える。ユーザデバイス６１０は、ユーザとインタフェースするためのディスプレイ、入力手段、および出力手段を含むユーザインタフェース部８４０をさらに備えてよい。 FIG. 8 is a diagram for explaining the configuration of a user device in one embodiment.
Referring to FIG. 8, the user device 610 comprises a user information generation section 810, a communication section 820 and a control section 830. FIG. User device 610 may further comprise a user interface portion 840 that includes a display, input means, and output means for interfacing with a user.

ユーザ情報生成部８１０は、３次元空間に対する映像を含むユーザ情報を生成する。したがって、ユーザ情報生成部８１０は、映像測定装置および深さ測定装置のうちの少なくとも１つを含んでよい。 The user information generator 810 generates user information including an image for a 3D space. Therefore, the user information generator 810 may include at least one of the image measuring device and the depth measuring device.

通信部８２０は、ユーザ情報をユーザポーズ推定サーバに送信し、３次元仮想空間モデルで推定されたユーザポーズに関する情報を前記サーバから受信する。
このとき、ユーザポーズ推定サーバは、図７に示したユーザポーズ推定装置６２０であってよく、ユーザポーズ推定サービスを提供する別のサーバであってもよい。 The communication unit 820 transmits user information to the user pose estimation server and receives information about the user pose estimated by the 3D virtual space model from the server.
At this time, the user pose estimation server may be the user pose estimation device 620 shown in FIG. 7 or another server that provides the user pose estimation service.

制御部８３０は、ユーザ情報生成部８１０および通信部８２０の動作を制御し、ユーザポーズに関する情報を現在実行中のアプリケーションまたは駆動システムに伝達するように構成された少なくとも１つのプロセッサを含む。
図９は、一実施形態における、ポーズの概念を説明するための例示図である。
３次元仮想空間モデルを構築するために使用される空間情報は、現実空間の一部ポーズで取得された不連続情報として考慮されてよい。ここで、ポーズ（ｐｏｓｅ）とは、位置（ｐｏｓｉｔｉｏｎ）と方向（ｏｒｉｅｎｔａｔｉｏｎ）の両方を含む概念である。一例として、２次元において、ポーズは、測定装置の位置であるｘ、ｙと、測定装置の角度ａとで表現されてよい。 Control unit 830 includes at least one processor configured to control the operation of user information generator 810 and communication unit 820 and to communicate information regarding user poses to a currently executing application or driving system.
FIG. 9 is an exemplary diagram for explaining the concept of poses in one embodiment.
Spatial information used to construct a 3D virtual space model may be considered as discontinuous information obtained in partial poses in real space. Here, pose is a concept that includes both position and orientation. As an example, in two dimensions, the pose may be represented by x, y, the position of the measuring device, and the angle a of the measuring device.

図９に示した例は、横と縦が１ｍである正方形の平面で、測定装置がｘ軸、ｙ軸に対して０～１ｍの範囲で１０ｃｍ間隔に移動し、回転角度は０～３６０度の範囲で１０度ずつ回転する例を示している。
このとき、可能な全体ポーズの場合の数は、１１×１１×３７、すなわち４，４７７種類となる。同じように、３次元において、ポーズは、センサの位置であるｘ、ｙ、ｚと、測定装置の角度ロール（ｒｏｌｌ）、ピッチ（ｐｉｔｃｈ）、ヨー（ｙａｗ）で表現されてよい。 The example shown in FIG. 9 is a square plane with a width and length of 1 m, the measuring device moves in the range of 0 to 1 m with respect to the x-axis and the y-axis at intervals of 10 cm, and the rotation angle is 0 to 360 degrees. shows an example of rotation by 10 degrees in the range of .
At this time, the number of possible global poses is 11×11×37, that is, 4,477 types. Similarly, in three dimensions, the pose may be represented by the position of the sensor x, y, z and the angles roll, pitch and yaw of the measurement device.

横、縦、高さが各１ｍである正六面体の空間で、センサがｘ軸、ｙ軸、ｚ軸に対して０～１ｍの範囲で１０ｃｍ間隔に動き、回転角度は０～３６０度の範囲でそれぞれ１０度ずつ回転すると仮定するとき、可能な全体ポーズの場合の数は１１×１１×１１×３７×３７×１９、すなわち、約３４００万種類の場合の数が存在することになる。 In a regular hexahedral space of 1 m in width, length and height, the sensor moves in the range of 0 to 1 m with respect to the x, y and z axes at 10 cm intervals, and the rotation angle is in the range of 0 to 360 degrees. , the number of possible global poses is 11.times.11.times.11.times.37.times.37.times.19, ie, there are about 34 million kinds of cases.

測定装置の移動間隔と回転角度の間隔を減少させれば不連続情報を連続情報のように見せることはできるが、可能なポーズの数は幾何級数的に増加するはずであり、現実空間の体積は１ｍ^３よりも極めて大きくなるため、可能なすべてのポーズで空間情報を取得することは現実的に不可能である。 Discontinuous information can be made to look like continuous information by reducing the interval of movement and angle of rotation of the measuring device, but the number of possible poses should increase exponentially, and the volume of real space becomes much larger than 1 m ³ , it is practically impossible to acquire spatial information in all possible poses.

このような理由により、空間情報を取得する段階では、現実空間を十分に含むことのできる一部のポーズでデータを取得し、前記取得された空間情報に基づいて深さ－映像連携情報を構成して３次元仮想空間モデルを構築することにより、一部のポーズで取得した空間情報を拡張させることができる。 For this reason, in the step of acquiring spatial information, data is acquired in some poses that can sufficiently include the real space, and depth-video link information is constructed based on the acquired spatial information. By constructing a three-dimensional virtual space model by doing so, it is possible to expand the spatial information acquired in some poses.

３次元仮想空間モデルは、一部のポーズで取得された空間情報に基づいて構成されてよいが、空間情報から構成された深さ－映像連携情報を活用するため、３次元仮想空間モデル内のいかなるポーズでも、同一の現実空間内のポーズで取得された空間情報または構成された深さ－映像連携情報と類似する対応情報を生成することができる。 The 3D virtual space model may be configured based on the spatial information acquired in some poses, but in order to utilize the depth-image linkage information configured from the spatial information, Any pose can generate corresponding information that is similar to spatial information or constructed depth-to-video linkage information acquired at the pose in the same physical space.

すなわち、空間情報が取得されないポーズであっても、３次元仮想空間モデルを使用すれば、該当のポーズでの深さ－映像連携情報または空間情報と類似の対応情報を生成することができ、前記生成された対応情報は、現実空間の同じポーズで取得した空間情報から構成された深さ－映像連携情報と類似すると考慮されてよい。３次元仮想空間モデルは、不連続情報である空間情報を連続情報である対応情報に変換させることができる。 That is, even for a pose for which spatial information is not acquired, if the 3D virtual space model is used, it is possible to generate depth-video link information or corresponding information similar to spatial information in the pose. The generated correspondence information may be considered similar to the depth-image linkage information composed of the spatial information acquired at the same pose in the physical space. The three-dimensional virtual space model can convert spatial information, which is discontinuous information, into corresponding information, which is continuous information.

図１０は、一実施形態における、３次元空間に対するユーザポーズ推定方法を説明するためのフローチャートである。
図１０に示した方法は、図７に示したユーザポーズ推定装置６２０によって実行されてよい。
段階Ｓ１０１０で、装置は、深さ測定装置および映像取得装置を利用して３次元空間に対する深さ情報および映像情報を含む空間情報を取得する。
段階Ｓ１０２０で、装置は、空間情報に基づいて深さ－映像連携情報を構成し、深さ－映像連携情報に基づいて３次元空間に対応する３次元仮想空間モデルを構築する。
段階Ｓ１０３０で、装置は、３次元空間でユーザデバイスによって取得された映像を含むユーザ情報を受信する。このとき、ユーザ情報は、取得された映像に対応する空間の深さ情報をさらに含んでよい。
段階Ｓ１０４０で、装置は、３次元仮想空間モデル内でユーザ情報に対応する対応情報を生成する。 FIG. 10 is a flowchart for explaining a user pose estimation method for 3D space in one embodiment.
The method shown in FIG. 10 may be performed by the user pose estimator 620 shown in FIG.
In step S1010, the apparatus acquires spatial information including depth information and image information for a 3D space using a depth measurement device and an image acquisition device.
In step S1020, the apparatus constructs depth-image association information based on the spatial information, and builds a 3D virtual space model corresponding to the 3D space based on the depth-image association information.
At step S1030, the apparatus receives user information including images captured by the user device in 3D space. At this time, the user information may further include spatial depth information corresponding to the captured image.
At step S1040, the device generates corresponding information corresponding to the user information within the 3D virtual space model.

３次元仮想空間モデルを利用すれば、空間情報が取得されないポーズであっても、該当のポーズで取得された空間情報または深さ－映像連携情報と類似する対応情報を生成することができる。 By using a 3D virtual space model, it is possible to generate correspondence information similar to spatial information or depth-video link information acquired in a pose for which no spatial information is acquired.

対応情報は、深さ情報、映像情報、または深さ－映像連携情報で表現されてよい。３次元仮想空間モデル内で３自由度の基底ベクトルとして表現されるポーズで対応情報を生成してよい。 Correspondence information may be represented by depth information, image information, or depth-image link information. Corresponding information may be generated with poses expressed as basis vectors with three degrees of freedom in the three-dimensional virtual space model.

例えば、ユーザ情報取得ポーズの高さが変わらないのであれば、３次元仮想空間モデル内で２自由度の基底ベクトルによって表現されるポーズで対応情報を生成してよい。対応情報は、視野角、映像情報変換、深さ情報変換などの過程を経て生成されてもよい。 For example, if the height of the user information acquisition pose does not change, the corresponding information may be generated with a pose represented by basis vectors with two degrees of freedom within the three-dimensional virtual space model. Corresponding information may be generated through processes such as viewing angle, image information conversion, and depth information conversion.

このとき、対応情報を生成する段階Ｓ１０４０は、ユーザ情報に含まれた映像で前記３次元空間の構造と関連する背景領域と前記３次元空間に置かれた物体に該当する非背景領域とを区分する段階、ユーザ情報に含まれた映像の背景領域を利用して前記ユーザ情報を加工する段階、および３次元仮想空間モデル内で加工されたユーザ情報に対応する対応情報を生成する段階を含んでよい。 At this time, the step of generating correspondence information (S1040) distinguishes between the background area related to the structure of the 3D space and the non-background area corresponding to the object placed in the 3D space in the image included in the user information. processing the user information using a background area of an image included in the user information; and generating corresponding information corresponding to the processed user information in the three-dimensional virtual space model. good.

ユーザが３次元仮想空間モデルの背景となった現実空間でユーザ情報を取得するときの現実空間は、３次元仮想空間モデルを構成するために空間情報を取得した時点と等しくないことがあり、人物、事物、インテリアなどのような空間の様子が変わることがある。 The real space when the user acquires user information in the real space that is the background of the 3D virtual space model may not be the same as the time when the space information was acquired to construct the 3D virtual space model. , objects, interiors, etc. may change.

したがって、ユーザ情報で背景部分と非背景部分とを区分してユーザ情報から非背景部分を取り除いてもよく、背景部分を使用してユーザ情報を変換させてよい。照明や光などによる効果を取り除くためにユーザ情報を加工して使用してよい。ユーザ情報を３次元空間モデルで生成した対応情報と比較する過程において、ユーザ情報または対応情報の形態を変換して比較してよい。 Therefore, the user information may be divided into a background portion and a non-background portion, and the non-background portion may be removed from the user information, or the user information may be converted using the background portion. User information may be manipulated and used to remove effects due to lighting, light, and the like. In the process of comparing the user information with the corresponding information generated by the 3D space model, the form of the user information or the corresponding information may be converted and compared.

段階Ｓ１０５０で、装置は、対応情報とユーザ情報との類似度を算出する。
このとき、類似度を算出する段階は、類似度を高める方向に対応情報を再生成し、再生成された対応情報に基づいて類似度を再算出する段階を含んでよい。このとき、類似度を高める方向は、ユーザ情報を再取得したり、ユーザ情報に対応する対応情報を再生成したり、ユーザ情報の他に付加的な情報を使用したりすることを含む。 At step S1050, the device calculates the similarity between the correspondence information and the user information.
At this time, calculating the similarity may include regenerating the correspondence information in a direction of increasing the similarity and recalculating the similarity based on the regenerated correspondence information. At this time, ways to increase the degree of similarity include reacquiring user information, regenerating correspondence information corresponding to user information, and using additional information in addition to user information.

類似度を高めるために、類似度を算出する段階Ｓ１０５０は、ユーザ情報および対応情報を比較するための比較対象領域を抽出する段階、ユーザ情報から抽出された比較対象領域と対応情報から抽出された比較対象領域で共通領域を決定する段階、および共通領域に基づいてユーザ情報および対応情報をそれぞれ再生成する段階を含んでよい。 In order to increase the degree of similarity, the step of calculating the degree of similarity S1050 is the step of extracting a comparison target region for comparing the user information and the correspondence information. A step of determining a common region in the comparison target region, and a step of respectively regenerating the user information and the corresponding information based on the common region may be included.

例えば、対応情報から構造の単純化によって歪曲した領域などのような所定の基準による一定の領域を取り除き、ユーザ情報の非背景部分に対応する領域を取り除くことにより、比較過程に使用される対応情報を再生成してよい。また、ユーザ情報で、非背景部分とともに、対応情報の歪曲した領域などに該当する領域を取り除くことにより、比較過程に使用されるユーザ情報を再生成してよい。
３次元仮想空間モデルで生成された対応情報とユーザが取得したユーザ情報との類似性を計算する方法は、対応情報の映像情報とユーザ情報の映像情報とを比較する方法、対応情報の深さ情報とユーザ情報の深さ情報とを比較する方法、または深さ－映像連携情報を比較する方法などであってよい。 For example, the correspondence information used in the comparison process is removed from the correspondence information according to a predetermined criterion, such as a distorted area due to structural simplification, and by removing the area corresponding to the non-background portion of the user information. can be regenerated. In addition, the user information used in the comparison process may be regenerated by removing the non-background portion as well as the area corresponding to the distorted area of the corresponding information.
The method of calculating the similarity between the correspondence information generated by the 3D virtual space model and the user information obtained by the user includes a method of comparing video information of the correspondence information and video information of the user information, and a depth of the correspondence information. A method of comparing information and depth information of user information, a method of comparing depth-video linkage information, or the like may be used.

このとき、対応情報とユーザ情報の大きさ（ｓｃａｌｅ）が異なることがあるため、正規化（Ｎｏｒｍａｌｉｚａｔｉｏｎ）が要求されたり、相対的な比較が必要となることもある。
一方、映像情報の比較は、各映像情報の形式に類似するようにするための映像情報の変換が必要となることがある。例えば、パノラマイメージ（ＰａｎｏｒａｍａＩｍａｇｅ）と補正イメージ（Ｒｅｃｔｉｆｉｅｄｉｍａｇｅ）との変換があってよく、映像情報の大きさを正規化してよく、視野角の変換があってよい。 At this time, since the size (scale) of the correspondence information and the user information may differ, normalization may be required or relative comparison may be required.
On the other hand, comparison of video information may require conversion of the video information to make the format of each video information similar. For example, there may be conversion between a Panorama Image and a Rectified Image, normalization of the size of image information, and conversion of a viewing angle.

これとは反対に、停留するイメージをパノラマ形式に変形させて使用することも可能である。２つの映像情報から、ＲＡＮＳＡＣ、ＳＩＦＴ、ＦＡＳＴ、ＳＵＲＦなどの手法、あるいはこれらの組み合わせを利用して映像情報の特徴点（Ｆｅａｔｕｒｅ）を見つけ出し、類似する特徴点のペアを連結してよい。特徴点は、エッジ（Ｅｄｇｅ）、直線、線分、コーナー（Ｃｏｒｎｅｒ）、円、楕円など、またはこれらの組み合わせであってよく、大きさ（ｓｃａｌｅ）、回転（ｒｏｔａｔｉｏｎ）などが異なってよい。特徴点マッチング（ＦｅａｔｕｒｅＭａｔｃｈｉｎｇ）、ＳＳＩＭ（ＳｔｒｕｃｔｕｒａｌＳｉｍｉｌａｒｉｔｙ）、ＮＩＤ（ＮｏｒｍａｌｉｚｅｄＩｎｆｏｒｍａｔｉｏｎＤｉｓｔａｎｃｅ）、ホモグラフィ行列（ＨｏｍｏｇｒａｐｈｙＭａｔｒｉｘ）などの手法によって映像情報の類似度を計算してよい。 Conversely, it is also possible to transform a stationary image into a panoramic format and use it. Feature points of the image information may be found from the two pieces of image information using techniques such as RANSAC, SIFT, FAST, SURF, or a combination thereof, and pairs of similar feature points may be connected. Feature points may be edges, straight lines, line segments, corners, circles, ellipses, etc., or combinations thereof, and may have different scales, rotations, and the like. The similarity of image information may be calculated by a method such as Feature Matching, SSIM (Structural Similarity), NID (Normalized Information Distance), Homography Matrix.

特徴点マッチングによって結ばれた多数のピクセル座標を使用してホモグラフィ行列を計算してよく、これを利用して２つのイメージ情報の差（誤差、ｅｒｒｏｒ）を計算してよい。ＳＳＩＭは２つのイメージの類似度を計算する方法であり、ＮＩＤは確率的な計算法である。 A number of pixel coordinates connected by feature point matching may be used to calculate a homography matrix, which may be used to calculate the difference (error) between two image information. SSIM is a method of calculating similarity between two images, and NID is a probabilistic calculation method.

ユーザ情報から深さ情報を抽出することができる場合、対応情報の深さ情報との類似度を比較してよい。深さ情報は、３次元点群情報（ＰＣＤ：ＰｏｉｎｔＣｌｏｕｄＤａｔａ）、深さ地図（ＤｅｐｔｈＭａｐ）、メッシュ（Ｍｅｓｈ）などで表現されてよく、２つの深さ情報の形式を統一する過程を必要としてもよい。深さ情報は、ピクセル間（ポイントあたり）が比較されてよく、周辺領域を考慮して比較してもよい。内挿法（ｉｎｔｅｒｐｏｌａｔｉｏｎ）によって深さ情報を新たに推定し比較してもよく、加重値を付加して算出してもよい。 If the depth information can be extracted from the user information, the degree of similarity between the corresponding information and the depth information may be compared. Depth information may be expressed as 3D point cloud information (PCD), depth map, mesh, etc., and requires a process of unifying the two depth information formats. may be Depth information may be compared pixel-to-pixel (per point) and may be compared considering the surrounding area. Depth information may be newly estimated and compared by interpolation, or may be calculated by adding a weighted value.

ユーザ情報で深さ－映像連携情報が構成可能な場合、対応情報の深さ－映像連携情報と比較してよい。深さ情報と映像情報をそれぞれ比較して各類似度を算出し、全体類似度を算出してよく、各類似度に加重値を与えて算出してよい。また、深さ－映像連携情報を複合的に比較してよく、深さ情報と映像情報の類似度を算出する方法を組み合わせて実行されてよい。 If the depth-video linkage information can be configured by the user information, the corresponding information may be compared with the depth-video linkage information. Each degree of similarity may be calculated by comparing the depth information and the image information, and the overall degree of similarity may be calculated. Further, the depth-image linkage information may be compared in a composite manner, and a method of calculating the similarity of the depth information and the image information may be combined and executed.

３次元仮想空間モデルを構築するための空間情報が取得された時点とユーザ情報が取得された時点とが異なることがあるため、同一のポーズであっても、対応情報とユーザ情報が異なることがある。したがって、対応情報とユーザ情報で強靭な特徴点を比較してよい。例えば、対応情報とユーザ情報で背景部分と非背景部分とを区分し、背景部分を使用して類似度を算出してよく、背景部分を利用して構築された３次元仮想空間モデルで対応情報を生成してユーザ情報の背景部分との類似度を算出してよい。対応情報とユーザ情報から照明または光に対する光源効果を取り除いて類似度を算出してよく、光源効果に強靭な特徴を比較して類似度を算出してよい。 Since the point in time when the spatial information for constructing the 3D virtual space model is acquired may be different from the point in time when the user information is acquired, the correspondence information and the user information may differ even for the same pose. be. Therefore, robust feature points may be compared between the correspondence information and the user information. For example, a background portion and a non-background portion may be separated from the corresponding information and the user information, and the similarity may be calculated using the background portion. may be generated to calculate the degree of similarity with the background portion of the user information. The degree of similarity may be calculated by removing the light source effect for illumination or light from the correspondence information and the user information, or the degree of similarity may be calculated by comparing features that are robust against the light source effect.

このとき、類似度を算出する段階Ｓ１０５０は、ユーザデバイス周辺に対する追加のユーザ情報を取得する段階、およびユーザ情報および追加のユーザ情報に基づいて類似度を算出する段階を含んでよい。追加のユーザ情報を取得するために、図１２に示すように案内情報を利用してよい。 At this time, calculating the similarity S1050 may include obtaining additional user information about the user device surroundings and calculating the similarity based on the user information and the additional user information. Guidance information may be utilized as shown in FIG. 12 to obtain additional user information.

段階Ｓ１０６０で、装置は、類似度が予め設定された値以上である候補対応情報を確認し、前記候補対応情報にマッチングされるポーズをユーザポーズとして推定する。 In step S1060, the apparatus identifies candidate correspondence information whose similarity is greater than or equal to a preset value, and estimates a pose matching the candidate correspondence information as a user pose.

類似度が高いほど、対応情報が生成された３次元仮想空間モデルのポーズとユーザ情報が取得されたポーズが同じであると考慮されてよい。または、類似度が基準値（ｔｈｒｅｓｈｏｌｄ）よりも高ければ、２つのデータが取得されて再構成されたポーズはほぼ同じであると見なしてよく、基準値は現実空間の環境によって異なってよい。または、多数の候補ポーズで生成された対応情報のうち、ユーザポーズと類似度が最も高いかいずれかの判断方式によって選択されたポーズが、ユーザポーズとして考慮されてもよい。 As the degree of similarity increases, it may be considered that the pose of the 3D virtual space model for which the corresponding information is generated is the same as the pose for which the user information is acquired. Alternatively, if the similarity is higher than a reference value (threshold), it may be assumed that the poses reconstructed by acquiring two pieces of data are substantially the same, and the reference value may differ depending on the environment in the real space. Alternatively, a pose selected by determining which pose has the highest similarity to the user pose among corresponding information generated from a large number of candidate poses may be considered as the user pose.

対応情報生成、類似度計算段階を１度だけ実行してユーザポーズを推定してもよく、繰り返し実行してもよい。繰り返し実行は、選ばれたポーズの周りで精密に再推定してもよく、全体領域に対して無作為（ｒａｎｄｏｍ）に再推定してもよく、加重値を付加して再推定する新たなポーズを選択してもよい。このような段階は、定められた回数だけが繰り返されてもよく、類似度が基準値以上であるか、繰り返されて推定されたポーズが収斂されるまで繰り返してもよい。類似度が高まるように最適化技法（Ｏｐｔｉｍｉｚａｔｉｏｎ）が利用されてもよい。 The step of generating correspondence information and calculating the degree of similarity may be performed once to estimate the user pose, or may be performed repeatedly. The iterative run may re-estimate precisely around the chosen pose, re-estimate randomly for the entire region, and re-estimate new poses with added weights. may be selected. Such steps may be repeated a predetermined number of times, and may be repeated until the similarity is greater than or equal to a reference value or until the repeated and estimated poses converge. Optimization techniques may be used to increase the similarity.

類似度が高まるように対応情報を再生成してもよく、再生成された対応情報は、既存の対応情報が生成されたポーズと類似度との関係によってユーザポーズとして予想されるポーズで再生成されてよい。対応情報の再生成後に類似度を算出し、必要な場合には対応情報再生成および類似度算出過程を繰り返してよい。 The correspondence information may be regenerated so as to increase the similarity, and the regenerated correspondence information is regenerated with a pose expected as the user pose based on the relationship between the pose for which the existing correspondence information was generated and the similarity. may be After regenerating the correspondence information, the similarity is calculated, and if necessary, the process of regenerating the correspondence information and calculating the similarity may be repeated.

慣性情報および距離情報などのユーザ付加情報を使用することで類似度が高まるようになるため、期待されるポーズで対応情報を生成および再生成してよい。以後、対応情報とユーザ情報との類似度を算出し、必要な場合にはユーザ付加情報を活用して対応情報を再生成して類似度算出過程を繰り返してよい。 Using user-additional information such as inertial and range information increases the similarity, so corresponding information may be generated and regenerated at the expected pose. After that, the similarity between the corresponding information and the user information may be calculated, and if necessary, the user additional information may be used to regenerate the corresponding information, and the similarity calculation process may be repeated.

このとき、ユーザ付加情報は、ユーザが取得した映像情報の他に、ユーザポーズの推定をサポートする情報であって、慣性情報（ＩＭＵ）、距離情報（ｏｄｏｍｅｔｒｙ）などで構成されてよい。一例として、慣性測定装置によって慣性情報の取得が可能な場合に、映像情報の処理時に映像取得ポーズに対する推定（ｐｒｅｄｉｃｔｉｏｎ）情報として使用することにより、映像取得ポーズに対する補正（ｃｏｒｒｅｃｔｉｏｎ）をより容易に行うことができる。 At this time, the user additional information is information supporting user pose estimation in addition to the image information obtained by the user, and may include inertial information (IMU), distance information (odometry), and the like. For example, when inertial information can be acquired by an inertial measurement device, it is used as prediction information for an image acquisition pose when processing image information, thereby facilitating correction of an image acquisition pose. be able to.

したがって、類似度を算出する段階Ｓ１０５０またはユーザポーズを推定する段階Ｓ１０６０は、ユーザデバイスによって前記ユーザポーズの推定に利用される付加的な情報であるユーザ付加情報が取得される場合、前記ユーザ情報または前記追加のユーザ情報とともに、前記ユーザ付加情報を利用して前記ユーザポーズを推定することを含んでよい。 Therefore, in step S1050 of calculating the similarity or in step S1060 of estimating the user pose, when user additional information, which is additional information used for estimating the user pose, is acquired by the user device, The method may include estimating the user pose using the additional user information together with the additional user information.

このとき、慣性情報の加速度値あるいは角速度値を活用して実際の移動距離を予想してよく、これを単一あるいは複数の映像測定装置から抽出された深さ情報の大きさ（ｓｃａｌｅ）を補正するのに活用してもよい。 At this time, the acceleration value or angular velocity value of the inertial information may be used to estimate the actual moving distance, and the scale of the depth information extracted from the single or multiple image measurement devices may be corrected. You can use it to

距離情報は、ユーザが取得した映像情報に基づいて構成されたＶＯ（ＶｉｓｕａｌＯｄｏｍｅｔｒｙ）、ＶＩＯ（ＶｉｓｕａｌＩｎｅｒｔｉａｌＯｄｏｍｅｔｒｙ）を活用して予測した距離情報であってよく、タイヤ型移動ロボットに測定装置を装着してユーザ情報を取得する場合、距離情報は、移動ロボットの距離情報となってよい。これにより、慣性情報を活用すれば、前記方法によって抽出された距離情報を補正するのに活用することが可能となる。 The distance information may be distance information predicted using VO (Visual Odometry) and VIO (Visual Inertial Odometry) constructed based on image information obtained by the user, and a measuring device is mounted on the tire-type mobile robot. When the user information is acquired by the mobile robot, the distance information may be the distance information of the mobile robot. Accordingly, if the inertial information is utilized, it becomes possible to utilize it for correcting the distance information extracted by the above method.

ユーザの代わりにタイヤ型移動ロボットにセンサを装着してユーザ情報を取得する場合、ユーザが移動ロボットを操縦してもよく、移動ロボットが自律走行してもよく、この２つの組み合わせによってユーザ情報を取得してよい。移動ロボットポーズをユーザポーズとして考慮してよく、移動ロボットとユーザ視野との座標変換関係が認知されていたり、座標変換が可能であれば、移動ロボットポーズをユーザポーズに変換させることが可能となる。 When a sensor is attached to a tire type mobile robot instead of a user to acquire user information, the user may operate the mobile robot or the mobile robot may run autonomously. may be obtained. A mobile robot pose can be considered as a user pose, and if the coordinate transformation relationship between the mobile robot and the user's field of view is recognized or coordinate transformation is possible, the mobile robot pose can be transformed into the user pose. .

移動ロボットは、映像が含まれたユーザ情報だけでなく、モバイルロボットの距離情報（ｏｄｏｍｅｔｒｙ）をユーザ付加情報として取得してよい。距離情報を活用してユーザポーズを補正してよい。順に取得した距離情報を活用して移動ロボットの相対予想ポーズを予測してよく、ＥＫＦ、ＥＩＦ、ＵＫＦなどの手法または類似の方法を活用して共分散行列（Ｃｏｖａｒｉａｎｃｅｍａｔｒｉｘ）などの情報を計算してよく、この情報を更新してユーザポーズを補正してよい。
移動ロボットを利用する場合、移動ロボットの動作、運転、操縦、移動、データ取得、記録、および処理など関連アルゴリズムは、ロボットオペレーティングシステム（ＲＯＳ：ｒｏｂｏｔｏｐｅｒａｔｉｎｇｓｙｓｔｅｍ）上で実行されてよい。 The mobile robot may acquire not only user information including images but also distance information (odometry) of the mobile robot as user additional information. Distance information may be utilized to correct user pose. A relative expected pose of the mobile robot may be predicted using the sequentially obtained distance information, and information such as a covariance matrix may be calculated using methods such as EKF, EIF, UKF, or similar methods. This information may be updated to correct the user pose.
When using a mobile robot, related algorithms such as mobile robot motion, driving, steering, locomotion, data acquisition, recording, and processing may be executed on a robot operating system (ROS).

空間情報、深さ－映像連携情報、３次元仮想空間モデル、ユーザ情報、ユーザ付加情報などは、外部のサーバ（Ｓｅｒｖｅｒ）に記録されて処理されてよい。
空間情報の取得と同時に深さ－映像連携情報が構成されて３次元仮想空間モデルが構築されてよく、ユーザ情報の取得と同時にユーザポーズをリアルタイムで推定してもよく、遅延（Ｌａｔｅｎｃｙ）を発生させてもよく、ユーザポーズの取得が完了した後に処理されてもよい。 Spatial information, depth-video link information, 3D virtual space model, user information, user additional information, etc. may be recorded and processed in an external server.
A three-dimensional virtual space model may be constructed by constructing depth-image link information simultaneously with acquisition of spatial information, and a user pose may be estimated in real time at the same time with acquisition of user information, causing latency. , or may be processed after acquisition of the user pose is complete.

３次元仮想空間モデルが構築されていれば、追加空間情報を取得しなくてもよく、一部の空間に対して追加空間情報が取得されてよい。追加空間情報が取得されなかった場合には構築された３次元仮想空間モデルを使用してよく、追加空間情報が取得された場合には、構築された３次元仮想空間モデルの部分または全体を更新して再構築して使用してよい。 As long as a three-dimensional virtual space model is constructed, it is not necessary to acquire additional spatial information, and additional spatial information may be acquired for a part of the space. The constructed 3D virtual space model may be used when additional spatial information is not obtained, and part or the entire constructed 3D virtual space model is updated when additional spatial information is obtained. can be rebuilt and used.

ユーザ情報を先に取得した後に空間情報を取得して３次元仮想空間モデルを構築してユーザポーズを推定してもよく、３次元仮想空間モデルを構築するための空間情報を先に取得した後にユーザ情報を取得してユーザポーズを推定してよい。
本発明は、センサシステムとコンピュータとが融合されたシステムで実行されてもよいし、独立的なセンサシステムとコンピュータで実行されてもよい。 The user information may be acquired first and then the space information may be acquired to construct the 3D virtual space model to estimate the user pose. User information may be obtained to estimate a user pose.
The present invention may be implemented in a system in which a sensor system and computer are fused, or may be implemented in an independent sensor system and computer.

ユーザ情報の取得時に、各測定装置のポーズとユーザセンサシステム全体のポーズとが異なることがあるが、各測定装置およびセンサシステムの座標変換関係を利用して変換が可能である。例えば、ユーザセンサシステムの中心または適切な位置をユーザポーズとして仮定したり、ユーザセンサシステムを基準にユーザポーズとして仮定したりしてよい。この場合、必要なキャリブレーション情報またはユーザセンサシステムからユーザポーズまでの相対ポーズを認知しているか、いずれかの値で仮定してよい。 When acquiring user information, the pose of each measuring device may differ from the pose of the entire user sensor system, but transformation is possible using the coordinate transformation relationship between each measuring device and the sensor system. For example, the center or appropriate position of the user sensor system may be assumed as the user pose, or the user pose may be assumed relative to the user sensor system. In this case, one may either know the required calibration information or the relative pose from the user sensor system to the user pose, or assume some value.

図１１は、他の実施形態における、３次元空間に対するユーザポーズ推定方法を説明するためのフローチャートである。
図１１に示した方法は、図７に示したユーザポーズ推定装置６２０によって実行されてよい。
段階１１１０で、装置は、３次元空間で取得された映像を含むユーザ情報を受信する。
段階１１２０で、装置は、３次元空間に対する深さ情報および映像情報を含む空間情報に基づいて構築された３次元仮想空間モデルを確認する。このとき、３次元仮想空間モデルは、図７の仮想空間モデル提供部７３０によって提供されてよい。 FIG. 11 is a flow chart for explaining a user pose estimation method for a 3D space in another embodiment.
The method shown in FIG. 11 may be performed by the user pose estimator 620 shown in FIG.
At step 1110, the device receives user information including images captured in a 3D space.
In step 1120, the apparatus confirms a 3D virtual space model constructed based on spatial information including depth information and image information for the 3D space. At this time, the 3D virtual space model may be provided by the virtual space model providing unit 730 of FIG.

段階１１３０で、装置は、３次元仮想空間モデル内で前記ユーザ情報に対応する対応情報を生成する。
段階Ｓ１１４０で、装置は、対応情報と前記ユーザ情報との類似度を算出する。
段階Ｓ１１５０で、装置は、類似度に基づいてユーザポーズを推定する。このとき、ユーザポーズは、例えば、ユーザ情報との類似度が最も高い対応情報のポーズであってよい。 At step 1130, the device generates corresponding information corresponding to the user information within the 3D virtual space model.
At step S1140, the device calculates the similarity between the corresponding information and the user information.
At step S1150, the device estimates the user pose based on the similarity. At this time, the user pose may be, for example, the pose of the corresponding information that has the highest degree of similarity with the user information.

図１２は、一実施形態における、追加のユーザポーズ取得方法の例を説明するための図である。
類似度を向上させるために追加のユーザ情報を取得してよく、３次元仮想空間モデルを活用して追加のユーザ情報取得ポーズをユーザに案内してよく、ユーザは、案内されたポーズで追加のユーザ情報を取得してよい。 FIG. 12 is a diagram for explaining an example of an additional user pose acquisition method in one embodiment.
Additional user information may be acquired to improve the similarity, the 3D virtual space model may be utilized to guide the user with additional user information acquisition poses, and the user may perform additional user information acquisition poses with the guided poses. User information may be obtained.

したがって、図１０の説明において、追加のユーザ情報を取得する段階は、３次元仮想空間モデルに基づいて追加のユーザ情報取得のための案内情報をユーザデバイス６１０に送信することを含んでよい。
このとき、案内情報は、３次元仮想空間モデル内の予め設定された特徴点に対するユーザ情報取得ポーズを含み、追加のユーザ情報を取得する段階は、類似度を高める方向に繰り返し実行されてよい。 Therefore, in the description of FIG. 10, obtaining additional user information may include transmitting guidance information for obtaining additional user information to the user device 610 based on the 3D virtual space model.
At this time, the guidance information may include user information acquisition poses for preset feature points in the 3D virtual space model, and the step of acquiring additional user information may be repeatedly performed in the direction of increasing the degree of similarity.

例えば、図１２に示すように、類似の環境が多い長い廊下の場合、３次元仮想空間モデルで特徴点を考慮して追加のユーザ情報取得ポーズを案内してよい。
図１２で、追加のユーザ情報取得ポーズは、特徴点１、２、３に対して順に映像を取得するポーズであるか、特徴点１、２、３のうちのいずれか１つに対するポーズであってよい。 For example, as shown in FIG. 12, in the case of a long corridor with many similar environments, additional user information acquisition poses may be guided in consideration of feature points in the 3D virtual space model.
In FIG. 12, the additional user information acquisition pose is a pose for sequentially acquiring images for feature points 1, 2, and 3, or a pose for any one of feature points 1, 2, and 3. you can

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、例えば、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）および前記ＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを格納、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでもよい。また、並列プロセッサのような、他の処理構成も可能である。 The apparatus described above may be realized by hardware components, software components, and/or a combination of hardware and software components. For example, the devices and components described in the embodiments may include, for example, processors, controllers, ALUs (arithmetic logic units), digital signal processors, microcomputers, FPGAs (field programmable gate arrays), PLUs (programmable logic units), microcontrollers, It may be implemented using one or more general purpose or special purpose computers, such as a processor or various devices capable of executing instructions and responding to instructions. A processing unit may run an operating system (OS) and one or more software applications that run on the OS. The processing unit may also access, store, manipulate, process, and generate data in response to executing software. For convenience of understanding, one processing device may be described as being used, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. You can understand. For example, a processing unit may include multiple processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、仮想装置、コンピュータ格納媒体または装置、または伝送される信号波に永久的または一時的に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で格納されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に格納されてもよい。 Software may include computer programs, code, instructions, or a combination of one or more of these, to configure a processor to operate at its discretion or to independently or collectively instruct a processor. You can Software and/or data may be any kind of machine, component, physical device, virtual device, computer storage medium or device, or to be interpreted on or to provide instructions or data to a processing device. It may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed over computer systems connected by a network so that they are stored and executed in a distributed fashion. Software and data may be stored on one or more computer-readable media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。前記コンピュータ読み取り可能な媒体は、プログラム命令、データファイル、データ構造などを単独でまたは組み合わせて含んでよい。前記媒体に記録されるプログラム命令は、実施形態のために特別に設計されて構成されたものであっても、コンピュータソフトウェアの当業者に公知な使用可能なものであってもよい。コンピュータ読み取り可能な記録媒体の例としては、ハードディスク、フロッピーディスク、および磁気テープのような磁気媒体、ＣＤ－ＲＯＭ、ＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどのようなプログラム命令を格納して実行するように特別に構成されたハードウェア装置が含まれる。プログラム命令の例は、コンパイラによって生成されるもののような機械語コードだけではなく、インタプリタなどを使用してコンピュータによって実行される高級言語コードを含む。上述したハードウェア装置は、実施形態の動作を実行するために１つ以上のソフトウェアモジュールとして動作するように構成されてもよく、その逆も同じである。 The method according to the embodiments may be embodied in the form of program instructions executable by various computer means and recorded on a computer-readable medium. The computer-readable media may include program instructions, data files, data structures, etc. singly or in combination. The program instructions recorded on the medium may be those specially designed and constructed for an embodiment, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. , and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that is executed by a computer, such as using an interpreter, as well as machine language code, such as that generated by a compiler. The hardware devices described above may be configured to act as one or more software modules to perform the operations of the embodiments, and vice versa.

以上のように、実施形態を、限定された実施形態と図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。
したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 As described above, the embodiments have been described based on the limited embodiments and drawings, but those skilled in the art will be able to make various modifications and variations based on the above description. For example, the techniques described may be performed in a different order than in the manner described and/or components such as systems, structures, devices, circuits, etc. described may be performed in a manner different from the manner described. Appropriate results may be achieved when combined or combined, opposed or substituted by other elements or equivalents.
Accordingly, different embodiments that are equivalent to the claims should still fall within the scope of the appended claims.

Claims

acquiring spatial information including depth information and image information for a 3D space using a depth measuring device and an image acquiring device;
constructing depth-image interaction information based on the spatial information, and constructing a 3D virtual space model corresponding to the 3D space based on the depth-image interaction information;
receiving user information including images captured by a user device in the three-dimensional space;
generating correspondence information corresponding to the user information in the three-dimensional virtual space model;
calculating a similarity between the corresponding information and the user information; and estimating a user pose including position information and direction information of the user device acquiring an image in the 3D space based on the similarity. A user pose estimation method for a three-dimensional space, comprising:

The step of constructing the three-dimensional virtual space model includes:
A background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space are separated from image information for the 3D space, and the background area is used for the 3D space. characterized by constructing a virtual space model,
A user pose estimation method for a three-dimensional space according to claim 1.

The step of generating the correspondence information includes:
separating a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in the image included in the user information;
processing the user information using a background area of an image included in the user information; and generating corresponding information corresponding to the processed user information in the three-dimensional virtual space model. A user pose estimation method for a three-dimensional space according to Item 1.

The step of calculating the degree of similarity includes:
The user pose estimation method for a 3D space according to claim 1, comprising the steps of: regenerating the correspondence information in the direction of increasing the similarity; and recalculating the similarity based on the regenerated correspondence information. .

The step of calculating the degree of similarity includes:
extracting a comparison target region for comparing the user information and the corresponding information;
determining a common area from the comparison area extracted from the user information and the comparison area extracted from the correspondence information; and regenerating the user information and the correspondence information based on the common area. 2. The method of estimating a user pose for a three-dimensional space according to claim 1, comprising:

The step of calculating the degree of similarity includes:
2. The method of claim 1, comprising: obtaining additional user information about the user device surroundings; and calculating a similarity based on the user information and the additional user information.

Estimating the user pose includes:
When user additional information, which is additional information used for estimating the user pose, is acquired by the user device, the user pose is obtained by using the user additional information together with the user information or the additional user information. including estimating
A user pose estimation method for a three-dimensional space according to claim 6.

Obtaining the additional user information includes:
transmitting guidance information for obtaining additional user information to the user device based on the three-dimensional virtual space model;
A user pose estimation method for a three-dimensional space according to claim 6.

The guidance information includes user information acquisition poses for preset feature points in the three-dimensional virtual space model,
The step of acquiring the additional user information is repeatedly performed in the direction of increasing the similarity,
A user pose estimation method for a three-dimensional space according to claim 8.

A method of estimating a user pose including position and orientation information of a user device with respect to a three-dimensional space, comprising:
receiving user information including an image captured in the three-dimensional space;
confirming a 3D virtual space model constructed based on spatial information including depth information and image information for the 3D space;
generating correspondence information corresponding to the user information in the three-dimensional virtual space model;
A user pose estimation method for a three-dimensional space, comprising: calculating a similarity between the corresponding information and the user information; and estimating a user pose based on the similarity.

The 3D virtual space model divides a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in image information for the 3D space, and characterized by being constructed using the area,
A user pose estimation method for a three-dimensional space according to claim 10.

The step of generating the correspondence information includes:
separating a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in the image included in the user information;
processing the user information using a background area of an image included in the user information; and generating corresponding information corresponding to the processed user information in the three-dimensional virtual space model. Item 11. User pose estimation method for three-dimensional space according to item 10.

The step of calculating the degree of similarity includes:
11. The user pose estimation method for a 3D space according to claim 10, comprising the steps of: regenerating the correspondence information in a direction of increasing the similarity; and recalculating the similarity based on the regenerated correspondence information. .

The step of calculating the degree of similarity includes:
extracting a comparison target region for comparing the user information and the corresponding information;
determining a common area from the comparison area extracted from the user information and the comparison area extracted from the correspondence information; and regenerating the user information and the correspondence information based on the common area. 11. A user pose estimation method for a three-dimensional space according to claim 10, comprising:

The step of calculating the degree of similarity includes:
11. The user pose estimation method for a three-dimensional space according to claim 10, comprising: obtaining additional user information about a user device surroundings; and calculating a similarity based on the user information and the additional user information.

Estimating the user pose includes:
When user additional information, which is additional information used for estimating the user pose, is acquired by the user device, the user pose is obtained by using the user additional information together with the user information or the additional user information. including estimating
A user pose estimation method for a three-dimensional space according to claim 15.

Obtaining the additional user information includes:
transmitting guidance information for obtaining additional user information to the user device based on the three-dimensional virtual space model;
A user pose estimation method for a three-dimensional space according to claim 15.

The guidance information includes user information acquisition poses for preset feature points in the three-dimensional virtual space model,
The step of acquiring the additional user information is repeatedly performed in the direction of increasing the similarity,
A user pose estimation method for a three-dimensional space according to claim 17.

a spatial information acquisition unit that acquires spatial information including depth information and video information for a three-dimensional space;
a virtual space model generating unit that configures depth-video link information based on the spatial information and generates a 3D virtual space model corresponding to the 3D space based on the depth-video link information;
a user information receiving unit for receiving user information including an image acquired by a user device in the three-dimensional space; and generating correspondence information corresponding to the user information in the three-dimensional virtual space model, A user pose estimation apparatus for a three-dimensional space, comprising: a controller including at least one processor configured to calculate a similarity measure with user information and estimate a user pose based on said similarity measure.

The virtual space model generation unit
A background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space are separated from image information for the 3D space, and the background area is used for the 3D space. characterized by constructing a virtual space model,
A user pose estimation apparatus for a three-dimensional space according to claim 19.

The control unit
distinguishing between a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in the image included in the user information; processing the user information using the background area to generate corresponding information corresponding to the user information processed in the three-dimensional virtual space model;
A user pose estimation apparatus for a three-dimensional space according to claim 19.

An apparatus for estimating a user pose including position and orientation information of a user device with respect to a three-dimensional space, comprising:
a virtual space model providing unit that provides a 3D virtual space model constructed based on space information including depth information and image information for the 3D space;
a user information receiving unit that receives user information including an image acquired by the user device in the three-dimensional space; and a corresponding information that corresponds to the user information in the three-dimensional virtual space model, A user pose estimation apparatus for a three-dimensional space, comprising: a controller including at least one processor configured to calculate a similarity with said user information and estimate said user pose based on said similarity.

The 3D virtual space model divides a background area related to the structure of the 3D space and a non-background area corresponding to an object placed in the 3D space in image information for the 3D space, and characterized by being constructed using the area,
User pose estimation apparatus for a three-dimensional space according to claim 22.

a user information generation unit that generates user information including an image for a three-dimensional space;
a communication unit that transmits the user information to a user pose estimation server and receives information about the user pose estimated by the three-dimensional virtual space model from the server; and controls operations of the user information generation unit and the communication unit, A user pose estimation apparatus for three-dimensional space, comprising: a control unit including at least one processor configured to communicate information about the user pose to a currently executing application or driving system.

The 3D virtual space model is generated based on spatial information including depth information and image information for the 3D space, and the image information for the 3D space includes a background region related to the structure of the 3D space and the A non-background area corresponding to an object placed in a three-dimensional space is separated and constructed using the background area,
User pose estimation apparatus for a three-dimensional space according to claim 24.