JP7089238B2

JP7089238B2 - Center of gravity position estimation device, center of gravity position estimation method, program

Info

Publication number: JP7089238B2
Application number: JP2018210222A
Authority: JP
Inventors: 康輔高橋; 弾三上; 麻理子五十川; 英明木全; 朋也鶏内; 英雄斎藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2022-06-22
Anticipated expiration: 2038-11-08
Also published as: JP2020076647A

Description

本発明は、多視点映像から人体の重心位置を推定する技術に関する。 The present invention relates to a technique for estimating the position of the center of gravity of the human body from a multi-viewpoint image.

人体の三次元重心位置(COM: Center of Mass)は、リハビリやスポーツをはじめとする様々な分野において人間の動作を解析する上で重要なものである。一般に、三次元重心位置を推定するためには、フォースプレートから得られる三次元重心位置を鉛直方向に投影した二次元位置(COP: Center of Projection)を利用することが多い。このフォースプレートを利用するためには解析対象となる人物がそのプレート上に乗っている必要がある。そのため、実験室環境や制御可能な環境では問題なく二次元位置を推定できる一方で、実際の試合中のようにプレートを設置することが困難な環境では二次元位置を推定することができない。 The position of the three-dimensional center of gravity (COM) of the human body is important for analyzing human movements in various fields such as rehabilitation and sports. Generally, in order to estimate the position of the three-dimensional center of gravity, a two-dimensional position (COP: Center of Projection) obtained by projecting the position of the three-dimensional center of gravity obtained from the force plate in the vertical direction is often used. In order to use this force plate, the person to be analyzed must be on the plate. Therefore, while the two-dimensional position can be estimated without any problem in a laboratory environment or a controllable environment, the two-dimensional position cannot be estimated in an environment where it is difficult to install a plate as in an actual game.

この問題に対して、非特許文献１では、複数の視点から撮影された映像（以下、多視点映像という）を用いることにより、フォースプレートを用いたり、解析対象である人物に何らかのセンサを取り付けたりすることなく、非侵襲に人体の三次元重心位置を推定する手法を提案している。当該手法では、視体積交差法を用いて人体が空間に占める領域を表す三次元領域を複数の単位立方体（ボクセル）の集合体として復元し、三次元領域に含まれるボクセルに対して人体の各部位に応じた適切な重みを付与し、ボクセルの位置座標と重みを用いて計算される重み付き平均を三次元重心位置として推定している。 In response to this problem, in Non-Patent Document 1, by using images taken from a plurality of viewpoints (hereinafter referred to as multi-viewpoint images), a force plate can be used, or some kind of sensor can be attached to a person to be analyzed. We are proposing a method to estimate the position of the three-dimensional center of gravity of the human body non-invasively without doing so. In this method, the three-dimensional region representing the area occupied by the human body in space is restored as a collection of multiple unit cubes (voxels) using the visual volume crossing method, and each of the human bodies is relative to the voxels contained in the three-dimensional region. Appropriate weights are given according to the part, and the weighted average calculated using the voxel position coordinates and weights is estimated as the three-dimensional center of gravity position.

鶏内朋也，森尚平，斎藤英雄，高橋康輔，三上弾，五十川麻理子，木全英明，“複数視点画像を用いた人体の重心位置推定手法の検討”，信学技報, Vol.117, No.252, MVE2017-28, pp.19-23, 2017年10月．Tomoya Toriuchi, Shohei Mori, Hideo Saito, Kosuke Takahashi, Bullet Mikami, Mariko Ichikawa, Hideaki Kizen, "Study of Method for Estimating Center of Gravity of Human Body Using Multiple Viewpoint Images", Shingaku Giho, Vol.117, No.252, MVE2017-28, pp.19-23, October 2017.

先述の通り、非特許文献１の手法では視体積交差法を用いる。このため、図１(a)の鳥瞰図に示すようにカメラから死角となる場所が存在しない場合には人体の三次元領域を適切に復元することができるが、図１(b)の鳥瞰図に示すようにカメラから死角となる場所が存在する場合には人体の三次元領域を正しく復元することができない。そのため、カメラから観測できない領域が存在する場合が存在する場合には、人体の三次元領域を正しく復元することができないことに起因して、三次元重心位置の推定精度が低下するという問題がある。 As described above, the visual volume crossing method is used in the method of Non-Patent Document 1. Therefore, as shown in the bird's-eye view of FIG. 1 (a), when there is no blind spot from the camera, the three-dimensional region of the human body can be appropriately restored, but it is shown in the bird's-eye view of FIG. 1 (b). If there is a blind spot from the camera, the three-dimensional area of the human body cannot be restored correctly. Therefore, if there is a region that cannot be observed from the camera, there is a problem that the estimation accuracy of the three-dimensional center of gravity position is lowered due to the inability to correctly restore the three-dimensional region of the human body. ..

そこで本発明では、カメラから観測できない領域が存在する場合であっても、精度よく、多視点映像から人体の三次元重心位置を推定する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for accurately estimating the position of the three-dimensional center of gravity of the human body from a multi-viewpoint image even when there is a region that cannot be observed by the camera.

本発明の一態様は、被写体となる人物を取り囲むように設置された複数のカメラで撮影された多視点映像から、時刻tにおいて前記人物の体が空間に占める領域を表す三次元領域を構成するボクセルの位置の三次元座標であるボクセル位置座標を少なくとも含むボクセルデータの集合を時刻tにおける三次元モデルとして生成する三次元モデル生成部と、前記多視点映像から、時刻tにおける前記人物の関節の位置を表す三次元座標を時刻tにおける関節位置座標として推定する三次元関節位置推定部と、前記時刻tにおける三次元モデルと前記時刻tにおける関節位置座標とから、前記三次元領域を構成するボクセルに対して当該ボクセルが含まれる人体の部位を識別するための部位識別子と当該ボクセルに対する重みを決定することにより、前記ボクセル位置座標と前記部位識別子と前記重みを少なくとも含む重み付きボクセルデータの集合を時刻tにおける重み付き三次元モデルとして生成する重み付き三次元モデル生成部と、部位三次元重心位置を前記人物の体の部位の重み付き平均、三次元重心位置を前記人物の体の重み付き平均とし、前記時刻tにおける重み付き三次元モデルと、時刻tより前の時刻における部位識別子i（i=1, …, N_p、N_pは人体の部位の数）の部位三次元重心位置と所定の時刻における重み付き三次元モデル（以下、基準重み付き三次元モデルという）とから、時刻tにおける三次元重心位置を推定する三次元重心位置推定部とを含む。 One aspect of the present invention constitutes a three-dimensional region representing an region occupied by the body of the person at time t from a multi-viewpoint image taken by a plurality of cameras installed so as to surround the person as a subject. From the multi-viewpoint video, the three-dimensional model generator that generates a set of boxel data including at least the boxel position coordinates, which are the three-dimensional coordinates of the boxel position, as a three-dimensional model at time t, and the joint of the person at time t. A box cell that constitutes the three-dimensional region from a three-dimensional joint position estimation unit that estimates three-dimensional coordinates representing positions as joint position coordinates at time t, a three-dimensional model at time t, and joint position coordinates at time t. By determining a part identifier for identifying a part of the human body including the boxel and a weight for the boxel, a set of weighted boxel data including at least the boxel position coordinates, the part identifier, and the weight is obtained. The weighted 3D model generator generated as a weighted 3D model at time t, the part 3D center of gravity position is the weighted average of the part of the person's body, and the 3D center of gravity position is the weighted average of the person's body. Then, the weighted three-dimensional model at the time t and the part three-dimensional center of gravity position of the part identifier i (i = 1, ..., N _p , N _p are the number of parts of the human body) at the time before the time t are determined. Includes a 3D center of gravity position estimation unit that estimates the position of the 3D center of gravity at time t from the weighted 3D model (hereinafter referred to as the reference weighted 3D model) at the time of.

本発明によれば、カメラから観測できない領域が存在する場合であっても、精度よく、多視点映像から人体の三次元重心位置を推定することが可能となる。 According to the present invention, it is possible to accurately estimate the position of the three-dimensional center of gravity of the human body from a multi-viewpoint image even when there is a region that cannot be observed by the camera.

複数カメラと復元された人体の領域の関係を示す図。The figure which shows the relationship between the multiple cameras and the area of the restored human body. 重心位置推定装置１００の構成を示すブロック図。The block diagram which shows the structure of the center of gravity position estimation apparatus 100. 重心位置推定装置１００の動作を示すフローチャート。The flowchart which shows the operation of the center of gravity position estimation apparatus 100. 三次元領域とその構成要素であるボクセルの様子を示す図。The figure which shows the state of a three-dimensional area and a voxel which is a component thereof. ボクセルのサイズを決定する様子を示す図。The figure which shows how the size of a voxel is determined. 三次元領域における人体の部位の一例を示す図。The figure which shows an example of the part of a human body in a three-dimensional area. 三次元重心位置推定部１４０の構成を示すブロック図。The block diagram which shows the structure of the 3D center of gravity position estimation part 140. 三次元重心位置推定部１４０の動作を示すフローチャート。The flowchart which shows the operation of the 3D center of gravity position estimation part 140.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. The components having the same function are given the same number, and duplicate explanations are omitted.

＜第一実施形態＞
上記問題を解決するため、本発明の実施形態では、“人体の各部位の三次元重心位置は滑らかに変化する”という知見（以下、知見１という）と、“人体の各部位の総重量は時間方向に不変である”という知見（以下、知見２という）とを取り入れ、三次元重心位置を推定する。 <First Embodiment>
In order to solve the above problems, in the embodiment of the present invention, the finding that "the position of the three-dimensional center of gravity of each part of the human body changes smoothly" (hereinafter referred to as "finding 1") and "the total weight of each part of the human body are The position of the three-dimensional center of gravity is estimated by incorporating the knowledge that "it is invariant in the time direction" (hereinafter referred to as "knowledge 2").

以下、図２～図３を参照して重心位置推定装置１００を説明する。図２は、重心位置推定装置１００の構成を示すブロック図である。図３は、重心位置推定装置１００の動作を示すフローチャートである。 Hereinafter, the center of gravity position estimation device 100 will be described with reference to FIGS. 2 to 3. FIG. 2 is a block diagram showing the configuration of the center of gravity position estimation device 100. FIG. 3 is a flowchart showing the operation of the center of gravity position estimation device 100.

重心位置推定装置１００には、複数（つまり、２以上）のカメラで撮影された多視点映像が入力される。これらのカメラは同一の被写体を撮影するように設定されている。具体的には、図１に示すように、被写体となる人物を取り囲むように複数のカメラを設置すればよい。カメラで撮影された映像は重心位置推定装置１００に直接入力してもよいし、一旦別の外部記録媒体に記録してから重心位置推定装置１００に入力してもよい。なお、これらのカメラは同期・幾何校正済みであることが好ましい。この場合、複数のカメラで撮影された映像は同期・幾何校正済みの多視点映像となる。以下、カメラの台数をN_c（N_cは2以上の整数）とする。また、映像を構成する画像であるフレームを識別するための番号であるフレーム番号t（t=1, 2, …、つまりtは1以上の整数）のことを時刻tということにする。 A multi-viewpoint image taken by a plurality of (that is, two or more) cameras is input to the center of gravity position estimation device 100. These cameras are set to shoot the same subject. Specifically, as shown in FIG. 1, a plurality of cameras may be installed so as to surround the person to be the subject. The image taken by the camera may be directly input to the center of gravity position estimation device 100, or may be once recorded on another external recording medium and then input to the center of gravity position estimation device 100. It is preferable that these cameras have been synchronized and geometrically calibrated. In this case, the images taken by the plurality of cameras are synchronized and geometrically calibrated multi-viewpoint images. Hereinafter, the number of cameras is N _c (N _c is an integer of 2 or more). Further, the frame number t (t = 1, 2, ..., That is, t is an integer of 1 or more), which is a number for identifying a frame that is an image constituting the video, is referred to as a time t.

図２に示すように重心位置推定装置１００は、三次元モデル生成部１１０と、三次元関節位置推定部１２０と、重み付き三次元モデル生成部１３０と、三次元重心位置推定部１４０と、記録部１９０を含む。記録部１９０は、処理に必要な情報を適宜記録する構成部である。例えば、記録部１９０は、N_c個の映像から構成される多視点映像を記録する。 As shown in FIG. 2, the center of gravity position estimation device 100 includes a three-dimensional model generation unit 110, a three-dimensional joint position estimation unit 120, a weighted three-dimensional model generation unit 130, a three-dimensional center of gravity position estimation unit 140, and recording. Includes part 190. The recording unit 190 is a component unit that appropriately records information necessary for processing. For example, the recording unit 190 records a multi-viewpoint video composed of N _c videos.

以下、図３を参照して、重心位置推定装置１００の動作について説明する。三次元モデル生成部１１０は、記録部１９０に記録している多視点映像を入力とし、時刻tにおいて被写体となる人物の体が空間に占める領域を表す三次元領域を構成するボクセルの位置の三次元座標であるボクセル位置座標を少なくとも含むボクセルデータの集合を時刻tにおける三次元モデルとして生成し、出力する（Ｓ１１０）。ボクセルデータは、ボクセル位置座標の他、例えば、ボクセルを識別するためのボクセル識別子を含んでいてもよい。図４に示すように、三次元領域は、所定の長さ（以下、単位長という。）を一辺とする立方体であるボクセルから構成される。ボクセルのサイズ（単位長）は、撮影に用いるカメラの解像度、映像中の人物の領域、期待する三次元重心位置の推定精度などに依存するものである。なお、ボクセルのサイズは、カメラC_k(k=1, …, N_c)で撮影した映像の1ピクセル四方の正方形を実空間中に逆投影したときの、人物が存在する領域における当該正方形の辺の長さをL_kとし、min_k(L_k)以下であることが望ましい（図５参照）。 Hereinafter, the operation of the center of gravity position estimation device 100 will be described with reference to FIG. The three-dimensional model generation unit 110 takes the multi-viewpoint image recorded in the recording unit 190 as an input, and at time t, the third-order position of the voxel constituting the three-dimensional area representing the area occupied by the body of the person to be the subject in the space. A set of voxel data including at least the voxel position coordinates which are the original coordinates is generated and output as a three-dimensional model at time t (S110). The voxel data may include, for example, a voxel identifier for identifying a voxel, in addition to the voxel position coordinates. As shown in FIG. 4, the three-dimensional region is composed of voxels which are cubes having a predetermined length (hereinafter referred to as a unit length) as one side. The size (unit length) of the voxel depends on the resolution of the camera used for shooting, the area of the person in the image, the estimation accuracy of the expected three-dimensional center of gravity position, and the like. The size of the voxel is the size of the square in the area where the person exists when a 1-pixel square of the image taken by the camera C _k (k = 1, ..., N _c ) is back-projected into the real space. The length of the side is L _k , and it is desirable that it is min _k (L _k ) or less (see FIG. 5).

また、三次元領域の復元には任意の手法を用いることができる。例えば、非特許文献１で利用されている視体積交差法、つまり、ビジュアルハル(Visual Hull)に基づく方法を用いることができる。視体積交差法は多視点映像に映り込む被写体のシルエットの積集合として三次元領域を求める方法であり、参考非特許文献１に詳しく説明されている。
（参考非特許文献１：Takashi Matsuyama, Xiaojun Wu, Takeshi Takai, and Shohei Nobuhara, “Real-time 3D shape reconstruction, dynamic 3D mesh deformation, and high fidelity visualization for 3D video”, Computer Vision and Image Understanding, Vol.96, Issue 3, pp.393-434, Dec. 2004.） In addition, any method can be used to restore the three-dimensional region. For example, the visual volume crossing method used in Non-Patent Document 1, that is, a method based on Visual Hull can be used. The visual volume crossing method is a method of obtaining a three-dimensional region as an intersection of silhouettes of a subject reflected in a multi-viewpoint image, and is described in detail in Reference Non-Patent Document 1.
(Reference Non-Patent Document 1: Takashi Matsuyama, Xiaojun Wu, Takeshi Takai, and Shohei Nobuhara, “Real-time 3D shape reconstruction, dynamic 3D mesh deformation, and high fidelity visualization for 3D video”, Computer Vision and Image Understanding, Vol. 96, Issue 3, pp.393-434, Dec. 2004.)

三次元関節位置推定部１２０は、Ｓ１１０で入力した多視点映像を入力とし、時刻tにおける被写体となる人物の関節の位置を表す三次元座標を時刻tにおける関節位置座標として推定し、出力する（Ｓ１２０）。関節位置座標の推定には任意の手法を用いることができる。例えば、非特許文献１と同様の方法を用いることができる。具体的には、まず多視点映像を構成する各映像に含まれる時刻tにおける画像上で当該人物の各関節の二次元座標を推定する。次に、推定した二次元座標を用いて三角測量により三次元座標である関節位置座標を求める。なお、関節の二次元座標の推定には、例えば、参考非特許文献２に示す方法を用いることができる。
（参考非特許文献２：Zhe Cao, Tomas Simon, Shih-En Wei and Yaser Sheikh, “Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”, CVPR 2017, 2017.） The three-dimensional joint position estimation unit 120 takes the multi-viewpoint image input in S110 as input, estimates three-dimensional coordinates representing the position of the joint of the person who is the subject at time t as the joint position coordinates at time t, and outputs the coordinates ( S120). Any method can be used to estimate the joint position coordinates. For example, the same method as in Non-Patent Document 1 can be used. Specifically, first, the two-dimensional coordinates of each joint of the person concerned are estimated on the image at time t included in each video constituting the multi-viewpoint video. Next, the joint position coordinates, which are three-dimensional coordinates, are obtained by triangulation using the estimated two-dimensional coordinates. For the estimation of the two-dimensional coordinates of the joint, for example, the method shown in Reference Non-Patent Document 2 can be used.
(Reference Non-Patent Document 2: Zhe Cao, Tomas Simon, Shih-En Wei and Yaser Sheikh, “Realtime Multi-Person 2D Pose Optimization Using Part Affinity Fields”, CVPR 2017, 2017.)

重み付き三次元モデル生成部１３０は、Ｓ１１０で生成した時刻tにおける三次元モデルとＳ１２０で推定した時刻tにおける関節位置座標を入力とし、三次元領域を構成するボクセルに対して当該ボクセルが含まれる人体の部位を識別するための部位識別子と当該ボクセルに対する重みを決定することにより、ボクセル位置座標と部位識別子と重みを少なくとも含む重み付きボクセルデータの集合を時刻tにおける重み付き三次元モデルとして生成し、出力する（Ｓ１３０）。重み付きボクセルデータは、ボクセル位置座標、部位識別子、重みの他、例えば、ボクセル識別子を含んでいてもよい。 The weighted three-dimensional model generation unit 130 inputs the three-dimensional model at the time t generated in S110 and the joint position coordinates at the time t estimated in S120, and includes the voxels for the voxels constituting the three-dimensional region. By determining the part identifier for identifying the part of the human body and the weight for the voxel, a set of weighted voxel data including at least the voxel position coordinates, the part identifier and the weight is generated as a weighted three-dimensional model at time t. , Output (S130). The weighted voxel data may include, for example, a voxel identifier in addition to the voxel position coordinates, the site identifier, and the weight.

部位識別子と重みを決定する方法には任意の方法を用いることができる。例えば、非特許文献１と同様の方法を用いることができる。非特許文献１では、図６に示すように、人体を１０個の部位（頭部、胴体、左上腕、左前腕、右上腕、右前腕、左大腿、左ふくらはぎ、右大腿、右ふくらはぎ）でモデル化する。一般に、部位の数は１０に限らず、N_p（N_pは2以上の整数であり、人体の部位の数を表す）とすることができる。以下、各部位を識別するための部位識別子をi（i=1, …, N_p）とする。また、部位と部位の結節点である関節間を結ぶ直線に対して、対応する体の部位を与えておく。例えば、頭部に対応する直線に対しては頭部が与えられる（図６参照）。そして、各ボクセルに対して、関節間を結ぶ直線と当該ボクセルとの距離が最小となる直線を求め、この距離が最小となる直線に対応する体の部位に当該ボクセルは含まれるのとみなし、当該ボクセルが含まれる部位の部位識別子を決定する。また、参考非特許文献３に示すような一般的な人体の部位の比重に基づいて各ボクセルに対する重みを決定する。
（参考非特許文献３：P. D. Leva, “Adjustments to Zatsiorsky-Seluyanov’s segment inertia parameters”, J. of Biomechanic, 29(9), pp.1223-1230, 1996.） Any method can be used as a method for determining the site identifier and the weight. For example, the same method as in Non-Patent Document 1 can be used. In Non-Patent Document 1, as shown in FIG. 6, the human body is divided into 10 parts (head, torso, upper left arm, left forearm, upper right arm, right forearm, left thigh, left calf, right thigh, right calf). Model. In general, the number of parts is not limited to 10, and can be N _p (N _p is an integer of 2 or more and represents the number of parts of the human body). Hereinafter, the site identifier for identifying each site is referred to as i (i = 1, ..., N _p ). In addition, the corresponding body part is given to the straight line connecting the part and the joint which is the node of the part. For example, a head is given to a straight line corresponding to the head (see FIG. 6). Then, for each voxel, a straight line that minimizes the distance between the straight line connecting the joints and the voxel is obtained, and the voxel is considered to be included in the body part corresponding to the straight line that minimizes this distance. Determine the site identifier of the site containing the voxel. Further, the weight for each voxel is determined based on the specific density of a general part of the human body as shown in Reference Non-Patent Document 3.
(Reference Non-Patent Document 3: PD Leva, “Adjustments to Zatsiorsky-Seluyanov's segment inertia parameters”, J. of Biomechanic, 29 (9), pp.1223-1230, 1996.)

なお、各部位の比重は事前に記録部１９０に記録しておく。 The specific gravity of each part is recorded in the recording unit 190 in advance.

三次元重心位置推定部１４０は、Ｓ１３０で生成した時刻tにおける重み付き三次元モデルと、記録部１９０に記録している時刻tより前の時刻における部位識別子i（i=1, …, N_p）の部位三次元重心位置と所定の時刻（以下、基準時刻という）における重み付き三次元モデル（以下、基準重み付き三次元モデルという）を入力とし、時刻tにおける三次元重心位置を推定し、出力する（Ｓ１４０）。ここで、部位三次元重心位置は被写体となる人物の体の部位の重み付き平均、三次元重心位置は当該人物の体の重み付き平均である。また、詳細は後述するが、基準重み付き三次元モデルは、知見２における時間方向に不変の各部位の総重量を示すためのものである。 The three-dimensional center of gravity position estimation unit 140 has a weighted three-dimensional model at time t generated in S130 and a part identifier i (i = 1, ..., N _p ) at a time before the time t recorded in the recording unit 190. ) Part 3D center of gravity position and weighted 3D model at a predetermined time (hereinafter referred to as reference time) (hereinafter referred to as reference weighted 3D model) are input, and the 3D center of gravity position at time t is estimated. Output (S140). Here, the position of the three-dimensional center of gravity of the part is the weighted average of the part of the body of the person to be the subject, and the position of the three-dimensional center of gravity is the weighted average of the body of the person. Further, although the details will be described later, the reference weighted three-dimensional model is for showing the total weight of each part that does not change in the time direction in Finding 2.

以下、図７～図８を参照して三次元重心位置推定部１４０について説明する。図７は、三次元重心位置推定部１４０の構成を示すブロック図である。図８は、三次元重心位置推定部１４０の動作を示すフローチャートである。図７に示すように三次元重心位置推定部１４０は、部位三次元重心位置推定部１４１と、重み付き三次元モデル修正部１４２と、三次元重心位置計算部１４３を含む。部位三次元重心位置推定部１４１、重み付き三次元モデル修正部１４２は、それぞれ知見１、知見２に基づく処理を実行する構成部である。 Hereinafter, the three-dimensional center of gravity position estimation unit 140 will be described with reference to FIGS. 7 to 8. FIG. 7 is a block diagram showing the configuration of the three-dimensional center of gravity position estimation unit 140. FIG. 8 is a flowchart showing the operation of the three-dimensional center of gravity position estimation unit 140. As shown in FIG. 7, the three-dimensional center of gravity position estimation unit 140 includes a site three-dimensional center of gravity position estimation unit 141, a weighted three-dimensional model correction unit 142, and a three-dimensional center of gravity position calculation unit 143. The site three-dimensional center of gravity position estimation unit 141 and the weighted three-dimensional model correction unit 142 are constituent units that execute processing based on the knowledge 1 and the knowledge 2, respectively.

以下、図８を参照して、三次元重心位置推定部１４０の動作について説明する。部位三次元重心位置推定部１４１は、時刻tより前の時刻における部位識別子i（i=1, …, N_p）の部位三次元重心位置を入力とし、時刻tにおける部位識別子iの部位三次元重心位置を推定し、出力する（Ｓ１４１）。また、部位三次元重心位置推定部１４１は、推定した時刻tにおける部位識別子iの部位三次元重心位置を記録部１９０に記録する。 Hereinafter, the operation of the three-dimensional center of gravity position estimation unit 140 will be described with reference to FIG. The part three-dimensional center of gravity position estimation unit 141 inputs the part three-dimensional center of gravity position of the part identifier i (i = 1, ..., N _p ) at the time before the time t, and the part three-dimensional part of the part identifier i at the time t. The position of the center of gravity is estimated and output (S141). Further, the site three-dimensional center of gravity position estimation unit 141 records the site three-dimensional center of gravity position of the site identifier i at the estimated time t in the recording unit 190.

推定には、任意の手法を用いることができる。例えば、N_tを1以上の整数とし、時刻t-N_tから時刻t-1までにおけるN_t個の部位識別子iの部位三次元重心位置C3D(i, t-N_t), …, C3D(i, t-1)に対してカルマンフィルタやパーティクルフィルタなどの時系列フィルタを適用することにより、時刻tにおける部位識別子iの部位三次元重心位置C3D(i, t)を推定してもよい。なお、N_tの適切な値は、映像のフレームレート、被写体の動きの速さ、時系列フィルタの種類などに依存するが、少なくとも1時刻前の部位三次元重心位置C3D(i, t-1)があれば部位三次元重心位置C3D(i, t)を推定することができる。 Any method can be used for estimation. For example, let N _t be an integer of 1 or more, and the part 3D center of gravity position C3D (i, tN _t ),…, C3D (i, t-) of N _t part identifiers i from time tN _t to time t-1. By applying a time-series filter such as a Kalman filter or a particle filter to 1), the part three-dimensional center of gravity position C3D (i, t) of the part identifier i at time t may be estimated. The appropriate value of N _t depends on the frame rate of the image, the speed of movement of the subject, the type of time-series filter, etc., but the position of the 3D center of gravity of the part at least 1 hour ago C3D (i, t-1) If there is), the position of the three-dimensional center of gravity of the site C3D (i, t) can be estimated.

また、t=1の場合は、時刻t=1における部位識別子iの部位三次元重心位置を適切な方法で推定すればよい。例えば、部位三次元重心位置推定部１４１は、時刻t=1における重み付き三次元モデルを用いて、部位識別子iの部位に含まれるボクセルの集合に対して、ボクセル位置座標と重みを用いて重み付き平均を計算し、その値を部位識別子iの部位三次元重心位置C3D(i, 1)とすればよい。 Further, when t = 1, the position of the three-dimensional center of gravity of the part of the part identifier i at time t = 1 may be estimated by an appropriate method. For example, the site three-dimensional center of gravity position estimation unit 141 uses a weighted three-dimensional model at time t = 1 to weight a set of voxels contained in the site of the site identifier i using voxel position coordinates and weights. The attached average may be calculated, and the value may be set as the site three-dimensional center of gravity position C3D (i, 1) of the site identifier i.

重み付き三次元モデル修正部１４２は、Ｓ１４１で推定した時刻tにおける部位識別子i（i=1, …, N_p）の部位三次元重心位置と、記録部１９０に記録している基準重み付き三次元モデルと、Ｓ１３０で生成した時刻tにおける重み付き三次元モデルを入力とし、時刻tにおける修正済み重み付き三次元モデルを生成し、出力する（Ｓ１４２）。ここで、時刻tにおける修正済み重み付き三次元モデルは、時刻tにおける重み付き三次元モデルから所定の方法により決定した重み付きボクセルデータを取り除くことにより生成されるものであり、集合として時刻tにおける重み付き三次元モデルに含まれるものとなる。また、重み付き三次元モデル修正部１４２は、生成した時刻tにおける修正済み重み付き三次元モデルを時刻tにおける重み付き三次元モデルとして記録部１９０に記録する。なお、時刻tにおける重み付き三次元モデルを記録するのは、以下で説明するように、基準時刻と一致する時刻における重み付き三次元モデルを基準重み付き三次元モデルとするためである。 The weighted three-dimensional model correction unit 142 has the part three-dimensional center of gravity position of the part identifier i (i = 1, ..., N _p ) at the time t estimated in S141, and the reference weighted third order recorded in the recording unit 190. The original model and the weighted 3D model at time t generated in S130 are input, and the modified weighted 3D model at time t is generated and output (S142). Here, the modified weighted 3D model at time t is generated by removing the weighted voxel data determined by a predetermined method from the weighted 3D model at time t, and is generated as a set at time t. It will be included in the weighted 3D model. Further, the weighted three-dimensional model correction unit 142 records the corrected weighted three-dimensional model at the generated time t in the recording unit 190 as the weighted three-dimensional model at the time t. The reason why the weighted three-dimensional model at time t is recorded is that, as described below, the weighted three-dimensional model at the time corresponding to the reference time is used as the reference weighted three-dimensional model.

以下、生成方法について具体的に説明する。ここで、i=1, …, N_pに対して、時刻tにおいて部位識別子iの部位に含まれるボクセルの数をN_i,tとし、時刻tにおいて部位識別子iの部位に含まれるボクセルをv(i, j, t)（j=1, …, N_i,t）、当該ボクセルのボクセル位置座標をp(i, j, t)、当該ボクセルの重みをw_i,j,tと表すことにする。つまり、p(i, j, t)は時刻tにおける部位識別子iの部位に含まれるj番目のボクセルの位置座標、w_i,j,tは時刻tにおける部位識別子iの部位に含まれるj番目のボクセルに付与された重みを表す。一般に、視体積交差法を用いて三次元領域を復元すると、実際に人体が空間に占める領域よりも大きな領域を三次元領域として復元してしまう。そこで、ここでは、人体が空間に占める実際の領域に含まれない（つまり、不要な）ボクセルv(i,j,t)を削除することにより、重み付き三次元モデルを修正する。 Hereinafter, the generation method will be specifically described. Here, for i = 1, ..., N _p , the number of voxels contained in the site of the site identifier i at time t is N _{i, t} , and the voxels contained in the site of site identifier i at time t are v. (i, j, t) (j = 1,…, N _{i, t} ), the voxel position coordinates of the voxel are expressed as p (i, j, t), and the weight of the voxel is expressed as w _{i, j, t} . To. That is, p (i, j, t) is the position coordinate of the j-th voxel contained in the part of the part identifier i at time t, and w _{i, j, t} is the j-th contained in the part of the part identifier i at time t. Represents the weight given to the voxel of. In general, when a three-dimensional region is restored by using the visual volume crossing method, a region larger than the region actually occupied by the human body is restored as a three-dimensional region. Therefore, here, the weighted 3D model is modified by deleting voxels v (i, j, t) that are not included in the actual area occupied by the human body (that is, unnecessary).

基準重み付き三次元モデルにおける部位識別子iの部位に含まれるボクセルの数をN_iとする。なお、基準重み付き三次元モデルは、基準時刻t_s(t_sは1以上の所定の整数)における重み付き三次元モデルである。基準時刻t_sは、どのように選んでもよい。例えば、ユーザが別途指定するのでもよい。ただし、図１(b)に示すような、カメラから観測できない領域が極力小さい時刻であるのが望ましい。 Let N _i be the number of voxels contained in the site of the site identifier i in the reference weighted 3D model. The reference weighted three-dimensional model is a weighted three-dimensional model at the reference time t _s (t _s is a predetermined integer of 1 or more). The reference time t _s may be selected in any way. For example, the user may specify it separately. However, it is desirable that the area that cannot be observed by the camera as shown in Fig. 1 (b) is as small as possible.

なお、基準重み付き三次元モデルが記録部１９０に記録されていない場合、以下の処理は実行しない。つまり、t≦t_sの場合、重み付き三次元モデル修正部１４２は、Ｓ１３０で生成した重み付き三次元モデルをそのまま修正済み重み付き三次元モデルとして出力する。一方、t>t_sの場合、重み付き三次元モデル修正部１４２は、以下説明する方法により修正した重み付き三次元モデルを修正済み重み付き三次元モデルとして出力する。 If the reference weighted three-dimensional model is not recorded in the recording unit 190, the following processing is not executed. That is, when t ≦ t _s , the weighted three-dimensional model correction unit 142 outputs the weighted three-dimensional model generated in S130 as it is as a corrected weighted three-dimensional model. On the other hand, when t> t _s , the weighted three-dimensional model modification unit 142 outputs the weighted three-dimensional model modified by the method described below as a modified weighted three-dimensional model.

時刻tにおける重み付き三次元モデルにおいて、部位識別子iの部位に含まれるボクセルの数N_i,tは、N_i,t>N_iを満たすものとする。時刻tにおけるx個のボクセルを含む部位識別子iの部位の部分集合をV(i, x, t)と表すことにする。このとき、次式により計算される時刻tにおける部位識別子iの部位での誤差e_i,tを最小にする部分集合V(i, N_i, t)を求める。 In the weighted three-dimensional model at time t, the number of voxels N _{i, t} contained in the part of the part identifier i shall satisfy N _{i, t} > N _i . Let V (i, x, t) represent the subset of the part of the part identifier i containing x voxels at time t. At this time, the subset V (i, N _i , t) that minimizes the error e _{i, t} at the part of the part identifier i at the time t calculated by the following equation is obtained.

ここで、||・||は、２つの位置座標間の距離を表す。距離||・||には、任意の距離を用いることができる。また、f(V(i, N_i, t))は、次式により計算される、部分集合V(i, N_i, t)の部位三次元重心位置を表す。 Here, || · || represents the distance between the two position coordinates. Any distance can be used for the distance || · ||. Further, f (V (i, N _i , t)) represents the position of the three-dimensional center of gravity of the part of the subset V (i, N _i , t) calculated by the following equation.

ここで、上式の左辺のΣは、{1, …, N_i,t}からN_i個の整数を選んだときの和を表す。 Here, Σ on the left side of the above equation represents the sum when N _i integers are selected from {1,…, N _{i, t} }.

なお、誤差e_i,tを最小にするV(i, N_i, t)を求める方法はどのような方法であってもよい。{1, …, N_i}からN_i個の整数を選び出した組み合わせすべてに対して誤差e_i,tを計算して、最小値をとるN_i個のボクセルを決定してもよいし、所定の回数だけN_i個の整数をランダムに選び出した組合せに対して誤差e_i,tを計算して、その中で最小値をとるN_i個のボクセルを決定してもよい。このようにして決定した、最小値をとるN_i個のボクセルを改めて部位識別子iの部位とする。 Any method may be used to obtain V (i, N _i , t) that minimizes the error e _{i, t} . The error e _{i, t} may be calculated for all combinations of N _i integers selected from {1,…, N _i } to determine the N _i voxels that take the minimum value. The error e _{i, t} may be calculated for a combination in which N _i integers are randomly selected for the number of times of, and the N _i voxels having the minimum value may be determined. The N _i voxels having the minimum value determined in this way are used as the site of the site identifier i again.

また、N_i,t≦N_iとなる場合は、N_i,t個のボクセルを改めて部位識別子iの部位とする。 If N _{i, t} ≤ N _i , then N _{i, t} voxels are used as the site of the site identifier i.

以上のようにして決定した部位識別子iの部位に含まれるボクセルに対応する、重み付きボクセルデータの集合を時刻tにおける修正済み重み付き三次元モデルとする。 Let the set of weighted voxel data corresponding to the voxels contained in the part of the part identifier i determined as described above be a corrected weighted three-dimensional model at time t.

三次元重心位置計算部１４３は、Ｓ１４２で生成した時刻tにおける修正済み重み付き三次元モデルを入力とし、時刻tにおける三次元重心位置を計算し、出力する（Ｓ１４３）。時刻tにおける三次元重心位置COM(t)は、次式により計算される。 The three-dimensional center of gravity position calculation unit 143 takes the corrected weighted three-dimensional model at time t generated in S142 as an input, calculates the three-dimensional center of gravity position at time t, and outputs it (S143). The three-dimensional center of gravity position COM (t) at time t is calculated by the following equation.

ここで、N_allはＳ１４２で生成した修正済み重み付き三次元モデルに含まれるボクセルの数である。 Here, N _all is the number of voxels included in the modified weighted 3D model generated in S142.

本実施形態の発明によれば、カメラから観測できない領域が存在する場合であっても、精度よく、多視点映像から人体の三次元重心位置を推定することが可能となる。 According to the invention of the present embodiment, it is possible to accurately estimate the position of the three-dimensional center of gravity of the human body from the multi-viewpoint image even when there is a region that cannot be observed by the camera.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ－ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The device of the present invention is, for example, as a single hardware entity, an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Communication unit, CPU (Central Processing Unit, cache memory, registers, etc.) to which can be connected, RAM and ROM as memory, external storage device as hard hardware, and input, output, and communication units of these. , CPU, RAM, ROM, has a bus connecting so that data can be exchanged between external storage devices. Further, if necessary, a device (drive) or the like capable of reading and writing a recording medium such as a CD-ROM may be provided in the hardware entity. As a physical entity equipped with such hardware resources, there is a general-purpose computer or the like.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program required to realize the above-mentioned functions and data required for processing of this program (not limited to the external storage device, for example, reading a program). It may be stored in a ROM, which is a dedicated storage device). Further, the data obtained by the processing of these programs is appropriately stored in a RAM, an external storage device, or the like.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in the external storage device (or ROM, etc.) and the data required for processing of each program are read into the memory as needed, and are appropriately interpreted and executed and processed by the CPU. .. As a result, the CPU realizes a predetermined function (each configuration requirement represented by the above, ... Department, ... means, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be appropriately modified without departing from the spirit of the present invention. Further, the processes described in the above-described embodiment are not only executed in chronological order according to the order described, but may also be executed in parallel or individually depending on the processing capacity of the device that executes the processes or if necessary. ..

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing function in the hardware entity (device of the present invention) described in the above embodiment is realized by the computer, the processing content of the function that the hardware entity should have is described by the program. Then, by executing this program on the computer, the processing function in the above hardware entity is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ－ＲＡＭ（Random Access Memory）、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ－Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ－ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing content can be recorded on a computer-readable recording medium. The recording medium that can be read by a computer may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like. Specifically, for example, a hard disk device, a flexible disk, a magnetic tape or the like as a magnetic recording device, and a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) as an optical disk. Memory), CD-R (Recordable) / RW (ReWritable), etc., MO (Magneto-Optical disc), etc. as an optical magnetic recording medium, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. as a semiconductor memory. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ－ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 Further, the distribution of this program is performed, for example, by selling, transferring, renting, or the like a portable recording medium such as a DVD or a CD-ROM in which the program is recorded. Further, the program may be stored in the storage device of the server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記憶装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first temporarily stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when the process is executed, the computer reads the program stored in its own storage device and executes the process according to the read program. Further, as another execution form of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer. You may execute the process according to the received program one by one each time. In addition, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and the result acquisition without transferring the program from the server computer to this computer. May be. The program in this embodiment includes information used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property that regulates the processing of the computer, etc.).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, in this form, the hardware entity is configured by executing a predetermined program on the computer, but at least a part of these processing contents may be realized in terms of hardware.

Claims

Three-dimensional coordinates of the positions of voxels that form a three-dimensional area that represents the area occupied by the person's body at time t from multi-viewpoint images taken by multiple cameras installed so as to surround the person as the subject. A 3D model generator that generates a set of voxel data including at least voxel position coordinates as a 3D model at time t,
From the multi-viewpoint image, a three-dimensional joint position estimation unit that estimates three-dimensional coordinates representing the joint position of the person at time t as joint position coordinates at time t, and
From the three-dimensional model at the time t and the joint position coordinates at the time t, the part identifier for identifying the part of the human body including the voxel for the voxel constituting the three-dimensional region and the weight for the voxel are obtained. A weighted 3D model generator that generates a set of weighted voxel data including at least the voxel position coordinates, the site identifier, and the weight as a weighted 3D model at time t by determining.
The position of the three-dimensional center of gravity of the part is defined as the weighted average of the parts of the body of the person, and the position of the three-dimensional center of gravity is defined as the weighted average of the body of the person.
The weighted 3D model at time t, the part 3D center of gravity position of the part identifier i (i = 1, ..., N _p , N _p are the number of parts of the human body) at the time before time t and the predetermined time. A center of gravity position estimation device including a three-dimensional center of gravity position estimation unit that estimates the position of the three-dimensional center of gravity at time t from the weighted three-dimensional model (hereinafter referred to as a reference weighted three-dimensional model).

The center of gravity position estimation device according to claim 1.
The three-dimensional center of gravity position estimation unit is
The part 3D center of gravity position estimation that estimates the part 3D center of gravity position of the part identifier i at time t from the part 3D center of gravity position of the part identifier i (i = 1, ..., N _p ) at the time before the time t. Department and
Corrected weight at time t from the part 3D center of gravity position of the part identifier i (i = 1, ..., N _p ) at time t, the reference weighted 3D model, and the weighted 3D model at time t. With a weighted 3D model modifier to generate a 3D model with
A center of gravity position estimation device including a three-dimensional center of gravity position calculation unit that calculates the three-dimensional center of gravity position at the time t from the corrected weighted three-dimensional model at the time t.

The center of gravity position estimation device according to claim 2.
C3D (i, t) is the position of the part three-dimensional center of gravity of the part identifier i at the time t, N _i is the number of voxels contained in the part of the part identifier i in the reference weighted three-dimensional model, V (i, N _i) . Let t) be a subset of the part of the part identifier i containing N _i voxels at time t.
The weighted three-dimensional model correction part is
Find the subset V (i, N _i , t) that minimizes the error e _{i, t} at the part of the part identifier i at time t calculated by the following equation.

(However, f (V (i, N _i , t)) is the position of the three-dimensional center of gravity of the subset V (i, N _i , t).)
Generate a set of weighted voxel data of voxels contained in the minimized subset V (i, N _i , t) (i = 1, ..., N _p ) as a modified weighted 3D model at time t. A device for estimating the position of the center of gravity.

The center of gravity position estimation device according to claim 2.
The site three-dimensional center of gravity position estimation unit is
By applying a time series filter to the part three-dimensional center of gravity position of the part identifier i (i = 1, ..., N _p ) at the time before the time t, the part three-dimensional of the part identifier i at the time t. A center of gravity position estimation device characterized by estimating the position of the center of gravity.

A box cell that constitutes a three-dimensional area that represents the area occupied by the person's body at time t from multi-viewpoint images taken by a plurality of cameras installed so as to surround the person to be the subject. A three-dimensional model generation step that generates a set of boxel data including at least the boxel position coordinates, which are the three-dimensional coordinates of the position, as a three-dimensional model at time t.
A three-dimensional joint position estimation step in which the center of gravity position estimation device estimates three-dimensional coordinates representing the position of the person's joint at time t as joint position coordinates at time t from the multi-viewpoint image.
A part for the center of gravity position estimation device to identify a part of the human body including the voxel with respect to the voxel constituting the three-dimensional region from the three-dimensional model at the time t and the joint position coordinates at the time t. A weighted 3D model generation that generates a set of weighted voxel data including at least the boxel position coordinates, the part identifier, and the weight as a weighted 3D model at time t by determining the identifier and the weight for the boxel. Steps and
The position of the three-dimensional center of gravity of the part is defined as the weighted average of the parts of the body of the person, and the position of the three-dimensional center of gravity is defined as the weighted average of the body of the person.
The center of gravity position estimation device uses the weighted three-dimensional model at time t and the part tertiary of the part identifiers i (i = 1, ..., N _p , N _p are the number of parts of the human body) at times before time t. A method for estimating the position of the center of gravity, which includes a step of estimating the position of the center of gravity of the three dimensions at time t from the position of the original center of gravity and a weighted three-dimensional model at a predetermined time (hereinafter referred to as a reference weighted three-dimensional model). ..

A program for operating a computer as the center of gravity position estimation device according to any one of claims 1 to 4.