JP2009048305A

JP2009048305A - Shape analysis program and shape analysis apparatus

Info

Publication number: JP2009048305A
Application number: JP2007211899A
Authority: JP
Inventors: Munenori Ukita; 宗伯浮田; Ryosuke Tsuji; 良介辻; Masatsugu Kidode; 正継木戸出
Original assignee: Nara Institute of Science and Technology NUC
Current assignee: Nara Institute of Science and Technology NUC
Priority date: 2007-08-15
Filing date: 2007-08-15
Publication date: 2009-03-05

Abstract

<P>PROBLEM TO BE SOLVED: To specify a metamere section of an observation object represented by each voxel of volume data generated by a volume intersection method in real time and with high precision. <P>SOLUTION: An input volume data generation unit 50 generates input volume data that is the volume data of the observation object having the same multiple metamere sections as those of a reference object by the volume intersection method every time a frame is obtained by a camera 11 to 1 m. A search unit 60 projects the input volume data generated by the input volume data generation unit 50 onto learning space, and searches a nearest neighbor reference projection point that is a reference point located nearest to a project point of the input volume data. Based on a label attached to the reference volume corresponding to the nearest neighbor reference projection point searched by the search unit 60, it is specified a metamere section to which each voxel of the input volume data belongs. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数の体節部からなる観測対象物のボリュームデータの各ボクセルがどの体節部に属するかをリアルタイムに特定することができる形状解析プログラム及び形状解析装置に関するものである。 The present invention relates to a shape analysis program and a shape analysis apparatus that can specify in real time which segmental part each voxel of volume data of an observation object composed of a plurality of body segment parts belongs to.

近年、人体の姿勢・運動情報の獲得することで、ジェスチャ等の動作を入力としたインターフェイス、ヒューマン・ロボットインタラクション、観測人物の位置・姿勢に応じた撮影のためにカメラを制御する能動観測、スポーツ・伝統技能などの自動撮影・学習支援システム等の様々なシステムを実現することが期待されている。環境における人体の運動情報獲得のためには、着衣を含めた身体部位の特定が要素技術として必要となる。 In recent years, by acquiring information on the posture and movement of the human body, an interface that uses gestures and other inputs as input, human-robot interaction, active observation that controls the camera for shooting according to the position and posture of the observed person, sports -It is expected to realize various systems such as automatic shooting / learning support system for traditional skills. In order to acquire human body movement information in the environment, identification of body parts including clothing is required as elemental technology.

ルーズな着衣を含んだ姿勢推定やその形状解析を行う手法として、非特許文献１には、各体節の形状を剛体ではなくパラメトリックな空間分布として表現し、各体節の体積保存や体節ラベルの空間的連続性を制約条件として各体節形状の最適パラメータを推定する技術が開示されている。 As a method for posture estimation including loose clothing and its shape analysis, Non-Patent Document 1 describes the shape of each body segment as a parametric spatial distribution instead of a rigid body. A technique for estimating optimum parameters of each body segment shape by using the spatial continuity of the label as a constraint is disclosed.

また、非特許文献２には、異なる視点間で対応画素の色の同一性をチェックして立体物の形状を復元するSpaceCarving法と呼ばれる技術が開示されている。
藤田武史、向川康博、尺長健，“多視点カメラシステムによる舞踊動作の獲得と解析” 情処研報、CVIM-2002-132，pp.95.102，2002. K. N. Kutulakos, S. M. Seitz, “A Theory of Shape by Space Carving,” IJCV, Vol.38, No.3, pp.199.218, 2000. Non-Patent Document 2 discloses a technique called a SpaceCarving method that checks the identity of the color of corresponding pixels between different viewpoints and restores the shape of a three-dimensional object.
Takefuji Fujita, Yasuhiro Mukakawa, Ken Shaku Naga, “Acquisition and analysis of dance movements with multi-view camera system” Information Processing Research Report, CVIM-2002-132, pp.95.102, 2002. KN Kutulakos, SM Seitz, “A Theory of Shape by Space Carving,” IJCV, Vol.38, No.3, pp.199.218, 2000.

しかしながら、非特許文献１の手法では、最適パラメータを推定するためになされる反復計算量が膨大であるため、リアルタイム処理を行うことは困難であり、また、パラメトリックモデルでは形状表現に限界があるため高精度に体節形状を算出することができない。また、非特許文献２の手法では、立体物をリアルに表現することは可能であるが、計算量が膨大であるため、リアルタイム処理を行うことは事実上不可能である。一方、複数のカメラで立体物を撮影して立体物を復元する視体積交差法を用いれば立体物をリアルタイムで復元することはできるが、凹部が偽のボリュームデータによって再現されるため、数多くの凹部が含まれる着衣を含めた身体部位を高精度に再現することは困難である。 However, in the method of Non-Patent Document 1, it is difficult to perform real-time processing because the amount of iterative calculation performed to estimate the optimal parameter is enormous, and the shape representation is limited in the parametric model. The segmental shape cannot be calculated with high accuracy. In the method of Non-Patent Document 2, it is possible to represent a three-dimensional object realistically, but since the amount of calculation is enormous, it is practically impossible to perform real-time processing. On the other hand, using the visual volume intersection method that restores a three-dimensional object by photographing the three-dimensional object with a plurality of cameras, the three-dimensional object can be restored in real time. It is difficult to accurately reproduce the body part including the clothes including the recesses.

本発明の目的は、視体積交差法により生成されたボリュームデータの各ボクセルが観測対象物のどの体節部を表しているかをリアルタイム、かつ精度よく特定することができる形状解析装置及び形状解析プログラムを提供することである。 An object of the present invention is to provide a shape analysis apparatus and a shape analysis program capable of accurately identifying in real time which segmental part of an observation object each voxel of volume data generated by the visual volume intersection method represents. Is to provide.

本発明による形状解析プログラムは、複数の体節部からなる基準対象物の動画像から、各ボクセルがどの体節部に属するかを示すラベルが付された前記基準対象物のボリュームデータである基準ボリュームデータをフレーム毎に生成する基準ボリュームデータ生成手段と、前記基準ボリュームデータ生成手段によりフレーム毎に生成された複数の基準ボリュームデータを、前記基準ボリュームデータのボクセル数よりも低い所定次元の学習空間に投影することで得られる複数の基準投影点列を学習データとして生成する学習データ生成手段と、前記基準ボリュームデータ生成手段により生成された基準ボリュームデータと、前記学習データ生成手段により生成された学習データとを予め記憶する学習データ記憶手段と、前記基準対象物と同じ複数の体節部からなる観測対象物の動画像を撮影する複数の撮影手段と、前記複数の撮影手段によりフレームが取得される毎に、視体積交差法を用いて前記観測対象物のボリュームデータである入力ボリュームデータを生成する入力ボリュームデータ生成手段と、前記入力ボリュームデータ生成手段により生成された入力ボリュームデータを前記学習空間に投影し、前記入力ボリュームデータの投影点に対して最近傍に位置する基準投影点である最近傍基準投影点を前記学習空間内で探索する探索手段と、前記探索手段により探索された最近傍基準投影点に対応する基準ボリュームデータの各ボクセルに付されたラベルを基に、前記入力ボリュームデータの各ボクセルがどの体節部に属するかを特定する特定手段としてコンピュータを機能させることを特徴とする。 The shape analysis program according to the present invention is a reference that is volume data of the reference object to which a label indicating which body segment belongs to each voxel from a moving image of the reference object composed of a plurality of body segments. Reference volume data generation means for generating volume data for each frame, and a plurality of reference volume data generated for each frame by the reference volume data generation means, a learning space of a predetermined dimension lower than the number of voxels of the reference volume data Learning data generating means for generating, as learning data, a plurality of reference projection point sequences obtained by projecting onto the reference volume, reference volume data generated by the reference volume data generating means, and learning generated by the learning data generating means Learning data storage means for storing data in advance and the same as the reference object A plurality of photographing means for photographing a moving image of an observation object consisting of a number of body segments, and volume data of the observation object using a visual volume intersection method every time a frame is acquired by the plurality of photographing means Input volume data generating means for generating the input volume data, and projecting the input volume data generated by the input volume data generating means onto the learning space, and being positioned closest to the projection point of the input volume data Search means for searching the nearest reference projection point, which is a reference projection point to be searched, in the learning space, and labels attached to each voxel of reference volume data corresponding to the nearest reference projection point searched by the search means Based on this, the computer functions as a specifying means for specifying to which body segment each voxel of the input volume data belongs. And wherein the Rukoto.

また、本発明による形状解析プログラムは、複数の体節部からなる基準対象物の動画像から、各ボクセルがどの体節部に属するかを示すラベルが付された前記基準対象物のボリュームデータである基準ボリュームデータをフレーム毎に生成する基準ボリュームデータ生成手段と、前記基準ボリュームデータ生成手段によりフレーム毎に生成された複数の基準ボリュームデータを、前記基準ボリュームデータのボクセル数よりも低い所定次元の学習空間に投影することで得られる複数の基準投影点列を学習データとして生成する学習データ生成手段と、前記基準ボリュームデータ生成手段により生成された基準ボリュームデータと、前記学習データ生成手段により生成された学習データとを予め記憶する学習データ記憶手段と、前記基準対象物と同じ複数の体節部からなる観測対象物の動画像を撮影する複数の撮影手段と、前記複数の撮影手段によりフレームが取得される毎に、視体積交差法を用いて前記観測対象物のボリュームデータである入力ボリュームデータを生成する入力ボリュームデータ生成手段と、前記入力ボリュームデータ生成手段により生成された入力ボリュームデータを前記学習空間に投影し、前記入力ボリュームデータの投影点に対して最近傍に位置する基準投影点である最近傍基準投影点を前記学習空間内で探索する探索手段と、前記探索手段により探索された最近傍基準投影点に対応する基準ボリュームデータの各ボクセルに付されたラベルを基に、前記入力ボリュームデータの各ボクセルがどの体節部に属するかを特定する特定手段とを備えることを特徴とする。 Further, the shape analysis program according to the present invention is volume data of the reference object to which a label indicating which body segment part each voxel belongs is obtained from a moving image of the reference object composed of a plurality of body part parts. A reference volume data generating means for generating a certain reference volume data for each frame, and a plurality of reference volume data generated for each frame by the reference volume data generating means, having a predetermined dimension lower than the number of voxels of the reference volume data; Learning data generating means for generating a plurality of reference projection point sequences obtained by projecting onto a learning space as learning data, reference volume data generated by the reference volume data generating means, and generated by the learning data generating means Learning data storage means for previously storing the learned data, and the reference object A plurality of photographing means for photographing a moving image of the observation object composed of the same plurality of body segments, and a volume of the observation object using a visual volume intersection method every time a frame is acquired by the plurality of photographing means. Input volume data generating means for generating input volume data as data, and projecting the input volume data generated by the input volume data generating means onto the learning space, and being closest to the projection point of the input volume data Search means for searching the nearest reference projection point, which is a reference projection point located in the learning space, and a label attached to each voxel of the reference volume data corresponding to the nearest reference projection point searched by the search means And a specifying means for specifying to which body segment each voxel of the input volume data belongs. To.

これらの構成によれば、観測対象物の動画像からフレーム毎に生成された基準ボリュームデータが学習空間に投影され、投影点である基準投影点列からなる学習データが予め生成されている。また、撮影手段により観測対象物が撮影されたフレームが取得される毎に、視体積交差法を用いて仮想３次元空間内に基準ボリュームデータと同一ボクセル数の入力ボリュームデータが生成され、この入力ボリュームデータが学習空間に投影され、入力ボリュームデータの投影点に対して最近傍に位置する基準投影点である最近傍基準投影点が探索され、最近傍基準投影点に対応する基準ボリュームデータの各ボクセルに付された体節部の部位を表すラベルに基づいて、入力ボリュームデータの各ボクセルがどの体節部に属しているのかが特定される。 According to these configurations, the reference volume data generated for each frame from the moving image of the observation object is projected onto the learning space, and learning data including a reference projection point sequence that is a projection point is generated in advance. Further, every time a frame in which the observation object is photographed by the photographing means is acquired, input volume data having the same number of voxels as the reference volume data is generated in the virtual three-dimensional space using the visual volume intersection method. The volume data is projected onto the learning space, the nearest reference projection point that is the nearest reference projection point to the projection point of the input volume data is searched, and each of the reference volume data corresponding to the nearest reference projection point is searched. Based on the label representing the part of the body segment attached to the voxel, it is specified to which body segment each voxel of the input volume data belongs.

ここで、学習空間の次元数が入力ボリュームデータのボクセル数より低いため、入力ボリュームデータを学習空間に投影すると、入力ボリュームデータのボクセル数が圧縮される結果、最近傍基準点の探索処理を高速に行うことができる。また、入力ボリュームデータは視体積交差法により生成されているため、撮影手段により撮影された動画像から入力ボリュームデータを高速に生成することができる。従って、入力ボリュームデータの各ボクセルがどの体節部に属するかを特定する処理をリアルタイムに行うことができる。 Here, because the number of dimensions in the learning space is lower than the number of voxels in the input volume data, when the input volume data is projected onto the learning space, the number of voxels in the input volume data is compressed, resulting in faster search processing for the nearest reference point. Can be done. Further, since the input volume data is generated by the visual volume intersection method, the input volume data can be generated at high speed from the moving image captured by the imaging means. Therefore, it is possible to perform real-time processing for specifying which segmental section each voxel of the input volume data belongs to.

また、学習データとなる基準投影点は、観測対象物と同じ複数の体節部からなる基準対象物の基準ボリュームデータが投影されたものであるため、最近傍基準投影点に対応する基準ボリュームデータは、入力ボリュームデータの姿勢と類似しており、入力ボリュームデータの各ボクセルがどの体節部を表しているかを精度よく特定することができる。 In addition, since the reference projection point serving as the learning data is a projection of the reference volume data of the reference object composed of the same body segment as the observation object, the reference volume data corresponding to the nearest reference projection point Is similar to the posture of the input volume data, and can accurately specify which segmental part each voxel of the input volume data represents.

更に、姿勢が類似する基準ボリュームデータに付されたラベルを用いて入力ボリュームデータにおける体節部が特定されているため、基準対象物及び観測対象物としてゆとりのある衣服を着用した人物を採用したとしても、精度良く体節部を特定することができる。 Furthermore, since the body segment in the input volume data is specified using the label attached to the reference volume data with a similar posture, a person wearing loose clothing as the reference object and observation object was adopted. However, the body segment can be identified with high accuracy.

また、前記学習データ記憶手段は、前記入力ボリュームデータ生成手段により生成された入力ボリュームデータに含まれる偽のボリュームデータを除去するための信頼度を予め記憶し、前記探索手段は、前記信頼度を用いて前記入力ボリュームデータを修正し、修正した入力ボリュームデータを前記学習空間に投影することが好ましい。 The learning data storage means stores in advance a reliability for removing false volume data included in the input volume data generated by the input volume data generation means, and the search means stores the reliability. It is preferable that the input volume data is corrected by using and the corrected input volume data is projected onto the learning space.

この構成によれば、偽ボリュームを除去するための信頼度を用いて入力ボリュームデータが修正されているため、視体積交差法を採用する以上、入力ボリュームデータに発生する偽のボリュームを減少させることが可能となる。 According to this configuration, since the input volume data is corrected using the reliability for removing the false volume, the false volume generated in the input volume data can be reduced as long as the visual volume intersection method is employed. Is possible.

また、前記基準ボリュームデータ生成手段は、視体積交差法により生成されたボリュームデータを修正することで前記基準ボリュームデータを生成し、前記基準ボリュームデータ生成手段により生成された基準ボリュームデータを仮想３次元空間内に配置し、複数の仮想カメラと前記基準ボリュームデータとの位置関係を変化させながら、前記複数の仮想カメラで前記基準ボリュームデータを撮影することで得られた２次元画像から視体積交差法を用いて前記位置関係毎の簡略ボリュームデータを生成する簡略ボリュームデータ生成手段と、各簡略ボリュームデータと前記基準ボリュームデータとを比較することで、前記簡略ボリュームデータにおける偽のボリュームデータの発生箇所を特定し、前記信頼度を生成する信頼度生成手段としてコンピュータを更に機能させることが好ましい。 Further, the reference volume data generation means generates the reference volume data by correcting the volume data generated by the visual volume intersection method, and the reference volume data generated by the reference volume data generation means is virtual three-dimensional. Visible volume intersection method from a two-dimensional image obtained by photographing the reference volume data with the plurality of virtual cameras while changing the positional relationship between the plurality of virtual cameras and the reference volume data. The simplified volume data generating means for generating simplified volume data for each positional relationship using the above, and comparing each simplified volume data with the reference volume data, the occurrence location of false volume data in the simplified volume data is determined. As a reliability generation means for identifying and generating the reliability It is preferable to further function computer.

この構成によれば、視体積交差法により生成されたボリュームデータが修正されて基準ボリュームデータが生成されているため、基準ボリュームデータは、偽ボリュームを含むことなく基準対象物を表すことになる。そして、この基準ボリュームデータが仮想３次元空間内で配置され、複数の仮想カメラで撮影され、得られた２次元画像から視体積交差法を用いて、簡略ボリュームデータが生成され、仮想カメラと基準ボリュームデータとの位置関係が変化されつつ、位置関係毎の簡略ボリュームデータが生成されている。 According to this configuration, since the volume data generated by the visual volume intersection method is corrected and the reference volume data is generated, the reference volume data represents the reference object without including the false volume. Then, the reference volume data is arranged in the virtual three-dimensional space, photographed by a plurality of virtual cameras, and simplified volume data is generated from the obtained two-dimensional image using the view volume intersection method. The simplified volume data for each positional relationship is generated while the positional relationship with the volume data is changed.

ここで、視体積交差法によって表される偽ボリュームデータは、基準対象物と複数のカメラとの位置関係とに応じて多く表れたり少なく表れたりする。一方、基準ボリュームデータは基準対象物そのものを表していると考えられるため、基準ボリュームデータと位置関係毎の簡略ボリュームデータとを比較すれば、簡略ボリュームデータの各ボクセルにおいて偽ボリュームが多く発生しているボクセルとあまり発生していないボクセルとを特定することができる。そして、信頼度はこのようにして特定された偽ボリュームに基づいて作成されているため、入力ボリュームデータのどのボクセルが偽ボリュームデータであるかを表すことが可能となる。 Here, the false volume data represented by the visual volume intersection method appears more or less depending on the positional relationship between the reference object and the plurality of cameras. On the other hand, since the reference volume data is considered to represent the reference object itself, if the reference volume data and the simplified volume data for each positional relationship are compared, many false volumes are generated in each voxel of the simplified volume data. Voxels that are present and those that do not occur much can be identified. Since the reliability is created based on the fake volume specified in this way, it is possible to indicate which voxels of the input volume data are fake volume data.

また、前記探索手段は、前記学習空間を一定サイズの部分領域に区切り、前記入力ボリュームデータの投影点が属する注目部分領域内において前記最近傍基準投影点を探索し、前記注目部分領域内において前記基準投影点が存在しない場合、前記基準投影点が検出されるように前記部分領域のサイズを拡大させることが好ましい。 Further, the search means divides the learning space into partial areas of a certain size, searches for the nearest reference projection point in a target partial area to which a projection point of the input volume data belongs, and in the target partial area When the reference projection point does not exist, it is preferable to enlarge the size of the partial region so that the reference projection point is detected.

この構成によれば、最近傍基準投影点の探索範囲が入力ボリュームデータの投影点が属する注目部分領域内とされるため、探索範囲が狭まり最近傍基準投影点を高速に探索することができる。また、注目部分領域内において基準投影点が存在しない場合、基準投影点が検出されるように、部分領域のサイズが拡大されるため、注目部分領域内において基準投影点が存在しないことを回避することができる。 According to this configuration, since the search range of the nearest reference projection point is within the target partial area to which the projection point of the input volume data belongs, the search range is narrowed and the nearest reference projection point can be searched at high speed. In addition, when the reference projection point does not exist in the target partial area, the size of the partial area is enlarged so that the reference projection point is detected, so that the reference projection point does not exist in the target partial area. be able to.

また、前記探索手段は、前記注目部分領域内において前記入力ボリュームデータの投影点の最近傍に位置する基準投影点を検出し、検出した基準投影点及び前記入力ボリュームデータの投影点間の距離よりも、前記入力ボリュームデータの投影点までの距離が短い１又は複数の隣接部分領域を特定し、特定した隣接部分領域及び前記注目部分領域内を探索することで前記最近傍基準投影点を検出することが好ましい。 Further, the search means detects a reference projection point located closest to the projection point of the input volume data in the target partial region, and based on the distance between the detected reference projection point and the projection point of the input volume data. Also, the nearest reference projection point is detected by specifying one or a plurality of adjacent partial areas having a short distance to the projection point of the input volume data and searching the specified adjacent partial area and the target partial area. It is preferable.

この構成によれば、最近傍基準投影点が注目部分領域ではなく、隣接する部分領域に存在する場合であっても、最近傍基準投影点を探索することができる。 According to this configuration, it is possible to search for the nearest reference projection point even when the nearest reference projection point is not in the target partial region but in an adjacent partial region.

また、前記基準ボリュームデータ生成手段は、所定の低解像度でボクセルが配置され、かつ各ボクセルがどの体節部に属するかを示すラベルが付された前記基準対象物全体のボリュームデータである全身基準ボリュームデータと、前記低解像度よりも高解像度でボクセルが配置され前記基準対象物の各体節部のボリュームデータである複数の体節基準ボリュームデータとを前記基準ボリュームデータとして生成し、前記学習データ生成手段は、前記基準ボリュームデータ生成手段により生成された全身基準ボリュームデータ及び複数の体節基準ボリュームデータを、各々の学習空間である全身の学習空間及び各体節部の学習空間に投影することで、全身及び各体節部の学習データを生成し、前記入力ボリューデータ生成手段は、前記低解像度でボクセルが配置された前記観測対象物全体のボリュームデータである第１の全身入力ボリュームデータと、前記高解像度でボクセルが配置された前記観測対象物全体のボリュームデータである第２の全身入力ボリュームデータとを前記入力ボリュームデータとして生成し、前記探索手段は、前記第１の全身入力ボリュームデータを全身の学習空間に投影することで、前記全身基準ボリュームデータに対する前記第１の全身入力ボリュームデータの向きを推定し、推定した向きに前記第２の全身入力ボリュームデータを回転させ、前記全身基準ボリュームデータの各ボクセルに付されたラベルを基に、回転させた第２の全身入力ボリュームデータから前記観測対象物の各体節部のボリュームデータである複数の体節入力ボリュームデータを生成し、各体節入力ボリュームデータを対応する学習空間に投影することで、体節部毎の前記最近傍基準投影点を探索し、前記特定手段は、前記探索手段により探索された体節部毎の最近傍基準投影点に対応する体節基準ボリュームを基に、前記第２の全身入力ボリュームデータの各ボクセルがどの体節部に属するかを特定することが好ましい。 Further, the reference volume data generation means is a whole body reference that is volume data of the entire reference object in which voxels are arranged at a predetermined low resolution and a label indicating which body segment part each voxel belongs to is attached. Volume data and a plurality of body segment reference volume data that are volume data of each segment part of the reference object in which voxels are arranged at a higher resolution than the low resolution are generated as the reference volume data, and the learning data The generation unit projects the whole body reference volume data and the plurality of body segment reference volume data generated by the reference volume data generation unit onto each body learning space and each body part learning space. And learning data of the whole body and each body segment part, and the input volume data generation means includes the low resolution. First whole body input volume data which is volume data of the whole observation object in which voxels are arranged, and second whole body input volume data which is volume data of the whole observation object in which voxels are arranged at the high resolution As the input volume data, and the search means projects the first whole body input volume data onto a whole body learning space, whereby the direction of the first whole body input volume data relative to the whole body reference volume data And the second whole body input volume data is rotated in the estimated direction, and the observation is performed from the rotated second whole body input volume data based on the label attached to each voxel of the whole body reference volume data. Generate multiple body segment input volume data that is the volume data of each body segment of the object. Then, by projecting each body segment input volume data onto the corresponding learning space, the nearest reference projection point for each body segment is searched, and the specifying unit is configured to search for each body segment searched by the search unit. It is preferable to identify to which body segment each voxel of the second whole body input volume data belongs based on the body segment reference volume corresponding to the nearest reference projection point.

この構成によれば、基準対象物の全体を低解像度でボクセルが配置された全身基準ボリュームデータと、基準対象物の各体節部を高解像度でボクセルが配置された複数の体節基準ボリュームデータとが生成され、全身基準ボリュームデータ及び複数の体節基準ボリュームデータが、各々の学習空間である全身の学習空間及び各体節部の学習空間に投影されて全身及び各体節部の学習データが生成される。 According to this configuration, whole body reference volume data in which voxels are arranged at a low resolution over the entire reference object, and a plurality of body segment reference volume data in which voxels are arranged at a high resolution in each body segment of the reference object And the whole body reference volume data and the plurality of body segment reference volume data are projected on the whole body learning space and the learning space of each body segment, which are the learning spaces, and the learning data of the whole body and each body segment Is generated.

一方、観測対象物を複数の撮影手段により撮影することで、低解像度でボクセルが配置された第１の全身入力ボリュームデータと高解像度でボクセルが配置された第２の全身入力ボリュームデータとが生成され、第１の全身入力ボリュームデータを全身の学習空間に投影することで、全身基準ボリュームデータに対する第１の全身入力ボリュームデータの向きが推定される。ここで、第１の全身入力ボリュームデータは低解像度でボクセルが配置されているため、第１の全身入力ボリュームデータに姿勢が類似する全身基準ボリュームデータと、その全身基準ボリュームデータに対する第１の全身入力ボリュームデータの向きとを高速に特定することができる。 On the other hand, the first whole-body input volume data in which voxels are arranged at a low resolution and the second whole-body input volume data in which voxels are arranged at a high resolution are generated by photographing the observation object with a plurality of photographing means. Then, by projecting the first whole body input volume data onto the whole body learning space, the direction of the first whole body input volume data with respect to the whole body reference volume data is estimated. Here, since the first whole-body input volume data has voxels arranged at a low resolution, the whole-body reference volume data whose posture is similar to that of the first whole-body input volume data, and the first whole-body reference volume data corresponding to the first whole-body reference volume data The direction of the input volume data can be specified at high speed.

そして、推定された向きに第２の全身入力ボリュームデータが回転され、全身基準ボリュームデータの各ボクセルに付されたラベルを基に、回転された第２の全身入力ボリュームデータから観測対象物における各体節部の体節入力ボリュームデータが生成され、各体節入力ボリュームデータを対応する学習空間に投影することで、体節部毎の最近傍基準投影点が探索される。ここで、各体節入力ボリュームデータは、全身基準ボリュームデータと同じ向きに回転された高解像度の第２の全身入力ボリュームデータから、基準ボリュームデータに付されたラベルに基づいて生成されたものであるため、観測対象物の各体節部の位置を精度良く表している。 Then, the second whole body input volume data is rotated in the estimated direction, and based on the labels attached to the respective voxels of the whole body reference volume data, each of the observation object is determined from the rotated second whole body input volume data. The body segment input volume data of the body segment is generated, and each body segment input volume data is projected onto the corresponding learning space, whereby the nearest reference projection point for each body segment is searched. Here, each body segment input volume data is generated based on the label attached to the reference volume data from the high-resolution second whole body input volume data rotated in the same direction as the whole body reference volume data. Therefore, the position of each body segment of the observation object is accurately represented.

そして、探索された体節部毎の最近傍基準投影点に対応する体節基準ボリュームの各ボクセルに付されたラベルを基に、第２の全身入力ボリュームデータの各ボクセルがどの体節部に属するかが特定されるため、第２の全身入力ボリュームデータの各ボクセルがどの体節部に属しているかを高速かつ高精度に特定することができる。 Then, based on the label attached to each voxel of the body segment reference volume corresponding to the nearest reference projection point for each searched body segment part, to which body segment each voxel of the second whole body input volume data is located. Since it is specified whether it belongs, it is possible to specify at high speed and high accuracy to which segmental section each voxel of the second whole body input volume data belongs.

また、前記探索手段は、前記第１の全身入力ボリュームデータを仮想３次元空間内で所定方向を基準として所定角度回転ずつ回転させる毎に全身の学習空間に投影し、距離が最短となる前記第１の全身入力ボリュームデータの投影点と基準投影点とを探索することで、前記入力ボリュームデータの向きを推定することが好ましい。 The search means projects the first whole body input volume data onto the whole body learning space each time the first whole body input volume data is rotated by a predetermined angle rotation with reference to a predetermined direction in the virtual three-dimensional space, and the distance becomes the shortest. It is preferable to estimate the direction of the input volume data by searching for a projection point of one whole body input volume data and a reference projection point.

この構成によれば、第１の全身入力ボリュームデータは、仮想３次元空間内で所定方向を基準として所定角度回転ずつ回転させる毎に学習空間に投影され、距離が最短となる第１の全身入力ボリュームデータの投影点と基準投影点とが探索されているため、第１の全身入力ボリュームデータに最も類似する全身基準ボリュームデータを特定し、かつこの全身基準ボリュームデータに対する入力ボリュームデータの向きを特定することができる。 According to this configuration, the first whole body input volume data is projected onto the learning space each time it is rotated by a predetermined angle with reference to a predetermined direction in the virtual three-dimensional space, and the first whole body input having the shortest distance is projected. Since the projection point of the volume data and the reference projection point are searched, the whole body reference volume data most similar to the first whole body input volume data is specified, and the direction of the input volume data with respect to this whole body reference volume data is specified. can do.

また、前記学習データ生成手段及び前記探索手段は、主成分分析を用いて前記基準ボリュームデータ及び前記入力ボリュームデータを前記学習空間に投影することが好ましい。 The learning data generation means and the search means preferably project the reference volume data and the input volume data onto the learning space using principal component analysis.

この構成によれば、基準ボリュームデータ及び入力ボリュームデータは主成分分析を用いて学習空間に投影されているため、元の特徴を損なうことなくボクセル数を圧縮することができる。 According to this configuration, since the reference volume data and the input volume data are projected onto the learning space using the principal component analysis, the number of voxels can be compressed without impairing the original features.

また、前記基準ボリュームデータ生成手段は、体節部毎に異なる色が付された基準対象物の動画像を用いて、前記基準ボリュームデータの各ボクセルがどの体節部に属するかを特定することが好ましい。 Further, the reference volume data generation means specifies a body segment part to which each voxel of the reference volume data belongs using a moving image of a reference object with a different color for each body segment part. Is preferred.

この構成によれば、体節部毎に異なる色が付された基準対象物の動画像を用いて基準ボリュームデータに付すラベルが特定されているため、各体節部の位置を精度良く表す基準ボリュームデータを生成することができる。 According to this configuration, since the label attached to the reference volume data is specified using the moving image of the reference object with a different color for each body segment, a reference that accurately represents the position of each body segment Volume data can be generated.

また、前記基準対象物及び前記観測対象物は、ゆとりのある衣服を着用した人物であることが好ましい。 In addition, it is preferable that the reference object and the observation object are persons wearing comfortable clothes.

この構成によれば、ゆとりのある衣服を着用した人物を構成する体節部をリアルタイムかつ高精度に特定することができる。 According to this structure, the segment part which comprises the person who wore the clothes with a space can be specified in real time and with high precision.

本発明によれば、学習空間の次元数は入力ボリュームデータのボクセル数より低いため、入力ボリュームデータを学習空間に投影すると、入力ボリュームデータのボクセル数が圧縮される結果、最近傍基準点の探索処理を高速に行うことができる。また、入力ボリュームデータは視体積交差法により生成されているため、撮影手段により撮影された動画像から入力ボリュームデータを高速に生成することができる。従って、入力ボリュームデータの各ボクセルがどの体節部に属するかを特定する処理をリアルタイムに実現することが可能となる。 According to the present invention, since the number of dimensions in the learning space is lower than the number of voxels in the input volume data, when the input volume data is projected onto the learning space, the number of voxels in the input volume data is compressed, resulting in the search for the nearest reference point. Processing can be performed at high speed. Further, since the input volume data is generated by the visual volume intersection method, the input volume data can be generated at high speed from the moving image captured by the imaging means. Therefore, it is possible to realize in real time a process for specifying which segmental section each voxel of the input volume data belongs to.

また、学習データとなる基準投影点は、基準対象物と同じ複数の体節部からなる観測対象物の基準ボリュームデータが投影されたものであるため、最近傍基準投影点に対応する基準ボリュームデータは、入力ボリュームデータの姿勢と類似しており、入力ボリュームデータの各ボクセルがどの体節部を表しているかを精度よく特定することができる。 In addition, since the reference projection point serving as the learning data is a projection of the reference volume data of the observation target including the same body segment as the reference target, the reference volume data corresponding to the nearest reference projection point Is similar to the posture of the input volume data, and can accurately specify which segmental part each voxel of the input volume data represents.

以下、本発明の実施の形態による形状解析装置について説明する。図１は、本発明の実施の形態による形状解析装置のハードウェア構成を示すブロック図である。本形状解析装置は、通常のコンピュータ等から構成され、入力装置１、ＲＯＭ（リードオンリメモリ）２、ＣＰＵ（中央演算処理装置）３、ＲＡＭ（ランダムアクセスメモリ）４、外部記憶装置５、表示装置６、記録媒体駆動装置７、インターフェイス部（Ｉ／Ｆ）部９、及びｍ（ｍは２以上の整数）台のカメラ１１〜１ｍを備える。入力装置１、ＲＯＭ２、ＣＰＵ３、ＲＡＭ４、外部記憶装置５、表示装置６、記録媒体駆動装置７、及びＩ／Ｆ部９は内部のバスに接続され、このバスを介して種々のデータ等が入出され、ＣＰＵ３の制御の下、種々の処理が実行される。 Hereinafter, a shape analysis apparatus according to an embodiment of the present invention will be described. FIG. 1 is a block diagram showing a hardware configuration of a shape analysis apparatus according to an embodiment of the present invention. This shape analysis apparatus is composed of an ordinary computer or the like, and includes an input device 1, a ROM (read only memory) 2, a CPU (central processing unit) 3, a RAM (random access memory) 4, an external storage device 5, and a display device. 6, a recording medium driving device 7, an interface unit (I / F) unit 9, and m (m is an integer of 2 or more) cameras 11 to 1m. The input device 1, ROM 2, CPU 3, RAM 4, external storage device 5, display device 6, recording medium drive device 7, and I / F unit 9 are connected to an internal bus, and various data etc. are input / output through this bus. Various processes are executed under the control of the CPU 3.

入力装置１は、キーボード、マウス等から構成され、ユーザが種々のデータを入力するために使用される。ＲＯＭ２には、ＢＩＯＳ（Basic Input/Output System）等のシステムプログラムが記憶される。外部記憶装置５は、ハードディスクドライブ等から構成され、所定のＯＳ（Operating System）及び形状解析プログラム等が記憶される。ＣＰＵ３は、外部記憶装置５からＯＳ等を読み出し、各ブロックの動作を制御する。本実施の形態では、ＣＰＵ３として、例えばＯＰＴＥＲＯＮ１．５ＧＨｚを採用することができる。ＲＡＭ４は、ＣＰＵ３の作業領域等として用いられる。 The input device 1 includes a keyboard, a mouse, and the like, and is used by a user to input various data. The ROM 2 stores a system program such as BIOS (Basic Input / Output System). The external storage device 5 includes a hard disk drive and the like, and stores a predetermined OS (Operating System), a shape analysis program, and the like. The CPU 3 reads the OS and the like from the external storage device 5 and controls the operation of each block. In the present embodiment, for example, OPTERON 1.5 GHz can be adopted as the CPU 3. The RAM 4 is used as a work area for the CPU 3.

表示装置６は、液晶表示装置等から構成され、ＣＰＵ３の制御の下に種々の画像を表示する。記録媒体駆動装置７は、ＣＤ−ＲＯＭドライブ、フレキシブルディスクドライブ等から構成される。 The display device 6 is composed of a liquid crystal display device or the like, and displays various images under the control of the CPU 3. The recording medium driving device 7 includes a CD-ROM drive, a flexible disk drive, and the like.

なお、本形状解析プログラムは、ＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記録媒体８に格納されて市場に流通される。ユーザはこの記録媒体８を記録媒体駆動装置７に読み込ませることで、形状解析プログラムをコンピュータにインストールする。また、形状解析プログラムをインターネット上のサーバに格納し、このサーバからダウンロードすることで、形状解析プログラムをコンピュータにインストールしてもよい。 The shape analysis program is stored in a computer-readable recording medium 8 such as a CD-ROM and distributed to the market. The user installs the shape analysis program in the computer by causing the recording medium driving device 7 to read the recording medium 8. Alternatively, the shape analysis program may be installed in a computer by storing the shape analysis program in a server on the Internet and downloading it from this server.

Ｉ／Ｆ部９は、例えばＵＳＢインターフェイスから構成され、カメラ１〜１ｍと形状解析装置との入出力インターフェイスを行う。カメラ１〜１ｍは所定のフレームレートで動画像を撮影するカメラから構成され、同一時刻にフレームが撮影されるように同期が図られている。 The I / F unit 9 is composed of, for example, a USB interface, and performs an input / output interface between the cameras 1 to 1 m and the shape analysis apparatus. The cameras 1 to 1m are composed of cameras that capture moving images at a predetermined frame rate, and are synchronized so that frames are captured at the same time.

図２は、図１に示す形状解析装置の機能構成を示すブロック図である。図２に示すように形状解析装置は、ｍ台のカメラ１１〜１ｍ（撮影手段の一例）、基準ボリュームデータ生成部２０（基準ボリュームデータ生成手段の一例）、簡略ボリュームデータ生成部３０（簡略ボリュームデータ生成手段の一例）、学習データ生成部４０（学習データ生成手段の一例）、入力ボリュームデータ生成部５０（入力ボリュームデータ生成手段の一例）、探索部６０（探索手段の一例）、特定部７０（特定手段の一例）、信頼度生成部８０（信頼度生成手段の一例）、及び学習データ記憶部９０（学習データ記憶手段の一例）を備えている。なお、図２に示す基準ボリュームデータ生成部２０〜９０は、ＣＰＵ３が形状解析プログラムを実行することで実現されるが、これに限定されず、各ブロックを専用のハードウェア装置により構成してもよい。 FIG. 2 is a block diagram showing a functional configuration of the shape analysis apparatus shown in FIG. As shown in FIG. 2, the shape analysis apparatus includes m cameras 11 to 1m (an example of an imaging unit), a reference volume data generation unit 20 (an example of a reference volume data generation unit), and a simplified volume data generation unit 30 (an simplified volume). An example of a data generation unit), a learning data generation unit 40 (an example of learning data generation unit), an input volume data generation unit 50 (an example of input volume data generation unit), a search unit 60 (an example of search unit), and a specification unit 70 (An example of specifying means), a reliability generation unit 80 (an example of reliability generation means), and a learning data storage unit 90 (an example of learning data storage means). The reference volume data generation units 20 to 90 shown in FIG. 2 are realized by the CPU 3 executing the shape analysis program. However, the present invention is not limited to this, and each block may be configured by a dedicated hardware device. Good.

カメラ１１〜１ｍは、３次元の実空間の所定箇所に配置され、基準ボリュームデータ生成部２０及び入力ボリュームデータ生成部５０がボリュームデータを生成する際に用いる動画像を撮影する。 The cameras 11 to 1m are arranged at predetermined locations in a three-dimensional real space, and shoot moving images used when the reference volume data generation unit 20 and the input volume data generation unit 50 generate volume data.

基準ボリュームデータ生成部２０は、複数の体節部からなる基準対象物の動画像から、各ボクセルがどの体節部に属するかを示すラベルが付された基準対象物のボリュームデータである基準ボリュームデータをフレーム毎に生成する。 The reference volume data generation unit 20 is a reference volume that is volume data of a reference object with a label indicating which body segment each voxel belongs to from a moving image of the reference object composed of a plurality of body segments. Data is generated for each frame.

具体的には、基準ボリュームデータ生成部２０は、カメラ１１〜１ｍで撮影された基準対象物の動画像から視体積交差法によりボリュームデータを生成し、生成したボリュームデータを弾性メッシュ変形、又はスペースカービング（ＳｐａｃｅＣａｒｖｉｎｇ）といった手法を用いて修正することで基準ボリュームデータを生成する。なお、視体積交差法としては、「G. K. M. Cheung, T. Kanade, J.-Y. Bouguet, M. Holler, “A real time system for robust 3D voxel reconstruction of humanmotions,” CVPR2000, Vol.2, pp.714.720, 2000.」又は「X. Wu, O. Takizawa, and T. Matsuyama, “Parallel Pipeline Volume Intersection for Real-Time 3D Shape Reconstruction on a PC Cluster,” in Proc. of The 4th IEEE International Conference on Computer Vision Systems (ICVS), 2006.」に記載された手法を採用することができる。また、弾性メッシュ変形としては、「S. Nobuhara and T. Matsuyama, “Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video,” 3DPVT, 2006.」に記載された手法を採用することができる。 Specifically, the reference volume data generation unit 20 generates volume data from a moving image of a reference object photographed by the cameras 11 to 1m by a visual volume intersection method, and the generated volume data is transformed into an elastic mesh or a space. The reference volume data is generated by correcting using a method such as carving (Space Carving). The visual volume intersection method is described in “GKM Cheung, T. Kanade, J.-Y. Bouguet, M. Holler,“ A real time system for robust 3D voxel reconstruction of humanmotions, ”CVPR2000, Vol.2, pp. 714.720, 2000. "or" X. Wu, O. Takizawa, and T. Matsuyama, "Parallel Pipeline Volume Intersection for Real-Time 3D Shape Reconstruction on a PC Cluster," in Proc. Of The 4th IEEE International Conference on Computer Vision Systems (ICVS), 2006. ”can be employed. For elastic mesh deformation, the method described in “S. Nobuhara and T. Matsuyama,“ Deformable Mesh Model for Complex Multi-Object 3D Motion Estimation from Multi-Viewpoint Video, ”3DPVT, 2006.” should be adopted. Can do.

また、スペースカービングとしては、「K. N. Kutulakos, S. M. Seitz, “A Theory of Shape by Space Carving,” IJCV, Vol.38, No.3, pp.199.218, 2000.」に記載された手法を採用することができる。 For space carving, the method described in “KN Kutulakos, SM Seitz,“ A Theory of Shape by Space Carving, ”IJCV, Vol.38, No.3, pp.199.218, 2000.” should be adopted. Can do.

基準ボリュームデータとしては、所定の低解像度でボクセルが配置され、かつ各ボクセルがどの体節部に属するかを示すラベルが付された基準対象物全体のボリュームデータである全身基準ボリュームデータと、低解像度よりも高解像度でボクセルが配置された基準対象物の各体節部のボリュームデータである複数の体節基準ボリュームデータとが含まれる。 The reference volume data includes whole body reference volume data, which is volume data of the entire reference object, in which voxels are arranged at a predetermined low resolution, and a label indicating which body segment part each voxel belongs to, A plurality of body segment reference volume data that is volume data of each body segment of a reference object in which voxels are arranged at a resolution higher than the resolution is included.

基準対象物としては、人物が採用され、体節部としては、人物の頭部、胴体部、右上腕部、右下腕部、左上腕部、左下腕部、右上脚部、右下脚部、左上脚部、及び左下脚部の１０個の部位が採用される。したがって、全身基準ボリュームデータの各ボクセルには、上記１０個の体節部のうち、自己が属する体節部を示すラベルが付される。なお、上記体節部は一例であり、１１個以上の体節部を採用してもよいし、異なる体節部を採用してもよいし、上記の体節部のうちいずれかの体節部を省いてもよい。 As a reference object, a person is adopted, and as a body segment part, a person's head, torso, upper right arm, right lower arm, left upper arm, left lower arm, upper right leg, right lower leg, Ten parts of the upper left leg and the lower left leg are employed. Therefore, each voxel of the whole body reference volume data is labeled with a body segment to which the self belongs, among the 10 body segments. In addition, the said body segment part is an example, 11 or more body segment parts may be employ | adopted, a different body segment part may be employ | adopted, and any body segment among said body segment parts. Part may be omitted.

図３は、本実施の形態で採用されるボリュームデータを示した図であり、（ａ）は全身基準ボリュームデータを示し、（ｂ）は体節基準ボリュームデータを示している。図３（ａ）に示すように全身基準ボリュームデータは、仮想３次元空間において基準対象物Ｋ１の全体を囲むように配置された直方体状のバウンディングボックスＢ１内を格子状に区画する複数のボクセルＢＳ１の集合によって表される。基準対象物Ｋ１に位置するボクセルＢＳ１には、例えば「１」の符号が割り当てられ、基準対象物Ｋ１が位置しないボクセルＢＳ１には、例えば「０」の符号が割り当てられる。また、各ボクセルＢＳ１は、頭部に位置するボクセルＢＳ１には頭部を示すラベルが付され、右上腕部に位置するボクセルＢＳ１には右上腕部を示すラベルが付されるというように、自己が属する体節部を表すラベルが付されている。 FIG. 3 is a diagram showing volume data employed in the present embodiment, where (a) shows whole body reference volume data and (b) shows body segment reference volume data. As shown in FIG. 3A, the whole-body reference volume data includes a plurality of voxels BS1 that divide the inside of a rectangular parallelepiped bounding box B1 arranged so as to surround the entire reference object K1 in a virtual three-dimensional space. Represented by a set of For example, a code “1” is assigned to the voxel BS1 located in the reference object K1, and a code “0” is assigned to the voxel BS1 in which the reference object K1 is not located. Each voxel BS1 has a label indicating the head attached to the voxel BS1 located at the head, and a label indicating the upper right arm attached to the voxel BS1 located at the upper right arm. A label indicating the body segment to which is attached.

図３（ｂ）に示す体節基準ボリュームデータは、仮想３次元空間において基準対象物Ｋ１の１０個の体節部のそれぞれを囲むように配置された１０個のバウンディングボックスＢ２内を格子状に区画する複数のボクセルＢＳ２から構成されている。ボクセルＢＳ２もボクセルＢＳ１と同様、体節部に位置するボクセルＢＳ２には、「１」の符号が割り当てられ、体節部に位置しないボクセルＢＳ２には、「０」の符号が割り当てられる。ここで、全身基準ボリュームデータは、低解像度であり、体節基準ボリュームデータは高解像度であるため、ボクセルＢＳ１のサイズはボクセルＢＳ２のサイズよりも大きい。 The somite reference volume data shown in FIG. 3 (b) is a grid pattern in 10 bounding boxes B2 arranged so as to surround each of the 10 somite parts of the reference object K1 in the virtual three-dimensional space. It consists of a plurality of voxels BS2 to be partitioned. Similarly to voxel BS1, voxel BS2 is assigned a code of “1” to voxel BS2 located in the body segment, and a code of “0” is assigned to voxel BS2 not located in the body segment. Here, since the whole body reference volume data has a low resolution and the body segment reference volume data has a high resolution, the size of the voxel BS1 is larger than the size of the voxel BS2.

図２に戻り簡略ボリュームデータ生成部３０は、基準ボリュームデータ生成部２０により生成された全身基準ボリュームデータを仮想３次元空間内に配置し、複数の仮想カメラと全身基準ボリュームデータとの位置関係を変化させながら、複数の仮想カメラで全身基準ボリュームデータを撮影することで得られた複数のシルエット画像から視体積交差法を用いて形状復元することで位置関係毎の簡略ボリュームデータを生成する。 Returning to FIG. 2, the simplified volume data generation unit 30 arranges the whole body reference volume data generated by the reference volume data generation unit 20 in the virtual three-dimensional space, and determines the positional relationship between the plurality of virtual cameras and the whole body reference volume data. Simplified volume data for each positional relationship is generated by restoring the shape using a visual volume intersection method from a plurality of silhouette images obtained by photographing whole body reference volume data with a plurality of virtual cameras while changing.

信頼度生成部８０は、位置関係毎の簡略ボリュームデータと全身基準ボリュームデータとを比較することで、簡略ボリュームデータにおける偽のボリュームの発生箇所を特定し、入力ボリュームデータに含まれる偽のボリュームを除去するための全身基準ボリュームデータの各ボクセルに付された信頼度を生成する。ここで、信頼度は、入力ボリュームデータ生成部５０により生成される入力ボリュームデータの各ボクセルにおける偽のボリュームの発生し易さを確率的に表現したデータである。例えば、あるフレームの全身基準ボリュームデータのあるボクセルにおいて、仮想カメラと全身基準ボリュームデータとの位置関係を１０回変化させたうち、２回において偽ボリュームが発生した場合、そのボクセルの信頼度は０．２となる。 The reliability generation unit 80 compares the simplified volume data for each positional relationship with the whole-body reference volume data, identifies the occurrence location of the false volume in the simplified volume data, and determines the false volume included in the input volume data. A reliability attached to each voxel of the whole-body reference volume data for removal is generated. Here, the reliability is data that probabilistically represents the likelihood of generating a false volume in each voxel of the input volume data generated by the input volume data generation unit 50. For example, in a voxel having a whole body reference volume data of a frame, when the positional relationship between the virtual camera and the whole body reference volume data is changed 10 times, if a false volume occurs twice, the reliability of the voxel is 0. .2.

偽のボリュームが発生する箇所は、対象形状だけでなく撮影カメラ群との位置関係によって大きく変化する。そのため、位置関係毎の簡略ボリュームデータと全身基準ボリュームデータとを比較することで、偽のボリュームの発生箇所を特定することができる。 The place where the false volume is generated varies greatly depending not only on the target shape but also on the positional relationship with the photographing camera group. Therefore, by comparing the simplified volume data for each positional relationship with the whole-body reference volume data, it is possible to specify the location where the false volume is generated.

学習データ生成部４０は、基準ボリュームデータ生成部２０によりフレーム毎に生成された複数の基準ボリュームデータを、例えば主成分分析を用いて基準ボリュームデータのボクセル数よりも低い所定次元の学習空間に投影することで得られる複数の基準投影点列を学習データとして生成する。なお、主成分分析に代えて、「N. D. Lawrence, “Probabilistic non-linear principal component analysis with Gaussian process latent variable models,” in Journal of Machine Learning Research, Vol.6,, pp.1783. 1816, 2005.」に示す陰変数を利用した非線形写像を用いる手法を適用してもよい。 The learning data generation unit 40 projects the plurality of reference volume data generated for each frame by the reference volume data generation unit 20 onto a learning space having a predetermined dimension lower than the number of voxels of the reference volume data, for example, using principal component analysis. A plurality of reference projection point sequences obtained as a result is generated as learning data. Instead of principal component analysis, “ND Lawrence,“ Probabilistic non-linear principal component analysis with Gaussian process latent variable models, ”in Journal of Machine Learning Research, Vol.6, pp.1783. 1816, 2005.” A method using a non-linear mapping using the implicit variable shown in FIG.

本実施の形態では、基準ボリュームデータとして、全身基準ボリュームデータと体節基準ボリュームデータとが含まれるため、学習空間としては、全身基準ボリュームデータのボクセル数よりも低い所定次元の学習空間である全身学習空間と、各体節基準ボリュームデータのボクセル数よりも低い所定次元の学習空間である体節学習空間とが存在する。なお、本実施の形態では、体節部は１０種類存在するため、１０個体節学習空間が存在する。 In the present embodiment, since the whole body reference volume data and the body segment reference volume data are included as the reference volume data, the learning space is a whole body that is a learning space of a predetermined dimension lower than the number of voxels of the whole body reference volume data. There exists a learning space and a somite learning space which is a learning space of a predetermined dimension lower than the number of voxels of each somite reference volume data. In this embodiment, since there are 10 types of body segment parts, there are 10 individual section learning spaces.

そして、学習データ生成部４０は、フレーム毎に生成された複数の全身基準ボリュームデータを全身学習空間に投影することで得られる複数の基準投影点列を全身の学習データとして生成すると共に、フレーム毎に生成された複数の体節基準ボリュームデータを対応する体節学習空間に投影することで得られる複数の基準投影点列を体節の学習データとして生成する。 The learning data generation unit 40 generates a plurality of reference projection point sequences obtained by projecting the plurality of whole body reference volume data generated for each frame on the whole body learning space as whole body learning data, and for each frame. A plurality of reference projection point sequences obtained by projecting the plurality of body segment reference volume data generated in the above into the corresponding body segment learning space are generated as body segment learning data.

学習データ記憶部９０は、信頼度生成部８０により生成された信頼度と、学習データ生成部４０により生成された学習データとを予め記憶する。なお、基準ボリュームデータ生成部２０〜学習データ生成部４０及び信頼度生成部８０による処理は、後述する入力ボリュームデータ生成部５０〜特定部７０によりリアルタイムで実行される処理の前処理として予め実行される。 The learning data storage unit 90 stores the reliability generated by the reliability generation unit 80 and the learning data generated by the learning data generation unit 40 in advance. Note that the processing by the reference volume data generation unit 20 to the learning data generation unit 40 and the reliability generation unit 80 is executed in advance as preprocessing of processing executed in real time by the input volume data generation unit 50 to identification unit 70 described later. The

入力ボリュームデータ生成部５０は、カメラ１１〜１ｍによりフレームが取得される毎に、視体積交差法を用いて基準対象物と同様の複数の体節部を有する観測対象物のボリュームデータである入力ボリュームデータを生成する。入力ボリュームデータには、全身基準ボリュームデータと同じ低解像度でボクセルが配置された観測対象物全体のボリュームデータである第１の全身入力ボリュームデータと、体節基準ボリュームデータと同じ高解像度でボクセルが配置された観測対象物全体のボリュームデータである第２の全身入力ボリュームデータとが含まれる。したがって、第１の全身入力ボリュームデータは、図３（ａ）に示す全身基準ボリュームデータと同様のボクセルＢＳ１で表されるデータとなり、第２の全身入力ボリュームデータは、図３（ａ）に示す全身基準ボリュームデータのボクセルＢＳ１をボクセルＢＳ２としたデータで表される。 The input volume data generation unit 50 is input that is volume data of an observation target having a plurality of body segments similar to the reference target using the visual volume intersection method every time frames are acquired by the cameras 11 to 1m. Generate volume data. The input volume data includes the first whole body input volume data that is the volume data of the entire observation object in which the voxels are arranged at the same low resolution as the whole body reference volume data, and the same high resolution as the body segment reference volume data. 2nd whole body input volume data which is the volume data of the whole observation object arranged is included. Accordingly, the first whole body input volume data is data represented by the voxel BS1 similar to the whole body reference volume data shown in FIG. 3A, and the second whole body input volume data is shown in FIG. It is represented by data in which the voxel BS1 of the whole body reference volume data is the voxel BS2.

探索部６０は、入力ボリュームデータ生成部５０により生成された入力ボリュームデータを学習空間に投影し、入力ボリュームデータの投影点に対して最近傍に位置する基準投影点である最近傍基準投影点を学習空間内で探索する。 The search unit 60 projects the input volume data generated by the input volume data generation unit 50 onto the learning space, and determines a nearest reference projection point that is a reference projection point located closest to the projection point of the input volume data. Search within the learning space.

具体的には、探索部６０は、第１の全身入力ボリュームデータを全身学習空間に投影することで、全身基準ボリュームデータに対する第１の全身入力ボリュームデータの向きを推定し、推定した向きに第２の全身入力ボリュームデータを回転させ、全身基準ボリュームデータの各ボクセルに付されたラベルを基に、回転させた第２の全身入力ボリュームデータにおける１０個の体節部の位置を特定し、１０個の体節部のそれぞれを囲む１０個のバウンディングボックスを設定することで、１０個の体節入力ボリュームデータを生成し、１０個の体節入力ボリュームデータを対応する学習空間に投影することで、１０個の体節部に対する最近傍基準投影点を探索する。 Specifically, the search unit 60 estimates the direction of the first whole body input volume data with respect to the whole body reference volume data by projecting the first whole body input volume data onto the whole body learning space, and sets the first whole body input volume data in the estimated direction. The whole body input volume data of 2 is rotated, and the positions of the ten body segments in the rotated second whole body input volume data are specified based on the labels attached to the voxels of the whole body reference volume data. By setting 10 bounding boxes surrounding each of the body segment parts, 10 body segment input volume data are generated, and 10 body segment input volume data are projected onto the corresponding learning space. The nearest reference projection point for 10 body segment parts is searched.

特定部７０は、探索部６０により探索された体節部毎の最近傍基準投影点に対応する体節基準ボリュームに付されたラベルを基に、第２の全身入力ボリュームデータの各ボクセルがどの体節部に属するかを特定する。 The specifying unit 70 determines which voxel of the second whole body input volume data is based on the label attached to the body segment reference volume corresponding to the nearest reference projection point for each body segment searched by the search unit 60. Identify whether it belongs to the body segment.

次に、本形状解析装置における学習データの生成処理について説明する。図４は、学習データ生成処理を示すフローチャートである。まず、ステップＳ１において、カメラ１１〜１ｍは、基準対象物の動画像を撮影する。図５は、全身基準ボリュームデータが生成される過程を示した図である。カメラ１１〜１ｍは、図５（ａ）に示すように特定対象となる１０個の体節部毎に異なる色が付されたゆとりのある衣服（例えば浴衣）を着用した人物を撮影する。 Next, learning data generation processing in the shape analysis apparatus will be described. FIG. 4 is a flowchart showing learning data generation processing. First, in step S1, the cameras 11 to 1m capture a moving image of the reference object. FIG. 5 is a diagram illustrating a process in which whole body reference volume data is generated. As shown in FIG. 5A, the cameras 11 to 1 m photograph a person who wears a garment (for example, a yukata) with a space that is given a different color for each of ten body segment parts to be specified.

次に、ステップＳ２において、基準ボリュームデータ生成部２０は、カメラ１１〜１ｍのそれぞれにより撮影された同一時刻のｍ枚のフレームから、視体積交差法を用いて３次元のボリュームデータを生成し、生成したボリュームデータに対して弾性メッシュ変形又はスペースカービングを施して全身基準ボリュームデータを生成する。これにより、全身基準ボリュームデータの形状は、図５（ｂ）に示す実際の基準対象物の形状に近似され、基準対象物をリアルに再現することができる。 Next, in step S2, the reference volume data generation unit 20 generates three-dimensional volume data using the visual volume intersection method from m frames at the same time taken by the cameras 11 to 1m, Whole volume reference volume data is generated by applying elastic mesh deformation or space carving to the generated volume data. Thereby, the shape of the whole body reference volume data is approximated to the shape of the actual reference object shown in FIG. 5B, and the reference object can be reproduced realistically.

次に、ステップＳ３において、基準ボリュームデータ生成部２０は、ステップＳ２で生成した全身基準ボリュームデータの各ボクセルに、各ボクセルが属する体節部を示すラベル（体節ラベル）を付す。具体的には、基準ボリュームデータ生成部２０は以下のようにして体節ラベルを付す。まず、カメラ１１〜１ｍにより撮影された同一時刻のｍ枚のフレームに対して色検出処理を実行し、図５（ｃ）に示すような体節部ごとにラベル化された多視点ラベル画像を生成する。なお、色検出処理としては、「和田俊和：“最近傍識別器を用いた色ターゲット検出”，情報処理学会CVIM 論文誌，Vol.44，No.SIG17，pp.126.135，2003．」記載の手法を採用することができる。 Next, in step S3, the reference volume data generation unit 20 attaches a label (body segment label) indicating the body segment to which each voxel belongs to each voxel of the whole body reference volume data generated in step S2. Specifically, the reference volume data generation unit 20 attaches a somatic label as follows. First, color detection processing is executed on m frames taken at the same time taken by the cameras 11 to 1m, and a multi-view label image labeled for each body segment as shown in FIG. Generate. For color detection processing, the method described in “Toshikazu Wada:“ Color target detection using nearest neighbor classifier ”, Information Processing Society of Japan CVIM Journal, Vol. 44, No. SIG17, pp. 126.135, 2003.” Can be adopted.

次に、多視点ラベル画像から全身基準ボリュームデータに対して体節ラベルの逆投影を行うことにより、全身基準ボリュームデータの表面ボクセルを体節ラベルでラベリングする。次に、全身基準ボリュームデータの各内部ボクセルの最近傍に位置する表面ボクセルを探索し、探索した表面ボクセルの体節ラベルで各内部ボクセルをラベリングする。以上により、図５（ｄ）に示すような、各ボクセルに体節ラベルが付された高精度形状を有する全身基準ボリュームデータが生成される。 Next, by performing back projection of the body segment label on the whole body reference volume data from the multi-viewpoint label image, the surface voxel of the whole body reference volume data is labeled with the body segment label. Next, the surface voxel located nearest to each internal voxel of the whole body reference volume data is searched, and each internal voxel is labeled with the body segment label of the searched surface voxel. As described above, the whole-body reference volume data having a high-precision shape in which the body segment label is attached to each voxel as shown in FIG. 5D is generated.

次に、ステップＳ４において、基準ボリュームデータ生成部２０は、体節ラベルが付された全身基準ボリュームデータを用いて基準対象物の１０個の体節部を表す１０個の体節基準ボリュームデータを生成する。ここで、基準ボリュームデータ生成部２０は、高解像度でボクセルが配置されたバウンディングボックスを、全身基準ボリュームデータに含まれる各体節部が覆われるように配置することで、高解像度の体節基準ボリュームデータを生成する。 Next, in step S4, the reference volume data generation unit 20 uses the whole body reference volume data to which the body segment label is attached, to generate 10 body segment reference volume data representing 10 body segment parts of the reference object. Generate. Here, the reference volume data generation unit 20 arranges the bounding box in which the voxels are arranged at a high resolution so that each body segment part included in the whole-body reference volume data is covered, so that the high resolution body segment reference is performed. Generate volume data.

全身基準ボリュームデータ及び各体節基準ボリュームデータを主成分分析するためには、各フレームにおける全身基準ボリュームデータ及び各体節基準ボリュームデータのボクセル数（次元数）を一致させる必要がある。そこで、基準ボリュームデータ生成部２０は、各フレームにおける基準対象物の３次元重心を求め、それを基準に基準対象物を囲むように一定サイズのバウンディングボックスを配置して全身基準ボリュームデータを生成すると共に、各フレームにおける各体節部の３次元重心を求め、それを基準に各体節部を囲むように一定サイズのバウンディングボックスを配置して各体節基準ボリュームを生成している。なお、全身を囲むバウンディングボックス及び各体節部を囲むバウンディングボックスのサイズは、全フレームにおいて基準対象物及び各体節部を囲むことができるサイズであることが好ましい。また、図３（ｂ）に示すように、ある体節部のバウンディングボックスＢＳ２内には、他の体節部の体節基準ボリュームデータが含まれている。 In order to perform principal component analysis on the whole body reference volume data and each body segment reference volume data, it is necessary to match the number of voxels (the number of dimensions) of the whole body reference volume data and each body segment reference volume data in each frame. Therefore, the reference volume data generation unit 20 obtains the three-dimensional centroid of the reference object in each frame, and generates a whole body reference volume data by arranging a bounding box of a certain size so as to surround the reference object based on that. At the same time, a three-dimensional center of gravity of each body segment in each frame is obtained, and a bounding box of a certain size is arranged so as to surround each body segment based on that, thereby generating each body segment reference volume. In addition, it is preferable that the size of the bounding box that surrounds the whole body and the bounding box that surrounds each body segment is a size that can surround the reference object and each body segment in all frames. Further, as shown in FIG. 3B, the body segment reference volume data of the other body segment part is included in the bounding box BS2 of a certain body segment part.

次に、ステップＳ５において、学習データ生成部４０は、全フレームにおける全身基準ボリュームデータ及び体節基準ボリュームデータを時系列的に並べたボリュームデータの集合Ｖを算出する。ここで、ある時刻ｔにおけるｄ次元（ボクセル数がｄ個）のボリュームデータは、ｖ_ｔ＝［ｖ_ｔ，１，ｖ_ｔ，２，・・・，ｖ_ｔ，ｄ］^Ｔ（ｖ_ｔ，ｉ∈｛０，１｝とし、０は非対象ボクセル、１は対象ボクセルとする。）で表される。また、全フレームのボリュームデータの集合Ｖは、次のように表現される。 Next, in step S5, the learning data generation unit 40 calculates a set V of volume data in which the whole body reference volume data and the body segment reference volume data in all frames are arranged in time series. Here, d-dimensional (d voxel number d) volume data at a certain time t is represented by v _t = [v _{t, 1} , v _{t, 2} ,..., V _{t, d} ] ^T (v _{t, i} ∈ {0, 1}, 0 is a non-target voxel, and 1 is a target voxel). A set V of volume data of all frames is expressed as follows.

Ｖ＝［ｖ_１−ｍ，ｖ_２−ｍ，・・・，ｖ_Ｔ−ｍ］（１）
但し、Ｔは学習データとして利用するボリュームデータ数（すなわち、撮影フレーム数）、ｍはＴ個のボリュームデータの平均ボリュームである。 V = [v ₁ −m, v ₂ −m,..., V _T −m] (1)
Here, T is the number of volume data used as learning data (that is, the number of captured frames), and m is the average volume of T volume data.

次に、ステップＳ６において、学習データ生成部４０は、全身基準ボリュームデータ及び体節基準ボリュームデータの各々において、集合Ｖから共分散行列Ｓ＝ＶＶ^Ｔを算出し、固有ベクトルの集合｛ｅ_ｉ｜ｉ＝１，・・・，ｄ｝を得る。但し、ｅ_ｉは対応する固有値λ_ｉを大きい順に並べたものとする。この固有ベクトルを基底とした空間が固有空間となる。なお、全身基準ボリュームデータ及び各体節基準ボリュームデータのボクセル数は説明の便宜上、ｄで表しているが、実際には各々異なるボクセル数である。 Next, in step S6, the learning data creation section 40, in each of the whole-body reference volume data and somites reference volume data, to calculate the covariance matrix S = VV ^T from the set V, the set of eigenvectors {e i _| i = 1,..., D}. However, e _i is assumed that corresponding eigenvalues λ _i are arranged in descending order. A space based on this eigenvector is an eigenspace. Note that the number of voxels in the whole body reference volume data and each body segment reference volume data is represented by d for convenience of explanation, but actually, it is a different number of voxels.

次に、ステップＳ７において、学習データ生成部４０は、全身基準ボリュームデータ及び体節基準ボリュームデータの各々において、ｄ次元のボリュームデータｖ_ｔを、ｄより十分に小さいｄ_ｌ個の固有ベクトルによって近似表現し、ｄ_ｌ個の固有ベクトルからなる行列Ｅ＝［ｅ_１，ｅ_２，・・・，ｅ_ｄｌ］を用いて、ｄ_ｌ次元の全身学習空間及び１０個の体節学習空間を生成する。なお、全身学習空間及び各体節学習空間の次元数は説明の便宜上ｄ_ｌで表しているが、実際には各々異なる次元数であってもよいし、同じ次元数であってもよい。 Next, in step S7, the learning data generation unit 40 approximates the d-dimensional volume data v _t by d ₁ eigenvectors sufficiently smaller than d in each of the whole body reference volume data and the body segment reference volume data. and, _{d l} number of matrix _{_{E = [e 1, e 2}} , ···, e dl] consisting of eigenvectors is used to generate systemic learning space and 10 somites learning space _{d l} dimensions. The number of dimensions of the whole body learning space and each body segment learning space is represented by _dl for convenience of explanation, but in actuality, it may be a different number of dimensions or the same number of dimensions.

次に、ステップＳ８において、学習データ生成部４０は、全身基準ボリュームデータ及び各体節基準ボリュームデータを、下式の線形射影により、ｄ_ｌ次元の全身学習空間及び対応する体節学習空間に基準投影点として投影する。なお、ｙ_ｔは基準投影点を示す。 Next, in step S8, the learning data generation unit 40 references the whole body reference volume data and each body segment reference volume data to the _dl- dimensional whole body learning space and the corresponding body segment learning space by linear projection of the following equation. Project as a projection point. Incidentally, _{y t} denotes a reference projection point.

ｙ_ｔ＝Ｅ^Ｔ（ｖ_ｔ−ｍ）（２）
次に、ステップＳ９において、学習データ生成部４０は、全身学習空間及び体節学習空間に投影された基準投影点を時系列的に結び、この基準投影点列を全身及び各体節部の学習データとして生成する。つまり、時系列に連続なボリュームデータの投影点を結ぶことにより、時系列的な形状変化を学習空間上の軌跡として表現することができる。上述の処理により、学習データを学習空間上の多様体として表現できる。この多様体は、基準投影点の集合｛ｙ_１ ^Ｌ，・・・，ｙ_Ｔ ^Ｌ｝で構成される。但し、ｙ_ｔ ^Ｌは時刻ｔにおける基準投影点を表すものとする。更に、多様体中の各点（あるフレームの全身基準ボリュームデータ）に対して、各ボクセルに付された体節ラベルを記録しておく。なお、体節基準ボリュームデータに対しては、各ボクセルに付された体節ラベルを記録しておいてもよいし、記録しておかなくてもよい。 y _t = E ^T (v _t −m) (2)
Next, in step S9, the learning data generation unit 40 connects the reference projection points projected onto the whole body learning space and the body segment learning space in time series, and this reference projection point sequence is used to learn the whole body and each body part. Generate as data. That is, by connecting projection points of continuous volume data in time series, a time series shape change can be expressed as a trajectory in the learning space. Through the above processing, the learning data can be expressed as a manifold on the learning space. This manifold is composed of a set of reference projection points {y ₁ ^L ,..., Y _T ^L }. However, y _t ^L represents a reference projection point at time t. Furthermore, a segment label attached to each voxel is recorded for each point in the manifold (whole body reference volume data of a certain frame). For the body segment reference volume data, the body segment label attached to each voxel may be recorded or may not be recorded.

図６は、学習空間の一例を示した図である。図６においては全身学習空間が３次元で示されている。図６に示すように、フレーム毎に生成された各全身基準ボリュームデータＶＤは、３次元の全身学習空間内の１点（基準投影点）として表されていることが分かる。そして、各基準投影点には対応する全身基準ボリュームデータＶＤの各ボクセルに付された体節ラベルが関連付けられている。そして、学習データ生成部４０は、図６に示すような学習空間を各体節部に対して生成する。本実施の形態では、１０個の体節部が存在するため、１個の全身学習空間と１０個の体節学習空間との合計１１個の学習空間が生成されることになる。 FIG. 6 is a diagram illustrating an example of a learning space. In FIG. 6, the whole body learning space is shown in three dimensions. As shown in FIG. 6, each whole body reference volume data VD generated for each frame is represented as one point (reference projection point) in the three-dimensional whole body learning space. Each reference projection point is associated with a body segment label attached to each voxel of the corresponding whole-body reference volume data VD. And the learning data generation part 40 produces | generates learning space as shown in FIG. 6 with respect to each body segment part. In the present embodiment, since there are 10 body segment parts, a total of 11 learning spaces including one whole body learning space and 10 body segment learning spaces are generated.

次に、探索部６０による最近傍基準投影点の探索処理について説明する。探索部６０は、第１の全身入力ボリュームデータと１０個の体節入力ボリュームデータに対して最近傍基準投影点を探索するが、以下の説明では、説明の便宜上これらのボリュームデータを単に入力ボリュームデータとする。現フレームと過去ｎフレームからなる入力ボリュームデータ｛ｖ_ｔ，ｖ_ｔ−１，・・・，ｖ_ｔ−ｎ｝を、それぞれ全身学習空間に投影すると、入力ボリュームデータの投影点は、｛ｙ_ｔ ^Ｉ，ｙ_ｔ−１ ^Ｉ，・・・，ｙ_ｔ−ｎ ^Ｉ｝で表される投影点群からなる軌跡パターンで表される。但し、ｙ_ｔ ^Ｉは、時刻ｔにおける入力ボリュームデータの投影点を表すものとする。 Next, the nearest neighbor reference projection point search process by the search unit 60 will be described. The search unit 60 searches for the nearest reference projection point with respect to the first whole body input volume data and the ten segment input volume data, but in the following description, for convenience of explanation, these volume data are simply used as the input volume. Data. When the input volume data {v _t , v _t−1 ,..., V _t−n } composed of the current frame and the past n frames is projected onto the whole body learning space, the projection point of the input volume data is {y _t ^{_{^{I, y t-1 I,}}} ···, represented by trajectory pattern consisting of shading point group represented by _{y ^t-n} ^I}. However, y _t ^I represents the projection point of the input volume data at time t.

そして、探索部６０は、時刻ｔにおける入力ボリュームデータから得られる軌跡パターンであるｙ_ｔ ^Ｉ＝（ｙ_ｔ ^Ｉ，ｙ_ｔ−１ ^Ｉ，・・・，ｙ_ｔ−ｎ ^Ｉ）と、学習データの軌跡パターンであるｙ_ｓ ^Ｌ＝（ｙ_ｓ ^Ｌ，ｙ_ｓ−１ ^Ｌ，・・・，ｙ_ｓ−ｎ ^Ｌ）（ｓ∈｛ｎ＋１，・・・，Ｔ｝）との比較により最近傍基準投影点、すなわち時刻ｔにおける入力ボリュームデータに類似する基準ボリュームデータを探索する。 The search unit 60 then obtains y _t ^I = (y _t ^I , y _t−1 ^I ,..., Y _t−n ^I ) that is a trajectory pattern obtained from the input volume data at time t, and the learning data The nearest neighbor reference projection by comparison with the locus pattern y _s ^L = (y _s ^L , y _s−1 ^L ,..., Y _s−n ^L ) (sε {n + 1,..., T}) The reference volume data similar to the input volume data at the point, that is, time t is searched.

したがって、参照する過去の履歴数であるｎが大きいほど、同一動作の探索成功率は向上するが、探索コストは大きくなってしまう。また、ｎが大き過ぎると、長時間にわたって学習データと全く同じ動きをしている入力動作の解析しかできなくなってしまう。そこで、タスクにあわせて処理時間、及び短い類似動作の組み合わせからなる動きへの適用可能性を考慮にいれて、ｎを決定することが望ましく、本実施の形態ではｎ＝５が採用されている。 Therefore, as n, which is the number of past histories to be referred to, increases, the search success rate for the same operation improves, but the search cost increases. On the other hand, if n is too large, it is only possible to analyze an input operation that moves exactly the same as the learning data for a long time. Therefore, it is desirable to determine n in consideration of the applicability to a motion consisting of a combination of processing time and a short similar operation according to a task. In this embodiment, n = 5 is adopted. .

具体的には、入力ボリュームデータと全身の学習データとの軌跡パターンの各点同士の距離総和を探索の評価値である式（３）に示すＤ（ｔ）を満たす全身の学習データｙ_ｓ ^Ｌをｙ_ｔ ^Ｉと最も類似した学習データ、すなわち最近傍基準投影点として決定する。 Specifically, the whole body learning data y _s ^L satisfying D (t) shown in Expression (3), which is the evaluation value of the search, is the sum of the distances between the points of the trajectory pattern of the input volume data and the whole body learning data. Is determined as the learning data most similar to y _t ^I , that is, the nearest reference projection point.

ｎ＝５として式（３）を説明すると、まず、Ｔ個のｙ_Ｓ ^Ｌの中からｙ_６ ^Ｌ〜ｙ_１ ^Ｌをとり、ｙ_６ ^Ｌ〜ｙ_１ ^Ｌとｙ_ｔ ^Ｉとの各々の距離の総和を求め、次に、ｙ_７ ^Ｌ〜ｙ_２ ^Ｌをとり、ｙ_７ ^Ｌ〜ｙ_２ ^Ｌとｙ_ｔ ^Ｉとの各々の距離の総和を求めるという処理をｓ＝６（＝ｎ＋１）〜Ｔまで繰り返し、距離の総和が最小となるｙ_Ｓ ^Ｌをｙ_ｔ ^Ｉに対する最近傍基準投影点として特定する。 To describe Equation 3 a n = 5, first, take a _y ⁶ ^L _~y ¹ L out of the T _y ^{S _L,} the distance between each of the ^y 6 ^L _~y ¹ L and _y ^{t I} the total sum, _then, taking the ^{_{^{_{y 7 L ~y 2 L, s}}}} = 6 to that process obtains each total sum of the distance between ^y 7 ^L _~y ² L and _{^{y t I (= n + 1}} ) to ~T It repeats and specifies y _S ^L that minimizes the sum of distances as the nearest reference projection point for y _t ^I.

こうして決定された全身の学習データｙ_ｓ ^Ｌから、凹部分のボリューム除去などの処理が施された高精度の全身基準ボリュームデータが参照可能であり、その形状と対応する体節ラベルも参照することができる。 From the whole body learning data y _s ^L determined in this way, it is possible to refer to highly accurate whole body reference volume data that has been subjected to processing such as volume removal for the concave portion, and also refer to the body segment label corresponding to the shape. Can do.

しかしながら、入力ボリュームデータには偽のボリュームが含まれており、この偽のボリュームの影響により正しい学習データを探索できない恐れがある。そこで、探索部６０は、信頼度生成部８０により生成された信頼度Ｃ_ｔを用いて、入力ボリュームデータに含まれる偽のボリュームを減少させている。図７は、偽のボリュームを示した図である。図７の丸で囲んだ領域において偽のボリュームが発生している。視体積交差法によれば、このような偽のボリュームが発生してしまうが、信頼度Ｃ_ｔを用いることで、この偽のボリュームを減少させることが可能となる。 However, the input volume data includes a fake volume, and there is a possibility that correct learning data cannot be searched due to the influence of the fake volume. Therefore, the search unit 60 is by using the reliability C _t generated by the reliability determining unit 80 reduces the volume of false included in the input volume data. FIG. 7 shows a false volume. A fake volume is generated in the circled area in FIG. According to the volume intersection method, such false volume occurs, by using the reliability C _t, it is possible to reduce the volume of this fake.

また、学習データｙ_ｓ ^Ｌ＝（ｙ_ｓ ^Ｌ，ｙ_ｓ−１ ^Ｌ，・・・，ｙ_ｓ−ｎ ^Ｌ）に対して評価を行っているため、学習したボリュームデータの数に比例して処理時間が増大する。そこで、探索部６０は、以下に示すような探索処理の効率化を行うことで処理時間を減少させている。図８及び図９は探索処理の効率化を説明する図である。なお、図８及び図９では、説明の便宜上、学習空間を２次元で表し、四角で示す各点が学習データを示している。 Further, since the evaluation is performed on the learning data y _s ^L = (y _s ^L , y _s−1 ^L ,..., Y _s−n ^L ), the processing is performed in proportion to the number of learned volume data. Time increases. Therefore, the search unit 60 reduces the processing time by improving the efficiency of the search process as described below. 8 and 9 are diagrams for explaining the efficiency of the search process. 8 and 9, for convenience of explanation, the learning space is represented in two dimensions, and each point indicated by a square represents learning data.

探索処理の高速化のためには、学習空間中の学習データを探索対象とするのではなく、入力ボリュームデータの投影点Ｐの近傍の学習データのみを探索対象とすればよい。そこで、探索部６０は、事前に学習空間全体を一定間隔の部分領域に区切り、各部分領域中に存在する学習データを調べておく。そして、探索処理時には、まず、入力ボリュームデータの投影点Ｐが含まれる部分領域を注目部分領域ＣＤとして選択し、注目部分領域ＣＤ内でのみ探索処理を行う。部分領域のサイズを小さくすると、探索対象となるデータの数は減少するが、図８（ａ）に示すように注目部分領域ＣＤにおいて学習データが存在しない可能性が上がる。そこで、探索部６０は、図８（ｂ）に示すように更に大きなサイズの部分領域を複数用意しておき、小さいサイズの部分領域集合から順に入力ボリュームデータを投影することで、注目部分領域ＣＤ内に学習データを含ませることを可能としている。 In order to speed up the search process, the learning data in the learning space need not be the search target, but only the learning data in the vicinity of the projection point P of the input volume data may be the search target. Therefore, the search unit 60 divides the entire learning space into partial areas at regular intervals in advance, and examines the learning data existing in each partial area. In the search process, first, the partial area including the projection point P of the input volume data is selected as the target partial area CD, and the search process is performed only within the target partial area CD. If the size of the partial area is reduced, the number of data to be searched decreases, but the possibility that there is no learning data in the target partial area CD increases as shown in FIG. Accordingly, the search unit 60 prepares a plurality of partial areas having a larger size as shown in FIG. 8B, and projects the input volume data in order from a set of partial areas having a smaller size, thereby obtaining the target partial area CD. It is possible to include learning data within.

しかしながら、投影点Ｐを含む注目部分領域ＣＤのみ探索すると、図９（ａ）に示すように注目部分領域ＣＤ外に真の最近傍基準投影点Ｐｘが存在する可能性がある。そこで、探索部６０は、図９（ｂ）に示すように、投影点Ｐと、注目部分領域ＣＤにおける最近傍の基準投影点Ｐ１との距離ｄｐを閾値として、投影点Ｐからの距離ｄｉがｄｉ＜ｄｐを満たす隣接部分領域ＣＤ１〜ＣＤ３を探索範囲に含ませる。このように、探索範囲を階層的に設定した効率的な探索により、高速性と確実に類似解を探索することができる安定性とを両立することができる。 However, if only the target partial area CD including the projection point P is searched, there is a possibility that the true nearest reference projection point Px exists outside the target partial area CD as shown in FIG. Therefore, as shown in FIG. 9B, the search unit 60 sets the distance di from the projection point P to the projection point P with the distance dp between the projection point P and the nearest reference projection point P1 in the target partial region CD as a threshold value. Adjacent partial regions CD1 to CD3 satisfying di <dp are included in the search range. As described above, the efficient search in which the search range is set hierarchically can achieve both high speed and stability capable of reliably searching for similar solutions.

次に、入力ボリュームデータ生成部５０〜特定部７０によってなされる形状解析処理について説明する。図１０は形状解析処理を示すフローチャートである。まず、ステップＳ３１において、カメラ１１〜１ｍは、同一時刻における観測対象物のｍ枚のフレームを取得する。次に、ステップＳ３２において、入力ボリュームデータ生成部５０は、ステップＳ３１で取得されたｍ枚のフレームから視体積交差法を用いて低解像度の第１の全身入力ボリュームデータと、高解像度の第２の全身入力ボリュームデータとを生成する。 Next, the shape analysis process performed by the input volume data generation unit 50 to the identification unit 70 will be described. FIG. 10 is a flowchart showing the shape analysis process. First, in step S31, the cameras 11 to 1m acquire m frames of the observation object at the same time. Next, in step S32, the input volume data generating unit 50 uses the visual volume intersection method from the m frames acquired in step S31, and the low-resolution first whole body input volume data and the high-resolution second volume data. The whole body input volume data is generated.

次に、ステップＳ３３において、式（４）を用いて第１の全身入力ボリュームデータを全身学習空間に投影する。 Next, in step S33, the first whole body input volume data is projected onto the whole body learning space using Expression (4).

ｙ_ｔ ^Ｉ＝Ｅ^ＴＣ_ｔ（ｖ_ｔ−ｍ）（４）
ここで、Ｃ_ｔは信頼度生成部８０により予め算出された信頼度を示し、式（５）により表される。 _{^{^{_{y t I = E T C t}}}} (v t -m) (4)
Here, C _t indicates the reliability calculated in advance by the reliability generation unit 80 and is represented by Expression (5).

但し、ｃ_ｔ，ｉは０以上１以下の値をとりうる時刻ｔのフレームにおける全身基準ボリュームデータのｉ番目のボクセルの信頼度を表す。これにより、第１の全身入力ボリュームデータに含まれる偽のボリュームが減少され、高精度な探索処理を実現することができる。 Here, _{ct, i} represents the reliability of the i-th voxel of the whole-body reference volume data in the frame at time t that can take a value between 0 and 1. As a result, the false volume included in the first whole-body input volume data is reduced, and a highly accurate search process can be realized.

信頼度Ｃ_ｔは、時刻ｔにおける入力ボリュームデータに対応する学習データが既知でないと与えることができず、鶏と卵の関係にある。そこで、本実施の形態では、時系列的に連続な第１の全身入力ボリュームデータ間の差は微小であると仮定し、時刻ｔで使用するボクセル信頼度は時刻ｔ−１で探索された学習データから決定される。すなわち、時刻ｔ−１の最近傍基準投影点に対応する全身基準ボリュームデータに対応付けられた信頼度が時刻ｔにおける信頼度Ｃ_ｔとして用いられる。 The reliability C _t cannot be given unless the learning data corresponding to the input volume data at time t is known, and has a relationship between chicken and egg. Therefore, in the present embodiment, it is assumed that the difference between the first whole-body input volume data that is continuous in time series is very small, and the voxel reliability used at time t is the learning searched at time t−1. Determined from the data. That is, the reliability associated with the whole-body reference volume data corresponding to the nearest reference projection point at time t−1 is used as the reliability C _{t at} time t.

次に、ステップＳ３４において、探索部６０は、式（３）に示す評価値であるＤ^θ（ｔ）を算出し、Ｄ^θ（ｔ）を満たす学習データであるｙ_Ｓ ^Ｌを最近傍基準投影点として探索する。 Next, in step S34, the search unit 60 calculates D ^θ (t) that is the evaluation value shown in Expression (3), and uses y _S ^L that is learning data that satisfies D ^θ (t) as the nearest neighbor reference projection. Search as a point.

ステップＳ３５において、探索部６０は、評価値であるＤ^θ（ｔ）を最小にするθ´が探索できた場合（ステップＳ３５でＹＥＳ）、処理をステップＳ３７に進め、Ｄ^θ（ｔ）を最小にするθ´が探索できなかった場合（ステップＳ３５でＮＯ）、θを所定角度変化させる（ステップＳ３６）。すなわち、探索部６０は、θ´が探索できるまでステップＳ３３〜Ｓ３６の処理を繰り返し実行する。 In step S35, when the search unit 60 can search for ^{θ ′} that minimizes the evaluation value D ^θ (t) (YES in step S35), the search unit 60 proceeds to step S37 and minimizes D ^θ (t). If θ ′ to be found cannot be searched (NO in step S35), θ is changed by a predetermined angle (step S36). That is, the search unit 60 repeatedly executes the processes of steps S33 to S36 until θ ′ can be searched.

次に、ステップＳ３７において、Ｄ^θ´（ｔ）を満たす全身の学習データに対応する全身基準ボリュームデータを特定する。ここで、特定された全身基準ボリュームデータは、第１の全身入力ボリュームデータに最も類似したデータとなる。また、探索されたθ´は、ステップＳ３７で特定された全身基準ボリュームデータに対する第１の全身入力ボリュームデータの向きを表す。 Next, in step S37, whole body reference volume data corresponding to whole body learning data satisfying D ^{θ ′} (t) is specified. Here, the specified whole body reference volume data is the data most similar to the first whole body input volume data. The searched θ ′ represents the direction of the first whole body input volume data with respect to the whole body reference volume data specified in step S37.

全身基準ボリュームデータ及び第１の全身入力ボリュームデータは、世界座標系の下で復元されたボクセル集合である。よって、基準対象物と観測対象物とが同じ形状及び姿勢を有していても、向きが異なれば、全く異なるデータとしてみなされてしまう。そこで、探索部６０は、第１の全身入力ボリュームデータの全身基準データに対する向きを特定している。 The whole body reference volume data and the first whole body input volume data are voxel sets restored under the world coordinate system. Therefore, even if the reference object and the observation object have the same shape and posture, if the directions are different, they are regarded as completely different data. Therefore, the search unit 60 specifies the orientation of the first whole body input volume data with respect to the whole body reference data.

ステップＳ３８において、探索部６０は、第２の全身入力ボリュームデータをθ´回転させる。これによりステップＳ３７で特定された全身基準ボリュームと第２の全身入力ボリュームデータとの向きが一致する。ステップＳ３９において、探索部６０は、ステップＳ３７で特定された全身基準ボリュームデータの各ボクセルに付された体節ラベルに従って、第２の全身入力ボリュームデータの各体節部の位置を特定し、特定した各体節部を囲む高解像度のバウンディングを設定することで、第２の全身入力ボリュームデータを各体節領域に分割し、体節入力ボリュームデータを生成する。 In step S38, the search unit 60 rotates the second whole body input volume data by θ ′. As a result, the orientations of the whole body reference volume specified in step S37 and the second whole body input volume data match. In step S39, the search unit 60 identifies and identifies the position of each body segment in the second whole body input volume data according to the body segment label attached to each voxel of the whole body reference volume data identified in step S37. By setting the high-resolution bounding surrounding each body segment part, the second whole body input volume data is divided into each body segment region to generate body segment input volume data.

ステップＳ４０において、探索部６０は、各体節入力ボリュームデータを対応する体節学習空間にそれぞれ投影する。ステップＳ４１において、探索部６０は、式（３）に示す評価値であるＤ^Ｌ（ｔ）がＤ^Ｌ（ｔ）＜閾値を満たす複数の学習データを探索する。ステップＳ４２において、特定部７０は、ステップＳ４１で探索された複数の学習データに対応する複数の体節基準ボリュームデータの各ボクセルの体節ラベルから第２の全身入力ボリュームの各ボクセルに付す体節ラベルを特定する。ここで付される体節ラベルは、各ボクセルの体節ラベルの候補を確率的に表現したデータとなる。例えば、Ｄ^Ｌ（ｔ）＜閾値を満たす学習データが５個存在し、この５個の学習データに対応する５個の体節基準ボリュームデータのうち４個の体節基準ボリュームデータがあるボクセルを「１」とし、１個の体節基準ボリュームデータがこのあるボクセルを「０」としている場合、このボクセルの体節ラベルは、３／４＝０．７５となる。 In step S40, the search unit 60 projects each body segment input volume data on the corresponding body segment learning space. In step S41, the search unit 60 searches for a plurality of learning data in which D ^L (t), which is the evaluation value shown in Expression (3), satisfies D ^L (t) <threshold. In step S42, the specifying unit 70 adds the body segment attached to each voxel of the second whole body input volume from the body segment label of each voxel of the plurality of body segment reference volume data corresponding to the plurality of learning data searched in step S41. Identify the label. The somatic label attached here is data that probabilistically represents the candidate somatic label of each voxel. For example, there are five learning data satisfying D ^L (t) <threshold, and voxels having four body segment reference volume data out of the five body segment reference volume data corresponding to the five learning data. When “1” is set and one segment reference volume data sets this certain voxel to “0”, the segment label of this voxel is 3/4 = 0.75.

各体節部は１つの塊として存在するため、あるボクセルとその近傍ボクセルとのラベルは一致している可能性が高い。そこで、特定部７０は、各ボクセルにおいて、近傍ボクセルとの体節ラベルの重み付き平均（重みは対象ボクセルからの距離に応じて小さくする）を計算して、体節部全体としての整合化を図る。また、体節のバウンディングボックスには重なりがあり、重なった領域に属するボクセルは複数種類の体節ラベルが付されることになる。そこで、特定部７０は、あるボクセルにおいて、最も高い確率をもつ体節ラベルをそのボクセルの体節ラベルとして決定する。 Since each body segment part exists as one lump, there is a high possibility that the labels of a certain voxel and its neighboring voxels are the same. Therefore, the specifying unit 70 calculates the weighted average of the body segment labels with neighboring voxels (the weight is reduced according to the distance from the target voxel) in each voxel, and performs matching as the whole body segment unit. Plan. In addition, the bounding box of the body segment has an overlap, and voxels belonging to the overlapped region are given a plurality of types of body segment labels. Therefore, the specifying unit 70 determines a body segment label having the highest probability in a certain voxel as a body segment label of the voxel.

ステップＳ４３において、最終フレームに到達した場合（ステップＳ４３でＹＥＳ）、処理が終了され、最終フレームに到達しない場合（ステップＳ４３でＮＯ）、処理がステップＳ３１に戻され、次フレームに対して同様の処理が実行される。以上により本形状解析装置の目的が達成される。 In step S43, if the final frame has been reached (YES in step S43), the process ends. If the final frame has not been reached (NO in step S43), the process returns to step S31, and the same applies to the next frame. Processing is executed. Thus, the object of the shape analysis apparatus is achieved.

次に、本形状解析装置の有効性を確認するために行った実験について説明する。本実験では、形状変化の大きい着物を身に付けた人物の舞踊動作を観測対象とした。観測対象を囲むように天井に設置された７台の同期撮影カメラ（Ｐｏｉｎｔｇｒｅｙ社Ｆｌｅａ：１０２４×７６８ｐｉｘｅｌ、８ｂｉｔｂａｙｅｒｐａｔｔｅｒｎ）により３０ｆｐｓで撮影される時系列画像集合を利用した。全ての処理はＯｐｔｅｒｏｎ１．８ＧＨｚのＰＣで行った。 Next, an experiment conducted for confirming the effectiveness of the shape analysis apparatus will be described. In this experiment, the subject of observation was the dance movement of a person wearing a kimono with a large shape change. A time-series image set photographed at 30 fps by seven synchronous photographing cameras (Pointgley Flea: 1024 × 768 pixels, 8-bit Bayer pattern) installed on the ceiling so as to surround the observation target was used. All treatments were performed on an Opteron 1.8 GHz PC.

学習データ中に含まれる舞踊動作は、ある一人のみの観測データのみとした。被験者はビデオを見て舞踊動作を習得した。学習データは１０００フレームからなる時系列ボリュームデータによって構成され、低解像度な全身ボリュームデータ及び高解像度な各体節ボリュームデータのボクセルサイズは世界座標系でそれぞれ６０ｍｍ^３、２０ｍｍ^３とした。 The dance motion included in the learning data was limited to observation data of only one person. The subjects learned the dance by watching the video. Training data is constituted by time-series volume data consisting of 1000 frames, the voxel size of the low-resolution whole-body volume data and high-resolution each body segment volume data was in the world coordinate system and 60 mm ^3, 20 mm ^3, respectively.

学習空間の次元数ｄ_ｌは主成分分析で得られる固有値の累積寄与率ａ（式６）を基に決定する。 The number of dimensions d ₁ of the learning space is determined based on the cumulative contribution rate a (Equation 6) of eigenvalues obtained by principal component analysis.

本実験では、十分な形状解析率を保てる最小の次元数として累積寄与率が７５％を満たす次元数を各学習空間の次元数とした。図１１は、次元数に応じた累積寄与率の変化を示している。 In this experiment, the number of dimensions in which the cumulative contribution rate satisfies 75% as the minimum number of dimensions that can maintain a sufficient shape analysis rate was defined as the number of dimensions of each learning space. FIG. 11 shows a change in the cumulative contribution rate according to the number of dimensions.

学習データとしては上述したものを利用して形状解析を行った。被験者は二人で、うち一人は学習データと同一人物であった。二人ともに、学習データ中の着物と似た形状・素材の着物を着用して、学習データと同じビデオを見て覚えた舞踊動作を行った。ただし、学習データに含まれる被験者（１５５ｃｍ）ともう一人の被験者（１７５ｃｍ）は大きく対角が異なるため、後者の視体積交差法の結果は身長の比に合わせて全体形状をサイズ変更して学習固有空間に投影した。 As the learning data, the shape analysis was performed using the above-described data. There were two subjects, one of whom was the same person as the learning data. They both wore a kimono with a shape and material similar to the kimono in the learning data, and performed the dance action that they learned from watching the same video as the learning data. However, since the subject (155 cm) and the other subject (175 cm) included in the learning data are greatly different in diagonal, the latter result of the visual volume intersection method is learned by resizing the overall shape according to the height ratio. Projected to eigenspace.

形状解析結果の例を図１２に示す。図１２おいて、１〜４段目はそれぞれ観測画像、入力の低解像度ボリューム、入力の高解像度ボリュームを示している。全フレームにわたり、低解像度解析では手先などの細かな形状を無視した探索が行なわれ、高解像度解析では低解像度解析で無視された細かな形状まで正しく復元・体節特定できている。また、人体の運動による着衣の大きな変形に対しても、安定に各体節への分割ができていることも確認できる。また、図１３に示すように、入力のボリュームデータ中に含まれる偽のボリューム領域が、除去されていることが確認できる。図１４は学習データと違う被験者における実験結果を示している。図１４に示すように学習データと同一人物の実験結果と比較しても、大きく劣る点は見受けられない。更に、背景差分の失敗により視体積交差法の復元結果に大きな誤りが含まれる場合においても、正確な学習データとの比較により良好な復元結果が得られている例も確認できた（図１４の円で囲まれた領域）。 An example of the shape analysis result is shown in FIG. In FIG. 12, the 1st to 4th tiers indicate an observed image, an input low resolution volume, and an input high resolution volume, respectively. Throughout the entire frame, low-resolution analysis performs a search ignoring fine shapes such as the hand, while high-resolution analysis correctly restores and identifies the body segment to fine shapes ignored in low-resolution analysis. In addition, it can be confirmed that the body segment is stably divided even for a large deformation of the clothes due to the movement of the human body. Further, as shown in FIG. 13, it can be confirmed that the false volume area included in the input volume data is removed. FIG. 14 shows an experimental result in a subject different from the learning data. As shown in FIG. 14, even if compared with the experimental results of the same person as the learning data, there is no significant inferior point. Furthermore, even when a large error is included in the restoration result of the visual volume intersection method due to the failure of the background difference, an example in which a good restoration result is obtained by comparison with accurate learning data has also been confirmed (FIG. 14). Circled area).

通常の姿勢推定手法で対象とされているタイトな着衣でも実験を行った。学習データのフレーム数が４０００フレームである以外は，先に示した着物における実験と同様の条件で実験を行なった。形状解析結果の例を図１５に示す。着物の場合と同様に本形状解析装置による処理が有効であることが確認できる。 Experiments were also performed on tight clothing, which is the target of normal posture estimation methods. The experiment was performed under the same conditions as the experiment in the kimono shown above except that the number of frames of the learning data is 4000 frames. An example of the shape analysis result is shown in FIG. As in the case of kimono, it can be confirmed that the processing by the shape analysis apparatus is effective.

形状解析結果の実行速度は１フレームあたり約０．００４２秒であった。この実行速度には三次元復元に要する時間は含まれていない。しかし、「X. Wu, O. Takizawa, and T. Matsuyama, “Parallel Pipeline Volume Intersection for Real-Time 3D Shape Reconstruction on a PC Cluster,” in Proc. of The 4th IEEE International Conference on Computer Vision Systems (ICVS), 2006.」において、ＶＧＡサイズの画像集合から５ｍｍ^３解像度での視体積交差法がビデオレート（０．００３３秒／フレーム）で実行可能なことが示されているように、３次元形状解析に十分な解像度での復元が実時間実行可能になってきている。よって、この形状復元と本形状解析装置の手法による形状修正・体節特定の実行時間の合計をあわせても、ビデオレートに近い実行速度を実現できている。この結果、オンライン性が必要であるＨＣＬ、ロボットインタラクション、能動カメラ制御なども実現可能であるといえる。 The execution speed of the shape analysis result was about 0.0042 seconds per frame. This execution speed does not include the time required for three-dimensional restoration. However, “X. Wu, O. Takizawa, and T. Matsuyama,“ Parallel Pipeline Volume Intersection for Real-Time 3D Shape Reconstruction on a PC Cluster, ”in Proc. Of The 4th IEEE International Conference on Computer Vision Systems (ICVS). , 2006., it is shown that 3D shape analysis can be performed using a VGA-size image set and a visual volume intersection method at 5 mm ³ resolution can be performed at a video rate (0.0033 sec / frame). Restoration with sufficient resolution is now possible in real time. Therefore, even if this shape restoration and the total execution time for shape correction and body segment specification by the method of this shape analysis apparatus are combined, an execution speed close to the video rate can be realized. As a result, it can be said that HCL, robot interaction, active camera control, etc. that require onlineness can be realized.

このように、本形状解析装置によれば、任意の人体・着衣のオンライン形状解析に基づいて、復元誤差の除去および各体節部位の特定を同時に実行することが可能であり、詳細解析により得られる対象の高精度形状の時系列変化を体節ラベル付で事前に学習し、この正解学習データと入力データとの比較による形状解析を行い、この解析の結果、視体積交差法で得られる対象の３次元ボリュームの中から大きな復元誤差領域を除去し、修正された３次元ボリュームの中から定義済の１０種類の体節（非対象ラベルを除く）が対応する領域を得ることができる。 As described above, according to this shape analysis apparatus, it is possible to simultaneously perform the removal of the restoration error and the identification of each body segment part based on the online shape analysis of an arbitrary human body / clothing. The time series change of the high-precision shape of the target to be obtained is learned in advance with a body segment label, and the shape analysis is performed by comparing this correct answer data with the input data. As a result of this analysis, the object obtained by the visual volume intersection method A large restoration error area is removed from the three-dimensional volume, and an area corresponding to ten defined body segments (excluding non-target labels) can be obtained from the corrected three-dimensional volume.

本発明の実施の形態による形状解析装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the shape analysis apparatus by embodiment of this invention. 図１に示す形状解析装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the shape analysis apparatus shown in FIG. 本実施の形態で採用されるボリュームデータを示した図である。It is the figure which showed the volume data employ | adopted by this Embodiment. 学習データ生成処理を示すフローチャートである。It is a flowchart which shows a learning data generation process. 全身基準ボリュームデータが生成される過程を示した図である。It is the figure which showed the process in which whole body reference | standard volume data is produced | generated. 学習空間の一例を示した図である。It is the figure which showed an example of learning space. 偽のボリュームを示した図である。It is the figure which showed the false volume. 探索処理の効率化を説明する図である。It is a figure explaining efficiency improvement of search processing. 探索処理の効率化を説明する図である。It is a figure explaining efficiency improvement of search processing. 形状解析処理を示すフローチャートである。It is a flowchart which shows a shape analysis process. 次元数に応じた累積寄与率の変化を示したグラフである。It is the graph which showed the change of the accumulation contribution rate according to the number of dimensions. 実験結果を示す図である。It is a figure which shows an experimental result. 実験結果を示す図である。It is a figure which shows an experimental result. 実験結果を示す図である。It is a figure which shows an experimental result. 実験結果を示す図である。It is a figure which shows an experimental result.

Explanation of symbols

１１〜１ｍカメラ
２０基準ボリュームデータ生成部
３０簡略ボリュームデータ生成部
４０学習データ生成部
５０入力ボリュームデータ生成部
６０探索部
７０特定部
８０信頼度生成部
９０学習データ記憶部 11 to 1 m Camera 20 Reference volume data generation unit 30 Simplified volume data generation unit 40 Learning data generation unit 50 Input volume data generation unit 60 Search unit 70 Identification unit 80 Reliability generation unit 90 Learning data storage unit

Claims

Reference volume data, which is volume data of the reference object with a label indicating to which body segment each voxel belongs, is generated for each frame from a moving image of the reference object composed of a plurality of body segments. Reference volume data generation means;
Learning a plurality of reference projection point sequences obtained by projecting a plurality of reference volume data generated for each frame by the reference volume data generation means onto a learning space of a predetermined dimension lower than the number of voxels of the reference volume data Learning data generating means for generating as data,
Learning data storage means for preliminarily storing the reference volume data generated by the reference volume data generation means and the learning data generated by the learning data generation means;
A plurality of photographing means for photographing a moving image of an observation object composed of a plurality of body segments same as the reference object;
Each time a frame is acquired by the plurality of imaging means, input volume data generation means for generating input volume data that is volume data of the observation object using a visual volume intersection method;
The input volume data generated by the input volume data generation means is projected onto the learning space, and the nearest reference projection point, which is the reference projection point located closest to the projection point of the input volume data, is the learning space. Search means for searching within,
Identification means for identifying to which body segment each voxel of the input volume data belongs based on a label attached to each voxel of the reference volume data corresponding to the nearest reference projection point searched by the search means A shape analysis program characterized by causing a computer to function.

The learning data storage means stores in advance a reliability for removing false volume data included in the input volume data generated by the input volume data generation means,
The shape analysis program according to claim 1, wherein the search unit corrects the input volume data using the reliability, and projects the corrected input volume data onto the learning space.

The reference volume data generation means generates the reference volume data by correcting the volume data generated by the visual volume intersection method,
The reference volume data generated by the reference volume data generation means is arranged in a virtual three-dimensional space, and the reference volume is changed by the plurality of virtual cameras while changing the positional relationship between the plurality of virtual cameras and the reference volume data. Simplified volume data generating means for generating simplified volume data for each of the positional relationships using a view volume intersection method from a two-dimensional image obtained by photographing data;
By comparing each simplified volume data with the reference volume data, the occurrence location of false volume data in the simplified volume data is specified, and the computer is further functioned as a reliability generation means for generating the reliability. The shape analysis program according to claim 2, wherein:

The search means divides the learning space into partial areas of a certain size, searches for the nearest reference projection point in a target partial area to which a projection point of the input volume data belongs, and the reference projection in the target partial area The shape analysis program according to claim 1, wherein when there is no point, the size of the partial region is enlarged so that the reference projection point is detected.

The search means detects a reference projection point located closest to the projection point of the input volume data within the target partial region, and more than the distance between the detected reference projection point and the projection point of the input volume data, Identifying one or more adjacent partial areas with a short distance to the projection point of the input volume data, and detecting the nearest reference projection point by searching in the identified adjacent partial area and the target partial area The shape analysis program according to claim 4, wherein:

The reference volume data generation means is a whole-body reference volume data which is volume data of the whole reference object, in which voxels are arranged at a predetermined low resolution, and a label indicating which body segment part each voxel belongs to is attached And a plurality of body segment reference volume data that are volume data of each body segment of the reference object in which voxels are arranged at a resolution higher than the low resolution, and are generated as the reference volume data,
The learning data generation means includes the whole body reference volume data and the plurality of body segment reference volume data generated by the reference volume data generation means in a whole body learning space and a learning space of each body segment part. By projecting, learning data of the whole body and each body segment is generated,
The input volume data generation means includes first whole body input volume data that is volume data of the entire observation object in which voxels are arranged at the low resolution, and the entire observation object in which voxels are arranged at the high resolution. 2nd whole body input volume data which is volume data of
The searching means estimates the first whole body input volume data relative to the whole body reference volume data by projecting the first whole body input volume data onto a whole body learning space, and the first whole body input volume data is estimated in the estimated direction. The whole body input volume data of 2 is rotated, and based on the label attached to each voxel of the whole body reference volume data, the volume data of each body segment of the observation object is obtained from the rotated second whole body input volume data. A plurality of somite input volume data is generated, and each somite input volume data is projected onto a corresponding learning space to search for the nearest reference projection point for each somite part,
The specifying means is based on the body segment reference volume corresponding to the nearest reference projection point for each body segment searched by the search means, to which body segment each voxel of the second whole body input volume data is located. It is specified whether it belongs or not. The shape analysis program according to any one of claims 1 to 5.

The search means projects the first whole body input volume data to the whole body learning space each time the first whole body input volume data is rotated by a predetermined angle with reference to a predetermined direction in the virtual three-dimensional space, and the first distance becomes the shortest. 7. The shape analysis program according to claim 6, wherein the direction of the input volume data is estimated by searching for a projection point of the whole body input volume data and a reference projection point.

The shape according to claim 1, wherein the learning data generation unit and the search unit project the reference volume data and the input volume data onto the learning space using principal component analysis. Analysis program.

The reference volume data generation means specifies a body segment part to which each voxel of the reference volume data belongs using a moving image of a reference object with a different color for each body segment part. The shape analysis program according to claim 1.

The shape analysis program according to any one of claims 1 to 9, wherein the reference object and the observation object are persons wearing loose clothes.

Reference volume data, which is volume data of the reference object with a label indicating to which body segment each voxel belongs, is generated for each frame from a moving image of the reference object composed of a plurality of body segments. Reference volume data generation means;
Learning a plurality of reference projection point sequences obtained by projecting a plurality of reference volume data generated for each frame by the reference volume data generation means onto a learning space of a predetermined dimension lower than the number of voxels of the reference volume data Learning data generating means for generating as data,
Learning data storage means for preliminarily storing the reference volume data generated by the reference volume data generation means and the learning data generated by the learning data generation means;
A plurality of photographing means for photographing a moving image of an observation object composed of a plurality of body segments same as the reference object;
Each time a frame is acquired by the plurality of imaging means, input volume data generation means for generating input volume data that is volume data of the observation object using a visual volume intersection method;
The input volume data generated by the input volume data generation means is projected onto the learning space, and the nearest reference projection point, which is the reference projection point located closest to the projection point of the input volume data, is the learning space. Search means for searching within,
Identification means for identifying to which body segment each voxel of the input volume data belongs based on a label attached to each voxel of the reference volume data corresponding to the nearest reference projection point searched by the search means A shape analysis apparatus comprising: