JP2013120556A

JP2013120556A - Object attribute estimation device and video plotting device

Info

Publication number: JP2013120556A
Application number: JP2011269256A
Authority: JP
Inventors: Masaru Sugano; 勝菅野; Hitoshi Naito; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-12-08
Filing date: 2011-12-08
Publication date: 2013-06-17
Anticipated expiration: 2031-12-08
Also published as: JP5795250B2

Abstract

PROBLEM TO BE SOLVED: To estimate the attitude of an object in a camera video on a three-dimensional model, and to, even in the case of displaying a video when it is viewed from a virtual point of view in a display environment where an immersion sensation is high, plot a video which is not geographically unnatural but three-dimensional.SOLUTION: A silhouette extraction unit 11 extracts a silhouette of an object from a two-dimensional model of the object projected on a camera video, and a silhouette weight map generation unit 12 generates a silhouette weight map. A similarity calculation unit 14 calculates the similarity of a skeleton model generated from motion capture data and the silhouette weight map, and an object attitude on a three-dimensional model is estimated from the motion capture data whose similarity is the maximum. A three-dimensional model is searched for from the motion capture data whose similarity is the maximum, and an object video when it is viewed from a virtual point of view is displayed on an immersion type display by using the three-dimensional model.

Description

本発明は、被写体姿勢推定装置および映像描画装置に関し、特に、単一または複数の単眼カメラの映像から生成された被写体の２次元モデルを基に被写体姿勢を推定する被写体姿勢推定装置、および該被写体姿勢推定装置により推定された被写体姿勢を用いて立体感のある映像を描画する映像描画装置に関する。 The present invention relates to a subject posture estimation device and a video drawing device, and in particular, a subject posture estimation device that estimates a subject posture based on a two-dimensional model of a subject generated from a single or a plurality of monocular camera images, and the subject The present invention relates to a video drawing apparatus that draws a stereoscopic image using a subject posture estimated by a posture estimation device.

非特許文献１には、サッカースタジアムなどの空間において、仮想的な視点から観た被写体映像を合成する際、被写体を平面的な矩形でモデル化し、ある視点から観た場合の被写体の空間位置に基づいて被写体映像を描画する技術が開示されている。 In Non-Patent Document 1, when a subject video viewed from a virtual viewpoint is synthesized in a space such as a soccer stadium, the subject is modeled as a flat rectangle, and the spatial position of the subject when viewed from a certain viewpoint is set. A technique for drawing a subject video based on the above is disclosed.

非特許文献２には、複数のカメラにより取得された被写体映像を基に、被写体の関節位置と部位を推定することによって、視点が変更された場合の被写体映像を描画する技術が開示されている。 Non-Patent Document 2 discloses a technique for drawing a subject video when a viewpoint is changed by estimating joint positions and parts of the subject based on subject videos acquired by a plurality of cameras. .

非特許文献３には、単眼カメラにより取得され被写体のシルエット画像から形状記述子を抽出し、関節角との非線形回帰によって被写体姿勢を推定する技術が開示されている。 Non-Patent Document 3 discloses a technique for extracting a shape descriptor from a silhouette image of a subject acquired by a monocular camera and estimating a subject posture by nonlinear regression with a joint angle.

K. Hayashiほか, "Synthesizing Free-viewpoint Images from Multiple View Videos in Soccer Stadium", CGIV 2006K. Hayashi et al., "Synthesizing Free-viewpoint Images from Multiple View Videos in Soccer Stadium", CGIV 2006 M. Germannほか, "Articulated Billboards for Video-based Rendering", EUROGRAPHICS 2010M. Germann et al., "Articulated Billboards for Video-based Rendering", EUROGRAPHICS 2010 A. Agarwalほか, "Recovering 3D Human Pose from Monocular Images", PAMI 28(2006)A. Agarwal et al., "Recovering 3D Human Pose from Monocular Images", PAMI 28 (2006)

非特許文献１に開示されている技術では、被写体を平面的な矩形でモデル化するので、視点の位置によっては単に平面を傾斜させたような被写体映像しか得られず、描画される被写体映像が不自然になるという課題がある。 In the technology disclosed in Non-Patent Document 1, since the subject is modeled as a flat rectangle, depending on the position of the viewpoint, only a subject image in which the plane is inclined can be obtained. There is a problem of becoming unnatural.

非特許文献２に開示されている技術では、被写体の関節位置と部位を推定するので、視点の変更に対してより自然な被写体映像を描画することができる。しかし、被写体映像を描画するときの表示環境が平面的なものでなく、没入感の高いものである場合、被写体の奥行きの欠如により平面的な被写体映像の描画しか得られないという課題がある。なお、没入感が高い表示環境とは、映像を見ているユーザが、恰も、その映像の実空間内にいるように感じる表示環境のことであり、例えば、CAVE(Cave Automatic Virtual Environment)のような没入型投影ディスプレイなどを用いて実現される。 In the technique disclosed in Non-Patent Document 2, since the joint position and part of the subject are estimated, a more natural subject video can be drawn with respect to a change in viewpoint. However, if the display environment when drawing the subject video is not flat but highly immersive, there is a problem that only the flat subject video can be drawn due to the lack of depth of the subject. Note that a highly immersive display environment is a display environment in which a user viewing a video feels like he is in the real space of the video, such as CAVE (Cave Automatic Virtual Environment). This is achieved using an immersive projection display.

非特許文献３に開示されている技術では、非線形回帰を利用するので、被写体姿勢の推定精度を高めるためには大量の学習データが必要となるという課題がある。 Since the technique disclosed in Non-Patent Document 3 uses nonlinear regression, there is a problem that a large amount of learning data is required to improve the estimation accuracy of the subject posture.

本発明の目的は、上記課題を解決し、カメラ映像における3次元モデル上での被写体姿勢を推定できる被写体姿勢推定装置を提供し、さらに、推定された被写体姿勢を基に、没入感の高い表示環境で、仮想的な視点から観たときの被写体映像を描画する場合でも、視点の位置に応じて幾何学的に不自然でなく、かつ立体感のある映像を描画できる映像描画装置を提供することにある。 An object of the present invention is to provide a subject posture estimation device that can solve the above-described problems and can estimate a subject posture on a three-dimensional model in a camera image, and further displays an immersive feeling based on the estimated subject posture. Provided is a video rendering device capable of rendering a stereoscopic image that is not geometrically unnatural according to the position of the viewpoint even when rendering a subject video when viewed from a virtual viewpoint in an environment. There is.

上記課題を解決するため、本発明に係る被写体姿勢推定装置は、カメラ映像から生成された被写体の２次元モデルを入力とし、該２次元モデルから被写体のシルエットを抽出するシルエット抽出手段と、前記シルエット抽出手段により抽出されたシルエットに重み付けしたシルエット重みマップを生成するシルエット重みマップ生成手段と、被写体が動いたときの種々の姿勢におけるモーションキャプチャデータを予め格納しているモーションキャプチャデータ格納手段と、前記シルエット重みマップ生成手段により生成されたシルエット重みマップと前記モーションキャプチャデータ格納手段に格納されているモーションキャプチャデータの類似度を算出する類似度算出手段を具備し、前記類似度算出手段により算出された類似度が最大となるモーションキャプチャデータから、３次元モデル上での被写体姿勢を推定することに第１の特徴がある。 In order to solve the above-described problem, a subject posture estimation apparatus according to the present invention receives, as an input, a two-dimensional model of a subject generated from a camera image, and extracts a silhouette of the subject from the two-dimensional model, and the silhouette Silhouette weight map generation means for generating a silhouette weight map weighted to the silhouette extracted by the extraction means, motion capture data storage means for storing motion capture data in various postures when the subject moves, It has similarity calculation means for calculating the similarity between the silhouette weight map generated by the silhouette weight map generation means and the motion capture data stored in the motion capture data storage means, and is calculated by the similarity calculation means The similarity is maximum From that motion capture data, there is a first feature to estimate the object position in the three-dimensional model.

また、本発明に係る被写体姿勢推定装置は、複数のカメラ映像のそれぞれから生成された２次元モデルを入力として被写体を同定する被写体同定手段と、前記被写体同定手段により同定された被写体の２次元モデルを統合する被写体モデル統合手段を具備し、前記シルエット抽出手段は、前記被写体モデル統合手段により統合された２次元モデルを入力として被写体の２次元モデルを抽出することに第２の特徴がある。 In addition, a subject posture estimation apparatus according to the present invention includes a subject identification unit that identifies a subject using a two-dimensional model generated from each of a plurality of camera images as input, and a two-dimensional model of the subject identified by the subject identification unit. A second feature is that the silhouette extracting means extracts the two-dimensional model of the subject using the two-dimensional model integrated by the subject model integrating means as an input.

また、本発明に係る被写体姿勢推定装置は、さらに、前記モーションキャプチャデータ格納手段に格納されているモーションキャプチャデータからスケルトンモデルを生成するスケルトンモデル生成手段を具備し、前記類似度測定手段は、前記スケルトンモデル生成手段により生成されたスケルトンモデルを用いて、シルエット重みマップとモーションキャプチャデータの類似度を算出することに第３の特徴がある。 The subject posture estimation apparatus according to the present invention further includes a skeleton model generation unit that generates a skeleton model from the motion capture data stored in the motion capture data storage unit, and the similarity measurement unit includes the similarity measurement unit The third feature is that the similarity between the silhouette weight map and the motion capture data is calculated using the skeleton model generated by the skeleton model generation means.

また、本発明に係る被写体姿勢推定装置は、前記シルエット重みマップ生成手段が、抽出手段により抽出されたシルエットに、シルエット中心部分が最も大きく、辺縁部分に向かって減少する重みを付与することに第４の特徴がある。 In the subject posture estimation apparatus according to the present invention, the silhouette weight map generation unit assigns a weight that has the largest silhouette center portion and decreases toward the edge portion to the silhouette extracted by the extraction unit. There is a fourth feature.

また、本発明に係る被写体姿勢推定装置は、前記類似度算出手段が、シルエット重みマップとモーションキャプチャデータのスケールおよび重心を一致させて類似度を算出することに第５の特徴がある。 The subject posture estimation apparatus according to the present invention has a fifth feature in that the similarity calculation unit calculates the similarity by matching the scale and the center of gravity of the silhouette weight map and the motion capture data.

また、本発明に係る被写体姿勢推定装置は、前記類似度算出手段が、さらに、時間的に連続したフレーム画像のカメラ映像のシルエット重みマップとモーションキャプチャデータの類似度を算出する際に、直前のフレーム画像に対して算出された類似度が、予め設定された一定閾値以上のモーションキャプチャデータのみを類似度算出対象とすることに第６の特徴がある。 In the subject posture estimation apparatus according to the present invention, when the similarity calculation unit further calculates the similarity between the silhouette weight map of the camera video of the temporally continuous frame image and the motion capture data, A sixth feature is that only the motion capture data whose similarity calculated for the frame image is equal to or greater than a predetermined threshold value is set as a similarity calculation target.

また、本発明に係る被写体姿勢推定装置は、シルエットの抽出からシルエット重みマップとモーションキャプチャデータの類似度の算出までの処理を、連続的に入力されるカメラ映像に対してフレーム単位で離散的に実行し、上記処理が実行されないフレーム区間のカメラ映像に対しては時間的推移を最適化問題として被写体姿勢を補間して推定することに第７の特徴がある。 Further, the subject posture estimation apparatus according to the present invention performs processing from silhouette extraction to calculation of similarity between silhouette weight map and motion capture data discretely in units of frames for continuously input camera images. A seventh feature is that, for a camera image in a frame section that is executed and in which the above processing is not executed, the object posture is interpolated and estimated as an optimization problem with the temporal transition.

さらに、本発明に係る被写体姿勢推定装置は、前記被写体同定手段が、カメラ映像間での特徴点の対応付けにより、複数のカメラ映像間での被写体を同定することに第８の特徴がある。 Furthermore, the subject posture estimation apparatus according to the present invention has an eighth feature in that the subject identifying means identifies a subject between a plurality of camera images by associating feature points between the camera images.

また、本発明に係る映像描画装置は、上記の被写体姿勢推定装置と、被写体についてのモーションキャプチャデータに対する３次元モデルを予め格納している３次元モデル格納部と、前記被写体姿勢推定装置により類似度が最大とされたモーションキャプチャデータに対する３次元モデルを前記３次元モデル格納部から探索する３次元モデル探索部と、前記３次元モデル探索部により探索された３次元モデルを、指定された視点および方向とカメラ映像の２次元モデルの位置および方向に応じて変換した後、被写体の２次元モデルと置換し、さらに、カメラ映像における被写体のテクスチャをマッピングして映像を生成する映像描画部と、前記映像描画部により生成された映像を表示する没入型ディスプレイを具備することに特徴がある。 In addition, the video rendering apparatus according to the present invention includes the above-described subject posture estimation device, a three-dimensional model storage unit that stores a three-dimensional model for motion capture data about the subject in advance, and the subject posture estimation device. A three-dimensional model search unit that searches the three-dimensional model storage unit for a three-dimensional model for the motion capture data for which the maximum is captured, and the three-dimensional model searched by the three-dimensional model search unit with a specified viewpoint and direction A video rendering unit that converts the two-dimensional model of the camera video according to the position and direction of the camera video, replaces the two-dimensional model of the subject, and maps the texture of the subject in the camera video to generate a video; The present invention is characterized by comprising an immersive display for displaying the video generated by the drawing unit.

本発明に係る被写体姿勢推定装置によれば、カメラ映像における被写体の2次元モデルのシルエットとモーションキャプチャデータの類似度を算出し、類似度が最大のモーションキャプチャデータの姿勢を被写体姿勢と推定するので、単一または複数の単眼カメラからの映像であっても、3次元モデル上での被写体姿勢を推定できる。 According to the subject posture estimation apparatus of the present invention, the similarity between the silhouette of the two-dimensional model of the subject in the camera image and the motion capture data is calculated, and the posture of the motion capture data with the maximum similarity is estimated as the subject posture. The object posture on the 3D model can be estimated even for images from a single or multiple monocular cameras.

また、本発明に係る映像描画置によれば、被写体姿勢推定装置により推定された3次元モデル上での被写体姿勢を基に3次元モデルを検索し、該3次元モデルを用いて映像を描画するので、没入感の高い表示環境で、仮想的な視点から観たときの映像を描画する場合でも、幾何学的に不自然でなく、かつ立体感のある映像を描画することができる。 Further, according to the video rendering device of the present invention, a 3D model is searched based on the subject orientation on the 3D model estimated by the subject orientation estimation device, and an image is rendered using the 3D model. Therefore, even in the case of drawing an image viewed from a virtual viewpoint in a highly immersive display environment, it is possible to draw an image that is not geometrically unnatural and has a stereoscopic effect.

本発明に係る被写体姿勢推定装置の一実施形態を示す機能ブロック図である。It is a functional block diagram which shows one Embodiment of the to-be-photographed object attitude | position estimation apparatus which concerns on this invention. シルエットと重みの関係の例を示す図である。It is a figure which shows the example of the relationship between a silhouette and a weight. シルエット重みマップとスケルトンモデルのオーバーラップを示すイメージ図である。It is an image figure which shows the overlap of a silhouette weight map and a skeleton model. 連続するカメラ映像(フレーム画像)についてのシルエット重みマップとスケルトンモデルの類似度の具体例を示す図である。It is a figure which shows the specific example of the similarity of the silhouette weight map and skeleton model about a continuous camera image | video (frame image). 本発明に係る被写体姿勢推定装置の第２実施形態を示す機能ブロック図である。It is a functional block diagram which shows 2nd Embodiment of the to-be-photographed object attitude | position estimation apparatus which concerns on this invention. 2つの2次元モデルの統合処理を概念的に示す図である。It is a figure which shows notionally the integration process of two two-dimensional models. 本発明に係る映像描画装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the image drawing apparatus which concerns on this invention.

以下、図面を参照して本発明を説明する。図１は、本発明に係る被写体姿勢推定装置の第１実施形態を示す機能ブロック図である。 The present invention will be described below with reference to the drawings. FIG. 1 is a functional block diagram showing a first embodiment of a subject posture estimation apparatus according to the present invention.

第１実施形態の被写体姿勢推定装置10は、シルエット抽出部11、シルエット重みマップ生成部12、モーションキャプチャデータ格納部13、スケルトンモデル生成部14および類似度測定部15を備える。 The subject posture estimation apparatus 10 of the first embodiment includes a silhouette extraction unit 11, a silhouette weight map generation unit 12, a motion capture data storage unit 13, a skeleton model generation unit 14, and a similarity measurement unit 15.

シルエット抽出部11には、2次元モデル生成部16から被写体の2次元モデルが入力される。2次元モデル生成部16は、単眼カメラ(以下、単にカメラと称する)により取得されたカメラ映像を入力とし、カメラ映像に映っている被写体の2次元モデルを生成する。2次元モデルは、カメラ映像に映っている被写体を単一の長方形ポリゴンでモデル化することにより生成できる。2次元モデルの生成は、例えば、非特許文献１に開示されている手法や特開2011-170487号公報に開示されている手法などで実現できる。 The silhouette extraction unit 11 receives the two-dimensional model of the subject from the two-dimensional model generation unit 16. The two-dimensional model generation unit 16 receives a camera image acquired by a monocular camera (hereinafter simply referred to as a camera) and generates a two-dimensional model of a subject shown in the camera image. A two-dimensional model can be generated by modeling a subject shown in a camera image with a single rectangular polygon. The generation of the two-dimensional model can be realized by, for example, the method disclosed in Non-Patent Document 1 or the method disclosed in Japanese Patent Laid-Open No. 2011-170487.

ある空間内の仮想的な視点から観たときの映像を描画する場合、ディスプレイが平面ディスプレイならば、奥行きを持たない2次元モデルから被写体映像を生成して描画してもあまり問題はない。しかし、ディスプレイが没入型投影ディスプレイの場合、2次元モデルから生成された被写体映像は、薄い平面の板が配置されているように見えるため、不自然になる。すなわち、没入型投影ディスプレイで映像を表示する場合、2次元モデル上で被写体姿勢を推定すると、十分に没入感のある映像を表示できない。 When drawing an image viewed from a virtual viewpoint in a certain space, if the display is a flat display, there is not much problem if a subject image is generated and drawn from a two-dimensional model having no depth. However, when the display is an immersive projection display, the subject image generated from the two-dimensional model appears unnatural because it appears to be a thin flat plate. That is, when displaying an image on an immersive projection display, if the subject posture is estimated on the two-dimensional model, an image with a sufficiently immersive feeling cannot be displayed.

そこで、第１実施形態では、シルエット重みマップ生成部12、モーションキャプチャデータ格納部13、スケルトンモデル生成部14および類似度測定部15を備え、3次元モデル上での被写体姿勢を推定するようにしている。 Therefore, in the first embodiment, a silhouette weight map generation unit 12, a motion capture data storage unit 13, a skeleton model generation unit 14, and a similarity measurement unit 15 are provided to estimate the subject posture on the three-dimensional model. Yes.

シルエット抽出部11は、2次元モデル生成部16により生成された2次元モデルから被写体のシルエットを抽出する。 The silhouette extraction unit 11 extracts the silhouette of the subject from the 2D model generated by the 2D model generation unit 16.

シルエット重みマップ生成部12は、シルエット抽出部11により抽出されたシルエットに重み付けしてシルエット重みマップを生成する。シルエット重みマップでは、例えば、シルエットの中心部分の重みが最も大きく、中心部分から辺縁部分に向かって重みが減少するように重みを付す。なお、シルエット重みマップは、シルエットを単に細線化したものでもよい。 The silhouette weight map generation unit 12 generates a silhouette weight map by weighting the silhouette extracted by the silhouette extraction unit 11. In the silhouette weight map, for example, the weight is assigned so that the weight of the center portion of the silhouette is the largest and the weight decreases from the center portion toward the edge portion. Note that the silhouette weight map may be obtained by simply thinning the silhouette.

図２は、シルエットと重みの関係の例を示す。ここでは、断面A-A′での重みを示している。同図に示すように、シルエット中心部分b,c,fから辺縁部分a,d,e,gに向かって重みを減少させる。これに加えて、図の上下方向で、シルエットの中心部分から離れる従って重みを減少させてもよい。なお、シルエットに対する重み付けの方向は、任意に設定できる。 FIG. 2 shows an example of the relationship between silhouettes and weights. Here, the weight in the cross section A-A ′ is shown. As shown in the figure, the weight is decreased from the silhouette center part b, c, f toward the edge part a, d, e, g. In addition to this, in the vertical direction of the figure, the weight may be reduced as it moves away from the central portion of the silhouette. The weighting direction for the silhouette can be set arbitrarily.

モーションキャプチャデータ格納部13は、3次元モデルとしてのモーションキャプチャデータを予め格納している。被写体が人物の場合、モーションキャプチャデータは、人体の関節モデルとなる。 The motion capture data storage unit 13 stores motion capture data as a three-dimensional model in advance. When the subject is a person, the motion capture data is a joint model of the human body.

スケルトンモデル生成部14は、モーションキャプチャデータ格納部13に格納されているモーションキャプチャデータから3次元モデルとしてのスケルトンモデルを生成する。モーションキャプチャデータが人体の関節モデルである場合、関節間を肢に相当する線分で連結することにより、スケルトンモデルを生成できる。また、スケルトンモデルには、適宜肉付けしてもよい。 The skeleton model generation unit 14 generates a skeleton model as a three-dimensional model from the motion capture data stored in the motion capture data storage unit 13. When the motion capture data is a joint model of a human body, a skeleton model can be generated by connecting joints with line segments corresponding to limbs. Further, the skeleton model may be appropriately fleshed.

類似度算出部15は、シルエット重みマップ生成部12により生成されたシルエット重みマップとスケルトンモデル生成部14により生成されたスケルトンモデルの類似度を算出し、シルエット重みマップに最も類似度が高いスケルトンモデルのモーションキャプチャデータを送出する。これにより送出されるモーションキャプチャデータは、入力されたカメラ映像における被写体の、3次元モデル上での姿勢を表す。 The similarity calculation unit 15 calculates the similarity between the silhouette weight map generated by the silhouette weight map generation unit 12 and the skeleton model generated by the skeleton model generation unit 14, and the skeleton model having the highest similarity to the silhouette weight map Send motion capture data. The motion capture data sent out in this way represents the posture of the subject in the input camera image on the three-dimensional model.

シルエット重みマップとスケルトンモデルの類似度は、シルエット重みマップに対して3次元モデルとしてのスケルトンモデルの重心およびスケールを一致させ、シルエット重みマップとスケルトンモデルのオーバーラップを評価することにより算出できる。なお、シルエットが単に細線化されたシルエット重みマップの場合には、一定範囲内のずれを許容してオーバーラップを評価する。 The similarity between the silhouette weight map and the skeleton model can be calculated by matching the gravity center and scale of the skeleton model as a three-dimensional model with the silhouette weight map and evaluating the overlap between the silhouette weight map and the skeleton model. In the case of a silhouette weight map in which the silhouette is simply thinned, the overlap is evaluated while allowing a deviation within a certain range.

例えば、被写体が人物の場合、シルエット重みマップとスケルトンモデルの類似度は、スケルトンモデルに含まれる関節とシルエット重みマップがどの程度オーバーラップしているかを、関節の数と重みの関連(オーバーラップしている関節における重みの積算)から算出できる。また、類似度の評価には、オーバーラップしている関節の数だけでなく、関節間の線分長さを用いてもよい。 For example, if the subject is a person, the similarity between the silhouette weight map and the skeleton model is the degree of overlap between the joints included in the skeleton model and the silhouette weight map. It can be calculated from the sum of weights in the joints that are present. Further, in the evaluation of similarity, not only the number of overlapping joints but also the length of a line segment between joints may be used.

図３は、シルエット重みマップ生成部12により生成されたシルエット重みマップとスケルトンモデル生成部13により生成されたスケルトンモデルのオーバーラップを示すイメージ図である。ここでは、シルエットの中心部分の重みが最も大きく、中心部分から離れるに従って重みが減少するシルエット重みマップ(図の白抜き部分)と、人体の関節モデルの関節(図では白丸で示す)間を線分で連結して生成されたスケルトンモデルを示している。 FIG. 3 is an image diagram showing an overlap between the silhouette weight map generated by the silhouette weight map generation unit 12 and the skeleton model generated by the skeleton model generation unit 13. Here, a line is drawn between the silhouette weight map (white part in the figure) where the weight of the center part of the silhouette is the largest and the weight decreases as it moves away from the center part, and the joint of the human joint model (indicated by white circles in the figure) It shows a skeleton model generated by concatenating with minutes.

図４は、あるカメラ映像(フレーム画像)についてのシルエット重みマップと、「歩行動作」に関する連続したスケルトンモデルの類似度の具体例を示す図である。図４では、あるフレーム画像から生成されたシルエット重みマップに対する、シーケンス番号0〜120のスケルトンモデル(図示下部)の類似度を示している。 FIG. 4 is a diagram illustrating a specific example of the silhouette weight map for a certain camera video (frame image) and the similarity of consecutive skeleton models related to “walking motion”. FIG. 4 shows the similarity of the skeleton models (lower part in the figure) with sequence numbers 0 to 120 with respect to the silhouette weight map generated from a certain frame image.

以上のように、類似度算出部15は、シルエット重みマップ生成部12により生成されたシルエット重みマップとスケルトンモデル生成部14により生成されたスケルトンモデルの類似度が最も高いスケルトンモデルのモーションキャプチャデータを送出する。これにより送出されたモーションキャプチャデータから、入力されたカメラ映像における被写体の、3次元モデル上での姿勢が推定される。 As described above, the similarity calculation unit 15 obtains the motion capture data of the skeleton model having the highest similarity between the silhouette weight map generated by the silhouette weight map generation unit 12 and the skeleton model generated by the skeleton model generation unit 14. Send it out. Thus, the posture of the subject in the input camera image on the three-dimensional model is estimated from the motion capture data transmitted.

図５は、本発明に係る被写体姿勢推定装置の第２実施形態を示す機能ブロック図である。なお、図５において図１と同一または同等部分には同じ番号を付している。 FIG. 5 is a functional block diagram showing a second embodiment of the subject posture estimation apparatus according to the present invention. 5 that are the same as or equivalent to those in FIG.

第２実施形態では、ある空間内の異なる位置に配設された２つのカメラにより被写体を撮影し、これらのカメラ映像を用いて3次元モデル上での被写体姿勢を推定する。すなわち、２つのカメラにより取得されたカメラ映像から生成された２つの被写体の2次元モデルを入力とし、空間内の、実際のカメラ位置と異なる仮想的な視点から観たときの被写体の、3次元モデル上での姿勢を推定する。 In the second embodiment, a subject is photographed by two cameras arranged at different positions in a certain space, and the posture of the subject on the three-dimensional model is estimated using these camera images. In other words, two-dimensional models of two subjects generated from camera images acquired by two cameras are used as inputs, and the three-dimensional of the subject when viewed from a virtual viewpoint different from the actual camera position in space. Estimate the posture on the model.

第２実施形態の被写体姿勢推定装置10は、被写体同定部17、被写体モデル統合部18、シルエット抽出部11、シルエット重みマップ生成部12、モーションキャプチャデータ格納部13、スケルトンモデル生成部14および類似度測定部15を備える。 The subject posture estimation apparatus 10 of the second embodiment includes a subject identification unit 17, a subject model integration unit 18, a silhouette extraction unit 11, a silhouette weight map generation unit 12, a motion capture data storage unit 13, a skeleton model generation unit 14, and a similarity A measurement unit 15 is provided.

被写体同定部17および被写体モデル統合部18には、2次元モデル生成部16-1,16-2から被写体の2次元モデルが入力される。2次元モデル生成部16-1は、ある空間内の第１の位置に配設されたカメラにより取得されたカメラ映像1を入力とし、カメラ映像1に映っている被写体の2次元モデルを生成する。2次元モデル生成部16-2は、ある空間内の、第１の位置とは異なる第２の位置に配設されたカメラにより取得されたカメラ映像2を入力とし、カメラ映像2に映っている被写体の2次元モデルを生成する。 The subject identification unit 17 and the subject model integration unit 18 receive the two-dimensional model of the subject from the two-dimensional model generation units 16-1 and 16-2. The two-dimensional model generation unit 16-1 receives a camera image 1 acquired by a camera arranged at a first position in a certain space as an input, and generates a two-dimensional model of the subject shown in the camera image 1 . The two-dimensional model generation unit 16-2 receives a camera image 2 acquired by a camera disposed at a second position different from the first position in a certain space, and is reflected in the camera image 2. Generate a 2D model of the subject.

被写体同定部17は、カメラ映像1,2の時間的に同期しているフレーム画像からそれぞれ抽出される特徴点を対応付けることにより、カメラ映像1,2のフレーム画像間での被写体を同定する。フレーム画像から抽出する特徴点としては、例えば、SIFT(Scale Invariant Feature Transform)を利用できるが、被写体の特徴を示すものならどのようなものでもよく、色を利用することもできる。 The subject identifying unit 17 identifies subjects between the frame images of the camera videos 1 and 2 by associating feature points extracted from the temporally synchronized frame images of the camera videos 1 and 2, respectively. For example, SIFT (Scale Invariant Feature Transform) can be used as the feature point extracted from the frame image. However, any feature point can be used as long as it shows the feature of the subject, and color can also be used.

被写体の識別子を定義し、カメラ映像1,2に映っている同一の被写体には同一の識別子を付して、カメラ映像1,2の画像フレームと被写体の関係をテーブルとして作成すれば、どの被写体がどのカメラ映像のどの画像フレームに映っているかを特定できる。 Define the subject identifier, attach the same identifier to the same subject shown in the camera images 1 and 2, and create a table that shows the relationship between the image frames of the camera images 1 and 2, and which subject Can be specified in which image frame of which camera image.

被写体モデル統合部18は、被写体同定部17により同一と見なされた被写体の2次元モデルを統合し、予め設定された視点からの2次元モデルを生成する。もちろん、カメラ映像1,2の一方だけに映っている被写体については、被写体の同定や統合は不要である。 The subject model integration unit 18 integrates the two-dimensional models of the subjects regarded as the same by the subject identification unit 17, and generates a two-dimensional model from a preset viewpoint. Of course, it is not necessary to identify or integrate the subject that appears in only one of the camera images 1 and 2.

被写体モデル統合部18での統合を行うことにより、カメラ映像1,2に同一の被写体が映っている場合でも、同一の被写体の2元モデルを統合的に扱うことができるようになる。なお、同一の被写体に対して、カメラ映像1,2から異なるスケールの被写体の2次元モデルが生成された場合には、これらを正規化してスケールが同一になるように変換してから統合する。 By performing integration in the subject model integration unit 18, even when the same subject is shown in the camera images 1 and 2, a binary model of the same subject can be handled in an integrated manner. When a two-dimensional model of a subject having a different scale is generated from the camera images 1 and 2 for the same subject, these are normalized and converted so as to have the same scale, and then integrated.

被写体の2次元モデルを統合は、例えば、それぞれのカメラと被写体の位置関係および予め設定された視点と被写体の位置関係から角度パラメータを求め、求められた角度パラメータに応じて、カメラ映像1,2から生成された2次元モデルを幾何変換(例えば、アフィン変換)し、幾何変換された2次元モデルの同一時間のフレームにおいて、スケールを同一にし、例えば、両者を平均した2次元モデルを生成することで実現できる。 The integration of the two-dimensional model of the subject includes, for example, obtaining an angle parameter from the positional relationship between each camera and the subject and the preset positional relationship between the viewpoint and the subject, and depending on the obtained angle parameter, the camera images 1, 2 Geometric transformation (for example, affine transformation) of the two-dimensional model generated from the same, in the same time frame of the geometric transformation two-dimensional model, for example, to generate a two-dimensional model that averages both Can be realized.

図６は、2つの2次元モデルの統合処理を概念的に示す図である。同図に示すように、カメラ映像1,2から生成された同一被写体の2次元モデルを、カメラと被写体の位置関係および予め設定された視点と被写体の位置関係に従って幾何変換することにより、予め設定された視点から観た場合の2次元モデルを生成し、幾何変換された2次元モデルを平均化処理して統合された2次元モデルを生成する。 FIG. 6 is a diagram conceptually showing an integration process of two two-dimensional models. As shown in the figure, the two-dimensional model of the same subject generated from the camera images 1 and 2 is preliminarily set by geometric transformation according to the positional relationship between the camera and the subject and the preset positional relationship between the viewpoint and the subject. A two-dimensional model when viewed from the point of view is generated, and an integrated two-dimensional model is generated by averaging the geometrically transformed two-dimensional model.

被写体モデル統合部18により統合された2次元モデルをシルエット抽出部11に入力する。シルエット抽出部11〜類似度算出部15の処理は、第１実施形態と同じであるので説明を省略するが、類似度算出部15からは、予め設定された視点から観たときの被写体の、3次元モデル上での姿勢を表すモーションキャプチャデータが送出される。 The two-dimensional model integrated by the subject model integration unit 18 is input to the silhouette extraction unit 11. Since the processes of the silhouette extraction unit 11 to the similarity calculation unit 15 are the same as those in the first embodiment, description thereof will be omitted. However, the similarity calculation unit 15 determines the subject when viewed from a preset viewpoint. Motion capture data representing the posture on the 3D model is sent out.

以上のように、被写体姿勢推定装置により推定された3次元モデル上での被写体姿勢を用いれば、没入感の高い表示環境で、仮想的な視点から観たときの映像を表示する場合でも、視点の位置に応じて幾何学的に不自然でなく、かつ立体感のある映像を描画することができる。 As described above, using the subject posture on the three-dimensional model estimated by the subject posture estimation device, even when displaying a video when viewed from a virtual viewpoint in a highly immersive display environment, It is possible to draw an image that is not geometrically unnatural and has a three-dimensional effect according to the position of the image.

図７は、本発明に係る映像描画装置の一実施形態を示すブロック図である。本実施形態の映像描画装置は、被写体姿勢推定装置10、3次元モデル格納部19、3次元モデル探索部20、映像描画部21および没入型ディスプレイ22を備える。 FIG. 7 is a block diagram showing an embodiment of a video drawing apparatus according to the present invention. The video drawing device of this embodiment includes a subject posture estimation device 10, a 3D model storage unit 19, a 3D model search unit 20, a video drawing unit 21, and an immersive display 22.

被写体姿勢推定装置10は、図１または図５に示された構成を備え、入力されたカメラ映像における被写体の、3次元モデル上での姿勢を表すモーションキャプチャデータを送出する。 The subject posture estimation apparatus 10 has the configuration shown in FIG. 1 or 5 and sends out motion capture data representing the posture of the subject in the input camera image on the three-dimensional model.

3次元モデル格納部19は、被写体についてのモーションキャプチャデータに対する3次元モデルを予め格納している。3次元モデル探索部20は、被写体姿勢推定装置10から送出されたモーションキャプチャデータに対する3次元モデルを3次元モデル格納部19から探索する。 The 3D model storage unit 19 stores a 3D model for motion capture data about the subject in advance. The 3D model search unit 20 searches the 3D model storage unit 19 for a 3D model for the motion capture data sent from the subject posture estimation device 10.

映像描画部21には、外部から仮想的な視点/方向が与えられる。映像描画部21は、仮想的な視点/方向とカメラ映像1,2の2次元モデルの位置/方向の関係に基づいて、3次元モデル探索部20により探索された3次元モデルを変換し、仮想的な視点/方向からの3次元モデルを生成する。映像描画部21は、さらに、被写体の2次元モデルを3次元モデルに置換し、カメラ映像のテクスチャを、3次元モデルにマッピングして被写体映像を生成する。 The video drawing unit 21 is given a virtual viewpoint / direction from the outside. The video rendering unit 21 converts the 3D model searched by the 3D model search unit 20 based on the relationship between the virtual viewpoint / direction and the position / direction of the 2D model of the camera images 1 and 2, A 3D model from a specific viewpoint / direction. The video drawing unit 21 further replaces the 2D model of the subject with the 3D model, and maps the texture of the camera video to the 3D model to generate the subject video.

この被写体映像を用いれば、幾何学的に不自然でなく、かつ立体感のある被写体映像を描画できる。被写体映像に合わせて背景映像を描画する場合、背景映像は、何れかのカメラ映像を仮想的な視点からの映像に幾何変換することにより生成できる。この背景映像に3次元モデルの被写体映像を合成すれば、被写体映像を含む全体映像を生成できる。 By using this subject video, it is possible to draw a subject video that is not geometrically unnatural and has a stereoscopic effect. When a background video is drawn in accordance with a subject video, the background video can be generated by geometrically converting any camera video into a video from a virtual viewpoint. By synthesizing the 3D model subject video with this background video, an entire video including the subject video can be generated.

没入型ディスプレイ22は、映像描画部21により生成された映像を表示する。 The immersive display 22 displays the video generated by the video drawing unit 21.

本発明は、例えば、サッカースタジアムのような空間を単一または複数のカメラで撮影し、実際のカメラ位置からの映像だけでなく、空間内の仮想的な位置から観たときの選手や審判(被写体)を、幾何学的に不自然でなく、かつ立体感のある被写体映像として、没入感の高い表示環境で描画する場合に適用できる。 The present invention, for example, captures a space such as a soccer stadium with a single or a plurality of cameras, and not only images from actual camera positions, but also players and referees when viewed from a virtual position in the space ( The present invention can be applied to a case where an object is drawn in a highly immersive display environment as a subject image that is not geometrically unnatural and has a stereoscopic effect.

この場合、被写体姿勢推定装置は、単一または複数のカメラのカメラ映像に映っている選手や審判の2次元モデルから、選手や審判の、3次元モデル上での姿勢を推定する。 In this case, the subject posture estimation device estimates the posture of the player or the referee on the three-dimensional model from the two-dimensional model of the player or the referee reflected in the camera images of the single or plural cameras.

被写体は、選手や審判といった人物であるので、選手や審判の、3次元モデル上での姿勢は、人体の関節モデルとして推定される。なお、複数のカメラのカメラ映像を扱う場合には、時間的に同期したフレーム画像に映っている選手や審判を同定し、統合した2次元モデルを用いる。 Since the subject is a person such as a player or a referee, the posture of the player or the referee on the three-dimensional model is estimated as a joint model of the human body. When handling camera images of a plurality of cameras, a player and a referee appearing in a temporally synchronized frame image are identified and an integrated two-dimensional model is used.

映像描画装置は、推定された人体の関節モデルから、当該関節モデルを有する3次元モデルとしての人体モデルを3次元モデル格納部(データベース)から探索する。人体モデルは任意のものを利用することができるが、被写体のシルエットに応じて最適な人体モデルを選択してもよい。 The video rendering device searches the human body model as a three-dimensional model having the joint model from the estimated human body joint model from the three-dimensional model storage unit (database). Any human body model can be used, but an optimal human body model may be selected according to the silhouette of the subject.

映像描画装置は、さらに、探索された人体モデルから仮想的な視点から観たときの選手や審判の被写体映像を生成する。そして、生成した被写体映像を、カメラ映像を幾何変換して生成した背景映像に合成して没入型ディスプレイで表示する。 The video drawing apparatus further generates a subject video of a player or a referee when viewed from a virtual viewpoint from the searched human body model. Then, the generated subject video is combined with the background video generated by geometrically converting the camera video and displayed on the immersive display.

以上実施形態について説明したが、本発明は、上記実施形態に限定されるものではない。例えば、カメラの数は、１台や２台に限らず、３台以上でもよい。また、撮影対象の空間はサッカースタジアムのような空間でなくてもよく、被写体も人物に限られない。 Although the embodiment has been described above, the present invention is not limited to the above embodiment. For example, the number of cameras is not limited to one or two, and may be three or more. Further, the space to be photographed may not be a space like a soccer stadium, and the subject is not limited to a person.

また、類似度算出部15での類似度の算出では、モーションキャプチャデータをそのまま用いることもできる。この場合、スケルトンモデル生成部14を省略でき、類似度は、モーションキャプチャデータに含まれる関節とシルエット重みマップがどの程度オーバーラップしているかを、関節の数と重みの関連から算出できる。 In the similarity calculation by the similarity calculation unit 15, the motion capture data can be used as it is. In this case, the skeleton model generation unit 14 can be omitted, and the degree of similarity can be calculated from the relationship between the number of joints and the weight, to what extent the joints included in the motion capture data overlap with the silhouette weight map.

また、モーションキャプチャデータ格納部13に、多数のモーションキャプチャデータが格納されている場合、時間的に連続したシルエット重みマップに対して類似度を算出するモーションキャプチャデータ(またはスケルトンモデル)を、直前に算出された類似度が所定閾値以上のモーションキャプチャデータ(またはスケルトンモデル)だけに限定することもできる。これにより、全てのモーションキャプチャデータ(またはスケルトンモデル)に対して類似度を算出しなくても済み、3次元モデル(関節モデル)の推定に要する処理を軽減できる。 In addition, when a large number of motion capture data is stored in the motion capture data storage unit 13, the motion capture data (or skeleton model) for calculating the similarity to the temporally continuous silhouette weight map is immediately before The calculated similarity may be limited to only motion capture data (or skeleton model) having a predetermined threshold value or more. Thereby, it is not necessary to calculate the similarity for all the motion capture data (or skeleton model), and the processing required for estimating the three-dimensional model (joint model) can be reduced.

さらに、カメラ映像に映っている被写体の姿勢を推定して被写体の3次元モデルを推定する処理は、入力される全ての映像フレームに対して行ってもよいが、離散的に、例えば、Nフレーム毎(・・・ ,t-N, t, t+N, ・・・)に推定するようにしてもよい。この場合、被写体の3次元モデルが推定されない映像フレーム(例えば、t+1,t+2,・・・,t+N-1)が生じるが、これらの映像フレームにおける被写体の姿勢は、フレームtとフレームt+Nで推定された被写体姿勢から、動的計画法などを用いて算出することができる。これにより、被写体の3次元モデル推定処理の高速化を図ることができる。 Further, the process of estimating the posture of the subject reflected in the camera video and estimating the three-dimensional model of the subject may be performed on all input video frames, but discretely, for example, N frames You may make it estimate every (..., tN, t, t + N, ...). In this case, a video frame (for example, t + 1, t + 2,..., T + N-1) in which the three-dimensional model of the subject is not estimated is generated, but the posture of the subject in these video frames is the frame t And the subject posture estimated at frame t + N can be calculated using dynamic programming or the like. Thereby, it is possible to increase the speed of the 3D model estimation process of the subject.

10・・・被写体姿勢推定装置、11・・・シルエット抽出部、12・・・シルエット重みマップ生成部、13・・・モーションキャプチャデータ格納部、14・・・スケルトンモデル生成部、15・・・類似度測定部、16,16-1,16-2・・・2次元モデル生成部、17・・・被写体同定部、15・・・被写体モデル統合部、19・・・3次元モデル格納部、20・・・3次元モデル探索部、21・・・映像描画部、22・・・没入型ディスプレイ 10 ... Subject posture estimation device, 11 ... Silhouette extraction unit, 12 ... Silhouette weight map generation unit, 13 ... Motion capture data storage unit, 14 ... Skeleton model generation unit, 15 ... Similarity measurement unit, 16, 16-1, 16-2 ... 2D model generation unit, 17 ... subject identification unit, 15 ... subject model integration unit, 19 ... 3D model storage unit, 20 ... 3D model search unit, 21 ... Video drawing unit, 22 ... Immersive display

Claims

A silhouette extracting means for inputting a two-dimensional model of a subject generated from a camera image and extracting a silhouette of the subject from the two-dimensional model;
Silhouette weight map generating means for generating a silhouette weight map weighted to the silhouette extracted by the silhouette extracting means;
Motion capture data storage means for storing motion capture data in various postures of the subject in advance;
A similarity calculation means for calculating the similarity between the silhouette weight map generated by the silhouette weight map generation means and the motion capture data stored in the motion capture data storage means;
An object posture estimation apparatus for estimating an object posture on a three-dimensional model from motion capture data having a maximum similarity calculated by the similarity calculation means.

Subject identification means for identifying a subject using a two-dimensional model generated from each of a plurality of camera images as an input;
Subject model integration means for integrating a two-dimensional model of the subject identified by the subject identification means;
The subject posture estimation apparatus according to claim 1, wherein the silhouette extraction unit extracts a two-dimensional model of a subject using the two-dimensional model integrated by the subject model integration unit as an input.

Furthermore, comprising a skeleton model generation means for generating a skeleton model from the motion capture data stored in the motion capture data storage means,
3. The subject posture according to claim 1, wherein the similarity measurement unit calculates the similarity between the silhouette weight map and the motion capture data using the skeleton model generated by the skeleton model generation unit. Estimating device.

4. The silhouette weight map generating means assigns a weight, which has the largest silhouette center portion and decreases toward the edge portion, to the silhouette extracted by the extracting means. The subject posture estimation device according to claim 1.

5. The object posture estimation apparatus according to claim 1, wherein the similarity calculation unit calculates the similarity by matching the scale and the center of gravity of the silhouette weight map and the motion capture data. 6.

The similarity calculation unit further calculates the similarity between the camera frame silhouette weight map and the motion capture data of temporally continuous frame images, and the similarity calculated for the immediately preceding frame image is: 6. The subject posture estimation apparatus according to claim 1, wherein only the motion capture data having a predetermined threshold value or more set in advance is set as a similarity calculation target.

The process from silhouette extraction to calculation of the similarity between silhouette weight map and motion capture data is discretely executed in units of frames for continuously input camera images, and the camera in the frame section where the above processing is not executed The subject posture estimation apparatus according to claim 1, wherein the subject posture is interpolated and estimated with respect to an image by using temporal transition as an optimization problem.

The subject posture estimation apparatus according to claim 2, wherein the subject identification unit identifies a subject between a plurality of camera videos by associating feature points between the camera videos.

A subject posture estimation device according to any one of claims 1 to 8,
A 3D model storage unit that stores in advance a 3D model for motion capture data about the subject;
A three-dimensional model search unit that searches the three-dimensional model storage unit for a three-dimensional model for motion capture data whose similarity is maximized by the subject posture estimation device;
The 3D model searched by the 3D model search unit is converted according to the designated viewpoint and direction and the position and direction of the 2D model of the camera image, and then replaced with the 2D model of the subject. A video rendering unit for mapping the texture of the subject in the camera video to generate the subject video;
An image drawing apparatus comprising an immersive display for displaying a subject image generated by the image drawing unit.