JP2018116421A

JP2018116421A - Image processing device and image processing method

Info

Publication number: JP2018116421A
Application number: JP2017006084A
Authority: JP
Inventors: 敬介野中; Keisuke Nonaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-01-17
Filing date: 2017-01-17
Publication date: 2018-07-26
Anticipated expiration: 2037-01-17
Also published as: JP6799468B2

Abstract

PROBLEM TO BE SOLVED: To enable the production of a high-quality free-viewpoint video using a billboard.SOLUTION: An image processing device includes: means for acquiring multiple images acquired by picking up images of an object in a real space from multiple viewpoints; means for calculating coordinates that indicate the position of the object in the real space from the acquired multiple images; and means for creating a synthesized image corresponding to a viewpoint not included in the multiple viewpoints by disposing, with reference to the calculated coordinates, a billboard of the object created from at least one image of the multiple acquired images.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置および画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method.

従来、スポーツシーンなどを対象として、カメラ視点以外の自由な視点からの映像（以下、自由視点映像と称す）を生成する技術が提案されている。この技術は、複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 2. Description of the Related Art Conventionally, a technique for generating a video from a free viewpoint other than the camera viewpoint (hereinafter referred to as a free viewpoint video) has been proposed for a sports scene or the like. This technology synthesizes videos from virtual viewpoints that are not arranged based on videos taken by multiple cameras, and displays the results on the screen for viewing videos from various viewpoints. It is possible.

ここで、自由視点映像を合成する技術のうち、ビルボードと呼ばれる簡易なモデルを利用して高速に自由視点映像を合成する技術が存在する（非特許文献１参照）。このビルボードを利用した技術では、映像からモデル化対象のオブジェクトのテクスチャを正確に切り出し、それを厚みのないビルボードモデルとして仮想空間の地面に立たせることで、自由視点映像を生み出す。 Here, among techniques for synthesizing free viewpoint videos, there is a technique for synthesizing free viewpoint videos at high speed using a simple model called a billboard (see Non-Patent Document 1). In the technology using this billboard, the texture of the object to be modeled is accurately cut out from the video, and it is made to stand on the ground in a virtual space as a thin billboard model, thereby generating a free viewpoint video.

ここで、一般にビルボード方式では、あるビルボードの最下点（例えば、人物の足先）が仮想空間の地面に接するようにビルボードが配置される。また、仮想視点が水平方向に移動する際はその仮想視点の移動に合わせてビルボードを回転させ、垂直方向に移動する際はビルボードの方向を変化させない。 Here, in general, in the billboard system, the billboard is arranged such that the lowest point (for example, a person's foot) of a certain billboard is in contact with the ground of the virtual space. When the virtual viewpoint moves in the horizontal direction, the billboard is rotated in accordance with the movement of the virtual viewpoint, and when the virtual viewpoint moves in the vertical direction, the direction of the billboard is not changed.

Hayashi, K.; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer StadiumADIUM," in Computer Graphics, Imaging and Visualisation, 2006 International Conference on , vol., no., pp.220-225, 26-28 July 2006Hayashi, K .; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer Stadium ADIUM," in Computer Graphics, Imaging and Visualisation, 2006 International Conference on, vol., No., Pp.220-225, 26-28 July 2006

非特許文献１に記載の方式は、品質の高い自由視点映像を高速に合成可能であり、かつ合成されたコンテンツデータのサイズが他の方式に比べて小さい、という点において優れている。しかしながら、ビルボードを地面に垂直に立たせるという制約のため、実空間の被写体とビルボードとが対応しなくなる状況が発生しうる。この場合、得られる自由視点映像が不自然となる虞がある。 The method described in Non-Patent Document 1 is superior in that a high-quality free viewpoint video can be synthesized at a high speed and the size of the synthesized content data is smaller than other methods. However, due to the restriction that the billboard stands vertically to the ground, a situation may occur where the subject in real space and the billboard do not correspond. In this case, the obtained free viewpoint video may become unnatural.

本発明はこうした課題に鑑みてなされたものであり、その目的は、ビルボードを用いたより質の高い自由視点映像の生成を可能とする技術の提供にある。 The present invention has been made in view of these problems, and an object thereof is to provide a technique that enables generation of a higher-quality free viewpoint video using a billboard.

本発明のある態様は、画像処理装置に関する。この画像処理装置は、実空間内の被写体を複数の視点から撮像することにより得られる複数の画像を取得する手段と、取得された複数の画像から、実空間における被写体の位置を表す座標を算出する手段と、取得された複数の画像のうちの少なくともひとつの画像から生成される被写体のビルボードを、算出された座標を参照して配置することによって、複数の視点に含まれない視点に対応する合成画像を生成する手段と、を備える。 One embodiment of the present invention relates to an image processing apparatus. The image processing apparatus calculates a coordinate representing the position of the subject in the real space from the plurality of acquired images and means for acquiring a plurality of images obtained by imaging the subject in the real space from a plurality of viewpoints. And a viewpoint that is not included in multiple viewpoints by placing the billboard of the subject generated from at least one of the acquired multiple images with reference to the calculated coordinates Generating a composite image.

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, or those obtained by replacing the constituent elements and expressions of the present invention with each other between apparatuses, methods, systems, computer programs, recording media storing computer programs, and the like are also included in the present invention. It is effective as an embodiment of

本発明によれば、ビルボードを用いたより質の高い自由視点映像の生成が可能となる。 According to the present invention, it is possible to generate a higher-quality free viewpoint video using a billboard.

従来のビルボード方式におけるビルボードの配置を示す模式図である。It is a schematic diagram which shows arrangement | positioning of the billboard in the conventional billboard system. 実施の形態に係る画像処理装置を備える自由視点画像配信システムを示す模式図である。It is a schematic diagram which shows a free viewpoint image delivery system provided with the image processing apparatus which concerns on embodiment. 図２の画像処理装置の機能および構成を示すブロック図である。It is a block diagram which shows the function and structure of the image processing apparatus of FIG. カメラの画像平面上の座標とフィールド座標との対応関係を示す説明図である。It is explanatory drawing which shows the correspondence of the coordinate on the image plane of a camera, and a field coordinate. 図５（ａ）〜（ｃ）は、図３の背景差分部における処理の例を示す説明図である。5A to 5C are explanatory diagrams illustrating an example of processing in the background difference unit of FIG. 図３の三次元処理部によって生成される三次元モデルおよびそのモデル代表点を示す模式図である。It is a schematic diagram which shows the three-dimensional model produced | generated by the three-dimensional process part of FIG. 3, and its model representative point. ビルボード基準面を示す模式図である。It is a schematic diagram which shows a billboard reference plane. 図８（ａ）〜（ｃ）は、図３のビルボード生成部におけるビルボードの生成処理を説明するための模式図である。FIGS. 8A to 8C are schematic diagrams for explaining billboard generation processing in the billboard generation unit of FIG. 3. 図２の画像処理装置における一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processes in the image processing apparatus of FIG. 複数のカメラで撮影された複数の画像からビルボードを配置すべき座標を決定する変形例に係る方法の説明図である。It is explanatory drawing of the method which concerns on the modification which determines the coordinate which should arrange | position a billboard from the several image image | photographed with the some camera.

以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 Hereinafter, the same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated description is appropriately omitted. In addition, in the drawings, some of the members that are not important for explanation are omitted.

従来のビルボード方式では、ビルボードの最下点（例えば、人物の足先）が仮想空間の地面（以下、フィールドと称す）に接するように配置される。本発明者は、この制約に伴う以下の課題を独自に認識した。 In the conventional billboard method, the billboard is arranged such that the lowest point (for example, a person's foot) touches the ground (hereinafter referred to as a field) of the virtual space. The inventor has uniquely recognized the following problems associated with this restriction.

図１は、従来のビルボード方式におけるビルボード１０の配置を示す模式図である。カメラ１４は被写体である人物１２がフィールドから高くジャンプしたところを撮影（撮像ともいう）する。カメラ１４から得られた画像から人物１２のビルボード１０が生成されるが、ビルボード１０の最下点はフィールドに接しなければならないという制約のため、生成されたビルボード１０は人物１２の実際の位置１６ではなく、カメラ１４の視線２０とフィールドとの交点１８に配置される。この場合、仮想視点からみたビルボード１０は本来の人物１２の位置１６とは全く異なる座標に配置されるので、合成映像品質が低下しうる。このような課題は、例えばバレーボールやバスケットボールなどのようにカメラの近くで人物が高くジャンプする（以下、被写体がフィールドを離れる行為を空中移動と称す）シーンで頻繁に生じうる。 FIG. 1 is a schematic diagram showing the arrangement of billboards 10 in a conventional billboard system. The camera 14 takes a picture (also referred to as imaging) where the subject person 12 jumps high from the field. The billboard 10 of the person 12 is generated from the image obtained from the camera 14, but the generated billboard 10 is the actual person 12 because of the restriction that the lowest point of the billboard 10 must touch the field. It is arranged not at the position 16 but at the intersection 18 between the line of sight 20 of the camera 14 and the field. In this case, since the billboard 10 viewed from the virtual viewpoint is arranged at coordinates completely different from the position 16 of the original person 12, the synthesized video quality can be degraded. Such a problem may frequently occur in a scene where a person jumps high near the camera, such as volleyball or basketball (hereinafter, the action of the subject leaving the field is referred to as air movement).

従来のビルボード方式では、仮想視点からの映像を生成する際に、１つの固定カメラからの対応する映像から切り出された被写体のビルボードを三次元空間内に配置する。しかしながら、フィールドに接しているなどの前提条件がなければ１つの固定カメラからの映像から被写体の真の位置を特定することは困難である。また、バレーボールやバスケットボールの撮影などの比較的近距離で行われる撮影では、画面内において人物の空中移動が占める割合が大きいため、映像品質の低下につながりやすい。 In the conventional billboard method, when a video from a virtual viewpoint is generated, a billboard of a subject cut out from a corresponding video from one fixed camera is arranged in a three-dimensional space. However, it is difficult to specify the true position of the subject from the video from one fixed camera if there is no precondition such as being in contact with the field. Further, in shooting performed at a relatively short distance such as volleyball or basketball shooting, since the ratio of the person's air movement in the screen is large, the video quality is likely to deteriorate.

これに対して、実施の形態に係る画像処理装置は、複数の撮影装置（例えば、カメラ）から得られる複数の画像から実空間における被写体の位置を推定する。推定される位置はフィールド上に限られず、空中であってもよい。画像処理装置は、推定結果を参照してビルボードの配置を行うことで、空中移動を伴う映像にビルボード方式の自由視点映像技術を適用した場合により自然な表示を可能とする。被写体の位置の推定は、例えば厚みのある被写体の三次元モデルを生成することにより行われる。 On the other hand, the image processing apparatus according to the embodiment estimates the position of the subject in real space from a plurality of images obtained from a plurality of photographing apparatuses (for example, cameras). The estimated position is not limited to the field and may be in the air. The image processing apparatus arranges the billboard with reference to the estimation result, thereby enabling a more natural display when the billboard free viewpoint video technology is applied to the video accompanied by the air movement. The estimation of the position of the subject is performed, for example, by generating a three-dimensional model of a thick subject.

図２は、実施の形態に係る画像処理装置２００を備える自由視点画像配信システム１１０を示す模式図である。自由視点画像配信システム１１０は、複数のカメラ１１６、１１８、１２０と、それらのカメラと接続された画像処理装置２００と、携帯電話やタブレットやスマートフォンやＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）などの携帯端末１１４と、を備える。画像処理装置２００と携帯端末１１４とはインターネットなどのネットワーク１１２を介して接続される。自由視点画像配信システム１１０では、例えばアリーナ内に配置された複数のカメラ１１６、１１８、１２０がフィールド１２６に立つか高くジャンプするバレーボールの選手１２４を撮影する。複数のカメラ１１６、１１８、１２０は撮った映像を画像処理装置２００に送信し、画像処理装置２００はそれらの映像を処理する。携帯端末１１４のユーザは画像処理装置２００に対して希望の視点を指定し、画像処理装置２００は指定された視点（仮想視点）から選手１２４を見た場合の画像を合成し、ネットワーク１１２を介して携帯端末１１４に配信する。 FIG. 2 is a schematic diagram illustrating a free viewpoint image distribution system 110 including the image processing apparatus 200 according to the embodiment. The free-viewpoint image distribution system 110 includes a plurality of cameras 116, 118, and 120, an image processing device 200 connected to the cameras, and a mobile terminal 114 such as a mobile phone, a tablet, a smartphone, or an HMD (Head Mounted Display). . The image processing apparatus 200 and the portable terminal 114 are connected via a network 112 such as the Internet. In the free-viewpoint image distribution system 110, for example, a plurality of cameras 116, 118, 120 arranged in the arena shoots a volleyball player 124 standing on the field 126 or jumping high. The plurality of cameras 116, 118, and 120 transmit captured images to the image processing apparatus 200, and the image processing apparatus 200 processes these images. The user of the portable terminal 114 designates a desired viewpoint with respect to the image processing apparatus 200, and the image processing apparatus 200 synthesizes an image when the player 124 is viewed from the designated viewpoint (virtual viewpoint) via the network 112. To the mobile terminal 114.

なお、図２ではアリーナ内のバレーボールの選手１２４を撮影する場合を説明したが、これに限られず、例えばフィットネスのインストラクタを撮影する場合やテニスの試合を撮影する場合やサッカーの試合を撮影する場合などの、空中移動を行いうる被写体を撮影する場合に、本実施の形態の技術的思想を適用できる。また、スポーツのシーンを撮影する場合に限られず、複数のカメラから得られる複数の映像に同じ被写体が撮影されうるアプリケーションであれば広く本実施の形態の技術的思想を適用できる。また、携帯端末１１４の代わりに、デスクトップＰＣやラップトップＰＣ、ＴＶ受像機等の据え置き型端末が使用されてもよい。 In addition, although FIG. 2 demonstrated the case where the volleyball player 124 in an arena was image | photographed, it is not restricted to this, For example, when shooting a fitness instructor, when shooting a tennis game, or when shooting a soccer game The technical idea of the present embodiment can be applied when shooting a subject that can move in the air. In addition, the present invention is not limited to photographing sports scenes, and the technical idea of the present embodiment can be widely applied to any application that can photograph the same subject on a plurality of videos obtained from a plurality of cameras. Further, a stationary terminal such as a desktop PC, a laptop PC, or a TV receiver may be used instead of the portable terminal 114.

図３は、実施の形態に係る画像処理装置２００の機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウエア的には、コンピュータのＣＰＵ（Central Processing Unit）をはじめとする素子や機械装置で実現でき、ソフトウエア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウエア、ソフトウエアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 3 is a block diagram showing functions and configuration of the image processing apparatus 200 according to the embodiment. Each block shown here can be realized by hardware such as a computer (CPU) (Central Processing Unit) and other elements and mechanical devices, and software can be realized by a computer program or the like. The functional block realized by those cooperation is drawn. Therefore, it is understood by those skilled in the art who have touched this specification that these functional blocks can be realized in various forms by a combination of hardware and software.

画像処理装置２００は、複数のカメラ１１６、１１８、１２０にて撮影された画像から任意の仮想視点の画像を合成する。画像処理装置２００は、同一の被写体を撮影した複数のカメラ映像を基に、ビルボード方式に則った自由視点映像の生成を行う。従来のビルボード方式ではビルボードの生成の際にひとつのカメラからの映像のみを利用していたが、本実施の形態に係る画像処理装置２００では複数のカメラ映像から算出される三次元モデルの座標データを利用する。 The image processing apparatus 200 synthesizes an image of an arbitrary virtual viewpoint from images taken by a plurality of cameras 116, 118, and 120. The image processing apparatus 200 generates a free viewpoint video in accordance with a billboard system based on a plurality of camera videos obtained by photographing the same subject. In the conventional billboard method, only the video from one camera is used when generating the billboard, but the image processing apparatus 200 according to the present embodiment uses a three-dimensional model calculated from a plurality of camera videos. Use coordinate data.

以下では、複数のカメラ１１６、１１８、１２０の間での時刻同期は事前に行われているとする。また、以下では被写体として人物を想定するがその他の被写体についても本実施の形態の技術的思想を適用可能である。また、仮想視点は例えばユーザが任意に指定可能な仮想的な視点であり、複数のカメラ１１６、１１８、１２０が配置される実際の視点とは意義が異なる。 In the following, it is assumed that time synchronization between the plurality of cameras 116, 118, 120 is performed in advance. In the following, a person is assumed as a subject, but the technical idea of the present embodiment can be applied to other subjects. The virtual viewpoint is a virtual viewpoint that can be arbitrarily designated by the user, for example, and has a different meaning from an actual viewpoint in which a plurality of cameras 116, 118, and 120 are arranged.

［画像処理装置２００の概要］
画像処理装置２００は、キャリブレーション部２０２と、背景差分部２０４と、三次元処理部２０６と、基準面算出部２０８と、ビルボード生成部２１０と、被投影点算出部２１２と、自由視点映像生成部２１４と、を備える。キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０で撮影した複数の映像を入力として受ける。キャリブレーション部２０２は、カメラごとに実空間のフィールドとカメラで撮影された画像との対応付けを取り、キャリブレーションデータとして出力する。カメラが固定されていることを前提とした場合、キャリブレーション部２０２におけるこのキャリブレーション操作は最初に一度行うのみでよい。背景差分部２０４は公知の背景差分法を用いて画像を背景と前景とに分類し、２値化した画像をマスクデータとして出力する。 [Outline of Image Processing Device 200]
The image processing apparatus 200 includes a calibration unit 202, a background difference unit 204, a three-dimensional processing unit 206, a reference plane calculation unit 208, a billboard generation unit 210, a projection point calculation unit 212, and a free viewpoint video. A generation unit 214. The calibration unit 202 receives a plurality of images captured by the plurality of cameras 116, 118, and 120 as inputs. The calibration unit 202 associates a field in the real space with an image photographed by the camera for each camera and outputs it as calibration data. If it is assumed that the camera is fixed, this calibration operation in the calibration unit 202 need only be performed once at the beginning. The background difference unit 204 classifies an image into a background and a foreground using a known background difference method, and outputs a binarized image as mask data.

三次元処理部２０６は複数のカメラ１１６、１１８、１２０で撮影された複数の映像から得られる複数のマスクデータから被写体の三次元モデルを生成する。三次元処理部２０６は、生成された三次元モデルから実空間における被写体の位置を推定する。三次元処理部２０６は、推定の結果得られる座標を出力する。三次元処理部２０６は、三次元モデル構築部２１６と、モデル代表点算出部２１８と、平滑化部２２０と、を含む。 The three-dimensional processing unit 206 generates a three-dimensional model of the subject from a plurality of mask data obtained from a plurality of images captured by the plurality of cameras 116, 118, and 120. The three-dimensional processing unit 206 estimates the position of the subject in real space from the generated three-dimensional model. The three-dimensional processing unit 206 outputs coordinates obtained as a result of estimation. The 3D processing unit 206 includes a 3D model construction unit 216, a model representative point calculation unit 218, and a smoothing unit 220.

三次元モデル構築部２１６は、背景差分部２０４により生成された前景マスクとキャリブレーション部２０２により生成されたキャリブレーションデータとを用いて、ビルボードとは異なる厚みのある三次元モデルを生成する。モデル代表点算出部２１８は、三次元モデルを表す代表的な（例えば、人物の姿勢変化に頑健な）点であるモデル代表点の座標を算出する。平滑化部２２０は、モデル代表点算出部２１８により算出されたモデル代表点の座標を時間軸方向に平滑化する。 The three-dimensional model construction unit 216 uses the foreground mask generated by the background difference unit 204 and the calibration data generated by the calibration unit 202 to generate a three-dimensional model having a thickness different from that of the billboard. The model representative point calculation unit 218 calculates the coordinates of a model representative point that is a representative point (for example, robust to a change in the posture of a person) representing the three-dimensional model. The smoothing unit 220 smoothes the coordinates of the model representative points calculated by the model representative point calculation unit 218 in the time axis direction.

基準面算出部２０８は、ビルボードがその面内に配置される平面であるビルボード基準面を設定する。基準面算出部２０８は、モデル代表点を含むようビルボード基準面を設定する。被投影点算出部２１２は、ビルボードの生成時に参照される画像平面内の座標を算出する。ビルボード生成部２１０は、被写体のビルボードを生成し出力する。自由視点映像生成部２１４は、ビルボード生成部２１０から出力されたビルボードとビルボード基準面とモデル代表点とを用いて自由視点映像を生成し、出力する。 The reference plane calculation unit 208 sets a billboard reference plane that is a plane on which the billboard is arranged. The reference plane calculation unit 208 sets the billboard reference plane so as to include the model representative points. The projected point calculation unit 212 calculates coordinates in the image plane that are referred to when the billboard is generated. The billboard generator 210 generates and outputs a billboard for the subject. The free viewpoint video generation unit 214 generates and outputs a free viewpoint video using the billboard output from the billboard generation unit 210, the billboard reference plane, and the model representative point.

[キャリブレーション部２０２]
キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０から複数の映像を取得する。キャリブレーション部２０２は、複数のカメラ１１６、１１８、１２０のそれぞれについて、ある時刻において撮影された画像中のフィールドの特徴的な点（コートの白線の交点など）と実際の実空間内のフィールド上の点との対応付けを行い、カメラパラメータとして算出する。例えば、一般的なスポーツの試合を撮影する場合、コートのサイズは規格化されているため、キャリブレーション部２０２は画像平面上の点が実空間内（世界座標系）のどの座標に対応するかを計算することが可能である。カメラパラメータは外部パラメータと内部パラメータとを含む。外部パラメータは実空間とカメラとの関係を示すパラメータであり、例えばカメラの位置（視点の位置）やカメラの姿勢（回転など）を表すパラメータを含む。内部パラメータはカメラ固有のパラメータであり、例えばレンズ歪みを含む。 [Calibration unit 202]
The calibration unit 202 acquires a plurality of videos from the plurality of cameras 116, 118, and 120. For each of the plurality of cameras 116, 118, and 120, the calibration unit 202 uses the characteristic points of the field (such as the intersection of the white lines of the coat) in the image taken at a certain time and the actual field in the real space. Are associated with the points and calculated as camera parameters. For example, when shooting a general sports game, since the size of the court is standardized, the calibration unit 202 corresponds to which coordinate in the real space (world coordinate system) the point on the image plane corresponds to Can be calculated. Camera parameters include external parameters and internal parameters. The external parameter is a parameter indicating the relationship between the real space and the camera, and includes, for example, a parameter indicating the camera position (viewpoint position) and the camera posture (rotation). Internal parameters are camera-specific parameters and include, for example, lens distortion.

図４は、カメラ４０２の画像平面上の座標とフィールド座標との対応関係を示す説明図である。カメラ４０２の２次元画像平面上の座標を（ｕ、ｖ）、世界座標系のフィールド平面上の座標（フィールド座標）を（ｘ’、ｙ’）としたときに、両者の対応関係はホモグラフィ行列
とスカラー値ｓとを用いて次の通りに表すことができる。
…（式１） FIG. 4 is an explanatory diagram showing the correspondence between the coordinates on the image plane of the camera 402 and the field coordinates. When the coordinates on the two-dimensional image plane of the camera 402 are (u, v) and the coordinates (field coordinates) on the field plane of the world coordinate system are (x ′, y ′), the correspondence between the two is homography. matrix
And a scalar value s can be expressed as follows.
... (Formula 1)

式１に上記の対応点の組を入力することでｓおよびＨを求めることが可能となり、画像平面上の任意の画素の座標とフィールド座標との相互変換が可能となる。なお、フィールドに対するカメラキャリブレーションの手法は上記のものに限られない。 S and H can be obtained by inputting the set of corresponding points in Equation 1, and mutual conversion between the coordinates of an arbitrary pixel on the image plane and the field coordinates becomes possible. The camera calibration method for the field is not limited to the above.

図３に戻り、キャリブレーション部２０２におけるカメラのキャリブレーションは、手動のほか、公知の自動キャリブレーションに関する技術を用いて行われてもよい。手動の方法としては、例えば画面上の白線の交点をユーザ操作により選択し、あらかじめ測定されたフィールドモデルとの対応付けをとることで、カメラのパラメータを推定する手法がある。なお、画面に歪みがある場合は下記の通り先に内部パラメータを推定しておく。同様の操作を自動で行う方法の一例としては、閾値処理などを用いて上記画面内の白線を抽出し、ハフ変換による直線成分の抽出を施すことで交点の画面内の座標を推定する方法などがある。 Returning to FIG. 3, the calibration of the camera in the calibration unit 202 may be performed using a technique related to a known automatic calibration in addition to the manual operation. As a manual method, for example, there is a method of estimating camera parameters by selecting an intersection of white lines on the screen by a user operation and associating it with a field model measured in advance. If the screen is distorted, the internal parameters are estimated as follows. As an example of a method of automatically performing the same operation, a method of extracting a white line in the screen using threshold processing and estimating a coordinate in the screen of an intersection point by extracting a linear component by Hough transform, etc. There is.

一方で、魚眼レンズなどの広角なレンズを備えるカメラを撮影に用いる場合は、キャリブレーション部２０２はカメラの内部パラメータを個別に推定し、画面の歪みを補正する。この推定は、予め撮影に用いるカメラにてチェッカーボードなどの幾何模様を撮影することにより実行されてもよい。固定されたカメラでの撮影を前提とした場合、キャリブレーション部２０２はカメラのキャリブレーションを映像生成の最初に一度行えばよい。また、移動するカメラでの撮影を前提とした場合、キャリブレーション部２０２は上述の公知の自動キャリブレーションをフレーム毎に行う。キャリブレーション部２０２は、算出されたカメラパラメータを含むキャリブレーションデータを生成し、出力する。 On the other hand, when a camera including a wide-angle lens such as a fisheye lens is used for shooting, the calibration unit 202 estimates the internal parameters of the camera individually and corrects the screen distortion. This estimation may be executed by photographing a geometric pattern such as a checkerboard with a camera used for photographing in advance. When shooting with a fixed camera is assumed, the calibration unit 202 may perform camera calibration once at the beginning of video generation. In addition, when it is assumed that shooting is performed with a moving camera, the calibration unit 202 performs the above-described known automatic calibration for each frame. The calibration unit 202 generates calibration data including the calculated camera parameters and outputs the calibration data.

以下、ある時刻ｔに複数のカメラ１１６、１１８、１２０のそれぞれにより撮影された画像に対する処理を説明する。
[背景差分部２０４]
背景差分部２０４は、各カメラからのある時刻ｔの画像の各画素を背景と前景との２つに分類することで、該画像を背景と前景とに分ける。本実施の形態では、この分離は、例えば公知の背景差分法を使用して実現されてもよい。背景差分部２０４は、背景、前景とされた画素の値にそれぞれ０、１などの２値を割り当てることで前景マスクを生成する。この背景と前景との分離を行うことによって、被写体を含むおおまかな領域を抽出することができる。 Hereinafter, a process for an image captured by each of the plurality of cameras 116, 118, 120 at a certain time t will be described.
[Background difference unit 204]
The background difference unit 204 classifies each pixel of an image at a certain time t from each camera into a background and a foreground, thereby dividing the image into a background and a foreground. In the present embodiment, this separation may be realized using a known background subtraction method, for example. The background difference unit 204 generates a foreground mask by assigning binary values such as 0 and 1 to the pixel values set as the background and the foreground, respectively. By separating the background and the foreground, a rough area including the subject can be extracted.

図５（ａ）〜（ｃ）は、背景差分部２０４における処理の例を示す説明図である。図５（ａ）はあるカメラにより撮影された時刻ｔの画像５０２を示す。背景差分部２０４はこの画像５０２を原画像として処理する。図５（ｂ）は図５（ａ）の原画像に対して背景差分法を適用した結果得られる前景マスク５０４を示す。前景マスク５０４において、黒色の部分は背景と判定され、０が割り当てられている。白色の部分は前景と判定され、１が割り当てられている。図５（ｃ）は図５（ａ）の原画像と図５（ｂ）の前景マスク５０４とから得られる人物５０６のテクスチャを示す。 FIGS. 5A to 5C are explanatory diagrams illustrating an example of processing in the background difference unit 204. FIG. 5A shows an image 502 at time t taken by a certain camera. The background difference unit 204 processes this image 502 as an original image. FIG. 5B shows a foreground mask 504 obtained as a result of applying the background difference method to the original image of FIG. In the foreground mask 504, the black portion is determined to be the background and 0 is assigned. The white part is determined to be the foreground, and 1 is assigned. FIG. 5C shows the texture of the person 506 obtained from the original image of FIG. 5A and the foreground mask 504 of FIG.

［三次元モデル構築部２１６］
図３に戻り、三次元モデル構築部２１６は、キャリブレーション部２０２により生成されたキャリブレーションデータおよび背景差分部２０４により生成された前景マスクを取得する。三次元モデル構築部２１６は、取得された情報を用いて、実空間を模した仮想三次元空間内で被写体の形を概略的に表す三次元モデルを構築する。被写体が人物である場合は、三次元モデルは人物の身体の概形を表してもよい。この三次元モデルはポリゴンメッシュモデルやボクセルモデルなどで表現される。なお、三次元モデルはビルボードモデルとは異なり、厚みを有する。 [Three-dimensional model construction unit 216]
Returning to FIG. 3, the three-dimensional model construction unit 216 acquires the calibration data generated by the calibration unit 202 and the foreground mask generated by the background difference unit 204. The three-dimensional model construction unit 216 uses the acquired information to construct a three-dimensional model that schematically represents the shape of the subject in a virtual three-dimensional space that imitates the real space. When the subject is a person, the three-dimensional model may represent a general shape of the person's body. This three-dimensional model is represented by a polygon mesh model or a voxel model. The three-dimensional model has a thickness different from the billboard model.

複数のカメラに由来する複数の前景マスクから三次元モデルを抽出するために、公知の視体積交差法が用いられてもよい。この手法では、三次元空間内にボクセル空間と呼ばれる縦・横・奥行きのそれぞれの方向に均一に分割された立方体群から成る空間を定義する。各カメラに由来する前景マスクのシルエットをそのボクセル空間に射影することで、実際の被写体の立体的な概形を得る。ｉ番目のカメラに由来する前景マスクをＭ_ｉ ^ｔとし、三次元モデルをｉ番目のカメラの画像平面に射影することで得られるシルエットマスクをＮ_ｉ ^ｔとした場合、理論的には
となる。 In order to extract a three-dimensional model from a plurality of foreground masks derived from a plurality of cameras, a known visual volume intersection method may be used. In this method, a space consisting of a group of cubes uniformly divided in the vertical, horizontal, and depth directions, called a voxel space, is defined in a three-dimensional space. By projecting the silhouette of the foreground mask derived from each camera into the voxel space, a three-dimensional outline of the actual subject is obtained. If the foreground mask derived from the i-th camera is M _i ^t and the silhouette mask obtained by projecting the three-dimensional model onto the image plane of the i-th camera is N _i ^t , theoretically
It becomes.

各カメラに由来する前景マスクには一般に、ノイズによって誤検出された前景が含まれる。しかしながら、上記の手法ではボクセル空間に射影された複数の前景マスクの情報を統合するため、このような過剰な前景検出を軽減することができる。したがって、以降の処理において、三次元モデルをカメラの画像平面に射影することで得られるシルエットマスクを前景マスクとして利用してもよい。 The foreground mask derived from each camera generally includes a foreground that is erroneously detected due to noise. However, in the above method, information of a plurality of foreground masks projected onto the voxel space is integrated, so that such excessive foreground detection can be reduced. Therefore, in the subsequent processing, a silhouette mask obtained by projecting the three-dimensional model onto the image plane of the camera may be used as the foreground mask.

［モデル代表点算出部２１８］
モデル代表点算出部２１８は、三次元モデル構築部２１６によって生成された被写体の三次元モデル（ボクセルデータ）の位置を表すモデル代表点の座標を、実空間における被写体の位置を表す座標として算出する。モデル代表点は三次元モデルの位置を表す点であればあらゆる点を採用することができる。しかしながら、時間軸方向の動きの繋がりも含めた映像品質の観点から、被写体（例えば、人や動物）の姿勢の変化に大きく依存しない点が採用されることが望ましい。すなわち、手先や足先は人や動物の姿勢によって急激に変化する蓋然性が高いため、体幹や腰などの骨格に含まれる点をモデル代表点として採用することが望ましい。 [Model representative point calculation unit 218]
The model representative point calculation unit 218 calculates the coordinates of the model representative point representing the position of the three-dimensional model (voxel data) of the subject generated by the three-dimensional model construction unit 216 as coordinates representing the position of the subject in real space. . Any model representative point can be used as long as it represents the position of the three-dimensional model. However, from the viewpoint of video quality including the connection of movements in the time axis direction, it is desirable to adopt a point that does not largely depend on a change in the posture of a subject (for example, a person or an animal). That is, since the hand and feet are highly likely to change rapidly depending on the posture of a person or animal, it is desirable to adopt points included in the skeleton such as the trunk and waist as model representative points.

あるいはまた、モデル代表点算出部２１８は、生成された三次元モデルをボクセル空間内で統計処理することでモデル代表点を決定し、その座標を算出してもよい。例えば、モデル代表点算出部２１８は、被写体の三次元モデルに含まれる全てのボクセルの重心の座標をモデル代表点の座標として算出してもよい。この場合、被写体の姿勢の変化に頑健なモデル代表点を算出することができる。または、被写体の姿勢の変化に頑健であるという性質が得られる限り、他の統計処理によりモデル代表点の座標を求めてもよい。例えば、三次元モデルをｘｙ平面に平行な平面で順次切断していき、切断面に含まれるボクセルが最多となる平面を最多ｘｙ平面として特定する。ｙｚ平面、ｚｘ平面についても同様に切断面に含まれるボクセルが最多となる最多ｙｚ平面、最多ｚｘ平面をそれぞれ特定する。モデル代表点算出部２１８は、最多ｘｙ平面と最多ｙｚ平面と最多ｚｘ平面とが交わる交点をモデル代表点として決定してもよい。また、より精緻な方法としては、モデル代表点算出部２１８は人物の部位ごとに追跡を行い三次元モデルのボーンを生成し、生成されたボーンのうち腰や背骨に当たる点をモデル代表点として決定してもよい。またはモデル代表点算出部２１８は画像から機械学習により人物の腰の位置を推定し、推定により得られた点をモデル代表点として決定してもよい。 Alternatively, the model representative point calculation unit 218 may determine a model representative point by statistically processing the generated three-dimensional model in the voxel space, and calculate the coordinates thereof. For example, the model representative point calculation unit 218 may calculate the coordinates of the center of gravity of all voxels included in the three-dimensional model of the subject as the coordinates of the model representative point. In this case, model representative points that are robust against changes in the posture of the subject can be calculated. Alternatively, the coordinates of the model representative point may be obtained by other statistical processing as long as the property of being robust to changes in the posture of the subject is obtained. For example, the three-dimensional model is sequentially cut along a plane parallel to the xy plane, and the plane having the largest number of voxels included in the cut surface is specified as the most xy plane. Similarly, for the yz plane and the zx plane, the maximum yz plane and the maximum zx plane where the voxels included in the cut surface are the maximum are specified. The model representative point calculation unit 218 may determine an intersection point at which the most xy plane, the most yz plane, and the most yz plane intersect as a model representative point. As a more precise method, the model representative point calculation unit 218 performs tracking for each part of the person to generate a bone of a three-dimensional model, and determines a point corresponding to the waist or spine as the model representative point among the generated bones. May be. Alternatively, the model representative point calculation unit 218 may estimate the position of the person's waist from the image by machine learning, and determine the point obtained by the estimation as the model representative point.

［平滑化部２２０］
平滑化部２２０は、三次元モデルのモデル代表点の座標を時間軸方向に統計処理する。例えば平滑化部２２０は、モデル代表点算出部２１８によって算出されたモデル代表点の座標を時間軸方向に平滑化する。モデル代表点の座標の算出時に、人物の姿勢の変化に対して頑健な算出方法を選択した場合でも、前景マスクや三次元モデルのノイズ等によって実際に映像化した際には不自然な動き（座標移動）になることが想定される。これを軽減するために平滑化部２２０において時間軸方向での座標の平滑化を行う。 [Smoothing unit 220]
The smoothing unit 220 statistically processes the coordinates of the model representative points of the three-dimensional model in the time axis direction. For example, the smoothing unit 220 smoothes the coordinates of the model representative points calculated by the model representative point calculation unit 218 in the time axis direction. When calculating the coordinates of model representative points, even if a calculation method that is robust against changes in the posture of the person is selected, unnatural motion ( (Coordinate movement). In order to reduce this, the smoothing unit 220 smoothes the coordinates in the time axis direction.

例えば、平滑化部２２０は、現在のフレームの前後ｎフレームにおけるモデル代表点の座標に対して、ローパスフィルタを適用することで平滑化を行う。また、平滑化部２２０は、離散的にサンプリングされたモデル代表点を制御点とするスプライン曲線やＢ−スプライン曲線を用いて滑らかな軌跡を生成し、生成された軌跡に従うようにモデル代表点の座標を移動させてもよい。この場合、自然な移動を実現できる。その他、平滑化部２２０はモデル代表点の座標の時系列データにカルマンフィルタなどの時系列フィルタを適用してもよい。この場合、自然な移動を実現できる。以下、モデル代表点は平滑化部２２０により平滑化されたモデル代表点とする。 For example, the smoothing unit 220 performs smoothing by applying a low-pass filter to the coordinates of model representative points in n frames before and after the current frame. In addition, the smoothing unit 220 generates a smooth trajectory using a spline curve or a B-spline curve with the model representative point sampled discretely as a control point, and the model representative point is determined to follow the generated trajectory. The coordinates may be moved. In this case, natural movement can be realized. In addition, the smoothing unit 220 may apply a time series filter such as a Kalman filter to the time series data of the coordinates of the model representative points. In this case, natural movement can be realized. Hereinafter, the model representative point is a model representative point smoothed by the smoothing unit 220.

図６は、三次元処理部２０６によって生成される三次元モデル６０２およびそのモデル代表点６０４を示す模式図である。三次元処理部２０６は、複数のカメラ６０６、６０８から得られる画像から被写体（フィールド６１０から高くジャンプした人物）の三次元モデル６０２を生成する。三次元処理部２０６は、三次元モデル６０２の重心の座標をモデル代表点６０４の座標として算出する。後述するとおり、被写体のビルボードはモデル代表点６０４を参照して配置される。 FIG. 6 is a schematic diagram showing a three-dimensional model 602 generated by the three-dimensional processing unit 206 and its model representative point 604. The three-dimensional processing unit 206 generates a three-dimensional model 602 of a subject (a person who has jumped high from the field 610) from images obtained from the plurality of cameras 606 and 608. The three-dimensional processing unit 206 calculates the coordinates of the center of gravity of the three-dimensional model 602 as the coordinates of the model representative point 604. As will be described later, the billboard of the subject is arranged with reference to the model representative point 604.

［基準面算出部２０８］
図３に戻り、基準面算出部２０８は、算出されたモデル代表点の座標およびキャリブレーション部２０２により生成された外部パラメータ（例えば、カメラの仰角α）に基づいてビルボード基準面を決定する。図７は、ビルボード基準面７０２を示す模式図である。基準面算出部２０８は、モデル代表点７０４を含みフィールド７０６と角度θをなす面をビルボード基準面７０２として設定する。後述の自由視点映像生成部２１４は設定されたビルボード基準面７０２に重畳するようにビルボード７１０を配置する。したがって、θは被写体のビルボード７１０がフィールド７０６となす角度である。 [Reference plane calculation unit 208]
Returning to FIG. 3, the reference plane calculation unit 208 determines the billboard reference plane based on the calculated model representative point coordinates and the external parameter (for example, the camera elevation angle α) generated by the calibration unit 202. FIG. 7 is a schematic diagram showing the billboard reference plane 702. The reference plane calculation unit 208 sets a plane that includes the model representative point 704 and forms the angle θ with the field 706 as the billboard reference plane 702. A free viewpoint video generation unit 214 described later arranges the billboard 710 so as to be superimposed on the set billboard reference plane 702. Therefore, θ is an angle formed by the billboard 710 of the subject and the field 706.

基準面算出部２０８は、キャリブレーション部２０２により生成された外部パラメータからカメラ７０８の仰角αを取得する。基準面算出部２０８は、カメラ７０８の仰角α（単位は度）について、θ＝９０（度）−αを計算することによりθを算出する。この場合、自然な表示を実現できる。外部パラメータは、世界座標系とカメラ座標系との間の絶対的な位置Ｔと姿勢Ｒ（回転行列）の関係を表すものであり、このうちＲのもつ成分を利用することで所望のαを算出することができる。カメラ７０８の仰角αはカメラ７０８の位置にも依存する。例えば、カメラ７０８がより高いところに設置されると、カメラ７０８の仰角αも、より大きくなる。 The reference plane calculation unit 208 acquires the elevation angle α of the camera 708 from the external parameter generated by the calibration unit 202. The reference plane calculation unit 208 calculates θ by calculating θ = 90 (degrees) −α with respect to the elevation angle α (unit: degrees) of the camera 708. In this case, natural display can be realized. The external parameter represents the relationship between the absolute position T and the posture R (rotation matrix) between the world coordinate system and the camera coordinate system, and a desired α can be obtained by using a component of R among them. Can be calculated. The elevation angle α of the camera 708 also depends on the position of the camera 708. For example, when the camera 708 is installed at a higher position, the elevation angle α of the camera 708 also becomes larger.

例えば、バレーボールなどの比較的近い距離で撮影されるシーンの場合、カメラ７０８の仰角αに合わせて、ビルボード７１０がフィールド７０６となす角度θを変更して配置したほうが自然な見え方となる。人物の像は実際には斜め上から撮影されているからである。特に、遠近感の観点で好適である。すなわち、斜め上から近距離で人物を撮影すると、カメラに近い頭部が、カメラから遠い胴部よりも相対的に大きく写る。この画像を切り出してビルボードとし、フィールドに垂直に立たせた場合、フィールド上の仮想視点（視線はフィールドと平行）からそのビルボードを見ると、あたかもカメラ側に傾いているように見えて不自然である。そこで、カメラの仰角分だけビルボードを反対側に傾かせることで、頭部と胴部とのプロポーションについてより違和感の低減された表現が可能となる。 For example, in the case of a scene shot at a relatively close distance such as volleyball, it is more natural to arrange the billboard 710 by changing the angle θ formed with the field 706 in accordance with the elevation angle α of the camera 708. This is because the image of a person is actually taken obliquely from above. In particular, it is suitable from the viewpoint of perspective. That is, when a person is photographed at a close distance from obliquely above, the head near the camera appears relatively larger than the torso far from the camera. When this image is cut out and used as a billboard, and standing vertically on the field, when the billboard is viewed from a virtual viewpoint on the field (the line of sight is parallel to the field), it looks as if it is tilted toward the camera, which is unnatural. It is. Therefore, by tilting the billboard to the opposite side by the elevation angle of the camera, it is possible to express the proportion of the head and torso with a more uncomfortable feeling.

［被投影点算出部２１２］
図３に戻り、被投影点算出部２１２は、算出されたモデル代表点の座標およびキャリブレーション部２０２によって生成されたカメラパラメータを用いて、ビルボード側の被投影点を決定する。従来のビルボード方式では、ビルボードの最下部の点を被投影点としてフィールドに投影していた。本実施の形態に係る画像処理装置２００では、三次元モデル側を基準としているため、フレームごとにビルボード内での被投影点が異なる。被投影点算出部２１２は、モデル代表点（の座標）をカメラの画像平面に投影することで被投影点を決定する。自由視点映像生成部２１４はビルボードを、その被投影点がビルボード基準面上かつモデル代表点と一致するように配置する。 [Projected point calculation unit 212]
Returning to FIG. 3, the projected point calculation unit 212 determines the projected point on the billboard side using the calculated model representative point coordinates and the camera parameters generated by the calibration unit 202. In the conventional billboard method, the lowest point of the billboard is projected onto the field as a projection point. In the image processing apparatus 200 according to the present embodiment, since the 3D model side is used as a reference, the projected point in the billboard is different for each frame. The projected point calculation unit 212 determines the projected point by projecting the model representative point (coordinates thereof) onto the image plane of the camera. The free viewpoint video generation unit 214 arranges the billboard so that the projected point is on the billboard reference plane and coincides with the model representative point.

［ビルボード生成部２１０］
ビルボード生成部２１０は、カメラからの画像と背景差分部２０４により生成された前景マスクと被投影点算出部２１２によって算出された被投影点とを用いてビルボードを生成する。図８（ａ）〜（ｃ）は、ビルボード生成部２１０におけるビルボードの生成処理を説明するための模式図である。図８（ａ）〜（ｃ）のそれぞれにおいて、カメラからの画像に前景マスクを適用して得られるテクスチャ８０２が示される。図８（ａ）に示されるように被投影点８０６が被写体の内部にありかつ被写体のマスク領域の外接矩形８０４に含まれる場合、ビルボード生成部２１０はその外接矩形８０４を切り出してビルボードとする。図８（ｂ）に示されるように被投影点８１０が被写体の外部にあるが依然として被写体のマスク領域の外接矩形８０４に含まれる場合、ビルボード生成部２１０はその外接矩形８０４を切り出してビルボードとする。一方、図８（ｃ）に示されるように被投影点８１４がマスク領域の外接矩形に含まれない場合、ビルボード生成部２１０は被投影点８１４とマスク領域とを含む最小の矩形領域８１２を切り出してビルボードとする。ここで、ビルボード生成部２１０は、マスク領域に含まれる画素についてはビルボードにおいても実画像の画素値を割り当て、その他の領域については画素値をもたない（透過扱い）ように設定する。 [Billboard generator 210]
The billboard generation unit 210 generates a billboard using the image from the camera, the foreground mask generated by the background difference unit 204, and the projection point calculated by the projection point calculation unit 212. FIGS. 8A to 8C are schematic diagrams for explaining billboard generation processing in the billboard generation unit 210. FIG. In each of FIGS. 8A to 8C, a texture 802 obtained by applying a foreground mask to an image from a camera is shown. As shown in FIG. 8A, when the projection point 806 is inside the subject and included in the circumscribed rectangle 804 of the mask area of the subject, the billboard generation unit 210 cuts out the circumscribed rectangle 804 and creates a billboard. To do. As shown in FIG. 8B, when the projection point 810 is outside the subject but is still included in the circumscribed rectangle 804 of the mask area of the subject, the billboard generation unit 210 cuts out the circumscribed rectangle 804 and extracts the billboard. And On the other hand, as shown in FIG. 8C, when the projection point 814 is not included in the circumscribed rectangle of the mask area, the billboard generator 210 generates a minimum rectangular area 812 including the projection point 814 and the mask area. Cut out to make a billboard. Here, the billboard generation unit 210 assigns the pixel values of the actual image to the pixels included in the mask area, and sets the other areas to have no pixel values (transparency treatment).

なお、図８（ｂ）、（ｃ）に示されるような、モデル代表点が三次元モデルの外部に設定される状況としては、例えば平滑化部２２０における時間軸方向の平滑化の結果、モデル代表点が三次元モデルからはみ出す状況が考えられる。また、ビルボードの被投影点をフィールド上に置くために、モデル代表点算出部２１８が三次元モデルの真下のフィールド上の点をモデル代表点として設定する状況では、ジャンプしている被写体の三次元モデルの下のほうにモデル代表点が設定される。 In addition, as a situation where the model representative points are set outside the three-dimensional model as shown in FIGS. 8B and 8C, for example, as a result of smoothing in the time axis direction in the smoothing unit 220, the model A situation where the representative point protrudes from the three-dimensional model can be considered. Also, in order to place the projected point of the billboard on the field, the model representative point calculation unit 218 sets the point on the field directly below the 3D model as the model representative point. A model representative point is set at the bottom of the original model.

［自由視点映像生成部２１４］
図３に戻り、自由視点映像生成部２１４は、キャリブレーション部２０２で生成されたカメラパラメータを用いて、ビルボード生成部２１０により生成されたビルボードをモデル代表点に配置することによって、仮想視点に対応する合成画像を生成する。自由視点映像生成部２１４は、ユーザにより指定された仮想視点の情報、例えば仮想視点の座標を取得する。この合成画像は自由視点映像の１フレームとなる。自由視点映像生成部２１４により配置されるビルボードはフィールドと垂直であるとは限らず、角度θを保つ。すなわち、ビルボードは、仮想視点の垂直方向の移動についてはフィールドとの角度θを保ち、水平方向の移動についてはフィールドに垂直な軸の周りで回転することで仮想視点に正対する。自由視点映像生成部２１４は、上記の処理をフレームごとに連続して行うことで、自由視点映像を生成する。 [Free viewpoint video generation unit 214]
Returning to FIG. 3, the free viewpoint video generation unit 214 uses the camera parameters generated by the calibration unit 202 to place the billboard generated by the billboard generation unit 210 at the model representative point, thereby generating a virtual viewpoint. A composite image corresponding to is generated. The free viewpoint video generation unit 214 acquires information on the virtual viewpoint designated by the user, for example, the coordinates of the virtual viewpoint. This synthesized image is one frame of a free viewpoint video. The billboard arranged by the free viewpoint video generation unit 214 is not necessarily perpendicular to the field, and maintains the angle θ. That is, the billboard keeps the angle θ with the field for the vertical movement of the virtual viewpoint, and faces the virtual viewpoint by rotating around the axis perpendicular to the field for the horizontal movement. The free viewpoint video generation unit 214 generates the free viewpoint video by performing the above processing continuously for each frame.

以上の構成による画像処理装置２００の動作を説明する。
図９は、画像処理装置２００における一連の処理の流れを示すフローチャートである。画像処理装置２００は、フィールド上の被写体の周りに設定された複数の視点のそれぞれに配置されたカメラから、被写体の像を含む画像を取得する（Ｓ９０２）。画像処理装置２００は、取得された画像のそれぞれに背景差分法を適用することで前景マスクを生成する（Ｓ９０６）。画像処理装置２００は、生成された複数の前景マスクから被写体の三次元モデルを生成する（Ｓ９０８）。画像処理装置２００は、生成された三次元モデルのモデル代表点の座標を算出する（Ｓ９１０）。画像処理装置２００は、ステップＳ９０２で取得された画像および対応する前景マスクを用いて被写体のビルボードを生成する（Ｓ９１２）。画像処理装置２００は、生成されたビルボードをステップＳ９１０で算出された座標に配置することによって、仮想視点から見た画像を合成する（Ｓ９１４）。 The operation of the image processing apparatus 200 having the above configuration will be described.
FIG. 9 is a flowchart showing a flow of a series of processes in the image processing apparatus 200. The image processing apparatus 200 acquires an image including the image of the subject from the cameras arranged at each of a plurality of viewpoints set around the subject on the field (S902). The image processing apparatus 200 generates a foreground mask by applying the background difference method to each of the acquired images (S906). The image processing apparatus 200 generates a three-dimensional model of the subject from the generated foreground masks (S908). The image processing apparatus 200 calculates the coordinates of the model representative point of the generated three-dimensional model (S910). The image processing apparatus 200 generates a billboard for the subject using the image acquired in step S902 and the corresponding foreground mask (S912). The image processing apparatus 200 synthesizes the image viewed from the virtual viewpoint by placing the generated billboard at the coordinates calculated in step S910 (S914).

本明細書の記載に基づき、各部を、図示しないＣＰＵや、インストールされたアプリケーションプログラムのモジュールや、システムプログラムのモジュールや、ハードディスクから読み出したデータの内容を一時的に記憶する半導体メモリなどにより実現できることは本明細書に触れた当業者には理解される。 Based on the description in this specification, each unit can be realized by a CPU (not shown), a module of an installed application program, a module of a system program, a semiconductor memory that temporarily stores the contents of data read from a hard disk, or the like. Will be understood by those of ordinary skill in the art having touched this specification.

本実施の形態に係る画像処理装置２００によると、従来のビルボード方式で課せられる制約に起因する不自然な表示を解消することができる。本実施の形態は、被写体の概形を表す三次元モデル（厚みのあるモデル）を利用してビルボードが配置されるべき位置を決定するので、表示負荷が軽いというビルボードの利点を活かした上で、より自然な表示が可能となる。 According to the image processing apparatus 200 according to the present embodiment, it is possible to eliminate an unnatural display caused by restrictions imposed by the conventional billboard method. In this embodiment, since the position where the billboard should be placed is determined using a three-dimensional model (thick model) representing the outline of the subject, the advantage of the billboard that the display load is light is utilized. Above, a more natural display is possible.

また、本実施の形態に係る画像処理装置２００によると、被写体の空中移動中の位置情報を被写体の姿勢に頑健な態様で推定するので、動画にした際に滑らかな移動を伴う表示を実現することができる。さらに、推定の結果得られる座標を時間軸方向に平滑化することで、より滑らかな表現が可能となる。 In addition, according to the image processing apparatus 200 according to the present embodiment, the position information during the air movement of the subject is estimated in a manner that is robust to the posture of the subject, so that a display with smooth movement is realized when a moving image is created. be able to. Furthermore, smoother expression is possible by smoothing the coordinates obtained as a result of estimation in the time axis direction.

以上、実施の形態に係る画像処理装置２００の構成と動作について説明した。この実施の形態は例示であり、各構成要素や各処理の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解される。 The configuration and operation of the image processing apparatus 200 according to the embodiment have been described above. This embodiment is an exemplification, and it will be understood by those skilled in the art that various modifications can be made to each component and combination of processes, and such modifications are within the scope of the present invention.

実施の形態では、被写体の三次元モデルを生成し、生成された三次元モデルを用いてビルボードが配置される座標を算出する場合について説明したが、これに限られず、複数のカメラから取得された複数の画像から、実空間における被写体の位置を表す座標を算出してもよい。 In the embodiment, a case has been described in which a three-dimensional model of a subject is generated and coordinates where a billboard is arranged are calculated using the generated three-dimensional model. However, the present invention is not limited to this, and is acquired from a plurality of cameras. The coordinates representing the position of the subject in the real space may be calculated from the plurality of images.

図１０は、複数のカメラ１５０、１５２、１５４で撮影された複数の画像１５０ａ、１５２ａからビルボードを配置すべき座標を決定する変形例に係る方法の説明図である。本変形例に係る画像処理装置は、複数のカメラ１５０、１５２で撮影された複数の画像１５０ａ、１５２ａを取得する。画像処理装置は、第１カメラ１５０で撮影された第１画像１５０ａに写る被写体１５４の対象部位１５４ａの像１５０ｂを通る第１カメラ１５０の第１光線１５０ｃを特定する。画像処理装置は、第２カメラ１５２で撮影された第２画像１５２ａに写る被写体１５４の対象部位１５４ａの像１５２ｂを通る第２カメラ１５２の第２光線１５２ｃを特定する。画像処理装置は、同様に第３カメラ１５６の第３光線１５６ｃを特定する。 FIG. 10 is an explanatory diagram of a method according to a modified example for determining coordinates where a billboard is to be arranged from a plurality of images 150a, 152a taken by a plurality of cameras 150, 152, 154. The image processing apparatus according to this modification acquires a plurality of images 150a and 152a taken by a plurality of cameras 150 and 152. The image processing apparatus identifies the first light ray 150c of the first camera 150 that passes through the image 150b of the target portion 154a of the subject 154 that is captured in the first image 150a captured by the first camera 150. The image processing device identifies the second light ray 152c of the second camera 152 that passes through the image 152b of the target portion 154a of the subject 154 that appears in the second image 152a photographed by the second camera 152. Similarly, the image processing apparatus specifies the third light beam 156c of the third camera 156.

画像処理装置は、複数のカメラ１５０、１５２、１５６の光線１５０ｃ、１５２ｃ、１５６ｃと最も距離の近い空間座標を、対象部位１５４ａの座標とする。画像処理装置は、決定された対象部位１５４ａの座標を、被写体１５４のビルボードが配置されるべき座標として特定する。この場合、ビルボードを配置すべき座標を特定するために三次元モデルを生成する必要はなく、処理量を低減できる。 The image processing apparatus sets the spatial coordinates closest to the light rays 150c, 152c, and 156c of the plurality of cameras 150, 152, and 156 as the coordinates of the target portion 154a. The image processing apparatus specifies the determined coordinates of the target portion 154a as the coordinates where the billboard of the subject 154 should be placed. In this case, it is not necessary to generate a three-dimensional model in order to specify the coordinates where the billboard is to be arranged, and the processing amount can be reduced.

１１０自由視点画像配信システム、１１２ネットワーク、１１４携帯端末、２００画像処理装置。 110 free viewpoint image distribution system, 112 network, 114 mobile terminal, 200 image processing apparatus.

Claims

Means for acquiring a plurality of images obtained by imaging a subject in real space from a plurality of viewpoints;
Means for calculating coordinates representing the position of the subject in real space from the plurality of acquired images;
Corresponding to viewpoints not included in the plurality of viewpoints by arranging the billboard of the subject generated from at least one of the acquired images with reference to the calculated coordinates And an image processing apparatus.

The means for calculating is
Means for generating a three-dimensional model of the subject from the plurality of acquired images;
The image processing apparatus according to claim 1, further comprising: means for calculating coordinates of the representative point of the generated three-dimensional model as coordinates representing the position of the subject in real space.

The image processing apparatus according to claim 2, wherein the means for calculating the coordinates of the representative point calculates the coordinates of the representative point by statistically processing the generated three-dimensional model in a space.

The image processing apparatus according to claim 2, wherein the means for calculating the coordinates of the representative point statistically processes the coordinates of the generated representative point of the three-dimensional model in a time axis direction.

When the subject is a person or an animal,
5. The image processing apparatus according to claim 2, wherein the means for calculating the coordinates of the representative point calculates the coordinates of the representative point based on a skeleton of the subject.

Means for generating a mask for separating a foreground and a background for each of the plurality of acquired images;
Means for generating a billboard of the subject using the generated mask;
Further comprising
The image processing apparatus according to claim 2, wherein the unit that generates the three-dimensional model of the subject generates the three-dimensional model of the subject based on the generated mask.

6. The image processing according to claim 1, further comprising means for generating a billboard of the subject based on a point on the image plane of the at least one image corresponding to the calculated coordinates. apparatus.

The image processing apparatus according to any one of claims 1 to 7, further comprising means for setting an angle formed by a billboard of the subject with a field according to a position of a viewpoint corresponding to the at least one image.

Acquiring a plurality of images obtained by imaging a subject in real space from a plurality of viewpoints;
Calculating coordinates representing the position of the subject in real space from the acquired plurality of images;
Corresponding to viewpoints not included in the plurality of viewpoints by arranging the billboard of the subject generated from at least one of the acquired images with reference to the calculated coordinates Generating a composite image to be processed.

A function of acquiring a plurality of images obtained by imaging a subject in real space from a plurality of viewpoints;
A function of calculating coordinates representing the position of the subject in real space from the acquired plurality of images;
Corresponding to viewpoints not included in the plurality of viewpoints by arranging the billboard of the subject generated from at least one of the acquired images with reference to the calculated coordinates A computer program for causing a computer to realize a function of generating a composite image.