JP7165254B2

JP7165254B2 - System and method for reproducing replay video of free-viewpoint video

Info

Publication number: JP7165254B2
Application number: JP2021206217A
Authority: JP
Inventors: 良亮渡邊; 敬介野中
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-02-23
Filing date: 2021-12-20
Publication date: 2022-11-02
Anticipated expiration: 2038-02-23
Also published as: JP2022036123A

Description

本発明は、自由視点映像のリプレイ映像を再生するシステムおよび方法に係り、特に、自由視点映像のリプレイ映像の構築に係る負荷を軽減することで、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法に関する。 The present invention relates to a system and method for reproducing a replay video of a free viewpoint video, and in particular, by reducing the load related to constructing a replay video of a free viewpoint video, replaying a free viewpoint video even on a viewing terminal with inferior processing capability. The present invention relates to systems and methods for enabling video playback.

複数のカメラから撮影した映像に基づいて、実際にはカメラが置かれていない仮想視点からの映像視聴を可能とする技術として、特許文献１や非特許文献１のような自由視点映像技術が提案されてきた。スポーツの競技場などに複数のカメラを配置し、これら複数のカメラからの映像を基に自由視点映像を生成することによって、ユーザは自分が観たい任意の仮想視点からの映像視聴を楽しむことが可能である。 Free-viewpoint video technology, such as Patent Document 1 and Non-Patent Document 1, has been proposed as a technology that enables video viewing from virtual viewpoints where no cameras are actually placed, based on video captured by multiple cameras. It has been. By arranging multiple cameras in a sports stadium or the like and generating free-viewpoint images based on images from these multiple cameras, users can enjoy viewing images from arbitrary virtual viewpoints that they want to see. It is possible.

このような自由視点映像技術を用いて、ユーザが選択した仮想視点からの映像を記録することを考えた場合、表示されている画面をキャプチャし、例えば非特許文献２に記載されるH.264 (MPEG-4 AVC) のような既存の動画フォーマットで動画ファイルにして保存を行うことが考えられる。こうして記録された動画ファイルはストレージなどに保存され、再び見たいときに再生を行ったり、保存されたファイルを他ユーザへ送ることで他ユーザとの動画の共有を行ったりすることができる。 When considering recording a video from a virtual viewpoint selected by the user using such a free-viewpoint video technology, the displayed screen is captured and, for example, H.264 described in Non-Patent Document 2 is used. It is conceivable to save the video as a video file in an existing video format such as (MPEG-4 AVC). The video files recorded in this way are saved in a storage or the like, and can be played back when desired to be viewed again, or can be shared with other users by sending the saved files to other users.

また、直接動画を記録せずに、ユーザが選択した特定の視点からの映像を記録し、後から再生する例としては、特許文献２に示されているようなゲーム装置及びゲームのリプレイ方法がある。この発明は、プレイヤのゲーム操作に関する履歴をリプレイデータとして保存し、後にリプレイデータに基づいてプレイヤのプレイ画像を再現するものである。 In addition, as an example of recording a video from a specific viewpoint selected by a user without directly recording a moving image and playing it back later, there is a game device and a game replay method as disclosed in Patent Document 2. be. The present invention saves a history of game operations performed by a player as replay data, and later reproduces a play image of the player based on the replay data.

特願２０１７－１６７４７２号Japanese Patent Application No. 2017-167472 特開２０１０－２６４１７３号公報JP 2010-264173 A

T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium"In Proc of IEEE/ACM Conference on ISMAR, pp. 178-187(2003)T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium"In Proc of IEEE/ACM Conference on ISMAR, pp. 178-187(2003) ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services, "2003年5月ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services," May 2003

自由視点映像技術を用いることで、ユーザは専用のコントローラやスマートフォン、タブレットの画面のタッチ操作に基づいて自由に視点を選択し、任意の視点からの映像視聴を楽しむことが可能となった。通常放送されている地上波等のテレビの映像とは異なり、自由視点映像ではユーザ各々が自由に視点を選択して動かすことが可能であることから、同じ自由視点映像であっても各ユーザの見ている映像はその視点位置や視点の動かし方によって異なってくる。したがって、このようなユーザが見ている視点からの映像を記録することで、オリジナルな映像コンテンツを生み出すことが可能である。 By using free-viewpoint video technology, users can freely select a viewpoint based on touch operations on the dedicated controller, smartphone, or tablet screen, and enjoy watching videos from any viewpoint. Unlike terrestrial TV images that are normally broadcast, free-viewpoint images allow each user to freely select and move their viewpoint. The video being viewed differs depending on the position of the viewpoint and how the viewpoint is moved. Therefore, it is possible to create original video content by recording the video from such a user's viewpoint.

自由視点映像技術が普及した未来を考えたときに、当然このような特定の視点からの映像はインターネットを介して交換され、ＳＮＳなどを介して評価やコメントが付くなど、新しい楽しみ方が誕生すると考えられる。 When we think about the future where free-viewpoint video technology has spread, it is natural that video from such a specific viewpoint will be exchanged via the Internet, and a new way of enjoying it will be born, such as receiving evaluations and comments via SNS. Conceivable.

このように、ある特定の端末にて録画した、特定の視点からの映像（以下、リプレイ映像）を後からもう一度再生したり、他の端末で再生したり、多数のユーザで共有したりしたいという需要に対し、非特許文献２に記載されるような既存の映像符号化フォーマットで保存を行い、作成した動画をやり取りすることは可能であるが、動画の容量が大きくなってしまうという課題が存在していた。 In this way, the video recorded on a specific terminal from a specific viewpoint (hereinafter referred to as replay video) can be played again later, played on other terminals, or shared with many users. In response to demand, it is possible to save in an existing video encoding format as described in Non-Patent Document 2 and exchange the created moving image, but there is a problem that the size of the moving image increases. Was.

特に、自由視点映像はユーザがそれぞれの視点からの映像を生成することができ、一つの自由視点映像から膨大な種類の動画コンテンツを生み出すことが可能であることから、このような問題は顕著に現れるものと考えられる。 In particular, with free-viewpoint video, the user can generate video from each viewpoint, and it is possible to create a huge variety of video content from one free-viewpoint video, so this problem is conspicuous. presumed to appear.

また、視点の動きがわかれば、自由視点映像生成装置に視点の情報を渡すことで特定の視点からの画像を得ることができる。例えば特許文献２に示されているようなゲームのリプレイ機能のように、予め記録された視点の情報を基に、与えられた視点からの映像を再計算して表示を行うことは可能である。 Also, if the movement of the viewpoint is known, an image from a specific viewpoint can be obtained by passing the viewpoint information to the free viewpoint video generation device. For example, it is possible to recalculate and display an image from a given viewpoint based on pre-recorded viewpoint information, such as the game replay function disclosed in Patent Document 2. .

しかしながら、ゲームと異なり自由視点映像では最初に複数台のカメラの映像から３次元空間を再構成する必要があり、この3D空間を再構成するための計算コストは非常に大きい。特に、多数のユーザが同時にリプレイ映像の再構成をサーバに依頼するようなケースを考えた場合、遅延なくリプレイ映像を再構成することは困難である。 However, unlike games, with free-viewpoint video, it is first necessary to reconstruct a 3D space from images captured by multiple cameras, and the computational cost for reconstructing this 3D space is very high. In particular, if a large number of users simultaneously request the server to reconstruct the replay video, it is difficult to reconstruct the replay video without delay.

本発明の目的は、上記の技術課題を解決し、自由視点映像のリプレイ映像の構築に係る負荷を軽減し、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法を提供することにある。 An object of the present invention is to solve the above technical problems, reduce the load associated with constructing a replay video of a free viewpoint video, and enable a viewing terminal with inferior processing capability to reproduce the replay video of the free viewpoint video. It is to provide a method.

上記の目的を達成するために、本発明は、視聴端末と自由視点映像生成装置とをネットワークで接続して構成され、自由視点映像のリプレイ映像を再生するシステムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above objects, the present invention provides a system configured by connecting a viewing terminal and a free-viewpoint video generation device via a network, and reproducing a replay video of the free-viewpoint video, having the following configuration. is characterized by

(1) 視聴端末が、自由視点映像の再生を要求する手段と、再生中の自由視点映像についてリプレイ映像の記録を要求する手段とを具備し、
前記自由視点映像生成装置が、前記再生の要求に応答して、複数のカメラ映像および仮想視点の視点情報に基づいて自由視点映像を生成する手段と、自由視点映像の生成プロセスにおいて、リプレイ映像の再生時刻ごとに仮想視点が記述されたリプレイ用フォーマットを記録する手段と、記録したリプレイ用フォーマットを視聴端末へ転送する手段とを具備し、
前記視聴端末が更に、前記リプレイ用フォーマットに基づいてリプレイに必要な情報を取得する手段と、前記リプレイ用フォーマットおよび取得した情報に基づいてリプレイ映像を再生する手段とを具備し、前記リプレイに必要な情報が、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を含むようにした。 (1) A viewing terminal comprises means for requesting playback of free-viewpoint video and means for requesting recording of replay video for the free-viewpoint video being played,
means for the free-viewpoint video generation device to generate a free-viewpoint video based on viewpoint information of a plurality of camera videos and virtual viewpoints in response to the playback request; comprising means for recording a replay format in which a virtual viewpoint is described for each playback time; and means for transferring the recorded replay format to a viewing terminal;
The viewing terminal further comprises means for acquiring information necessary for replay based on the replay format, and means for reproducing replay video based on the replay format and the acquired information. The information includes a background 3D model and spatial information for rendering an object on the background 3D model.

(2) 前記空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むようにした。 (2) The spatial information includes position information for arranging the mask image of each object and the model of each object.

(3) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段がクラウド上のサーバに実装され、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段がリプレイ映像の視聴端末に実装され、リプレイ用フォーマットがサーバから視聴端末へ転送されて当該視聴端末上に蓄積されるようにした。 (3) The means for generating free-viewpoint video and the means for recording the format for replay are implemented on a server on the cloud, and the means for obtaining information necessary for replay and the means for reproducing replay video are on the replay video viewing terminal. The replay format is transferred from the server to the viewing terminal and stored on the viewing terminal.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 自由視点映像の生成プロセスにおいて、そのリプレイ映像の再構成に流用できる情報の識別子およびリプレイ映像の再構成に必要なパラメータ等の情報がリプレイ用フォーマットに記録される。したがって、リプレイ映像を再構成する際は、リプレイ用フォーマットに記録された識別情報に基づいてリプレイ映像の再構成に流用できる情報を取得し、またリプレイ用フォーマットに記録された情報をパラメータとすることで、リプレイ映像を軽負荷で再生できるようになる。 (1) In the free-viewpoint video generation process, information such as identifiers of information that can be used for reconstruction of the replay video and parameters necessary for reconstruction of the replay video are recorded in the replay format. Therefore, when reconstructing a replay video, information that can be used for reconstructing the replay video is obtained based on the identification information recorded in the replay format, and the information recorded in the replay format is used as a parameter. With this, replay videos can be played back with a light load.

(2) リプレイ映像の再構成に必要な情報として、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を取得するので、リプレイ映像を再構成する際の処理負荷が軽減される。 (2) The background 3D model and spatial information for rendering objects on the background 3D model are acquired as the information necessary for reconstructing the replay video, so the processing load when reconstructing the replay video is reduced. be.

(3) 空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むので、リプレイ映像を再構成する際に、処理負荷の高いこれらの情報を得るための計算が不要になる。 (3) Spatial information includes the mask image of each object and positional information for arranging the model of each object, so when reconstructing the replay video, computation to obtain this information, which has a high processing load, is not required. Become.

(4) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段をクラウド上のサーバに実装し、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段をリプレイ映像の視聴端末に実装すれば、一般的に処理能力の高いサーバに処理負荷の高い計算を負わせることができる。したがって、一般的に処理能力の低い視聴端末でも自由視点映像のリプレイが可能になる。 (4) The means for generating free-viewpoint video and the means for recording replay formats are implemented on a server on the cloud, and the means for acquiring information necessary for replays and the means for reproducing replay videos are installed on replay video viewing terminals. Once implemented, it can offload computationally intensive computations to typically more powerful servers. Therefore, it is possible to replay the free-viewpoint video even in a viewing terminal that generally has a low processing capability.

本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図である。1 is a block diagram showing the configuration of main parts of a free-viewpoint video distribution system according to one embodiment of the present invention; FIG. 仮想視点Pに応じてポリゴンの設置対象となるオブジェクトを切り換える方法を説明するための図である。FIG. 10 is a diagram for explaining a method of switching an object for which a polygon is to be installed according to a virtual viewpoint P; リプレイ用フォーマットの第１の例を示した図である。FIG. 4 is a diagram showing a first example of a replay format; FIG. 視点情報の定義を説明するための図である。FIG. 4 is a diagram for explaining the definition of viewpoint information; リプレイ用フォーマットの構築からリプレイ映像の再生までの手順を示したシーケンスフローである。FIG. 11 is a sequence flow showing a procedure from construction of a replay format to reproduction of replay video; FIG. リプレイ用フォーマットの第２の例を示した図である。FIG. 10 is a diagram showing a second example of a replay format; リプレイ用フォーマットの第３の例を示した図である。FIG. 11 is a diagram showing a third example of a replay format; 本発明の第２実施形態における自由視点映像の生成方法を示した図である。FIG. 10 is a diagram showing a method of generating a free viewpoint video according to the second embodiment of the present invention; リプレイ用フォーマットの第３の例を示した図である。FIG. 11 is a diagram showing a third example of a replay format;

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the main parts of a free-viewpoint video distribution system according to an embodiment of the present invention, and the illustration of configurations unnecessary for the explanation of the present invention is omitted here.

本発明の自由視点映像配信システムは、競技場などに配置されて競技者などのオブジェクトを異なる視点で撮影する複数台のカメラcam、各カメラcamで撮影した映像およびカメラパラメータを記憶するコンテンツサーバ１、複数のカメラ映像、カメラパラメータおよび視点情報に基づいて自由視点映像を生成する自由視点映像生成装置２、および端末ユーザUの操作に応じて仮想視点Pの情報（視点情報）を自由視点映像生成装置２へ提供し、自由視点映像生成装置２が生成した自由視点映像を取得して再生する視聴端末３を主要な構成としている。 The free-viewpoint video distribution system of the present invention includes a plurality of camera cams arranged in a stadium or the like to capture objects such as athletes from different viewpoints, and a content server 1 that stores the video captured by each camera cam and camera parameters. , a free-viewpoint video generation device 2 that generates a free-viewpoint video based on a plurality of camera videos, camera parameters, and viewpoint information; The viewing terminal 3 is provided to the device 2 and acquires and reproduces the free-viewpoint video generated by the free-viewpoint video generation device 2 .

前記コンテンツサーバ１、自由視点映像生成装置２および視聴端末３は、汎用のコンピュータに後述する各機能を実現するアプリケーション（プログラム）を実装して構成しても良いし、あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機として構成しても良い。 The content server 1, the free-viewpoint video generation device 2, and the viewing terminal 3 may be configured by installing applications (programs) for realizing each function described later in a general-purpose computer, or a part of the application may be implemented by hardware. It may be configured as a dedicated machine or a single-function machine in the form of hardware or ROM.

前記コンテンツサーバ１では、撮影されたカメラ映像およびそのカメラパラメータを含むコンテンツが固有のIDで管理されている。図示の例では、サッカーの自由視点コンテンツにはID1、バレーの自由視点コンテンツにはID2、柔道の自由視点コンテンツにはID3が、それぞれ割り当てられている。 In the content server 1, content including captured camera images and their camera parameters is managed with unique IDs. In the illustrated example, ID1 is assigned to soccer free-viewpoint content, ID2 is assigned to volleyball free-viewpoint content, and ID3 is assigned to judo free-viewpoint content.

自由視点映像生成装置２は、自由視点映像生成部２０１およびフォーマット記録部２０２を含み、これらの機能をクラウド上に置かれたサーバに実装することで、自由視点映像生成用サーバとして構成することができる。 The free-viewpoint video generation device 2 includes a free-viewpoint video generation unit 201 and a format recording unit 202, and by implementing these functions in a server placed on the cloud, it can be configured as a server for free-viewpoint video generation. can.

前記自由視点映像生成部２０１は、視点の異なる複数のカメラ映像、各カメラパラメータおよび視聴端末３において端末ユーザUが選択した視点情報を、公知の自由視点技術に適用することで自由視点映像を生成する。 The free-viewpoint video generation unit 201 generates a free-viewpoint video by applying a known free-viewpoint technique to a plurality of camera videos with different viewpoints, each camera parameter, and viewpoint information selected by the terminal user U in the viewing terminal 3. do.

本発明の第１実施形態では、非特許文献１と同様に、３次元空間中のオブジェクトを１枚の長方形ポリゴンで近似し、ユーザが選択する仮想視点Pに応じて、複数のカメラ映像から獲得したテクスチャ情報を長方形ポリゴンに適切にマッピングするビルボード方式を採用した自由視点技術への適用例について説明する。 In the first embodiment of the present invention, as in Non-Patent Document 1, an object in a three-dimensional space is approximated by a single rectangular polygon, and the virtual viewpoint P selected by the user is obtained from multiple camera images. An example of application to a free-viewpoint technology that employs a billboard method that appropriately maps texture information onto rectangular polygons will be described.

自由視点映像生成部２０１は、複数のカメラ映像からオブジェクトを抽出し、その位置を推定する。サッカーの試合の自由視点映像であれば、スタジアム上に登場している選手等の人物がオブジェクトとなる。なお、オブジェクト以外のスタジアムのピッチや観客席などの背景は、プリセットとして予め3Dモデルが手動で作成されて存在しているものとする。 The free-viewpoint video generation unit 201 extracts an object from multiple camera videos and estimates its position. In the case of a free-viewpoint video of a soccer match, the object is a person such as a player appearing on the stadium. In addition, it is assumed that a 3D model is manually created in advance as a preset and exists for the stadium pitch, spectator seats, and other backgrounds other than objects.

自由視点映像生成部２０１は更に、図２に示したように、選択されている仮想視点Pに応じて、各オブジェクトが存在すると推定される位置に、前記仮想視点Pの視線方向と正対する１枚の長方形ポリゴンを設置し、当該長方形ポリゴンに、前記仮想視点Pと視線方向の角度が最も近い実カメラcamのカメラ映像から抽出した当該オブジェクトのテクスチャを表示する。 Further, as shown in FIG. 2, the free-viewpoint video generation unit 201, according to the selected virtual viewpoint P, places a 1 Rectangular polygons are set, and the texture of the object is displayed on the rectangular polygons, extracted from the camera image of the real camera cam whose line-of-sight direction angle is the closest to the virtual viewpoint P.

図２の例では、仮想視点Pの視線角度が実カメラcam３の視線角度に最も近いので、実カメラcam3のカメラ映像から対象オブジェクトのテクスチャを切り出して長方形ポリゴンに張り付ける。このテクスチャの表示を行う際には、対象オブジェクトの形状を表現した二値のマスク画像などを用いることで、長方形ポリゴンのうち対象オブジェクトの存在する部分のみが表示され、他の部分は透過される。そして、仮想視点Pが移動すると、この移動に応じて長方形ポリゴンも回転させ、常に仮想視点Pと正対させることで仮想視点Pからの視聴を違和感なく実現できる。 In the example of FIG. 2, the line-of-sight angle of the virtual viewpoint P is closest to the line-of-sight angle of the real camera cam3, so the texture of the target object is extracted from the camera image of the real camera cam3 and pasted on the rectangular polygon. When displaying this texture, by using a binary mask image that expresses the shape of the target object, only the part of the rectangular polygon where the target object exists is displayed, and the other parts are transparent. . Then, when the virtual viewpoint P moves, the rectangular polygon is also rotated according to this movement, and by always facing the virtual viewpoint P, viewing from the virtual viewpoint P can be realized without discomfort.

フォーマット記録部２０２は、前記自由視点映像生成部２０１が生成した自由視点映像を、後に視聴端末３が少ない計算負荷でリプレイできるようにするための各種パラメータを、前記自由視点映像生成部２０１が自由視点映像を生成するプロセスにおいて取得し、リプレイ用フォーマットに記録する。 The format recording unit 202 allows the free viewpoint video generation unit 201 to freely store various parameters for enabling the viewing terminal 3 to replay the free viewpoint video generated by the free viewpoint video generation unit 201 later with a small calculation load. Acquired in the process of generating viewpoint video and recorded in a replay format.

前記リプレイ用フォーマットの記録は、自由視点映像を再生中の視聴端末３による記録開始要求RQ2に応答して開始され、記録終了要求RQ3に応答して終了する。完成したリプレイ用フォーマットは、前記各要求RQ2，RQ3を送信した視聴端末３へ転送されてストレージ３０１上で管理される。 Recording of the replay format is started in response to a recording start request RQ2 from the viewing terminal 3 that is reproducing the free viewpoint video, and ends in response to a recording end request RQ3. The completed replay format is transferred to the viewing terminal 3 that has transmitted the requests RQ2 and RQ3, and managed on the storage 301. FIG.

前記フォーマット記録部２０２において、自由視点情報記録部２０２ａは、端末ユーザUが視聴端末３を操作することで選択する仮想視点Pの位置、向き、姿勢を含む視点情報を前記リプレイ用フォーマットに記録する。 In the format recording unit 202, the free viewpoint information recording unit 202a records, in the replay format, viewpoint information including the position, orientation, and attitude of the virtual viewpoint P selected by the terminal user U operating the viewing terminal 3. .

空間情報記録部２０２ｂは、選択されている仮想視点Pから見える画像を視聴端末３がレンダリングする際に、その処理負荷を減らすことができる空間情報を記録する。本実施形態では、前記視点情報に基づいて当該視点から見えているオブジェクトを特定し、見えているオブジェクトのみをレンダリングして３次元空間に再構成することを考える。 The spatial information recording unit 202b records spatial information that can reduce the processing load when the viewing terminal 3 renders an image viewed from the selected virtual viewpoint P. FIG. In the present embodiment, it is considered that an object seen from the viewpoint is specified based on the viewpoint information, and only the visible object is rendered and reconstructed in a three-dimensional space.

また、レンダリング処理において処理負荷の高いパラメータ、具体的には各オブジェクトを表示するビルボードの位置および各ビルボードにテクスチャを張り付ける際に、その一部をオブジェクト形状に合わせて透過させるためのマスク画像については、視聴端末３がこれらのパラメータを自由視点映像生成装置２から取得するための識別子のみが記録される。 In addition, parameters with a high processing load in the rendering process, specifically the position of the billboard that displays each object and the mask for making a part of the texture transparent according to the object shape when pasting the texture to each billboard For images, only identifiers for the viewing terminal 3 to acquire these parameters from the free-viewpoint video generating device 2 are recorded.

図２の場合、背景モデル上に４つのオブジェクト（ID1～ID4）が立っているが、選択されている仮想視点Pから見えるオブジェクトは２つ（ID3，ID4）のみである。また、仮想視点Pの視線方向の角度は実カメラcam3の角度に近いので、オブジェクトID3，ID4の位置に設置するビルボードには、実カメラcam3のカメラ映像から取得したテクスチャが張り付けられることを想定する。 In the case of FIG. 2, four objects (ID1 to ID4) stand on the background model, but only two objects (ID3, ID4) are visible from the selected virtual viewpoint P. Also, since the angle of the viewing direction of the virtual viewpoint P is close to the angle of the real camera cam3, it is assumed that the billboards installed at the positions of the objects ID3 and ID4 will have textures obtained from the camera image of the real camera cam3. do.

このような場合、本実施形態では２つのオブジェクトのみを３次元空間に再構成するものとし、これらのビルボード位置、テクスチャおよびマスク画像のみを取得する。なお、オブジェクト、そのビルボード位置、当該ビルボードに張り付けるテクスチャおよびその実カメラならびに当該テクスチャをマスキングするためのマスク画像は相互に紐付けることができるので、これら全てをリプレイ用フォーマットに記録することは冗長となる。 In such a case, in this embodiment, only two objects are reconstructed in 3D space, and only their billboard positions, textures and mask images are obtained. The object, its billboard position, the texture to be pasted on the billboard, the actual camera, and the mask image for masking the texture can be associated with each other, so it is not possible to record all of these in the replay format. redundant.

そこで、本実施形態では後に詳述する図３に示したように、空間情報としては「カメラ番号」，「テクスチャ番号」および「ビルボード位置」のみを記録し、マスク画像の識別子は「テクスチャ番号」で総括している。なお、各ビルボードの位置に関しても、自由視点映像生成装置２がテクスチャ番号と紐付けて記録しているのであれば、必ずしも記録する必要はない。 Therefore, in this embodiment, as shown in FIG. 3, which will be described later in detail, only the "camera number", "texture number" and "billboard position" are recorded as the spatial information, and the identifier of the mask image is the "texture number". ” sums it up. It should be noted that the position of each billboard does not necessarily need to be recorded if the free-viewpoint video generation device 2 records it in association with the texture number.

視聴端末３は、映像を再生できるテレビに専用のコントローラを接続して視点を選択する機能、スマホやタブレットのディスプレイに設けられたタッチスクリーンに対するタッチ操作やスワイプ操作等で視点を選択する機能、あるいは加速度センサを備えたVR端末などで自由視点映像を視聴し、ユーザの動きに合わせて視点を選択する機能で実現できる。 The viewing terminal 3 has a function of selecting a viewpoint by connecting a dedicated controller to a television capable of reproducing video, a function of selecting a viewpoint by a touch operation or a swipe operation on a touch screen provided on a display of a smartphone or a tablet, or a function of selecting a viewpoint. It can be realized by viewing free-viewpoint video on a VR terminal equipped with an acceleration sensor and selecting a viewpoint according to the user's movement.

前記視聴端末３において、ストレージ３０１には、前記自由視点映像生成装置２からネットワーク経由で送信されたリプレイ用フォーマットおよび各種の空間情報が蓄積される。ユーザ操作検知部３０２は、端末ユーザUによる仮想視点の操作RQ1、リプレイ用フォーマットの記録開始要求RQ2および記録終了要求RQ3ならびにリプレイ要求RQ4の各操作を検知して、ネットワーク経由で自由視点映像生成装置２へ送信する。 In the viewing terminal 3, the storage 301 stores the replay format and various spatial information transmitted from the free-viewpoint video generating device 2 via the network. The user operation detection unit 302 detects each operation of the virtual viewpoint operation RQ1 by the terminal user U, the recording start request RQ2 and recording end request RQ3 of the replay format, and the replay request RQ4, and outputs the free viewpoint video generation device via the network. 2.

レンダリング情報取得部３０３は、前記リプレイ要求RQ4に応答して、前記ストレージ３０１に蓄積されているリプレイ用フォーマットを参照し、当該フォーマットがどの自由視点映像のものなのかを特定する。そして、特定した自由視点映像の背景3Dモデルを自由視点映像生成装置２から取得し、更にその背景3Dモデルに基づいて、各フレームの再構成に必要なテクスチャ、マスク画像、ビルボードの位置などの空間情報を自由視点映像生成装置２から取得する。取得した各情報はストレージ３０１に一時記憶される。 In response to the replay request RQ4, the rendering information acquisition unit 303 refers to the replay format stored in the storage 301, and specifies which free viewpoint video the format corresponds to. Then, the background 3D model of the specified free-viewpoint video is acquired from the free-viewpoint video generation device 2, and based on the background 3D model, textures, mask images, billboard positions, etc. necessary for reconstruction of each frame are created. Spatial information is acquired from the free viewpoint video generation device 2 . Each acquired information is temporarily stored in the storage 301 .

前記リプレイ映像再生部３０４は、前記リプレイ用フォーマットおよび前記自由視点映像生成装置２から取得した空間情報に基づいてレンダリングを実行し、前記自由視点映像のリプレイ映像を再生する。 The replay video reproduction unit 304 executes rendering based on the replay format and the spatial information acquired from the free viewpoint video generation device 2, and reproduces the replay video of the free viewpoint video.

本実施例では、自由視点映像のレンダリングを視聴端末３で行うことになるが、視聴端末３では、視点情報によって得られる仮想視点Pからの映像をレンダリングする処理が必要となる。すなわち、取得した背景3Dモデルの各ビルボードの立ち位置に、仮想視点Pに正対するような形でビルボードを立て、そこに取得したテクスチャをマスク画像によりマスクして張り付けることで3D空間を再構成する。 In this embodiment, the viewing terminal 3 renders the free-viewpoint video, but the viewing terminal 3 needs processing to render the video from the virtual viewpoint P obtained from the viewpoint information. In other words, the 3D space is created by placing billboards facing the virtual viewpoint P at the standing position of each billboard in the acquired background 3D model, and masking the acquired texture with a mask image and pasting it there. Reconfigure.

一方、複数のカメラ映像からビルボードの立ち位置を推定し、マスク画像を生成するといったコストの高い処理は行わないことから、近年のスマートフォン等のスペックを鑑みれば、視聴端末側で上記のレンダリングを行うことは十分に可能である。 On the other hand, since it does not perform high-cost processing such as estimating the standing position of the billboard from multiple camera images and generating a mask image, considering the specs of smartphones in recent years, the above rendering can be performed on the viewing terminal side. It is quite possible to do so.

本実施形態によれば、レンダリングを自由視点映像生成装置２において行う場合と比較べて、背景3Dモデルの伝送が一度で済むのみならず、その他のフレームでも、見えているオブジェクトの空間情報のみを伝送すれば良い。例えば、サッカーの映像などを考えたときに、全体の絵に対して選手の存在する領域は非常に小さい場合が多く、毎回レンダリング後の映像を送るよりも、テクスチャやマスク画像だけを伝送した方がデータ量としては軽量で済むケースが多い。 According to this embodiment, compared to the case where the rendering is performed by the free-viewpoint video generation device 2, not only can the background 3D model be transmitted only once, but only the spatial information of the visible object can be transmitted in other frames as well. should be transmitted. For example, when considering a soccer video, it is often the case that the area where the player exists is very small compared to the entire picture, so it is better to transmit only the texture and mask image than to send the rendered video every time. However, in many cases, the amount of data is small.

また、各テクスチャは一度伝送された後、視聴端末において保存され続けるようにすれば、一度リプレイ再構成を行った視聴端末３にはテクスチャが残り続けるため、既にダウンロードされているテクスチャ番号については再度ダウンロードする必要がなく、ネットワークがなくてもリプレイ映像を再構成することが可能になる。 Also, if each texture is transmitted once and then kept stored in the viewing terminal, the texture will continue to remain in the viewing terminal 3 where the replay reconstruction has been performed once. There is no need to download, and it is possible to reconstruct the replay video without a network.

図３は、前記リプレイ用フォーマットの第１の例を示した図であり、ヘッダ情報と時系列情報とで構成されている。 FIG. 3 is a diagram showing a first example of the replay format, which consists of header information and chronological information.

ヘッダ情報において、「自由視点映像ID」は、各リプレイ用フォーマットを一意に識別するために用いられる。この自由視点映像IDは一度記録されればよいことから、フォーマットのヘッダに書き込まれる情報となる。「合計フレーム数」は、当該リプレイ用フォーマットに基づいて再構成されるリプレイ映像のフレーム数であり、再生時間に対応している。 In the header information, the "free viewpoint video ID" is used to uniquely identify each replay format. Since this free-viewpoint video ID only needs to be recorded once, it becomes information written in the header of the format. The "total number of frames" is the number of frames of the replay video reconstructed based on the replay format, and corresponds to the playback time.

時系列情報はフレームごとに生成され、「再生時刻識別子」には、リプレイ映像における各フレームの位置（時刻）を特定する情報が記録される。図示の例では、２フレーム分の視点時系列情報（２１，２２）のみが示されているが、「合計フレーム数」が２００であれば、２００フレーム分の時系列情報が連結されることになる。 Time-series information is generated for each frame, and information specifying the position (time) of each frame in the replay video is recorded in the "playback time identifier". In the illustrated example, only two frames of viewpoint time-series information (21, 22) are shown. Become.

例えば、毎秒３０フレームで１分間の自由視点映像に関して、その開始から１０秒のタイミングで記録開始要求RQ2が検知され、２０秒のタイミングで記録終了要求RQ3が検知されると、時系列情報は、「再生時刻識別子」が３００の情報から６００の情報までを時系列で連結して構成される。なお、「再生時刻識別子」はフレーム番号に限定されるものではなく、絶対的な時刻情報または相対的な時間情報であっても良い。このように、本実施形態では各フレームをリプレイするための情報が時系列で管理されるので、コンテンツサーバに記録されている音声が時刻情報で管理されていれば、時刻ベースで映像及び音声を簡単に同期再生できるようになる。 For example, with respect to free-viewpoint video of 1 minute at 30 frames per second, when a recording start request RQ2 is detected at a timing of 10 seconds from the start and a recording end request RQ3 is detected at a timing of 20 seconds, the time-series information is as follows. A "playback time identifier" is configured by linking information from 300 to 600 in chronological order. Note that the "playback time identifier" is not limited to the frame number, and may be absolute time information or relative time information. As described above, in this embodiment, information for replaying each frame is managed in chronological order. Therefore, if the audio recorded in the content server is managed by time information, video and audio can be reproduced on a time basis. Synchronized playback is easy.

また、本実施形態では仮想視点Pを特定する情報として、図４に示したように、視点の３次元位置座標を表す「視点位置E(ex，ey，ez)」、視点の方向（視線）を表す「視線方向D(dx，dy，dz)」および視点の姿勢情報を表す「姿勢方向U(ux，uy，uz)」を採用し、視点情報が３つの３次元ベクトルの計９つのパラメータで特定される。 Further, in this embodiment, as information for specifying the virtual viewpoint P, as shown in FIG. and "posture direction U (ux, uy, uz)" representing the viewpoint information. identified by

なお「姿勢方向」とは、ある視点位置からある方向を見ている場合に、表示に用いるスクリーンのどちらが上になるのかを示す情報である。視点位置および視線方向が同じであっても、直立した状態で観た映像と、逆立ちした状態で観た映像とでは映像が上下反転するので、どちらが上になるのかという姿勢情報があって初めて、リプレイ動画の再構成が可能となる。 Note that the “orientation direction” is information indicating which of the screens used for display is on top when viewing in a certain direction from a certain viewpoint position. Even if the viewpoint position and line-of-sight direction are the same, the image viewed upright and the image viewed upside down will be upside down. Reconstruction of the replay video becomes possible.

前記「カメラ番号」は、仮想視点Pと方向が最も近い実カメラcamの識別子である。「テクスチャ番号」は、現在の仮想視点Pにおいて見えているオブジェクトのテクスチャの番号である。「ビルボード位置」は、現在の仮想視点において見えているオブジェクトをモデル化するビルボードの座標位置と当該ビルボードに張り付けるテクスチャの識別子との関係を表している。本実施形態では、このような時系列情報が所定の周期、例えばフレーム単位で構築され、前記「再生時刻識別子」で管理されて順次に連結される。 The "camera number" is the identifier of the real camera cam closest to the virtual viewpoint P in direction. “Texture number” is the number of the texture of the object visible at the current virtual viewpoint P. "Billboard position" represents the relationship between the coordinate position of the billboard that models the object seen at the current virtual viewpoint and the identifier of the texture that is attached to the billboard. In this embodiment, such time-series information is constructed in a predetermined cycle, for example, in units of frames, managed by the "reproduction time identifier", and sequentially linked.

図５は、前記リプレイ用フォーマットの構築から当該リプレイ用フォーマットに基づくリプレイ映像の再生までの手順を示したシーケンスフローである。 FIG. 5 is a sequence flow showing the procedure from construction of the replay format to reproduction of replay video based on the replay format.

時刻t1では、視聴端末３から自由視点映像生成装置２へ映像の視聴要求RQ1が送信される。自由視点映像生成装置２は、前記視聴要求RQ1に応答して、時刻t2において映像コンテンツの配信を開始する。時刻t3では、前記映像コンテンツを取得した視聴端末３において前記映像が再生される。 At time t1, a video viewing request RQ1 is transmitted from the viewing terminal 3 to the free viewpoint video generating device 2. FIG. In response to the viewing request RQ1, the free viewpoint video generating device 2 starts distributing the video content at time t2. At time t3, the video is reproduced on the viewing terminal 3 that has acquired the video content.

時刻t4において、端末ユーザUが視聴端末３に対して自由視点映像を視聴するための視点操作を行い、これが前記ユーザ操作検知部３０２により検知されると、時刻t5では、端末ユーザUが選択した仮想視点Pを特定する視点情報が視聴端末３から自由視点映像生成装置２へ転送される。自由視点映像生成装置２では、時刻t6において、自由視点映像生成部２０１が前記視点情報および各カメラ映像に基づいてレンダリングを実施し、自由視点映像を生成する。時刻t7では、前記自由視点映像が視聴端末３へ配信され、時刻t8で再生される。 At time t4, the terminal user U performs a viewpoint operation for viewing a free viewpoint video on the viewing terminal 3. When this is detected by the user operation detection unit 302, at time t5, the terminal user U selects Viewpoint information specifying the virtual viewpoint P is transferred from the viewing terminal 3 to the free-viewpoint video generation device 2 . In the free viewpoint video generation device 2, at time t6, the free viewpoint video generation unit 201 performs rendering based on the viewpoint information and each camera video to generate a free viewpoint video. At time t7, the free viewpoint video is delivered to the viewing terminal 3, and reproduced at time t8.

時刻t9において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録開始を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t10では、記録開始要求RQ2が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t11において、前記フォーマット記録部２０２が前記記録開始要求RQ2に応答して、再生中の自由視点映像に関してリプレイ用フォーマットの記録を開始する。当該リプレイ用フォーマットの記録は、視聴端末３からの記録終了要求RQ3が検知されるまでフレーム単位で繰り返される。 At time t9, the terminal user U operates the viewing terminal 3 to request the recording of the replay video, and when this is detected by the user operation detection unit 302, the recording start request RQ2 is sent from the viewing terminal 3 at time t10. It is transmitted to the free viewpoint video generation device 2 . In the free viewpoint video generating device 2, at time t11, the format recording unit 202 responds to the recording start request RQ2 and starts recording the replay format for the free viewpoint video being reproduced. Recording of the replay format is repeated frame by frame until the recording end request RQ3 from the viewing terminal 3 is detected.

一般に、自由視点映像生成部２０１が自由視点映像を再生している場合、ユーザが選択している仮想視点Pの情報は自由視点映像生成部２０１から取得することが可能である。本実施形態でも、リプレイ映像の記録開始RQ2が検知されると、フォーマット記録部２０２が自由視点映像生成部２０１から視点情報をフレーム単位で取得し、そのパラメータをリプレイ用フォーマットに記録する。 In general, when the free-viewpoint video generation unit 201 is reproducing the free-viewpoint video, information on the virtual viewpoint P selected by the user can be obtained from the free-viewpoint video generation unit 201 . Also in this embodiment, when the replay video recording start RQ2 is detected, the format recording unit 202 acquires the viewpoint information from the free viewpoint video generating unit 201 in units of frames, and records the parameters in the replay format.

本実施形態では、フレーム単位で「視点位置E(ex，ey，ez)」、「視線方向D(dx，dy，dz)」および「姿勢方向U(ux，uy，uz)」が記録される。さらに、現在の視点で見えるオブジェクトのビルボードを立てる位置の位置情報が記録される。さらに、各ビルボードに張り付ける対応オブジェクトのテクスチャ番号およびカメラ番号が記録される。 In this embodiment, "viewpoint position E (ex, ey, ez)", "line-of-sight direction D (dx, dy, dz)", and "attitude direction U (ux, uy, uz)" are recorded in frame units. . In addition, the positional information of the billboard position of the object visible from the current viewpoint is recorded. In addition, the texture number and camera number of the corresponding object to be pasted on each billboard are recorded.

その後、時刻t12において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録終了を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t13では、記録終了要求RQ3が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t14において、フォーマット記録部２０２がリプレイ用フォーマットの記録を終了する。時刻t15では、前記生成されたリプレイ用フォーマットが視聴端末３へ転送され、時刻t16において、視聴端末３のストレージ３０１に蓄積される。 After that, at time t12, the terminal user U operates the viewing terminal 3 to request the end of recording of the replay video. 3 to the free viewpoint video generation device 2 . In the free viewpoint video generation device 2, the format recording unit 202 finishes recording the replay format at time t14. At time t15, the generated replay format is transferred to the viewing terminal 3, and stored in the storage 301 of the viewing terminal 3 at time t16.

その後、時刻t17において、前記自由視点映像のリプレイを所望するユーザが、前記ストレージ上のリプレイ用フォーマットを指定してリプレイを要求し、これが前記ユーザ操作検知部３０２により検知されると、前記レンダリング情報取得部３０３が前記リプレイ用フォーマットを解釈し、フォーマットに記述されている自由視点映像IDに基づいて、当該フォーマットがどの自由視点映像のリプレイ動画なのかを突き止める。 Thereafter, at time t17, the user who desires replay of the free viewpoint video requests replay by specifying a replay format on the storage, and when this is detected by the user operation detection unit 302, the rendering information The acquisition unit 303 interprets the replay format, and finds out which free-viewpoint video replay video the format is based on the free-viewpoint video ID described in the format.

時刻t18では、視聴端末３が映像のリプレイに必要な情報を前記リプレイ用フォーマットに基づいて自由視点映像生成装置２へ要求（RQ4）する。本実施形態では、リプレイ用フォーマットの自由視点映像IDに紐付けられている背景3Dモデルが要求され、時刻t19では、自由視点映像生成装置２が当該要求に応答して背景3Dモデルを配信する。 At time t18, the viewing terminal 3 requests (RQ4) information necessary for video replay from the free viewpoint video generation device 2 based on the replay format. In this embodiment, the background 3D model linked to the free-viewpoint video ID in the replay format is requested, and at time t19, the free-viewpoint video generation device 2 responds to the request and delivers the background 3D model.

時刻t20では、前記リプレイ用フォーマットおよび取得した背景3Dモデルに基づいて、前記リプレイ映像再生部３０４がレンダリングを実施し、自由視点映像のリプレイ映像の再生が開始される。リプレイ映像の再生中、レンダリング情報取得部３０３はフレーム単位で前記リプレイ用フォーマットに基づき、ビルボードの位置、マスク画像およびテクスチャなどの空間情報を自由視点映像生成装置２に要求して取得する。 At time t20, the replay video playback unit 304 performs rendering based on the replay format and the acquired background 3D model, and playback of the free viewpoint video replay video is started. During playback of the replay video, the rendering information acquisition unit 303 requests and acquires spatial information such as billboard positions, mask images, and textures from the free viewpoint video generation device 2 on a frame-by-frame basis based on the replay format.

そして、リプレイ映像再生部３０４が前記フォーマットに記載されている自由視点空間情報に基づいて高効率に3D空間の再構成を行い、その再構成を行った空間に対して、フォーマットに記録されている視点位置から見た画像を、取得した空間情報に基づいてレンダリングすることでリプレイ映像が再構成される。 Then, the replay video playback unit 304 highly efficiently reconstructs the 3D space based on the free viewpoint space information described in the format, and the reconstructed space is recorded in the format. The replay video is reconstructed by rendering the image viewed from the viewpoint position based on the acquired spatial information.

図６は、前記リプレイ用フォーマットの他の例を示した図である。上記の実施形態では、視点情報が視点位置E(ex，ey，ez)、視線方向D(dx，dy，dz)、姿勢方向U(ux，uy，uz)の各３次元ベクトル、９パラメータで表現されるものとして説明した。しかしながら、パラメータが変化しないときに、その冗長性を排除してデータサイズの削減を行う機能を備えてもよく、これはフレーム間でパラメータが変化しないときに、後のパラメータを記述しないことで実現できる。 FIG. 6 is a diagram showing another example of the replay format. In the above embodiment, the viewpoint information is a three-dimensional vector of viewpoint position E (ex, ey, ez), line-of-sight direction D (dx, dy, dz), posture direction U (ux, uy, uz), and nine parameters. described as being represented. However, when the parameter does not change, it may have a function to eliminate the redundancy and reduce the data size. This is realized by not describing the later parameter when the parameter does not change between frames. can.

例えば、視点が平行移動する際は視線方向Dや姿勢方向Uは変化せず、視点位置Eのみが変化する場合がある。本実施形態では、このような視点の動きが検知されると、図６に示したように、次フレームの時系列情報に関しては視線方向Dおよび姿勢方向Uの記録を省略することにより、データサイズの削減および処理負荷の軽減が可能になる。また、記録するパラメータはデータサイズの削減のために、一定の桁数で丸めて近似値として記録してもよい。 For example, when the viewpoint moves in parallel, the line-of-sight direction D and the posture direction U do not change, and only the viewpoint position E may change. In the present embodiment, when such a movement of the viewpoint is detected, as shown in FIG. 6, recording of the line-of-sight direction D and the posture direction U is omitted for the time-series information of the next frame, thereby reducing the data size. can be reduced and the processing load can be reduced. In order to reduce the data size, the parameters to be recorded may be rounded to a certain number of digits and recorded as approximate values.

図７は、前記リプレイ用フォーマットの更に他の例を示した図である。本実施形態では、前記視線方向D(dx，dy，dz)に代えて注視点位置F(fx，fy，fz)を保存するようにした点に特徴がある。注視点位置Fとは、視線方向D上にある特定の一点の位置を示している。視線方向Dは、視点位置Eおよび注視点位置Fから次式(1)で求められる。 FIG. 7 is a diagram showing still another example of the replay format. This embodiment is characterized in that the gaze point position F (fx, fy, fz) is stored instead of the line-of-sight direction D (dx, dy, dz). The point-of-regard position F indicates the position of a specific point on the line-of-sight direction D. FIG. The line-of-sight direction D is obtained from the viewpoint position E and the gazing point position F by the following equation (1).

このように、視点方向D(dx，dy，dz)の代わりに注視点位置F(fx，fy，fz)を採用することにより冗長性を排除できる場合がある。例えば、注視点Fを中心に回転するような動きが視聴端末３のスワイプ操作などに割り当てられていると、注視点Fを中心に回転する動きが多く登場することが考えられる。このような場合、本実施形態によれば注視点位置F(fx，fy，fz)が変化しないので冗長性の排除が可能になる。 In this way, it may be possible to eliminate redundancy by adopting the gaze point position F(fx, fy, fz) instead of the viewpoint direction D(dx, dy, dz). For example, if a motion rotating around the point of gaze F is assigned to a swipe operation of the viewing terminal 3 or the like, it is conceivable that many motions rotating around the point of gaze F appear. In such a case, according to this embodiment, since the point-of-regard position F(fx, fy, fz) does not change, redundancy can be eliminated.

前記視点情報の更に他の例として、回転移動量（回転角度）および平行移動量を視点情報のパラメータとして採用しても良い。 As still another example of the viewpoint information, the amount of rotational movement (rotation angle) and the amount of parallel movement may be employed as parameters of the viewpoint information.

ある視点を得るためには、ワールド座標系の原点を中心としてx軸を中心に回転量θx、y軸を中心に回転量θy、z軸を中心に回転量θzだけ視点を回転させ、さらに視点位置までの平行移動T(tx，ty，tz)を行うことで視点の位置、視線方向および姿勢を特定できる。したがって、回転量θx、θy、θzおよび平行移動量T(tx，ty，tz)の６つのパラメータから視点を再構成できる。 To obtain a certain viewpoint, the viewpoint is rotated around the origin of the world coordinate system by the amount of rotation θx about the x-axis, the amount of rotation θy about the y-axis, and the amount of rotation θz about the z-axis. The position, line-of-sight direction, and orientation of the viewpoint can be specified by performing parallel movement T(tx, ty, tz) to the position. Therefore, the viewpoint can be reconstructed from the six parameters of rotation amounts .theta.x, .theta.y, .theta.z and translation amount T(tx, ty, tz).

なお、この例では少ないパラメータから視点を再構成できるが、回転や平行移動を施す前の、視点のデフォルトの位置や方向、姿勢が明確に決められている必要がある。これはつまり、回転や平行移動などを何も施さない場合、「視点位置はワールド座標系の原点にあり、z軸の正の方向を向いており、姿勢はy軸の正方向を上にしている」といったような初期値が決まっている必要があることを意味しており、視聴端末３のリプレイ映像再生部３０４でも、初期の視点情報を認識している必要がある。この情報はフォーマット自体に書き込んでやり取りしてもよいが、自由視点映像生成部２０１において自由視点映像の再生を行う場合の初期位置を、そのまま初期位置として定めてもよい。 In this example, the viewpoint can be reconstructed from a small number of parameters, but the default position, direction, and orientation of the viewpoint must be clearly determined before rotation and translation are applied. In other words, when no rotation, translation, etc. This means that an initial value such as "is present" must be determined, and the replay video reproduction unit 304 of the viewing terminal 3 also needs to recognize the initial viewpoint information. Although this information may be written in the format itself and exchanged, the initial position when playing back the free-viewpoint video in the free-viewpoint video generation unit 201 may be determined as the initial position.

前記視点情報の更に他の例として、ビュー変換行列を記録する形態を採用しても良い。ビュー変換行列とは、ワールド座標系から視点の座標系（カメラ座標系）への変換を行う変換行列を指し示すものであり、この変換行列を用いれば、視点の位置と方向、姿勢情報について特定することが可能である。ここでは、ビュー座標行列は同次座標系で示されるものすると、４×４の変換行列Mは次式(2)で表される。 As still another example of the viewpoint information, a form of recording a view transformation matrix may be employed. The view transformation matrix indicates a transformation matrix that performs transformation from the world coordinate system to the viewpoint coordinate system (camera coordinate system). Using this transformation matrix, the position and direction of the viewpoint and orientation information can be specified. It is possible. Here, assuming that the view coordinate matrix is represented by a homogeneous coordinate system, the 4×4 transformation matrix M is represented by the following equation (2).

このような行列はOpenGLやDirectXなどの一般に普及した3D表示を行うライブラリにおいて頻繁に使われるものであり、視点位置E(ex，ey，ez)、視線方向D(dx，dy，dz)、姿勢方向U(ux，uy，uz)などからビュー変換行列Mを計算することが多い。したがって、予めビュー変換行列を保存しておけば、ライブラリなどで用いることを考えた場合に、最も簡単に変換行列を取得できるため処理コストが少なくなる。 Such matrices are frequently used in popular 3D display libraries such as OpenGL and DirectX. The view transformation matrix M is often calculated from the direction U(ux, uy, uz) and so on. Therefore, if the view transformation matrix is stored in advance, the processing cost can be reduced because the transformation matrix can be obtained most easily when considering its use in a library or the like.

さらに、上記の各フォーマットの例では、原則として視点情報をそのまま記録したが、図７に示したように、前フレームとの差分値のみを記録するようにしても良い。 Furthermore, in the examples of each format described above, in principle, the viewpoint information is recorded as it is, but as shown in FIG. 7, only the difference value from the previous frame may be recorded.

このような形式のフォーマットでは、フレーム間の差分値は小さくなりやすいため、小さい値が多く書き込まれるという特徴がある。値が小さくなる場合、通常「0」などの同じ値の並びが発生しやすくなることが考えられる。このような、同じ値の並びが発生しやすくなる符号列に対して、ハフマン符号化に代表されるようなエントロピー符号化を行うことによって、更なるデータサイズの削減を実施できる可能性がある。 In such a format, since the difference value between frames tends to be small, many small values are written. When the value is small, it is conceivable that a sequence of the same value such as "0" is more likely to occur. By performing entropy coding, typically Huffman coding, on code strings in which sequences of the same values tend to occur, there is a possibility that the data size can be further reduced.

しかしながら、途中から再生を行いたい場合などには、最初のフレームからの差分を足し合わせて途中のフレームの値を計算しなければならないため、計算コストが大きくなりがちである。したがって、数フレームかに１枚は通常の差分ではないパラメータを記載し、他のフレームでは、前のフレームからの差分値を記録するようなフォーマットとすることも可能である。 However, when it is desired to start reproduction from the middle, the difference from the first frame must be added to calculate the value of the middle frame, which tends to increase the calculation cost. Therefore, it is possible to adopt a format in which parameters other than normal differences are described in one of several frames, and difference values from the previous frame are recorded in other frames.

この場合には、どのフレームが全ての情報を保持したフレームで、どのフレームが差分情報を保持したフレームなのかがわかるようなフォーマットとする必要がある。図７の例では、差分フレームには識別子「D」、差分フレーム以外には識別子「I」を付することで各フレームを区別するようにしている。 In this case, it is necessary to use a format that makes it possible to identify which frame holds all information and which frame holds differential information. In the example of FIG. 7, each frame is distinguished by assigning an identifier "D" to a differential frame and an identifier "I" to a non-differential frame.

前記フォーマット記録部２０２は、上記の各方式で各種のパラメータを各フレームに渡って記述していくことで視点の情報を記録する。自由視点情報記録部２０２ａは、自由視点映像生成部２０１から受け取る視点に関する情報を、フォーマットに記載する形式になるように変換や整形する機能を持たなくてはならない。 The format recording unit 202 records viewpoint information by describing various parameters over each frame in each of the above methods. The free-viewpoint information recording unit 202a must have a function of converting and shaping the information on the viewpoint received from the free-viewpoint video generation unit 201 into a format described in the format.

ここでいう変換や整形とは、例えば自由視点映像生成部２０１で、ユーザの視点を得るためにワールド座標系からカメラ座標系への視点の変換行列（ビュー変換行列）を用いて特定視点からの映像を生成しているとすると、この変換行列を取得して、変換行列から視点の位置座標の３次元ベクトルなどの情報を得るまでの計算処理や、あるいは決まった桁数で記録する数値を切り捨て、丸める処理などの、フォーマットに適した形式へと変換する処理を指す。 The conversion and shaping here means that, for example, the free-viewpoint video generation unit 201 uses a viewpoint conversion matrix (view conversion matrix) from the world coordinate system to the camera coordinate system to obtain the user's viewpoint, and converts the image from a specific viewpoint. Assuming that you are generating an image, you need to obtain this transformation matrix and obtain information such as a 3D vector of the position coordinates of the viewpoint from the transformation matrix, or truncate the numerical value to be recorded to a fixed number of digits. , rounding, etc.

図８は、本発明の第２実施形態が採用する自由視点技術を説明するための図であり、図８は、第２実施形態におけるリプレイ用フォーマットの例を示した図である。 FIG. 8 is a diagram for explaining the free viewpoint technique adopted by the second embodiment of the present invention, and FIG. 8 is a diagram showing an example of a replay format in the second embodiment.

第１実施形態では、自由視点映像生成部２０１がビルボード方式を採用して自由視点映像を生成するものとして説明した。これに対して、本実施形態は特許文献１に示されているように、オブジェクトの3Dモデルの形状を正確に復元する方式（ここでは、「逆投影面を用いたフルモデル方式」と表現する）を採用して自由視点映像を生成する点に特徴がある。 In the first embodiment, the free-viewpoint video generation unit 201 employs the billboard method to generate the free-viewpoint video. On the other hand, as shown in Patent Document 1, the present embodiment is a method for accurately restoring the shape of a 3D model of an object (here, expressed as a "full model method using a back projection plane"). ) to generate free-viewpoint video.

自由視点映像生成部２０１がフルモデル方式を採用する場合、オブジェクトの3D形状を復元するために多数の逆投影面P1，P2…を仮想視点Pに正対する形で並べる。次いで、各逆投影面P1，P2…に対して、背景差分法などで得られた対象オブジェクトのマスク画像を投影し、その視体積を計算することで、逆投影面ごとに3Dモデル化を行い、更に対象オブジェクトのテクスチャ画像をマッピングすることで逆投影面の色付けを行う。したがって、逆投影面を適切に削り出すことで3Dモデルの復元が可能である。 When the free viewpoint video generation unit 201 adopts the full model method, a large number of back projection planes P1, P2, . Next, a mask image of the target object obtained by background subtraction or the like is projected onto each back projection plane P1, P2, etc., and the visual volume is calculated to create a 3D model for each back projection plane. Furthermore, the back projection plane is colored by mapping the texture image of the target object. Therefore, it is possible to restore the 3D model by properly cutting out the back projection plane.

このような手法では、各逆投影面P1，P2…が常に仮想視線Pと直交する形で配置されるため、各逆投影面P1，P2…の位置は仮想視点Pの位置に依存して変化する。フォーマット記録部２０２は、視聴端末３からのリプレイ映像の記録開始要求RQ2に応答してフォーマットの記録を開始する。この際、第１実施形態と同様に、自由視点映像IDおよび合計フレーム数がヘッダに記録され、視点情報も第１実施形態と同様の手法でフレームごとに記録する。 In such a method, the backprojection planes P1, P2, etc. are always arranged orthogonally to the virtual line of sight P, so the positions of the backprojection planes P1, P2, etc. change depending on the position of the virtual viewpoint P. do. The format recording unit 202 starts recording the format in response to a replay video recording start request RQ2 from the viewing terminal 3 . At this time, as in the first embodiment, the free viewpoint video ID and the total number of frames are recorded in the header, and the viewpoint information is also recorded for each frame in the same manner as in the first embodiment.

空間情報記録部２０２ｂは、多数の逆投影面P1，P2…の中で、モデルが生成される面のインデックスのみを空間情報として記録する。すなわち、本実施形態ではモデルが生成されない面のインデックスは記録されない。例えば、図８に示した例では、円筒状のオブジェクトが空間に存在しているが、そのモデルが生成されるのはP2，P3，P4のみである。したがって、図９に示したように、そのインデックスとして「2 3 4」のみが記録される。 The spatial information recording unit 202b records, as spatial information, only the index of the plane on which the model is generated among the many backprojection planes P1, P2, . . . . That is, in this embodiment, the indices of faces for which models are not generated are not recorded. For example, in the example shown in FIG. 8, a cylindrical object exists in space, but only P2, P3, and P4 are modeled for it. Therefore, as shown in FIG. 9, only "2 3 4" is recorded as the index.

例えばサッカーのように、選手が広いフィールド内の一部に離散的に存在する自由視点映像では、フィールド全体に逆投影面を配置するとモデルの生成されない無駄な逆投影面が多く発生し、このような面の計算を、リプレイ動画再生時に再度行うことは無駄である。 For example, in free-viewpoint video, such as soccer, in which players exist discretely in one part of a wide field, if backprojection planes are arranged over the entire field, many useless backprojection planes are generated for which models are not generated. It is useless to re-calculate such aspects when reproducing a replay video.

これに対して、本実施形態では予めモデルの生成される逆投影面と生成されない逆投影面とを識別できるので、効率的なメモリ確保が可能となり、またモデルの生成されない逆投影面に関してはマスク画像を逆投影する計算も不要となるので計算負荷が減ぜられる。 On the other hand, in the present embodiment, it is possible to distinguish in advance between backprojection planes for which models are generated and those for which models are not generated. The calculation load is reduced because the calculation for backprojecting the image is also unnecessary.

特に、本実施形態が採用する特許文献１のフルモデル方式は、GPUを用いて並列計算を行うことが特許文献１でも触れられており、逆投影面の枚数を減らすことは省メモリ化につながる。その結果、メモリのアクセスに要する時間なども減らすことができることから、計算資源の節約と計算の高速化を実現できる。 In particular, Patent Document 1 mentions that the full model method of Patent Document 1, which is adopted in this embodiment, performs parallel calculation using a GPU, and reducing the number of back projection planes leads to memory saving. . As a result, it is possible to reduce the time required for memory access, etc., so that it is possible to save computational resources and increase the speed of computation.

空間情報記録部２０２ｂは、3D空間を再現する際に計算する必要のある投影面のインデックスを記録することで、計算の高速化および計算資源の節約を図る。逆投影面に付するインデックスについては、視点に正対する逆投影面が１０００枚存在する場合、視点に近い方から順番に１～１０００のようにインデックスを振っていく方式が考えられる。図９に示したフォーマットの例では、モデルが生成される３枚の逆投影面P2，P3、P4を代表するインデックスとして「2 3 4」が記録されている。このようにして記録されたリプレイ用フォーマットは、第１実施形態と同様に視聴端末３へ転送されて蓄積され、後にリプレイ時に参照されることになる。 The space information recording unit 202b records the index of the projection plane that needs to be calculated when reproducing the 3D space, thereby speeding up calculation and saving calculation resources. As for the indices assigned to the backprojection planes, if there are 1000 backprojection planes facing the viewpoint, a method of assigning indices from 1 to 1000 in order from the one closest to the viewpoint can be considered. In the format example shown in FIG. 9, "2 3 4" is recorded as an index representing the three backprojection planes P2, P3, and P4 on which models are generated. The replay format recorded in this manner is transferred to and stored in the viewing terminal 3 in the same manner as in the first embodiment, and will be referred to later during replay.

視聴端末３では、リプレイ動画再生部３０４が蓄積されているリプレイ用フォーマットに基づいてリプレイ映像を再構成する。この際、リプレイ映像フォーマットの視点情報に基づいて視点を確定し、この視点に基づいて逆投影面を配置するが、前記インデックスを参照することでモデルが生成されない逆投影面を識別し、当該逆投影面については配置と計算を行わない。 In the viewing terminal 3, the replay video is reconstructed based on the replay format in which the replay video playback unit 304 is stored. At this time, the viewpoint is determined based on the viewpoint information of the replay video format, and the back projection plane is arranged based on this viewpoint. No placement and calculations are performed for the projection plane.

これにより、モデルが生成されることが約束されている逆投影面のみ計算を行って３Ｄ空間を再現することができる。その後、この３Ｄ空間に対して視点からの映像のレンダリングを行い、レンダリング画像を視聴端末３へと伝送することでリプレイ映像の再生を実現する。 As a result, it is possible to reproduce the 3D space by performing calculations only on the back projection plane for which the model is guaranteed to be generated. After that, the video from the viewpoint is rendered in this 3D space, and the rendered image is transmitted to the viewing terminal 3 to realize the reproduction of the replay video.

なお、本実施例では自由視点映像生成装置２においてレンダリングを行っているが、例えば自由視点映像を構成するための全ての動画を視聴端末３に予め配信し、端末側でレンダリングを行うようにしても良い。この場合でも、予めモデルの生じない逆投影面のインデックスを記録しておけば、計算の高速化と計算資源の節約を行うことが可能である。 In this embodiment, rendering is performed in the free-viewpoint video generation device 2. However, for example, all videos for constructing the free-viewpoint video may be delivered to the viewing terminal 3 in advance and rendered on the terminal side. Also good. Even in this case, if the index of the back projection plane on which no model is generated is recorded in advance, it is possible to speed up the calculation and save the calculation resources.

また、このような構成では自由視点映像生成装置２にリプレイ映像再生機能が設けられ、3D空間の完全な再構成が可能となる。このため、リプレイ動画再生機能が視聴端末３のスペックに関する情報を受信し、再生デバイスの解像度や画面サイズに応じて、視点は同じであるが見える視野や画像の縦横比が変わるようにレンダリング画像を出力する機能を備えてもよい。 Moreover, in such a configuration, the free-viewpoint video generation device 2 is provided with a replay video reproduction function, which enables complete reconstruction of the 3D space. For this reason, the replay video playback function receives information about the specs of the viewing terminal 3, and renders the rendered image so that the visual field and the aspect ratio of the image change, although the viewpoint is the same, according to the resolution and screen size of the playback device. A function to output may be provided.

加えて、複数の視聴端末３が自由視点映像生成装置２に対して同時にリプレイ映像の再生を要求した場合に、同一の視点位置かつ同一の時刻のフレームのレンダリング要求があった場合には、レンダリング結果を保存し、使い回すなどの機構を備えてもよい。 In addition, when a plurality of viewing terminals 3 simultaneously request the free viewpoint video generation device 2 to reproduce a replay video, if there is a rendering request for a frame at the same viewpoint position and at the same time, rendering is performed. A mechanism for saving the result and reusing it may be provided.

１…コンテンツサーバ，２…自由視点映像生成装置，３…視聴端末，２０１…自由視点映像生成部，２０２…フォーマット記録部，２０２ａ…自由視点情報記録部，２０２ｂ…空間情報記録部，３０１…ストレージ，３０２…ユーザ操作検知部，３０３…レンダリング情報取得部，３０４…リプレイ映像再生部 REFERENCE SIGNS LIST 1 content server 2 free viewpoint video generation device 3 viewing terminal 201 free viewpoint video generation unit 202 format recording unit 202a free viewpoint information recording unit 202b spatial information recording unit 301 storage , 302... user operation detection unit, 303... rendering information acquisition unit, 304... replay video reproduction unit

Claims

In a system configured by connecting a viewing terminal and a free-viewpoint video generation device via a network and reproducing a replay video of the free-viewpoint video,
the viewing device
means for requesting playback of free-viewpoint video;
means for requesting recording of a replay video for the free-viewpoint video being played;
The free-viewpoint video generation device
means for generating a free viewpoint video based on viewpoint information of a plurality of camera videos and virtual viewpoints in response to the playback request;
means for recording, in response to the recording request, a replay format in which a virtual viewpoint is described for each playback time of the replay video in the free viewpoint video generation process;
means for transferring the recorded replay format to the viewing terminal;
The viewing terminal further
means for acquiring information necessary for replay based on the replay format;
means for executing rendering based on the replay format and the acquired information and reproducing the replay video;
A system for reproducing a replay video of a free-viewpoint video, wherein the information necessary for the replay is a background 3D model and spatial information for rendering an object on the background 3D model.

the replay format includes header information and chronological information;
The time-series information is configured by concatenating a plurality of time-series information recorded at a predetermined cycle,
2. The system for reproducing a replay video of a free-viewpoint video according to claim 1, wherein each piece of time-series information records a reproduction time ID unique to a reproduction position of the replay video based on the time-series information.

the generating process includes a background 3D model generating process for the free-viewpoint video;
The ID of the free viewpoint video is recorded in the replay format,
3. The system according to claim 1, wherein said acquiring means acquires said background 3D model based on an ID of said free viewpoint video.

the generation process includes a viewpoint information generation process;
4. The system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the viewpoint information is recorded in the replay format.

5. A system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the means for generating the free viewpoint video employs billboard type free viewpoint technology.

the generating process includes identifying objects visible in a virtual viewpoint;
A general ID associated with the visible object is recorded in the replay format,
6. The system according to claim 5, wherein said acquiring means acquires rendering information of each object based on said general ID.

the generating process includes generating a mask image for each object;
The mask image is associated with the general ID,
7. The system according to claim 6, wherein said acquiring means acquires a mask image of each object based on said general ID.

the generating process includes extracting the texture of each object from camera footage;
The texture is associated with the general ID,
7. The system according to claim 6, wherein said acquiring means acquires the texture of each object based on said general ID.

5. The replay video of the free viewpoint video according to any one of claims 1 to 4, wherein the means for generating the free viewpoint video employs a full model type free viewpoint technology using a back projection plane. system to do.

The generating process includes a process of arranging a plurality of back projection planes directly facing the virtual viewpoint at the position of the object, projecting a mask image of the object, and performing 3D modeling for each of the back projection planes to restore the 3D model. ,
Record the index of the back projection plane where the 3D model exists in the replay format;
10. The system according to claim 9, wherein said acquiring means acquires information necessary for replay based on the index of said back projection plane.

the means for generating the free viewpoint video and the means for recording the replay format are mounted on a server on the cloud;
a means for acquiring information necessary for the replay and a means for reproducing the replay video are mounted on a replay video viewing terminal;
11. The system for reproducing replay video of free viewpoint video according to any one of claims 1 to 10, wherein the replay format is transferred from a server to a viewing terminal and stored on the viewing terminal.

the means for generating the free viewpoint video, the means for recording the replay format, and the means for acquiring information necessary for the replay are implemented on a server on the cloud;
means for playing back the replay video is mounted on the viewing terminal;
11. The system for reproducing replay video of free viewpoint video according to any one of claims 1 to 10, wherein the replay format is transferred from a server to a viewing terminal and stored on the viewing terminal.

12. The means for recording the replay format starts recording the replay format in response to a recording start request from a viewing terminal, and finishes recording in response to a recording end request. 13. A system for reproducing a replay video of the free viewpoint video according to 12.

In a method for reproducing a replay video of a free-viewpoint video, wherein a viewing terminal and a free-viewpoint video generation device are connected via a network,
the viewing device
request playback of free-viewpoint video,
Request recording of replay video for the free viewpoint video being played,
The free-viewpoint video generation device
Generating a free viewpoint video based on viewpoint information of a plurality of camera videos and a virtual viewpoint in response to the playback request;
recording a replay format in which a virtual viewpoint is described for each playback time of the replay video in the free viewpoint video generation process in response to the recording request;
Transfer the recorded replay format to the viewing terminal,
The viewing terminal further
Acquiring information necessary for replay based on the replay format;
Rendering is performed based on the replay format and the acquired information to reproduce the replay video,
A method for reproducing a replay video of a free-viewpoint video, wherein the information necessary for the replay is a background 3D model and spatial information for rendering an object on the background 3D model.