JP2019145017A

JP2019145017A - System and method for reproducing replay video of free viewpoint video

Info

Publication number: JP2019145017A
Application number: JP2018030991A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe; 敬介野中; Keisuke Nonaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-02-23
Filing date: 2018-02-23
Publication date: 2019-08-29
Anticipated expiration: 2038-02-23
Also published as: JP7054351B2

Abstract

To reproduce a replay video of a free viewpoint video at a low load, and reproduce a replay video of a free viewpoint video even on a viewing terminal in which processing capacity is low.SOLUTION: A replay video reproduction system of a free viewpoint video comprises: a free viewpoint video generation part 201 for generating a free viewpoint video, based on a plurality of camera videos and viewpoint information of a virtual viewpoint; a format recording part 202 for recording a replay format in generation process of the free viewpoint video; a rendering information acquisition part 303 for acquiring information required for replay, based on the replay format; and a replay video reproduction part 304 for reproducing a replay video, based on the replay format and the acquired rendering information. The free viewpoint video generation part 201 and format recording part 202 are implemented on a server on a cloud, and the rendering information acquisition part 303 and the replay video reproduction part 304 are implemented on a viewing terminal 3.SELECTED DRAWING: Figure 1

Description

本発明は、自由視点映像のリプレイ映像を再生するシステムおよび方法に係り、特に、自由視点映像のリプレイ映像の構築に係る負荷を軽減することで、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法に関する。 The present invention relates to a system and method for reproducing a replay video of a free viewpoint video, and more particularly, to replay a free viewpoint video even on a viewing terminal having inferior processing capability by reducing the load related to the construction of the replay video of a free viewpoint video. The present invention relates to a system and a method for enabling video playback.

複数のカメラから撮影した映像に基づいて、実際にはカメラが置かれていない仮想視点からの映像視聴を可能とする技術として、特許文献１や非特許文献１のような自由視点映像技術が提案されてきた。スポーツの競技場などに複数のカメラを配置し、これら複数のカメラからの映像を基に自由視点映像を生成することによって、ユーザは自分が観たい任意の仮想視点からの映像視聴を楽しむことが可能である。 Free viewpoint video technologies such as Patent Literature 1 and Non-Patent Literature 1 are proposed as technologies that enable video viewing from a virtual viewpoint where cameras are not actually placed based on videos taken from a plurality of cameras. It has been. By arranging multiple cameras at sports stadiums and generating free viewpoint images based on the images from these cameras, users can enjoy viewing videos from any virtual viewpoint they want to see. Is possible.

このような自由視点映像技術を用いて、ユーザが選択した仮想視点からの映像を記録することを考えた場合、表示されている画面をキャプチャし、例えば非特許文献２に記載されるH.264 (MPEG-4 AVC) のような既存の動画フォーマットで動画ファイルにして保存を行うことが考えられる。こうして記録された動画ファイルはストレージなどに保存され、再び見たいときに再生を行ったり、保存されたファイルを他ユーザへ送ることで他ユーザとの動画の共有を行ったりすることができる。 When recording a video from a virtual viewpoint selected by the user using such a free viewpoint video technology, the displayed screen is captured, for example, H.264 described in Non-Patent Document 2. It is conceivable to save the video file in an existing video format such as (MPEG-4 AVC). The moving image file recorded in this manner is stored in a storage or the like, and can be played back when the user wants to view it again, or can be shared with another user by sending the stored file to another user.

また、直接動画を記録せずに、ユーザが選択した特定の視点からの映像を記録し、後から再生する例としては、特許文献２に示されているようなゲーム装置及びゲームのリプレイ方法がある。この発明は、プレイヤのゲーム操作に関する履歴をリプレイデータとして保存し、後にリプレイデータに基づいてプレイヤのプレイ画像を再現するものである。 Moreover, as an example of recording a video from a specific viewpoint selected by the user without directly recording a moving image and reproducing the video later, there is a game device and a game replay method as shown in Patent Document 2. is there. According to the present invention, a history relating to a game operation of a player is stored as replay data, and a play image of the player is reproduced based on the replay data later.

特願２０１７−１６７４７２号Japanese Patent Application No. 2017-167472 特開２０１０−２６４１７３号公報JP 2010-264173 A

T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium"In Proc of IEEE/ACM Conference on ISMAR, pp. 178-187(2003)T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium" In Proc of IEEE / ACM Conference on ISMAR, pp. 178-187 (2003) ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services, "2003年5月ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services," May 2003

自由視点映像技術を用いることで、ユーザは専用のコントローラやスマートフォン、タブレットの画面のタッチ操作に基づいて自由に視点を選択し、任意の視点からの映像視聴を楽しむことが可能となった。通常放送されている地上波等のテレビの映像とは異なり、自由視点映像ではユーザ各々が自由に視点を選択して動かすことが可能であることから、同じ自由視点映像であっても各ユーザの見ている映像はその視点位置や視点の動かし方によって異なってくる。したがって、このようなユーザが見ている視点からの映像を記録することで、オリジナルな映像コンテンツを生み出すことが可能である。 Using free-viewpoint video technology, users can freely select a viewpoint based on touch operations on the screens of dedicated controllers, smartphones, and tablets, and enjoy viewing videos from any viewpoint. Unlike television images such as terrestrial broadcasts that are normally broadcast, each user can freely select and move a viewpoint in a free viewpoint video. The video you are viewing varies depending on the viewpoint position and how you move the viewpoint. Therefore, it is possible to generate original video content by recording video from the viewpoint that the user is viewing.

自由視点映像技術が普及した未来を考えたときに、当然このような特定の視点からの映像はインターネットを介して交換され、ＳＮＳなどを介して評価やコメントが付くなど、新しい楽しみ方が誕生すると考えられる。 When thinking about the future of free-viewpoint video technology, it is natural that videos from such a specific point of view will be exchanged via the Internet, and new ways of enjoying will be born, such as evaluation and comments via SNS. Conceivable.

このように、ある特定の端末にて録画した、特定の視点からの映像（以下、リプレイ映像）を後からもう一度再生したり、他の端末で再生したり、多数のユーザで共有したりしたいという需要に対し、非特許文献２に記載されるような既存の映像符号化フォーマットで保存を行い、作成した動画をやり取りすることは可能であるが、動画の容量が大きくなってしまうという課題が存在していた。 In this way, you want to play back a video from a specific viewpoint (hereinafter referred to as a replay video) that was recorded on a specific terminal, and then play it back on another terminal, or share it with many users. In response to demand, it is possible to save in the existing video encoding format as described in Non-Patent Document 2 and exchange the created video, but there is a problem that the capacity of the video will increase Was.

特に、自由視点映像はユーザがそれぞれの視点からの映像を生成することができ、一つの自由視点映像から膨大な種類の動画コンテンツを生み出すことが可能であることから、このような問題は顕著に現れるものと考えられる。 In particular, free viewpoint video allows users to generate videos from different viewpoints, and it is possible to create a huge variety of video content from a single free viewpoint video. It is thought to appear.

また、視点の動きがわかれば、自由視点映像生成装置に視点の情報を渡すことで特定の視点からの画像を得ることができる。例えば特許文献２に示されているようなゲームのリプレイ機能のように、予め記録された視点の情報を基に、与えられた視点からの映像を再計算して表示を行うことは可能である。 Also, if the movement of the viewpoint is known, an image from a specific viewpoint can be obtained by passing the viewpoint information to the free viewpoint video generation device. For example, like the game replay function shown in Patent Document 2, it is possible to recalculate and display the video from a given viewpoint based on the viewpoint information recorded in advance. .

しかしながら、ゲームと異なり自由視点映像では最初に複数台のカメラの映像から３次元空間を再構成する必要があり、この3D空間を再構成するための計算コストは非常に大きい。特に、多数のユーザが同時にリプレイ映像の再構成をサーバに依頼するようなケースを考えた場合、遅延なくリプレイ映像を再構成することは困難である。 However, unlike games, it is necessary to first reconstruct a three-dimensional space from images from a plurality of cameras in a free viewpoint image, and the calculation cost for reconstructing this 3D space is very high. In particular, when considering a case where a large number of users simultaneously request the server to reconstruct a replay video, it is difficult to reconstruct the replay video without delay.

本発明の目的は、上記の技術課題を解決し、自由視点映像のリプレイ映像の構築に係る負荷を軽減し、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法を提供することにある。 An object of the present invention is to solve the above technical problem, to reduce the load related to the construction of a replay video of a free viewpoint video, and to make it possible to play back a replay video of a free viewpoint video even with a viewing terminal having inferior processing capability It is to provide a method.

上記の目的を達成するために、本発明は、自由視点映像のリプレイ映像を再生するシステムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that a system for reproducing a replay video of a free viewpoint video has the following configuration.

(1) 複数のカメラ映像および仮想視点の視点情報に基づいて自由視点映像を生成する手段と、自由視点映像の生成プロセスにおいてリプレイ用フォーマットを記録する手段と、前記リプレイ用フォーマットに基づいてリプレイに必要な情報を取得する手段と、前記リプレイ用フォーマットおよび取得した情報に基づいてリプレイ映像を再生する手段とを具備した。 (1) means for generating a free viewpoint video based on viewpoint information of a plurality of camera videos and virtual viewpoints, means for recording a replay format in a free viewpoint video generation process, and replaying based on the replay format Means for obtaining necessary information, and means for reproducing a replay video based on the replay format and the obtained information.

(2) 前記リプレイに必要な情報が、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を含むようにした。 (2) The information necessary for the replay includes a background 3D model and spatial information for rendering an object on the background 3D model.

(3) 前記空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むようにした。 (3) The spatial information includes position information for arranging a mask image of each object and a model of each object.

(4) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段がクラウド上のサーバに実装され、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段がリプレイ映像の視聴端末に実装され、リプレイ用フォーマットがサーバから視聴端末へ転送されて当該視聴端末上に蓄積されるようにした。 (4) A means for generating a free viewpoint video and a means for recording a replay format are implemented in a server on the cloud, and a means for acquiring information necessary for replay and a means for playing back the replay video are provided in the terminal for viewing the replay video. Implemented, the replay format is transferred from the server to the viewing terminal and stored on the viewing terminal.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 自由視点映像の生成プロセスにおいて、そのリプレイ映像の再構成に流用できる情報の識別子およびリプレイ映像の再構成に必要なパラメータ等の情報がリプレイ用フォーマットに記録される。したがって、リプレイ映像を再構成する際は、リプレイ用フォーマットに記録された識別情報に基づいてリプレイ映像の再構成に流用できる情報を取得し、またリプレイ用フォーマットに記録された情報をパラメータとすることで、リプレイ映像を軽負荷で再生できるようになる。 (1) In a free viewpoint video generation process, information such as identifiers of information that can be used for replay video reconstruction and parameters necessary for replay video reconstruction are recorded in the replay format. Therefore, when replaying a replay video, information that can be used for replay video reconstruction is acquired based on the identification information recorded in the replay format, and the information recorded in the replay format is used as a parameter. Thus, the replay video can be played with a light load.

(2) リプレイ映像の再構成に必要な情報として、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を取得するので、リプレイ映像を再構成する際の処理負荷が軽減される。 (2) As the information necessary for replay video reconstruction, the background 3D model and spatial information for rendering objects on the background 3D model are acquired, reducing the processing load when replay video is reconstructed. The

(3) 空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むので、リプレイ映像を再構成する際に、処理負荷の高いこれらの情報を得るための計算が不要になる。 (3) Since the spatial information includes the position information for placing the mask image of each object and the model of each object, when reconstructing the replay video, calculations for obtaining such information with high processing load are unnecessary. Become.

(4) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段をクラウド上のサーバに実装し、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段をリプレイ映像の視聴端末に実装すれば、一般的に処理能力の高いサーバに処理負荷の高い計算を負わせることができる。したがって、一般的に処理能力の低い視聴端末でも自由視点映像のリプレイが可能になる。 (4) A means for generating a free viewpoint video and a means for recording a replay format are implemented in a server on the cloud, and a means for acquiring information necessary for replay and a means for reproducing the replay video are provided as a viewing terminal for the replay video. If implemented, it is generally possible to place a high processing load on a server with high processing capability. Therefore, it is possible to replay a free viewpoint video even on a viewing terminal having generally low processing capability.

本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図である。It is the block diagram which showed the structure of the principal part of the free viewpoint image | video delivery system which concerns on one Embodiment of this invention. 仮想視点Pに応じてポリゴンの設置対象となるオブジェクトを切り換える方法を説明するための図である。FIG. 6 is a diagram for explaining a method of switching an object on which a polygon is to be placed according to a virtual viewpoint P. リプレイ用フォーマットの第１の例を示した図である。It is the figure which showed the 1st example of the format for replays. 視点情報の定義を説明するための図である。It is a figure for demonstrating the definition of viewpoint information. リプレイ用フォーマットの構築からリプレイ映像の再生までの手順を示したシーケンスフローである。It is the sequence flow which showed the procedure from construction of the format for replay to reproduction | regeneration of a replay image | video. リプレイ用フォーマットの第２の例を示した図である。It is the figure which showed the 2nd example of the format for replays. リプレイ用フォーマットの第３の例を示した図である。It is the figure which showed the 3rd example of the format for replays. 本発明の第２実施形態における自由視点映像の生成方法を示した図である。It is the figure which showed the production | generation method of the free viewpoint image | video in 2nd Embodiment of this invention. リプレイ用フォーマットの第３の例を示した図である。It is the figure which showed the 3rd example of the format for replays.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the main part of a free viewpoint video distribution system according to an embodiment of the present invention, and here, the configuration unnecessary for the description of the present invention is omitted.

本発明の自由視点映像配信システムは、競技場などに配置されて競技者などのオブジェクトを異なる視点で撮影する複数台のカメラcam、各カメラcamで撮影した映像およびカメラパラメータを記憶するコンテンツサーバ１、複数のカメラ映像、カメラパラメータおよび視点情報に基づいて自由視点映像を生成する自由視点映像生成装置２、および端末ユーザUの操作に応じて仮想視点Pの情報（視点情報）を自由視点映像生成装置２へ提供し、自由視点映像生成装置２が生成した自由視点映像を取得して再生する視聴端末３を主要な構成としている。 The free viewpoint video distribution system of the present invention is a content server 1 that stores a plurality of camera cams that are arranged in a stadium or the like and that captures an object such as an athlete from different viewpoints, videos captured by each camera cam, and camera parameters. Free viewpoint video generation device 2 that generates a free viewpoint video based on a plurality of camera videos, camera parameters, and viewpoint information, and free viewpoint video information (viewpoint information) of virtual viewpoint P according to the operation of terminal user U The viewing terminal 3 that is provided to the device 2 and acquires and reproduces the free viewpoint video generated by the free viewpoint video generation device 2 is a main configuration.

前記コンテンツサーバ１、自由視点映像生成装置２および視聴端末３は、汎用のコンピュータに後述する各機能を実現するアプリケーション（プログラム）を実装して構成しても良いし、あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機として構成しても良い。 The content server 1, the free viewpoint video generation device 2, and the viewing terminal 3 may be configured by mounting an application (program) for realizing each function described later on a general-purpose computer, or a part of the application may be hardware. It may be configured as a dedicated machine or a single-function machine in the form of a wear or ROM.

前記コンテンツサーバ１では、撮影されたカメラ映像およびそのカメラパラメータを含むコンテンツが固有のIDで管理されている。図示の例では、サッカーの自由視点コンテンツにはID1、バレーの自由視点コンテンツにはID2、柔道の自由視点コンテンツにはID3が、それぞれ割り当てられている。 In the content server 1, the captured camera video and the content including the camera parameters are managed with a unique ID. In the illustrated example, ID1 is assigned to the free viewpoint content of soccer, ID2 is assigned to the free viewpoint content of valley, and ID3 is assigned to the free viewpoint content of judo.

自由視点映像生成装置２は、自由視点映像生成部２０１およびフォーマット記録部２０２を含み、これらの機能をクラウド上に置かれたサーバに実装することで、自由視点映像生成用サーバとして構成することができる。 The free viewpoint video generation device 2 includes a free viewpoint video generation unit 201 and a format recording unit 202. By implementing these functions on a server placed on the cloud, the free viewpoint video generation unit 2 can be configured as a free viewpoint video generation server. it can.

前記自由視点映像生成部２０１は、視点の異なる複数のカメラ映像、各カメラパラメータおよび視聴端末３において端末ユーザUが選択した視点情報を、公知の自由視点技術に適用することで自由視点映像を生成する。 The free viewpoint video generation unit 201 generates a free viewpoint video by applying a plurality of camera videos with different viewpoints, camera parameters, and viewpoint information selected by the terminal user U in the viewing terminal 3 to a known free viewpoint technology. To do.

本発明の第１実施形態では、非特許文献１と同様に、３次元空間中のオブジェクトを１枚の長方形ポリゴンで近似し、ユーザが選択する仮想視点Pに応じて、複数のカメラ映像から獲得したテクスチャ情報を長方形ポリゴンに適切にマッピングするビルボード方式を採用した自由視点技術への適用例について説明する。 In the first embodiment of the present invention, similar to Non-Patent Document 1, an object in a three-dimensional space is approximated by one rectangular polygon, and acquired from a plurality of camera images according to a virtual viewpoint P selected by the user. An example of application to free viewpoint technology that employs a billboard method for appropriately mapping texture information to rectangular polygons will be described.

自由視点映像生成部２０１は、複数のカメラ映像からオブジェクトを抽出し、その位置を推定する。サッカーの試合の自由視点映像であれば、スタジアム上に登場している選手等の人物がオブジェクトとなる。なお、オブジェクト以外のスタジアムのピッチや観客席などの背景は、プリセットとして予め3Dモデルが手動で作成されて存在しているものとする。 The free viewpoint video generation unit 201 extracts objects from a plurality of camera videos and estimates their positions. In the case of a free viewpoint video of a soccer game, a person such as a player appearing on the stadium becomes an object. It is assumed that a 3D model is manually created in advance as a preset for backgrounds such as stadium pitches and spectator seats other than objects.

自由視点映像生成部２０１は更に、図２に示したように、選択されている仮想視点Pに応じて、各オブジェクトが存在すると推定される位置に、前記仮想視点Pの視線方向と正対する１枚の長方形ポリゴンを設置し、当該長方形ポリゴンに、前記仮想視点Pと視線方向の角度が最も近い実カメラcamのカメラ映像から抽出した当該オブジェクトのテクスチャを表示する。 As shown in FIG. 2, the free viewpoint video generation unit 201 is further configured to face the visual line direction of the virtual viewpoint P at a position where each object is estimated to exist according to the selected virtual viewpoint P. Two rectangular polygons are installed, and the texture of the object extracted from the camera image of the real camera cam having the closest angle in the line-of-sight direction to the virtual viewpoint P is displayed on the rectangular polygons.

図２の例では、仮想視点Pの視線角度が実カメラcam３の視線角度に最も近いので、実カメラcam3のカメラ映像から対象オブジェクトのテクスチャを切り出して長方形ポリゴンに張り付ける。このテクスチャの表示を行う際には、対象オブジェクトの形状を表現した二値のマスク画像などを用いることで、長方形ポリゴンのうち対象オブジェクトの存在する部分のみが表示され、他の部分は透過される。そして、仮想視点Pが移動すると、この移動に応じて長方形ポリゴンも回転させ、常に仮想視点Pと正対させることで仮想視点Pからの視聴を違和感なく実現できる。 In the example of FIG. 2, since the line-of-sight angle of the virtual viewpoint P is closest to the line-of-sight angle of the real camera cam3, the texture of the target object is cut out from the camera image of the real camera cam3 and pasted on the rectangular polygon. When displaying this texture, by using a binary mask image representing the shape of the target object, only the portion of the rectangular polygon where the target object exists is displayed, and the other portions are transparent. . When the virtual viewpoint P moves, the rectangular polygon is also rotated in accordance with this movement, and viewing from the virtual viewpoint P can be realized without a sense of incongruity by always facing the virtual viewpoint P.

フォーマット記録部２０２は、前記自由視点映像生成部２０１が生成した自由視点映像を、後に視聴端末３が少ない計算負荷でリプレイできるようにするための各種パラメータを、前記自由視点映像生成部２０１が自由視点映像を生成するプロセスにおいて取得し、リプレイ用フォーマットに記録する。 The format recording unit 202 provides the free viewpoint video generation unit 201 with various parameters for enabling the viewing terminal 3 to replay the free viewpoint video generated by the free viewpoint video generation unit 201 later with a small calculation load. Acquired in the process of generating the viewpoint video and recorded in the replay format.

前記リプレイ用フォーマットの記録は、自由視点映像を再生中の視聴端末３による記録開始要求RQ2に応答して開始され、記録終了要求RQ3に応答して終了する。完成したリプレイ用フォーマットは、前記各要求RQ2，RQ3を送信した視聴端末３へ転送されてストレージ３０１上で管理される。 The recording of the replay format is started in response to the recording start request RQ2 by the viewing terminal 3 that is reproducing the free viewpoint video, and is ended in response to the recording end request RQ3. The completed replay format is transferred to the viewing terminal 3 that transmitted the requests RQ2 and RQ3 and managed on the storage 301.

前記フォーマット記録部２０２において、自由視点情報記録部２０２ａは、端末ユーザUが視聴端末３を操作することで選択する仮想視点Pの位置、向き、姿勢を含む視点情報を前記リプレイ用フォーマットに記録する。 In the format recording unit 202, the free viewpoint information recording unit 202a records viewpoint information including the position, orientation, and orientation of the virtual viewpoint P selected by the terminal user U operating the viewing terminal 3 in the replay format. .

空間情報記録部２０２ｂは、選択されている仮想視点Pから見える画像を視聴端末３がレンダリングする際に、その処理負荷を減らすことができる空間情報を記録する。本実施形態では、前記視点情報に基づいて当該視点から見えているオブジェクトを特定し、見えているオブジェクトのみをレンダリングして３次元空間に再構成することを考える。 The spatial information recording unit 202b records spatial information that can reduce the processing load when the viewing terminal 3 renders an image seen from the selected virtual viewpoint P. In the present embodiment, it is considered that an object visible from the viewpoint is specified based on the viewpoint information, and only the visible object is rendered and reconstructed into a three-dimensional space.

また、レンダリング処理において処理負荷の高いパラメータ、具体的には各オブジェクトを表示するビルボードの位置および各ビルボードにテクスチャを張り付ける際に、その一部をオブジェクト形状に合わせて透過させるためのマスク画像については、視聴端末３がこれらのパラメータを自由視点映像生成装置２から取得するための識別子のみが記録される。 Also, a parameter that places a high processing load on the rendering process, specifically the position of the billboard that displays each object, and a mask that allows a part of the billboard to be transmitted in accordance with the object shape when pasting the texture on each billboard. For the image, only the identifier for the viewing terminal 3 to acquire these parameters from the free viewpoint video generation device 2 is recorded.

図２の場合、背景モデル上に４つのオブジェクト（ID1〜ID4）が立っているが、選択されている仮想視点Pから見えるオブジェクトは２つ（ID3，ID4）のみである。また、仮想視点Pの視線方向の角度は実カメラcam3の角度に近いので、オブジェクトID3，ID4の位置に設置するビルボードには、実カメラcam3のカメラ映像から取得したテクスチャが張り付けられることを想定する。 In the case of FIG. 2, four objects (ID1 to ID4) are standing on the background model, but only two objects (ID3 and ID4) are visible from the selected virtual viewpoint P. In addition, since the angle of the line of sight of the virtual viewpoint P is close to the angle of the real camera cam3, it is assumed that the texture acquired from the camera image of the real camera cam3 is pasted on the billboard installed at the positions of the object ID3 and ID4 To do.

このような場合、本実施形態では２つのオブジェクトのみを３次元空間に再構成するものとし、これらのビルボード位置、テクスチャおよびマスク画像のみを取得する。なお、オブジェクト、そのビルボード位置、当該ビルボードに張り付けるテクスチャおよびその実カメラならびに当該テクスチャをマスキングするためのマスク画像は相互に紐付けることができるので、これら全てをリプレイ用フォーマットに記録することは冗長となる。 In such a case, in the present embodiment, only two objects are reconstructed into a three-dimensional space, and only the billboard position, texture, and mask image are acquired. Since the object, its billboard position, the texture attached to the billboard, the actual camera, and the mask image for masking the texture can be linked to each other, all of these can be recorded in the replay format. It becomes redundant.

そこで、本実施形態では後に詳述する図３に示したように、空間情報としては「カメラ番号」，「テクスチャ番号」および「ビルボード位置」のみを記録し、マスク画像の識別子は「テクスチャ番号」で総括している。なお、各ビルボードの位置に関しても、自由視点映像生成装置２がテクスチャ番号と紐付けて記録しているのであれば、必ずしも記録する必要はない。 Therefore, in this embodiment, as shown in FIG. 3 described in detail later, only “camera number”, “texture number”, and “billboard position” are recorded as the spatial information, and the identifier of the mask image is “texture number”. Is summarized. Note that the position of each billboard is not necessarily recorded as long as the free viewpoint video generation device 2 records the billboard in association with the texture number.

視聴端末３は、映像を再生できるテレビに専用のコントローラを接続して視点を選択する機能、スマホやタブレットのディスプレイに設けられたタッチスクリーンに対するタッチ操作やスワイプ操作等で視点を選択する機能、あるいは加速度センサを備えたVR端末などで自由視点映像を視聴し、ユーザの動きに合わせて視点を選択する機能で実現できる。 The viewing terminal 3 has a function of selecting a viewpoint by connecting a dedicated controller to a TV capable of reproducing video, a function of selecting a viewpoint by a touch operation or a swipe operation on a touch screen provided on a display of a smartphone or a tablet, or This can be realized with the function of viewing a free viewpoint video on a VR terminal equipped with an acceleration sensor and selecting the viewpoint according to the user's movement.

前記視聴端末３において、ストレージ３０１には、前記自由視点映像生成装置２からネットワーク経由で送信されたリプレイ用フォーマットおよび各種の空間情報が蓄積される。ユーザ操作検知部３０２は、端末ユーザUによる仮想視点の操作RQ1、リプレイ用フォーマットの記録開始要求RQ2および記録終了要求RQ3ならびにリプレイ要求RQ4の各操作を検知して、ネットワーク経由で自由視点映像生成装置２へ送信する。 In the viewing terminal 3, the storage 301 stores the replay format and various spatial information transmitted from the free viewpoint video generation device 2 via the network. The user operation detection unit 302 detects a virtual viewpoint operation RQ1, a replay format recording start request RQ2, a recording end request RQ3, and a replay request RQ4 by the terminal user U, and a free viewpoint video generation device via the network 2 to send.

レンダリング情報取得部３０３は、前記リプレイ操作RQ4に応答して、前記ストレージ３０１に蓄積されているリプレイ用フォーマットを参照し、当該フォーマットがどの自由視点映像のものなのかを特定する。そして、特定した自由視点映像の背景3Dモデルを自由視点映像生成装置２から取得し、更にその背景3Dモデルに基づいて、各フレームの再構成に必要なテクスチャ、マスク画像、ビルボードの位置などの空間情報を自由視点映像生成装置２から取得する。取得した各情報はストレージ３０１に一時記憶される。 In response to the replay operation RQ4, the rendering information acquisition unit 303 refers to the replay format stored in the storage 301 and identifies which free viewpoint video the format is for. Then, the background 3D model of the specified free viewpoint video is acquired from the free viewpoint video generation device 2, and further, based on the background 3D model, the texture, mask image, billboard position, etc. necessary for reconstruction of each frame are acquired. Spatial information is acquired from the free viewpoint video generation device 2. Each acquired information is temporarily stored in the storage 301.

前記リプレイ映像再生部３０４は、前記リプレイ用フォーマットおよび前記自由視点映像生成装置２から取得した空間情報に基づいてレンダリングを実行し、前記自由視点映像のリプレイ映像を再生する。 The replay video reproduction unit 304 performs rendering based on the replay format and the spatial information acquired from the free viewpoint video generation device 2 and reproduces the replay video of the free viewpoint video.

本実施例では、自由視点映像のレンダリングを視聴端末３で行うことになるが、視聴端末３では、視点情報によって得られる仮想視点Pからの映像をレンダリングする処理が必要となる。すなわち、取得した背景3Dモデルの各ビルボードの立ち位置に、仮想視点Pに正対するような形でビルボードを立て、そこに取得したテクスチャをマスク画像によりマスクして張り付けることで3D空間を再構成する。 In the present embodiment, rendering of the free viewpoint video is performed by the viewing terminal 3, but the viewing terminal 3 needs to render a video from the virtual viewpoint P obtained from the viewpoint information. That is, at the standing position of each billboard in the acquired background 3D model, the billboard is set up in a form facing the virtual viewpoint P, and the acquired texture is masked and pasted with the mask image to create the 3D space. Reconfigure.

一方、複数のカメラ映像からビルボードの立ち位置を推定し、マスク画像を生成するといったコストの高い処理は行わないことから、近年のスマートフォン等のスペックを鑑みれば、視聴端末側で上記のレンダリングを行うことは十分に可能である。 On the other hand, since the costly process of estimating the standing position of the billboard from a plurality of camera images and generating a mask image is not performed, the above rendering is performed on the viewing terminal side in view of the specifications of recent smartphones and the like. It is well possible to do.

本実施形態によれば、レンダリングを自由視点映像生成装置２において行う場合と比較べて、背景3Dモデルの伝送が一度で済むのみならず、その他のフレームでも、見えているオブジェクトの空間情報のみを伝送すれば良い。例えば、サッカーの映像などを考えたときに、全体の絵に対して選手の存在する領域は非常に小さい場合が多く、毎回レンダリング後の映像を送るよりも、テクスチャやマスク画像だけを伝送した方がデータ量としては軽量で済むケースが多い。 According to this embodiment, as compared with the case where rendering is performed in the free viewpoint video generation device 2, not only the background 3D model needs to be transmitted once, but also only the spatial information of the visible object in other frames. It may be transmitted. For example, when considering soccer video, the area where players are present is often very small relative to the whole picture, and only the texture and mask images are transmitted rather than sending the rendered video every time. However, there are many cases where the amount of data is light.

また、各テクスチャは一度伝送された後、視聴端末において保存され続けるようにすれば、一度リプレイ再構成を行った視聴端末３にはテクスチャが残り続けるため、既にダウンロードされているテクスチャ番号については再度ダウンロードする必要がなく、ネットワークがなくてもリプレイ映像を再構成することが可能になる。 In addition, if each texture is once transmitted and then stored in the viewing terminal, the texture remains in the viewing terminal 3 that has been replayed once. There is no need to download, and replay video can be reconstructed without a network.

図３は、前記リプレイ用フォーマットの第１の例を示した図であり、ヘッダ情報と時系列情報とで構成されている。 FIG. 3 is a diagram showing a first example of the replay format, which is composed of header information and time-series information.

ヘッダ情報において、「自由視点映像ID」は、各リプレイ用フォーマットを一意に識別するために用いられる。この自由視点映像IDは一度記録されればよいことから、フォーマットのヘッダに書き込まれる情報となる。「合計フレーム数」は、当該リプレイ用フォーマットに基づいて再構成されるリプレイ映像のフレーム数であり、再生時間に対応している。 In the header information, the “free viewpoint video ID” is used to uniquely identify each replay format. Since this free viewpoint video ID only needs to be recorded once, it is information written in the format header. The “total number of frames” is the number of frames of the replay video reconstructed based on the replay format, and corresponds to the playback time.

時系列情報はフレームごとに生成され、「再生時刻識別子」には、リプレイ映像における各フレームの位置（時刻）を特定する情報が記録される。図示の例では、２フレーム分の視点時系列情報（２１，２２）のみが示されているが、「合計フレーム数」が２００であれば、２００フレーム分の時系列情報が連結されることになる。 The time-series information is generated for each frame, and information specifying the position (time) of each frame in the replay video is recorded in the “playback time identifier”. In the illustrated example, only the viewpoint time-series information (21, 22) for two frames is shown. However, if the “total number of frames” is 200, the time-series information for 200 frames is connected. Become.

例えば、毎秒３０フレームで１分間の自由視点映像に関して、その開始から１０秒のタイミングで記録開始要求RQ2が検知され、２０秒のタイミングで記録終了要求RQ3が検知されると、時系列情報は、「再生時刻識別子」が３００の情報から６００の情報までを時系列で連結して構成される。なお、「再生時刻識別子」はフレーム番号に限定されるものではなく、絶対的な時刻情報または相対的な時間情報であっても良い。このように、本実施形態では各フレームをリプレイするための情報が時系列で管理されるので、コンテンツサーバに記録されている音声が時刻情報で管理されていれば、時刻ベースで映像及び音声を簡単に同期再生できるようになる。 For example, when a recording start request RQ2 is detected at a timing of 10 seconds from the start and a recording end request RQ3 is detected at a timing of 20 seconds for a free viewpoint video of 1 minute at 30 frames per second, the time series information is: The “reproduction time identifier” is configured by connecting information from 300 to 600 in time series. The “reproduction time identifier” is not limited to the frame number, and may be absolute time information or relative time information. As described above, in this embodiment, information for replaying each frame is managed in time series. Therefore, if the audio recorded in the content server is managed by the time information, the video and audio can be displayed on a time basis. Synchronized playback can be easily performed.

また、本実施形態では仮想視点Pを特定する情報として、図４に示したように、視点の３次元位置座標を表す「視点位置E(ex，ey，ez)」、視点の方向（視線）を表す「視線方向D(dx，dy，dz)」および視点の姿勢情報を表す「姿勢方向U(ux，uy，uz)」を採用し、視点情報が３つの３次元ベクトルの計９つのパラメータで特定される。 In the present embodiment, as information for specifying the virtual viewpoint P, as shown in FIG. 4, “viewpoint position E (ex, ey, ez)” representing the three-dimensional position coordinates of the viewpoint, and the viewpoint direction (line of sight) "Gaze direction D (dx, dy, dz)" representing the position and "Attitude direction U (ux, uy, uz)" representing the posture information of the viewpoint are adopted, and the viewpoint information is a total of nine parameters of three three-dimensional vectors. Specified by

なお「姿勢方向」とは、ある視点位置からある方向を見ている場合に、表示に用いるスクリーンのどちらが上になるのかを示す情報である。視点位置および視線方向が同じであっても、直立した状態で観た映像と、逆立ちした状態で観た映像とでは映像が上下反転するので、どちらが上になるのかという姿勢情報があって初めて、リプレイ動画の再構成が可能となる。 The “posture direction” is information indicating which of the screens used for display is on when viewing a certain direction from a certain viewpoint position. Even if the viewpoint position and line-of-sight direction are the same, the image is upside down between the image viewed in an upright state and the image viewed in an upside-down state. Replay video can be reconstructed.

前記「カメラ番号」は、仮想視点Pと方向が最も近い実カメラcamの識別子である。「テクスチャ番号」は、現在の仮想視点Pにおいて見えているオブジェクトのテクスチャの番号である。「ビルボード位置」は、現在の仮想視点において見えているオブジェクトをモデル化するビルボードの座標位置と当該ビルボードに張り付けるテクスチャの識別子との関係を表している。本実施形態では、このような時系列情報が所定の周期、例えばフレーム単位で構築され、前記「再生時刻識別子」で管理されて順次に連結される。 The “camera number” is an identifier of the real camera cam whose direction is closest to the virtual viewpoint P. “Texture number” is the number of the texture of the object visible at the current virtual viewpoint P. “Billboard position” represents the relationship between the coordinate position of the billboard that models the object viewed at the current virtual viewpoint and the identifier of the texture that is attached to the billboard. In this embodiment, such time-series information is constructed in a predetermined cycle, for example, in units of frames, managed by the “reproduction time identifier”, and sequentially connected.

図５は、前記リプレイ用フォーマットの構築から当該リプレイ用フォーマットに基づくリプレイ映像の再生までの手順を示したシーケンスフローである。 FIG. 5 is a sequence flow showing a procedure from construction of the replay format to replay video playback based on the replay format.

時刻t1では、視聴端末３から自由視点映像生成装置２へ映像の視聴要求RQ1が送信される。自由視点映像生成装置２は、前記視聴要求RQ1に応答して、時刻t2において映像コンテンツの配信を開始する。時刻t3では、前記映像コンテンツを取得した視聴端末３において前記映像が再生される。 At time t1, a video viewing request RQ1 is transmitted from the viewing terminal 3 to the free viewpoint video generation device 2. In response to the viewing request RQ1, the free viewpoint video generation device 2 starts distributing video content at time t2. At time t3, the video is played on the viewing terminal 3 that acquired the video content.

時刻t4において、端末ユーザUが視聴端末３に対して自由視点映像を視聴するための視点操作を行い、これが前記ユーザ操作検知部３０２により検知されると、時刻t5では、端末ユーザUが選択した仮想視点Pを特定する視点情報が視聴端末３から自由視点映像生成装置２へ転送される。自由視点映像生成装置２では、時刻t6において、自由視点映像生成部２０１が前記視点情報および各カメラ映像に基づいてレンダリングを実施し、自由視点映像を生成する。時刻t7では、前記自由視点映像が視聴端末３へ配信され、時刻t8で再生される。 At time t4, the terminal user U performs a viewpoint operation for viewing a free viewpoint video on the viewing terminal 3, and when this is detected by the user operation detection unit 302, at time t5, the terminal user U has selected. The viewpoint information for specifying the virtual viewpoint P is transferred from the viewing terminal 3 to the free viewpoint video generation apparatus 2. In the free viewpoint video generation device 2, at time t6, the free viewpoint video generation unit 201 performs rendering based on the viewpoint information and each camera video to generate a free viewpoint video. At time t7, the free viewpoint video is distributed to the viewing terminal 3 and reproduced at time t8.

時刻t9において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録開始を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t10では、記録開始要求RQ2が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t11において、前記フォーマット記録部２０２が前記記録開始要求RQ2に応答して、再生中の自由視点映像に関してリプレイ用フォーマットの記録を開始する。当該リプレイ用フォーマットの記録は、視聴端末３からの記録終了要求RQ3が検知されるまでフレーム単位で繰り返される。 At time t9, the terminal user U operates the viewing terminal 3 to request the start of replay video recording, and when this is detected by the user operation detection unit 302, the recording start request RQ2 is received from the viewing terminal 3 at time t10. It is transmitted to the free viewpoint video generation device 2. In the free viewpoint video generation device 2, at time t11, the format recording unit 202 starts recording the replay format for the free viewpoint video being reproduced in response to the recording start request RQ2. The recording of the replay format is repeated in units of frames until a recording end request RQ3 from the viewing terminal 3 is detected.

一般に、自由視点映像生成部２０１が自由視点映像を再生している場合、ユーザが選択している仮想視点Pの情報は自由視点映像生成部２０１から取得することが可能である。本実施形態でも、リプレイ映像の記録開始RQ2が検知されると、フォーマット記録部２０２が自由視点映像生成部２０１から視点情報をフレーム単位で取得し、そのパラメータをリプレイ用フォーマットに記録する。 In general, when the free viewpoint video generation unit 201 reproduces a free viewpoint video, information on the virtual viewpoint P selected by the user can be acquired from the free viewpoint video generation unit 201. Also in this embodiment, when the replay video recording start RQ2 is detected, the format recording unit 202 acquires viewpoint information from the free viewpoint video generation unit 201 in units of frames and records the parameters in the replay format.

本実施形態では、フレーム単位で「視点位置E(ex，ey，ez)」、「視線方向D(dx，dy，dz)」および「姿勢方向U(ux，uy，uz)」が記録される。さらに、現在の視点で見えるオブジェクトのビルボードを立てる位置の位置情報が記録される。さらに、各ビルボードに張り付ける対応オブジェクトのテクスチャ番号およびカメラ番号が記録される。 In this embodiment, “viewpoint position E (ex, ey, ez)”, “line-of-sight direction D (dx, dy, dz)”, and “posture direction U (ux, uy, uz)” are recorded in units of frames. . Further, the position information of the position where the billboard of the object seen from the current viewpoint is raised is recorded. Further, the texture number and camera number of the corresponding object to be attached to each billboard are recorded.

その後、時刻t12において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録終了を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t13では、記録終了要求RQ3が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t14において、フォーマット記録部２０２がリプレイ用フォーマットの記録を終了する。時刻t15では、前記生成されたリプレイ用フォーマットが視聴端末３へ転送され、時刻t16において、視聴端末３のストレージ３０１に蓄積される。 Thereafter, at time t12, the terminal user U operates the viewing terminal 3 to request the end of recording of the replay video, and when this is detected by the user operation detection unit 302, at time t13, the recording end request RQ3 is transmitted to the viewing terminal. 3 to the free viewpoint video generation device 2. In the free viewpoint video generation device 2, the format recording unit 202 ends the recording of the replay format at time t14. At time t15, the generated replay format is transferred to the viewing terminal 3, and is stored in the storage 301 of the viewing terminal 3 at time t16.

その後、時刻t17において、前記自由視点映像のリプレイを所望するユーザが、前記ストレージ上のリプレイ用フォーマットを指定してリプレイを要求し、これが前記ユーザ操作検知部３０２により検知されると、前記レンダリング情報取得部３０３が前記リプレイ用フォーマットを解釈し、フォーマットに記述されている自由視点映像IDに基づいて、当該フォーマットがどの自由視点映像のリプレイ動画なのかを突き止める。 After that, at time t17, a user who desires to replay the free viewpoint video designates a replay format on the storage and requests replay. When this is detected by the user operation detection unit 302, the rendering information The acquisition unit 303 interprets the replay format and determines which free viewpoint video the replay video is based on the free viewpoint video ID described in the format.

時刻t18では、視聴端末３が映像のリプレイに必要な情報を前記リプレイ用フォーマットに基づいて自由視点映像生成装置２へ要求（RQ4）する。本実施形態では、リプレイ用フォーマットの自由視点映像IDに紐付けられている背景3Dモデルが要求され、時刻t19では、自由視点映像生成装置２が当該要求に応答して背景3Dモデルを配信する。 At time t18, the viewing terminal 3 requests (RQ4) the information required for video replay to the free viewpoint video generation device 2 based on the replay format. In the present embodiment, a background 3D model associated with the free viewpoint video ID in the replay format is requested, and at time t19, the free viewpoint video generation device 2 delivers the background 3D model in response to the request.

時刻t20では、前記リプレイ用フォーマットおよび取得した背景3Dモデルに基づいて、前記リプレイ映像再生部３０４がレンダリングを実施し、自由視点映像のリプレイ映像の再生が開始される。リプレイ映像の再生中、レンダリング情報取得部３０３はフレーム単位で前記リプレイ用フォーマットに基づき、ビルボードの位置、マスク画像およびテクスチャなどの空間情報を自由視点映像生成装置２に要求して取得する。 At time t20, the replay video playback unit 304 performs rendering based on the replay format and the acquired background 3D model, and playback of the replay video of the free viewpoint video is started. During playback of the replay video, the rendering information acquisition unit 303 requests and acquires spatial information such as the position of the billboard, the mask image, and the texture from the free viewpoint video generation device 2 based on the replay format in units of frames.

そして、リプレイ映像再生部３０４が前記フォーマットに記載されている自由視点空間情報に基づいて高効率に3D空間の再構成を行い、その再構成を行った空間に対して、フォーマットに記録されている視点位置から見た画像を、取得した空間情報に基づいてレンダリングすることでリプレイ映像が再構成される。 Then, the replay video playback unit 304 reconstructs the 3D space with high efficiency based on the free viewpoint space information described in the format, and is recorded in the format for the reconstructed space. The replay video is reconstructed by rendering the image viewed from the viewpoint position based on the acquired spatial information.

図６は、前記リプレイ用フォーマットの他の例を示した図である。上記の実施形態では、視点情報が視点位置E(e_x，e_y，e_z)、視線方向D(d_x，d_y，d_z)、姿勢方向U(u_x，u_y，u_z)の各３次元ベクトル、９パラメータで表現されるものとして説明した。しかしながら、パラメータが変化しないときに、その冗長性を排除してデータサイズの削減を行う機能を備えてもよく、これはフレーム間でパラメータが変化しないときに、後のパラメータを記述しないことで実現できる。 FIG. 6 is a diagram showing another example of the replay format. In the above embodiment, the viewpoint information includes the viewpoint position E (e _x , e _y , e _z ), the line-of-sight direction D (d _x , d _y , d _z ), and the posture direction U (u _x , u _y , u _z ). Each of the three-dimensional vectors is described as being expressed by nine parameters. However, when the parameter does not change, it may have a function to reduce the data size by eliminating the redundancy, and this is realized by not describing the subsequent parameter when the parameter does not change between frames. it can.

例えば、視点が平行移動する際は視線方向Dや姿勢方向Uは変化せず、視点位置Eのみが変化する場合がある。本実施形態では、このような視点の動きが検知されると、図６に示したように、次フレームの時系列情報に関しては視線方向Dおよび姿勢方向Uの記録を省略することにより、データサイズの削減および処理負荷の軽減が可能になる。また、記録するパラメータはデータサイズの削減のために、一定の桁数で丸めて近似値として記録してもよい。 For example, when the viewpoint moves in parallel, the line-of-sight direction D and the posture direction U do not change, and only the viewpoint position E may change. In the present embodiment, when such a viewpoint movement is detected, as shown in FIG. 6, the recording of the line-of-sight direction D and the posture direction U is omitted for the time-series information of the next frame, thereby reducing the data size. And the processing load can be reduced. The parameter to be recorded may be rounded by a fixed number of digits and recorded as an approximate value in order to reduce the data size.

図７は、前記リプレイ用フォーマットの更に他の例を示した図である。本実施形態では、前記視線方向D(d_x，d_y，d_z)に代えて注視点位置F(f_x，f_y，f_z)を保存するようにした点に特徴がある。注視点位置Fとは、視線方向D上にある特定の一点の位置を示している。視線方向Dは、視点位置Eおよび注視点位置Fから次式(1)で求められる。 FIG. 7 is a diagram showing still another example of the replay format. The present embodiment is characterized in that the gazing point position F (f _x , f _y , f _z ) is stored instead of the line-of-sight direction D (d _x , d _y , d _z ). The gaze point position F indicates the position of a specific point on the line-of-sight direction D. The line-of-sight direction D is obtained from the viewpoint position E and the gazing point position F by the following equation (1).

このように、視点方向D(d_x，d_y，d_z)の代わりに注視点位置F(f_x，f_y，f_z)を採用することにより冗長性を排除できる場合がある。例えば、注視点Fを中心に回転するような動きが視聴端末３のスワイプ操作などに割り当てられていると、注視点Fを中心に回転する動きが多く登場することが考えられる。このような場合、本実施形態によれば注視点位置F(f_x，f_y，f_z)が変化しないので冗長性の排除が可能になる。 In this way, redundancy may be eliminated by adopting the gaze point position F (f _x , f _y , f _z ) instead of the viewpoint direction D (d _x , d _y , d _z ). For example, if a movement that rotates around the gazing point F is assigned to a swipe operation or the like of the viewing terminal 3, it can be considered that many movements that rotate around the gazing point F appear. In such a case, according to the present embodiment, the gazing point position F (f _x , f _y , f _z ) does not change, so that redundancy can be eliminated.

前記視点情報の更に他の例として、回転移動量（回転角度）および平行移動量を視点情報のパラメータとして採用しても良い。 As still another example of the viewpoint information, a rotational movement amount (rotation angle) and a parallel movement amount may be employed as the viewpoint information parameters.

ある視点を得るためには、ワールド座標系の原点を中心としてx軸を中心に回転量θ_x、y軸を中心に回転量θ_y、z軸を中心に回転量θ_zだけ視点を回転させ、さらに視点位置までの平行移動T(t_x，t_y，t_z)を行うことで視点の位置、視線方向および姿勢を特定できる。したがって、回転量θx、θy、θzおよび平行移動量T(tx，ty，tz)の６つのパラメータから視点を再構成できる。 To obtain a certain viewpoint, the viewpoint is rotated by the rotation amount θ _x around the x axis, the rotation amount θ _y around the y axis, and the rotation amount θ _z around the z axis around the origin of the world coordinate system. Further, by performing parallel movement T (t _x , t _y , t _z ) to the viewpoint position, the position of the viewpoint, the line-of-sight direction and the posture can be specified. Therefore, the viewpoint can be reconstructed from the six parameters of the rotation amounts θx, θy, θz and the parallel movement amount T (tx, ty, tz).

なお、この例では少ないパラメータから視点を再構成できるが、回転や平行移動を施す前の、視点のデフォルトの位置や方向、姿勢が明確に決められている必要がある。これはつまり、回転や平行移動などを何も施さない場合、「視点位置はワールド座標系の原点にあり、z軸の正の方向を向いており、姿勢はy軸の正方向を上にしている」といったような初期値が決まっている必要があることを意味しており、視聴端末３のリプレイ映像再生部３０４でも、初期の視点情報を認識している必要がある。この情報はフォーマット自体に書き込んでやり取りしてもよいが、自由視点映像生成部２０１において自由視点映像の再生を行う場合の初期位置を、そのまま初期位置として定めてもよい。 In this example, the viewpoint can be reconstructed from a small number of parameters, but the default position, direction, and orientation of the viewpoint need to be clearly determined before rotation or translation. This means that if you do not perform any rotation or translation, the viewpoint position is at the origin of the world coordinate system, facing the positive direction of the z axis, and the posture is facing the positive direction of the y axis. This means that an initial value such as “Yes” needs to be determined, and the replay video reproduction unit 304 of the viewing terminal 3 also needs to recognize the initial viewpoint information. This information may be written and exchanged in the format itself, but the initial position when the free viewpoint video generation unit 201 reproduces the free viewpoint video may be determined as it is as the initial position.

前記視点情報の更に他の例として、ビュー変換行列を記録する形態を採用しても良い。ビュー変換行列とは、ワールド座標系から視点の座標系（カメラ座標系）への変換を行う変換行列を指し示すものであり、この変換行列を用いれば、視点の位置と方向、姿勢情報について特定することが可能である。ここでは、ビュー座標行列は同次座標系で示されるものすると、４×４の変換行列Mは次式(2)で表される。 As still another example of the viewpoint information, a form in which a view transformation matrix is recorded may be employed. The view transformation matrix indicates a transformation matrix that performs transformation from the world coordinate system to the viewpoint coordinate system (camera coordinate system). By using this transformation matrix, the position and direction of the viewpoint and the posture information are specified. It is possible. Here, if the view coordinate matrix is expressed in a homogeneous coordinate system, a 4 × 4 transformation matrix M is expressed by the following equation (2).

このような行列はOpenGLやDirectXなどの一般に普及した3D表示を行うライブラリにおいて頻繁に使われるものであり、視点位置E(e_x，e_y，e_z)、視線方向D(d_x，d_y，d_z)、姿勢方向U(u_x，u_y，u_z)などからビュー変換行列Mを計算することが多い。したがって、予めビュー変換行列を保存しておけば、ライブラリなどで用いることを考えた場合に、最も簡単に変換行列を取得できるため処理コストが少なくなる。 Such matrices are frequently used in popular 3D display libraries such as OpenGL and DirectX, and include the viewpoint position E (e _x , e _y , e _z ) and the line-of-sight direction D (d _x , d _y , D _z ), the orientation direction U (u _x , u _y , u _z ), and the like are often calculated. Therefore, if the view transformation matrix is stored in advance, the processing cost can be reduced because the transformation matrix can be most easily acquired when considering use in a library or the like.

さらに、上記の各フォーマットの例では、原則として視点情報をそのまま記録したが、図７に示したように、前フレームとの差分値のみを記録するようにしても良い。 Furthermore, in the example of each format described above, the viewpoint information is recorded as it is in principle. However, as shown in FIG. 7, only the difference value from the previous frame may be recorded.

このような形式のフォーマットでは、フレーム間の差分値は小さくなりやすいため、小さい値が多く書き込まれるという特徴がある。値が小さくなる場合、通常「0」などの同じ値の並びが発生しやすくなることが考えられる。このような、同じ値の並びが発生しやすくなる符号列に対して、ハフマン符号化に代表されるようなエントロピー符号化を行うことによって、更なるデータサイズの削減を実施できる可能性がある。 Such a format has a feature that many small values are written because the difference value between frames tends to be small. When the value is small, it is likely that the same value sequence such as “0” is likely to occur. By performing entropy coding represented by Huffman coding on such a code string in which the same value sequence is likely to occur, there is a possibility that the data size can be further reduced.

しかしながら、途中から再生を行いたい場合などには、最初のフレームからの差分を足し合わせて途中のフレームの値を計算しなければならないため、計算コストが大きくなりがちである。したがって、数フレームかに１枚は通常の差分ではないパラメータを記載し、他のフレームでは、前のフレームからの差分値を記録するようなフォーマットとすることも可能である。 However, when it is desired to reproduce from the middle, the calculation cost tends to increase because the value of the middle frame must be calculated by adding the differences from the first frame. Therefore, it is also possible to set a format in which one frame is described as a parameter that is not a normal difference and a difference value from the previous frame is recorded in other frames.

この場合には、どのフレームが全ての情報を保持したフレームで、どのフレームが差分情報を保持したフレームなのかがわかるようなフォーマットとする必要がある。図７の例では、差分フレームには識別子「D」、差分フレーム以外には識別子「I」を付することで各フレームを区別するようにしている。 In this case, it is necessary to set the format so that it can be understood which frame holds all the information and which frame holds the difference information. In the example of FIG. 7, each frame is distinguished by attaching an identifier “D” to the difference frame and an identifier “I” other than the difference frame.

前記フォーマット記録部２０２は、上記の各方式で各種のパラメータを各フレームに渡って記述していくことで視点の情報を記録する。自由視点情報記録部２０２ａは、自由視点映像生成部２０１から受け取る視点に関する情報を、フォーマットに記載する形式になるように変換や整形する機能を持たなくてはならない。 The format recording unit 202 records viewpoint information by describing various parameters over each frame by the above-described methods. The free viewpoint information recording unit 202a must have a function of converting and shaping information about the viewpoint received from the free viewpoint video generation unit 201 so that the information is described in the format.

ここでいう変換や整形とは、例えば自由視点映像生成部２０１で、ユーザの視点を得るためにワールド座標系からカメラ座標系への視点の変換行列（ビュー変換行列）を用いて特定視点からの映像を生成しているとすると、この変換行列を取得して、変換行列から視点の位置座標の３次元ベクトルなどの情報を得るまでの計算処理や、あるいは決まった桁数で記録する数値を切り捨て、丸める処理などの、フォーマットに適した形式へと変換する処理を指す。 For example, the free viewpoint video generation unit 201 uses the transformation matrix (view transformation matrix) of the viewpoint from the world coordinate system to the camera coordinate system to obtain the user's viewpoint. If the video is generated, this transformation matrix is acquired, and the calculation process until obtaining information such as the three-dimensional vector of the position coordinates of the viewpoint from the transformation matrix, or the numerical value recorded with a fixed number of digits is truncated. , And processing to convert to a format suitable for the format, such as rounding processing.

図８は、本発明の第２実施形態が採用する自由視点技術を説明するための図であり、図８は、第２実施形態におけるリプレイ用フォーマットの例を示した図である。 FIG. 8 is a diagram for explaining the free viewpoint technology adopted by the second embodiment of the present invention, and FIG. 8 is a diagram showing an example of a replay format in the second embodiment.

第１実施形態では、自由視点映像生成部２０１がビルボード方式を採用して自由視点映像を生成するものとして説明した。これに対して、本実施形態は特許文献１に示されているように、オブジェクトの3Dモデルの形状を正確に復元する方式（ここでは、「逆投影面を用いたフルモデル方式」と表現する）を採用して自由視点映像を生成する点に特徴がある。 In the first embodiment, the free viewpoint video generation unit 201 has been described as generating a free viewpoint video using the billboard method. On the other hand, this embodiment expresses a method for accurately restoring the shape of a 3D model of an object (here, “full model method using a backprojection plane”) as disclosed in Patent Document 1. ) To generate a free viewpoint video.

自由視点映像生成部２０１がフルモデル方式を採用する場合、オブジェクトの3D形状を復元するために多数の逆投影面P1，P2…を仮想視点Pに正対する形で並べる。次いで、各逆投影面P1，P2…に対して、背景差分法などで得られた対象オブジェクトのマスク画像を投影し、その視体積を計算することで、逆投影面ごとに3Dモデル化を行い、更に対象オブジェクトのテクスチャ画像をマッピングすることで逆投影面の色付けを行う。したがって、逆投影面を適切に削り出すことで3Dモデルの復元が可能である。 When the free viewpoint video generation unit 201 adopts the full model method, a large number of backprojection planes P1, P2,... Are arranged in a form facing the virtual viewpoint P in order to restore the 3D shape of the object. Next, a 3D model is created for each backprojection plane by projecting the mask image of the target object obtained by the background subtraction method to each backprojection plane P1, P2,... Further, the back projection plane is colored by mapping the texture image of the target object. Therefore, it is possible to restore the 3D model by cutting out the backprojection surface appropriately.

このような手法では、各逆投影面P1，P2…が常に仮想視線Pと直交する形で配置されるため、各逆投影面P1，P2…の位置は仮想視点Pの位置に依存して変化する。フォーマット記録部２０２は、視聴端末３からのリプレイ映像の記録開始要求RQ2に応答してフォーマットの記録を開始する。この際、第１実施形態と同様に、自由視点映像IDおよび合計フレーム数がヘッダに記録され、視点情報も第１実施形態と同様の手法でフレームごとに記録する。 In such a method, each backprojection plane P1, P2,... Is always arranged in a shape orthogonal to the virtual line of sight P. Therefore, the position of each backprojection plane P1, P2... Changes depending on the position of the virtual viewpoint P. To do. The format recording unit 202 starts recording the format in response to the replay video recording start request RQ2 from the viewing terminal 3. At this time, as in the first embodiment, the free viewpoint video ID and the total number of frames are recorded in the header, and the viewpoint information is also recorded for each frame in the same manner as in the first embodiment.

空間情報記録部２０２ｂは、多数の逆投影面P1，P2…の中で、モデルが生成される面のインデックスのみを空間情報として記録する。すなわち、本実施形態ではモデルが生成されない面のインデックスは記録されない。例えば、図８に示した例では、円筒状のオブジェクトが空間に存在しているが、そのモデルが生成されるのはP2，P3，P4のみである。したがって、図９に示したように、そのインデックスとして「2 3 4」のみが記録される。 The spatial information recording unit 202b records, as spatial information, only the index of the surface on which the model is generated among the many backprojection surfaces P1, P2,. That is, in this embodiment, the index of the surface on which no model is generated is not recorded. For example, in the example shown in FIG. 8, a cylindrical object exists in the space, but the model is generated only for P2, P3, and P4. Therefore, as shown in FIG. 9, only “2 3 4” is recorded as the index.

例えばサッカーのように、選手が広いフィールド内の一部に離散的に存在する自由視点映像では、フィールド全体に逆投影面を配置するとモデルの生成されない無駄な逆投影面が多く発生し、このような面の計算を、リプレイ動画再生時に再度行うことは無駄である。 For example, in a free viewpoint video in which players are discretely present in a part of a wide field, such as soccer, if a backprojection plane is placed over the entire field, a lot of useless backprojection planes in which a model is not generated are generated. It is useless to perform a correct calculation again during replay video playback.

これに対して、本実施形態では予めモデルの生成される逆投影面と生成されない逆投影面とを識別できるので、効率的なメモリ確保が可能となり、またモデルの生成されない逆投影面に関してはマスク画像を逆投影する計算も不要となるので計算負荷が減ぜられる。 On the other hand, in this embodiment, the backprojection plane where the model is generated and the backprojection plane where the model is not generated can be identified in advance, so that efficient memory can be secured, and the backprojection plane where the model is not generated is masked. Since the calculation for back projecting the image is not necessary, the calculation load is reduced.

特に、本実施形態が採用する特許文献１のフルモデル方式は、GPUを用いて並列計算を行うことが特許文献１でも触れられており、逆投影面の枚数を減らすことは省メモリ化につながる。その結果、メモリのアクセスに要する時間なども減らすことができることから、計算資源の節約と計算の高速化を実現できる。 In particular, the full model method of Patent Document 1 employed by this embodiment is also described in Patent Document 1 that performs parallel computation using a GPU, and reducing the number of backprojection planes leads to memory saving. . As a result, it is possible to reduce the time required for memory access and the like, so that it is possible to save calculation resources and increase the calculation speed.

空間情報記録部２０２ｂは、3D空間を再現する際に計算する必要のある投影面のインデックスを記録することで、計算の高速化および計算資源の節約を図る。逆投影面に付するインデックスについては、視点に正対する逆投影面が１０００枚存在する場合、視点に近い方から順番に１〜１０００のようにインデックスを振っていく方式が考えられる。図９に示したフォーマットの例では、モデルが生成される３枚の逆投影面P2，P3、P4を代表するインデックスとして「2 3 4」が記録されている。このようにして記録されたリプレイ用フォーマットは、第１実施形態と同様に視聴端末３へ転送されて蓄積され、後にリプレイ時に参照されることになる。 The spatial information recording unit 202b records the projection plane index that needs to be calculated when reproducing the 3D space, thereby speeding up the calculation and saving the calculation resources. As for the index attached to the backprojection plane, when there are 1000 backprojection planes directly facing the viewpoint, a method of assigning the index in order from 1 to 1000 in order from the side closest to the viewpoint is conceivable. In the example of the format shown in FIG. 9, “2 3 4” is recorded as an index representing the three back projection planes P2, P3, and P4 on which the model is generated. The replay format recorded in this way is transferred to the viewing terminal 3 and stored in the same manner as in the first embodiment, and is later referred to during replay.

視聴端末３では、リプレイ動画再生部３０４が蓄積されているリプレイ用フォーマットに基づいてリプレイ映像を再構成する。この際、リプレイ映像フォーマットの視点情報に基づいて視点を確定し、この視点に基づいて逆投影面を配置するが、前記インデックスを参照することでモデルが生成されない逆投影面を識別し、当該逆投影面については配置と計算を行わない。 In the viewing terminal 3, the replay video playback unit 304 reconstructs the replay video based on the stored replay format. At this time, the viewpoint is determined based on the viewpoint information of the replay video format, and the backprojection plane is arranged based on this viewpoint, but the backprojection plane where no model is generated is identified by referring to the index, and the backprojection plane is identified. Arrangement and calculation are not performed for the projection plane.

これにより、モデルが生成されることが約束されている逆投影面のみ計算を行って３Ｄ空間を再現することができる。その後、この３Ｄ空間に対して視点からの映像のレンダリングを行い、レンダリング画像を視聴端末３へと伝送することでリプレイ映像の再生を実現する。 As a result, the 3D space can be reproduced by calculating only the backprojection plane on which the model is promised to be generated. After that, the video from the viewpoint is rendered on this 3D space, and the rendered image is transmitted to the viewing terminal 3 to realize replay video reproduction.

なお、本実施例では自由視点映像生成装置２においてレンダリングを行っているが、例えば自由視点映像を構成するための全ての動画を視聴端末３に予め配信し、端末側でレンダリングを行うようにしても良い。この場合でも、予めモデルの生じない逆投影面のインデックスを記録しておけば、計算の高速化と計算資源の節約を行うことが可能である。 In this embodiment, rendering is performed in the free-viewpoint video generation device 2, but for example, all videos for constituting a free-viewpoint video are distributed in advance to the viewing terminal 3 and rendered on the terminal side. Also good. Even in this case, if the index of the backprojection plane in which no model is generated is recorded in advance, it is possible to increase the calculation speed and save the calculation resources.

また、このような構成では自由視点映像生成装置２にリプレイ映像再生機能が設けられ、3D空間の完全な再構成が可能となる。このため、リプレイ動画再生機能が視聴端末３のスペックに関する情報を受信し、再生デバイスの解像度や画面サイズに応じて、視点は同じであるが見える視野や画像の縦横比が変わるようにレンダリング画像を出力する機能を備えてもよい。 In such a configuration, the free viewpoint video generation device 2 is provided with a replay video playback function, and a complete reconstruction of the 3D space is possible. For this reason, the replay video playback function receives information related to the specifications of the viewing terminal 3, and renders the rendered image so that the visual field and the aspect ratio of the image change depending on the resolution and screen size of the playback device, although the viewpoint is the same. You may provide the function to output.

加えて、複数の視聴端末３が自由視点映像生成装置２に対して同時にリプレイ映像の再生を要求した場合に、同一の視点位置かつ同一の時刻のフレームのレンダリング要求があった場合には、レンダリング結果を保存し、使い回すなどの機構を備えてもよい。 In addition, when a plurality of viewing terminals 3 request the free viewpoint video generation device 2 to replay the replay video at the same time, if there are requests for rendering frames at the same viewpoint position and the same time, rendering is performed. A mechanism for storing and reusing the result may be provided.

１…コンテンツサーバ，２…自由視点映像生成装置，３…視聴端末，２０１…自由視点映像生成部，２０２…フォーマット記録部，２０２ａ…自由視点情報記録部，２０２ｂ…空間情報記録部，３０１…ストレージ，３０２…ユーザ操作検知部，３０３…レンダリング情報取得部，３０４…リプレイ映像再生部 DESCRIPTION OF SYMBOLS 1 ... Content server, 2 ... Free viewpoint video production | generation apparatus, 3 ... Viewing terminal, 201 ... Free viewpoint video production | generation part, 202 ... Format recording part, 202a ... Free viewpoint information recording part, 202b ... Spatial information recording part, 301 ... Storage , 302 ... User operation detection unit, 303 ... Rendering information acquisition unit, 304 ... Replay video reproduction unit

Claims

In a system that plays replay video of free viewpoint video,
Means for generating a free viewpoint video based on viewpoint information of a plurality of camera videos and virtual viewpoints;
Means for recording a replay format in the free viewpoint video generation process;
Means for obtaining information necessary for replay based on the replay format;
A system for playing back a replay video of a free viewpoint video, comprising: means for playing back a replay video based on the replay format and acquired information.

The system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the information necessary for the replay is a background 3D model and spatial information for rendering an object on the background 3D model.

The replay format includes header information and time series information,
The time series information is configured by connecting a plurality of time series information recorded at a predetermined period,
3. The system for reproducing a replay video of a free viewpoint video according to claim 1, wherein a reproduction time ID unique to a reproduction position of the replay video based on the time series information is recorded in each time series information.

The generation process includes a generation process of a background 3D model of a free viewpoint video;
In the replay format, the ID of the free viewpoint video is recorded,
The system for reproducing a replay video of a free viewpoint video according to any one of claims 1 to 3, wherein the acquiring unit acquires the background 3D model based on an ID of the free viewpoint video.

The generation process includes a viewpoint information generation process;
The system for reproducing a replay video of a free viewpoint video according to any one of claims 1 to 4, wherein the viewpoint information is recorded in the replay format.

6. A system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the means for generating the free viewpoint video employs a billboard free viewpoint technology.

The generation process includes a process of identifying objects visible in a virtual perspective;
A general ID associated with the visible object is recorded in the replay format,
The system for reproducing a replay video of a free viewpoint video according to claim 6, wherein the acquiring unit acquires rendering information of each object based on the overall ID.

The generating process includes generating a mask image of each object;
The mask image is linked to the general ID,
8. The system for reproducing a replay video of a free viewpoint video according to claim 7, wherein the acquiring unit acquires a mask image of each object based on the overall ID.

The generating process includes extracting a texture of each object from a camera image;
The texture is linked to the general ID,
8. The system for reproducing a replay video of a free viewpoint video according to claim 7, wherein the acquiring unit acquires a texture of each object based on the overall ID.

7. The system for reproducing a replay video of a free viewpoint video according to claim 6, wherein the means for generating the free viewpoint video employs a full-model free viewpoint technology using a backprojection plane.

The generation process includes a process of projecting a mask image of the object by arranging a plurality of back projection planes facing the virtual viewpoint at the position of the object, and performing 3D modeling for each back projection plane to restore the 3D model ,
In the replay format, record the index of the backprojection plane where the 3D model exists,
The system for reproducing a replay video of a free viewpoint video according to claim 10, wherein the acquiring unit acquires information necessary for replay based on an index of the backprojection plane.

The means for generating the free viewpoint video and the means for recording the replay format are mounted on a server on the cloud,
Means for acquiring information necessary for the replay and means for reproducing the replay video are implemented in a replay video viewing terminal,
12. The system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the replay format is transferred from the server to the viewing terminal and stored on the viewing terminal.

A means for generating the free viewpoint video, a means for recording a replay format, and a means for acquiring information necessary for the replay are mounted on a server on the cloud,
Means for playing the replay video is implemented in the viewing terminal;
12. The system for reproducing a replay video of a free viewpoint video according to claim 1, wherein the replay format is transferred from the server to the viewing terminal and stored on the viewing terminal.

13. The means for recording the replay format starts recording the replay format in response to a recording start request from a viewing terminal, and ends recording in response to a recording end request. The system which reproduces the replay image of the free viewpoint image according to 13.

In a method for a computer to play a replay video of a free viewpoint video,
Generate free viewpoint video based on the viewpoint information of multiple camera videos and virtual viewpoints,
Record the replay format in the free viewpoint video generation process,
Obtain information necessary for replay based on the replay format,
A method of reproducing a replay image of a free viewpoint image, wherein the replay image is reproduced based on the replay format and the acquired information.