JP2022036123A

JP2022036123A - System and method for reproducing replay video of free-viewpoint video

Info

Publication number: JP2022036123A
Application number: JP2021206217A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe; 敬介野中; Keisuke Nonaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-02-23
Filing date: 2021-12-20
Publication date: 2022-03-04
Anticipated expiration: 2038-02-23
Also published as: JP7165254B2

Abstract

PROBLEM TO BE SOLVED: To enable replay videos of free-viewpoint videos to be reproduced in a low-load manner, by which even audiovisual terminals with poor processing capacity can reproduce the replay videos of the free-viewpoint videos.

SOLUTION: A replay video reproduction system for free-viewpoint videos according to the present invention includes: a free-viewpoint video generation section 201 for generating free-viewpoint videos based on a plurality of camera videos and viewpoint information of virtual viewpoints; a format recording section 202 for recording formats for replays in a process generating the free-viewpoint videos; a rendering information acquisition section 303 for acquiring information required for the replays based on the formats for replays; and a replay video reproduction section 304 for reproducing replay videos based on the formats for replays and the acquired rendering information. The free-viewpoint video generation section 201 and the format recording section 202 are implemented on a cloud server, whereas the rendering information acquisition section 303 and the replay video reproduction section 304 are implemented on an audiovisual terminal 3.

SELECTED DRAWING: Figure 1

Description

本発明は、自由視点映像のリプレイ映像を再生するシステムおよび方法に係り、特に、自由視点映像のリプレイ映像の構築に係る負荷を軽減することで、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法に関する。 The present invention relates to a system and a method for reproducing a replay image of a free viewpoint image, and in particular, by reducing the load related to the construction of a replay image of the free viewpoint image, the replay of the free viewpoint image is performed even on a viewing terminal having inferior processing capacity. Related to systems and methods that allow video to be played.

複数のカメラから撮影した映像に基づいて、実際にはカメラが置かれていない仮想視点からの映像視聴を可能とする技術として、特許文献１や非特許文献１のような自由視点映像技術が提案されてきた。スポーツの競技場などに複数のカメラを配置し、これら複数のカメラからの映像を基に自由視点映像を生成することによって、ユーザは自分が観たい任意の仮想視点からの映像視聴を楽しむことが可能である。 Free-viewpoint video technologies such as Patent Document 1 and Non-Patent Document 1 have been proposed as technologies that enable video viewing from a virtual viewpoint in which cameras are not actually placed, based on video shot from multiple cameras. It has been. By arranging multiple cameras in sports stadiums and generating free-viewpoint images based on the images from these multiple cameras, users can enjoy viewing images from any virtual viewpoint they want to see. It is possible.

このような自由視点映像技術を用いて、ユーザが選択した仮想視点からの映像を記録することを考えた場合、表示されている画面をキャプチャし、例えば非特許文献２に記載されるH.264 (MPEG-4 AVC) のような既存の動画フォーマットで動画ファイルにして保存を行うことが考えられる。こうして記録された動画ファイルはストレージなどに保存され、再び見たいときに再生を行ったり、保存されたファイルを他ユーザへ送ることで他ユーザとの動画の共有を行ったりすることができる。 When considering recording an image from a virtual viewpoint selected by a user by using such a free viewpoint image technology, the displayed screen is captured, for example, H.264 described in Non-Patent Document 2. It is conceivable to save as a video file in an existing video format such as (MPEG-4 AVC). The video file recorded in this way is saved in a storage or the like, and can be played back when it is desired to be viewed again, or the saved file can be sent to another user to share the video with another user.

また、直接動画を記録せずに、ユーザが選択した特定の視点からの映像を記録し、後から再生する例としては、特許文献２に示されているようなゲーム装置及びゲームのリプレイ方法がある。この発明は、プレイヤのゲーム操作に関する履歴をリプレイデータとして保存し、後にリプレイデータに基づいてプレイヤのプレイ画像を再現するものである。 Further, as an example of recording a video from a specific viewpoint selected by the user without directly recording the video and playing it back later, a game device and a game replay method as shown in Patent Document 2 are used. be. INDUSTRIAL APPLICABILITY The present invention saves a history of a player's game operation as replay data, and later reproduces a player's play image based on the replay data.

特願２０１７－１６７４７２号Japanese Patent Application No. 2017-167472 特開２０１０－２６４１７３号公報Japanese Unexamined Patent Publication No. 2010-264173

T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium"In Proc of IEEE/ACM Conference on ISMAR, pp. 178-187(2003)T. Koyama, I. Kitahara and Y. ohta, "Live Mixed-Reality 3D Video in Soccer Stadium" In Proc of IEEE / ACM Conference on ISMAR, pp. 178-187 (2003) ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services, "2003年5月ITU-T Recommendation H.264 "Advanced video coding for generic audiovisual services," May 2003

自由視点映像技術を用いることで、ユーザは専用のコントローラやスマートフォン、タブレットの画面のタッチ操作に基づいて自由に視点を選択し、任意の視点からの映像視聴を楽しむことが可能となった。通常放送されている地上波等のテレビの映像とは異なり、自由視点映像ではユーザ各々が自由に視点を選択して動かすことが可能であることから、同じ自由視点映像であっても各ユーザの見ている映像はその視点位置や視点の動かし方によって異なってくる。したがって、このようなユーザが見ている視点からの映像を記録することで、オリジナルな映像コンテンツを生み出すことが可能である。 By using the free viewpoint video technology, the user can freely select the viewpoint based on the touch operation of the screen of the dedicated controller, smartphone, or tablet, and enjoy watching the video from any viewpoint. Unlike TV images such as terrestrial broadcasting that are normally broadcast, each user can freely select and move the viewpoint in the free-viewpoint video, so even if the same free-viewpoint video is used, each user can freely select and move the viewpoint. The image you are watching depends on the position of the viewpoint and how you move the viewpoint. Therefore, it is possible to create original video content by recording the video from the viewpoint that the user is viewing.

自由視点映像技術が普及した未来を考えたときに、当然このような特定の視点からの映像はインターネットを介して交換され、ＳＮＳなどを介して評価やコメントが付くなど、新しい楽しみ方が誕生すると考えられる。 When considering the future in which free-viewpoint video technology has become widespread, it is natural that videos from such specific viewpoints will be exchanged via the Internet, and new ways of enjoying will be born, such as evaluations and comments via SNS. Conceivable.

このように、ある特定の端末にて録画した、特定の視点からの映像（以下、リプレイ映像）を後からもう一度再生したり、他の端末で再生したり、多数のユーザで共有したりしたいという需要に対し、非特許文献２に記載されるような既存の映像符号化フォーマットで保存を行い、作成した動画をやり取りすることは可能であるが、動画の容量が大きくなってしまうという課題が存在していた。 In this way, you want to play back the video from a specific viewpoint (hereinafter referred to as replay video) recorded on a specific terminal, play it on another terminal, or share it with a large number of users. In response to demand, it is possible to save in the existing video coding format as described in Non-Patent Document 2 and exchange the created video, but there is a problem that the capacity of the video becomes large. Was.

特に、自由視点映像はユーザがそれぞれの視点からの映像を生成することができ、一つの自由視点映像から膨大な種類の動画コンテンツを生み出すことが可能であることから、このような問題は顕著に現れるものと考えられる。 In particular, the free-viewpoint video allows the user to generate video from each viewpoint, and it is possible to generate a huge variety of video contents from one free-viewpoint video, so such a problem is remarkable. It is thought that it will appear.

また、視点の動きがわかれば、自由視点映像生成装置に視点の情報を渡すことで特定の視点からの画像を得ることができる。例えば特許文献２に示されているようなゲームのリプレイ機能のように、予め記録された視点の情報を基に、与えられた視点からの映像を再計算して表示を行うことは可能である。 Further, if the movement of the viewpoint is known, an image from a specific viewpoint can be obtained by passing the information of the viewpoint to the free viewpoint video generator. For example, like the game replay function shown in Patent Document 2, it is possible to recalculate and display the image from a given viewpoint based on the information of the viewpoint recorded in advance. ..

しかしながら、ゲームと異なり自由視点映像では最初に複数台のカメラの映像から３次元空間を再構成する必要があり、この3D空間を再構成するための計算コストは非常に大きい。特に、多数のユーザが同時にリプレイ映像の再構成をサーバに依頼するようなケースを考えた場合、遅延なくリプレイ映像を再構成することは困難である。 However, unlike games, it is necessary to first reconstruct a three-dimensional space from images from a plurality of cameras in a free-viewpoint image, and the calculation cost for reconstructing this 3D space is very high. In particular, when considering a case where a large number of users request the server to reconstruct the replay video at the same time, it is difficult to reconstruct the replay video without delay.

本発明の目的は、上記の技術課題を解決し、自由視点映像のリプレイ映像の構築に係る負荷を軽減し、処理能力に劣る視聴端末でも自由視点映像のリプレイ映像を再生できるようにするシステムおよび方法を提供することにある。 An object of the present invention is a system that solves the above technical problems, reduces the load related to the construction of a replay image of a free viewpoint image, and enables a viewing terminal having inferior processing capacity to reproduce the replay image of the free viewpoint image. To provide a method.

上記の目的を達成するために、本発明は、視聴端末と自由視点映像生成装置とをネットワークで接続して構成され、自由視点映像のリプレイ映像を再生するシステムにおいて、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is configured by connecting a viewing terminal and a free viewpoint video generator via a network, and has the following configurations in a system for reproducing a replay video of a free viewpoint video. There is a feature in.

(1) 視聴端末が、自由視点映像の再生を要求する手段と、再生中の自由視点映像についてリプレイ映像の記録を要求する手段とを具備し、
前記自由視点映像生成装置が、前記再生の要求に応答して、複数のカメラ映像および仮想視点の視点情報に基づいて自由視点映像を生成する手段と、自由視点映像の生成プロセスにおいて、リプレイ映像の再生時刻ごとに仮想視点が記述されたリプレイ用フォーマットを記録する手段と、記録したリプレイ用フォーマットを視聴端末へ転送する手段とを具備し、
前記視聴端末が更に、前記リプレイ用フォーマットに基づいてリプレイに必要な情報を取得する手段と、前記リプレイ用フォーマットおよび取得した情報に基づいてリプレイ映像を再生する手段とを具備し、前記リプレイに必要な情報が、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を含むようにした。 (1) The viewing terminal is provided with a means for requesting the reproduction of the free-viewpoint video and a means for requesting the recording of the replay video for the free-viewpoint video being played.
In response to the reproduction request, the free-viewpoint video generation device generates a free-viewpoint video based on viewpoint information of a plurality of camera images and a virtual viewpoint, and in the process of generating the free-viewpoint video, the replay video is It is provided with a means for recording a replay format in which a virtual viewpoint is described for each playback time and a means for transferring the recorded replay format to a viewing terminal.
The viewing terminal further includes a means for acquiring information necessary for replay based on the replay format and a means for reproducing the replay video based on the replay format and the acquired information, which is necessary for the replay. Information now includes the background 3D model and the spatial information for rendering the object on the background 3D model.

(2) 前記空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むようにした。 (2) The spatial information includes the mask image of each object and the position information in which the model of each object is placed.

(3) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段がクラウド上のサーバに実装され、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段がリプレイ映像の視聴端末に実装され、リプレイ用フォーマットがサーバから視聴端末へ転送されて当該視聴端末上に蓄積されるようにした。 (3) A means for generating a free-viewpoint video and a means for recording a replay format are implemented in a server on the cloud, and a means for acquiring information necessary for replay and a means for playing the replay video are provided on the replay video viewing terminal. Implemented so that the replay format is transferred from the server to the viewing terminal and stored on the viewing terminal.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 自由視点映像の生成プロセスにおいて、そのリプレイ映像の再構成に流用できる情報の識別子およびリプレイ映像の再構成に必要なパラメータ等の情報がリプレイ用フォーマットに記録される。したがって、リプレイ映像を再構成する際は、リプレイ用フォーマットに記録された識別情報に基づいてリプレイ映像の再構成に流用できる情報を取得し、またリプレイ用フォーマットに記録された情報をパラメータとすることで、リプレイ映像を軽負荷で再生できるようになる。 (1) In the process of generating a free-viewpoint video, information such as an identifier of information that can be diverted to the reconstruction of the replay video and parameters necessary for the reconstruction of the replay video is recorded in the replay format. Therefore, when reconstructing the replay video, the information that can be diverted to the reconstruction of the replay video is acquired based on the identification information recorded in the replay format, and the information recorded in the replay format is used as a parameter. Then, the replay video can be played with a light load.

(2) リプレイ映像の再構成に必要な情報として、背景3Dモデルおよび当該背景3Dモデル上にオブジェクトをレンダリングするための空間情報を取得するので、リプレイ映像を再構成する際の処理負荷が軽減される。 (2) Since the background 3D model and the spatial information for rendering the object on the background 3D model are acquired as the information necessary for reconstructing the replay video, the processing load when reconstructing the replay video is reduced. Ru.

(3) 空間情報が、各オブジェクトのマスク画像および各オブジェクトのモデルを配置する位置情報を含むので、リプレイ映像を再構成する際に、処理負荷の高いこれらの情報を得るための計算が不要になる。 (3) Since the spatial information includes the mask image of each object and the position information for arranging the model of each object, when reconstructing the replay video, there is no need for calculation to obtain such information having a high processing load. Become.

(4) 自由視点映像を生成する手段およびリプレイ用フォーマットを記録する手段をクラウド上のサーバに実装し、リプレイに必要な情報を取得する手段およびリプレイ映像を再生する手段をリプレイ映像の視聴端末に実装すれば、一般的に処理能力の高いサーバに処理負荷の高い計算を負わせることができる。したがって、一般的に処理能力の低い視聴端末でも自由視点映像のリプレイが可能になる。 (4) Implement a means to generate a free-viewpoint video and a means to record a replay format on a server on the cloud, and provide a means to acquire information necessary for replay and a means to play the replay video on the replay video viewing terminal. If implemented, it is possible to impose a high processing load calculation on a server with high processing power in general. Therefore, it is possible to replay the free-viewpoint video even on a viewing terminal having generally low processing power.

本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図である。It is a block diagram which showed the structure of the main part of the free viewpoint video distribution system which concerns on one Embodiment of this invention. 仮想視点Pに応じてポリゴンの設置対象となるオブジェクトを切り換える方法を説明するための図である。It is a figure for demonstrating the method of switching the object which is the object of setting of a polygon according to the virtual viewpoint P. リプレイ用フォーマットの第１の例を示した図である。It is a figure which showed the 1st example of the format for replay. 視点情報の定義を説明するための図である。It is a figure for demonstrating the definition of viewpoint information. リプレイ用フォーマットの構築からリプレイ映像の再生までの手順を示したシーケンスフローである。It is a sequence flow showing the procedure from the construction of the replay format to the reproduction of the replay video. リプレイ用フォーマットの第２の例を示した図である。It is a figure which showed the 2nd example of the format for replay. リプレイ用フォーマットの第３の例を示した図である。It is a figure which showed the 3rd example of the format for replay. 本発明の第２実施形態における自由視点映像の生成方法を示した図である。It is a figure which showed the generation method of the free viewpoint image in 2nd Embodiment of this invention. リプレイ用フォーマットの第３の例を示した図である。It is a figure which showed the 3rd example of the format for replay.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係る自由視点映像配信システムの主要部の構成を示したブロック図であり、ここでは、本発明の説明に不要な構成は図示が省略されている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a main part of a free-viewpoint video distribution system according to an embodiment of the present invention, and the configuration unnecessary for the description of the present invention is omitted here.

本発明の自由視点映像配信システムは、競技場などに配置されて競技者などのオブジェクトを異なる視点で撮影する複数台のカメラcam、各カメラcamで撮影した映像およびカメラパラメータを記憶するコンテンツサーバ１、複数のカメラ映像、カメラパラメータおよび視点情報に基づいて自由視点映像を生成する自由視点映像生成装置２、および端末ユーザUの操作に応じて仮想視点Pの情報（視点情報）を自由視点映像生成装置２へ提供し、自由視点映像生成装置２が生成した自由視点映像を取得して再生する視聴端末３を主要な構成としている。 The free-viewpoint video distribution system of the present invention is a content server 1 that is arranged in a stadium or the like and stores a plurality of camera cams that shoot objects such as athletes from different viewpoints, videos taken by each camera cam, and camera parameters. , Free-viewpoint video generator 2 that generates free-viewpoint video based on multiple camera images, camera parameters, and viewpoint information, and free-viewpoint video generation of virtual viewpoint P information (viewpoint information) according to the operation of terminal user U. The main configuration is a viewing terminal 3 that is provided to the device 2 and acquires and reproduces the free-viewpoint video generated by the free-viewpoint video generation device 2.

前記コンテンツサーバ１、自由視点映像生成装置２および視聴端末３は、汎用のコンピュータに後述する各機能を実現するアプリケーション（プログラム）を実装して構成しても良いし、あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機として構成しても良い。 The content server 1, the free-viewpoint video generation device 2, and the viewing terminal 3 may be configured by mounting an application (program) that realizes each function described later on a general-purpose computer, or a part of the application is hardware. It may be configured as a dedicated machine or a single-purpose machine that has been made into hardware or ROM.

前記コンテンツサーバ１では、撮影されたカメラ映像およびそのカメラパラメータを含むコンテンツが固有のIDで管理されている。図示の例では、サッカーの自由視点コンテンツにはID1、バレーの自由視点コンテンツにはID2、柔道の自由視点コンテンツにはID3が、それぞれ割り当てられている。 In the content server 1, the captured camera image and the content including the camera parameters thereof are managed by a unique ID. In the illustrated example, ID1 is assigned to the free-viewpoint content of soccer, ID2 is assigned to the free-viewpoint content of volleyball, and ID3 is assigned to the free-viewpoint content of judo.

自由視点映像生成装置２は、自由視点映像生成部２０１およびフォーマット記録部２０２を含み、これらの機能をクラウド上に置かれたサーバに実装することで、自由視点映像生成用サーバとして構成することができる。 The free-viewpoint video generation device 2 includes a free-viewpoint video generation unit 201 and a format recording unit 202, and by implementing these functions on a server placed on the cloud, it can be configured as a free-viewpoint video generation server. can.

前記自由視点映像生成部２０１は、視点の異なる複数のカメラ映像、各カメラパラメータおよび視聴端末３において端末ユーザUが選択した視点情報を、公知の自由視点技術に適用することで自由視点映像を生成する。 The free viewpoint image generation unit 201 generates a free viewpoint image by applying a plurality of camera images having different viewpoints, each camera parameter, and viewpoint information selected by the terminal user U in the viewing terminal 3 to a known free viewpoint technique. do.

本発明の第１実施形態では、非特許文献１と同様に、３次元空間中のオブジェクトを１枚の長方形ポリゴンで近似し、ユーザが選択する仮想視点Pに応じて、複数のカメラ映像から獲得したテクスチャ情報を長方形ポリゴンに適切にマッピングするビルボード方式を採用した自由視点技術への適用例について説明する。 In the first embodiment of the present invention, as in Non-Patent Document 1, an object in a three-dimensional space is approximated by one rectangular polygon, and is acquired from a plurality of camera images according to a virtual viewpoint P selected by the user. An example of application to a free-viewpoint technology that employs a billboard method that appropriately maps the texture information to a rectangular polygon will be described.

自由視点映像生成部２０１は、複数のカメラ映像からオブジェクトを抽出し、その位置を推定する。サッカーの試合の自由視点映像であれば、スタジアム上に登場している選手等の人物がオブジェクトとなる。なお、オブジェクト以外のスタジアムのピッチや観客席などの背景は、プリセットとして予め3Dモデルが手動で作成されて存在しているものとする。 The free viewpoint image generation unit 201 extracts an object from a plurality of camera images and estimates its position. In the case of a free-viewpoint video of a soccer game, a person such as a player appearing on the stadium is an object. For backgrounds such as stadium pitches and spectator seats other than objects, it is assumed that a 3D model is manually created and exists as a preset.

自由視点映像生成部２０１は更に、図２に示したように、選択されている仮想視点Pに応じて、各オブジェクトが存在すると推定される位置に、前記仮想視点Pの視線方向と正対する１枚の長方形ポリゴンを設置し、当該長方形ポリゴンに、前記仮想視点Pと視線方向の角度が最も近い実カメラcamのカメラ映像から抽出した当該オブジェクトのテクスチャを表示する。 Further, as shown in FIG. 2, the free viewpoint image generation unit 201 faces the line-of-sight direction of the virtual viewpoint P at a position where each object is presumed to exist according to the selected virtual viewpoint P1. A piece of rectangular polygon is installed, and the texture of the object extracted from the camera image of the real camera cam having the closest angle in the line-of-sight direction to the virtual viewpoint P is displayed on the rectangular polygon.

図２の例では、仮想視点Pの視線角度が実カメラcam３の視線角度に最も近いので、実カメラcam3のカメラ映像から対象オブジェクトのテクスチャを切り出して長方形ポリゴンに張り付ける。このテクスチャの表示を行う際には、対象オブジェクトの形状を表現した二値のマスク画像などを用いることで、長方形ポリゴンのうち対象オブジェクトの存在する部分のみが表示され、他の部分は透過される。そして、仮想視点Pが移動すると、この移動に応じて長方形ポリゴンも回転させ、常に仮想視点Pと正対させることで仮想視点Pからの視聴を違和感なく実現できる。 In the example of FIG. 2, since the line-of-sight angle of the virtual viewpoint P is the closest to the line-of-sight angle of the real camera cam3, the texture of the target object is cut out from the camera image of the real camera cam3 and pasted on the rectangular polygon. When displaying this texture, by using a binary mask image that expresses the shape of the target object, only the part of the rectangular polygon where the target object exists is displayed, and the other parts are transparent. .. Then, when the virtual viewpoint P moves, the rectangular polygon is also rotated in response to this movement, and by always facing the virtual viewpoint P, viewing from the virtual viewpoint P can be realized without discomfort.

フォーマット記録部２０２は、前記自由視点映像生成部２０１が生成した自由視点映像を、後に視聴端末３が少ない計算負荷でリプレイできるようにするための各種パラメータを、前記自由視点映像生成部２０１が自由視点映像を生成するプロセスにおいて取得し、リプレイ用フォーマットに記録する。 In the format recording unit 202, the free viewpoint video generation unit 201 is free to set various parameters for allowing the viewing terminal 3 to later replay the free viewpoint video generated by the free viewpoint video generation unit 201 with a small calculation load. Obtained in the process of generating the viewpoint image and recorded in the replay format.

前記リプレイ用フォーマットの記録は、自由視点映像を再生中の視聴端末３による記録開始要求RQ2に応答して開始され、記録終了要求RQ3に応答して終了する。完成したリプレイ用フォーマットは、前記各要求RQ2，RQ3を送信した視聴端末３へ転送されてストレージ３０１上で管理される。 Recording in the replay format is started in response to the recording start request RQ2 by the viewing terminal 3 playing the free viewpoint video, and ends in response to the recording end request RQ3. The completed replay format is transferred to the viewing terminal 3 that has transmitted the requested RQ2 and RQ3, and is managed on the storage 301.

前記フォーマット記録部２０２において、自由視点情報記録部２０２ａは、端末ユーザUが視聴端末３を操作することで選択する仮想視点Pの位置、向き、姿勢を含む視点情報を前記リプレイ用フォーマットに記録する。 In the format recording unit 202, the free viewpoint information recording unit 202a records the viewpoint information including the position, orientation, and posture of the virtual viewpoint P selected by the terminal user U by operating the viewing terminal 3 in the replay format. ..

空間情報記録部２０２ｂは、選択されている仮想視点Pから見える画像を視聴端末３がレンダリングする際に、その処理負荷を減らすことができる空間情報を記録する。本実施形態では、前記視点情報に基づいて当該視点から見えているオブジェクトを特定し、見えているオブジェクトのみをレンダリングして３次元空間に再構成することを考える。 The spatial information recording unit 202b records spatial information that can reduce the processing load when the viewing terminal 3 renders an image that can be seen from the selected virtual viewpoint P. In the present embodiment, it is considered that an object visible from the viewpoint is specified based on the viewpoint information, and only the visible object is rendered and reconstructed in a three-dimensional space.

また、レンダリング処理において処理負荷の高いパラメータ、具体的には各オブジェクトを表示するビルボードの位置および各ビルボードにテクスチャを張り付ける際に、その一部をオブジェクト形状に合わせて透過させるためのマスク画像については、視聴端末３がこれらのパラメータを自由視点映像生成装置２から取得するための識別子のみが記録される。 In addition, parameters with a high processing load in the rendering process, specifically, the position of the billboard that displays each object and the mask for making a part of it transparent according to the object shape when pasting the texture on each billboard. For the image, only the identifier for the viewing terminal 3 to acquire these parameters from the free viewpoint image generation device 2 is recorded.

図２の場合、背景モデル上に４つのオブジェクト（ID1～ID4）が立っているが、選択されている仮想視点Pから見えるオブジェクトは２つ（ID3，ID4）のみである。また、仮想視点Pの視線方向の角度は実カメラcam3の角度に近いので、オブジェクトID3，ID4の位置に設置するビルボードには、実カメラcam3のカメラ映像から取得したテクスチャが張り付けられることを想定する。 In the case of FIG. 2, four objects (ID1 to ID4) stand on the background model, but only two objects (ID3 and ID4) can be seen from the selected virtual viewpoint P. Also, since the angle of the line-of-sight direction of the virtual viewpoint P is close to the angle of the real camera cam3, it is assumed that the texture acquired from the camera image of the real camera cam3 will be attached to the billboard installed at the positions of objects ID3 and ID4. do.

このような場合、本実施形態では２つのオブジェクトのみを３次元空間に再構成するものとし、これらのビルボード位置、テクスチャおよびマスク画像のみを取得する。なお、オブジェクト、そのビルボード位置、当該ビルボードに張り付けるテクスチャおよびその実カメラならびに当該テクスチャをマスキングするためのマスク画像は相互に紐付けることができるので、これら全てをリプレイ用フォーマットに記録することは冗長となる。 In such a case, in the present embodiment, only two objects are reconstructed in the three-dimensional space, and only these billboard positions, textures, and mask images are acquired. Since the object, its billboard position, the texture to be attached to the billboard, its actual camera, and the mask image for masking the texture can be linked to each other, it is not possible to record all of them in the replay format. It becomes redundant.

そこで、本実施形態では後に詳述する図３に示したように、空間情報としては「カメラ番号」，「テクスチャ番号」および「ビルボード位置」のみを記録し、マスク画像の識別子は「テクスチャ番号」で総括している。なお、各ビルボードの位置に関しても、自由視点映像生成装置２がテクスチャ番号と紐付けて記録しているのであれば、必ずしも記録する必要はない。 Therefore, in this embodiment, as shown in FIG. 3 to be described in detail later, only the "camera number", "texture number" and "bilboard position" are recorded as spatial information, and the identifier of the mask image is "texture number". Is summarized in. It should be noted that the position of each billboard does not necessarily have to be recorded as long as the free viewpoint video generation device 2 records the position in association with the texture number.

視聴端末３は、映像を再生できるテレビに専用のコントローラを接続して視点を選択する機能、スマホやタブレットのディスプレイに設けられたタッチスクリーンに対するタッチ操作やスワイプ操作等で視点を選択する機能、あるいは加速度センサを備えたVR端末などで自由視点映像を視聴し、ユーザの動きに合わせて視点を選択する機能で実現できる。 The viewing terminal 3 has a function of connecting a dedicated controller to a TV capable of playing video and selecting a viewpoint, a function of selecting a viewpoint by a touch operation or a swipe operation on a touch screen provided on a display of a smartphone or tablet, or a function of selecting a viewpoint. This can be achieved by viewing a free-viewpoint video on a VR terminal equipped with an acceleration sensor and selecting the viewpoint according to the movement of the user.

前記視聴端末３において、ストレージ３０１には、前記自由視点映像生成装置２からネットワーク経由で送信されたリプレイ用フォーマットおよび各種の空間情報が蓄積される。ユーザ操作検知部３０２は、端末ユーザUによる仮想視点の操作RQ1、リプレイ用フォーマットの記録開始要求RQ2および記録終了要求RQ3ならびにリプレイ要求RQ4の各操作を検知して、ネットワーク経由で自由視点映像生成装置２へ送信する。 In the viewing terminal 3, the storage 301 stores a replay format and various spatial information transmitted from the free viewpoint video generation device 2 via a network. The user operation detection unit 302 detects each operation of the virtual viewpoint operation RQ1, the recording start request RQ2, the recording end request RQ3, and the replay request RQ4 by the terminal user U, and is a free viewpoint video generation device via the network. Send to 2.

レンダリング情報取得部３０３は、前記リプレイ要求RQ4に応答して、前記ストレージ３０１に蓄積されているリプレイ用フォーマットを参照し、当該フォーマットがどの自由視点映像のものなのかを特定する。そして、特定した自由視点映像の背景3Dモデルを自由視点映像生成装置２から取得し、更にその背景3Dモデルに基づいて、各フレームの再構成に必要なテクスチャ、マスク画像、ビルボードの位置などの空間情報を自由視点映像生成装置２から取得する。取得した各情報はストレージ３０１に一時記憶される。 In response to the replay request RQ4, the rendering information acquisition unit 303 refers to the replay format stored in the storage 301, and specifies which free-viewpoint video the format belongs to. Then, the background 3D model of the specified free-viewpoint video is acquired from the free-viewpoint video generator 2, and based on the background 3D model, the texture, mask image, billboard position, etc. required for reconstructing each frame are obtained. Spatial information is acquired from the free viewpoint image generator 2. Each acquired information is temporarily stored in the storage 301.

前記リプレイ映像再生部３０４は、前記リプレイ用フォーマットおよび前記自由視点映像生成装置２から取得した空間情報に基づいてレンダリングを実行し、前記自由視点映像のリプレイ映像を再生する。 The replay video reproduction unit 304 executes rendering based on the replay format and the spatial information acquired from the free viewpoint video generation device 2, and reproduces the replay video of the free viewpoint video.

本実施例では、自由視点映像のレンダリングを視聴端末３で行うことになるが、視聴端末３では、視点情報によって得られる仮想視点Pからの映像をレンダリングする処理が必要となる。すなわち、取得した背景3Dモデルの各ビルボードの立ち位置に、仮想視点Pに正対するような形でビルボードを立て、そこに取得したテクスチャをマスク画像によりマスクして張り付けることで3D空間を再構成する。 In this embodiment, the free viewpoint video is rendered on the viewing terminal 3, but the viewing terminal 3 needs a process of rendering the video from the virtual viewpoint P obtained from the viewpoint information. In other words, a billboard is erected at the standing position of each billboard of the acquired background 3D model so as to face the virtual viewpoint P, and the acquired texture is masked with a mask image and pasted to create a 3D space. Reconstruct.

一方、複数のカメラ映像からビルボードの立ち位置を推定し、マスク画像を生成するといったコストの高い処理は行わないことから、近年のスマートフォン等のスペックを鑑みれば、視聴端末側で上記のレンダリングを行うことは十分に可能である。 On the other hand, since high-cost processing such as estimating the standing position of the billboard from multiple camera images and generating a mask image is not performed, the above rendering is performed on the viewing terminal side in view of the specifications of recent smartphones and the like. It is quite possible to do.

本実施形態によれば、レンダリングを自由視点映像生成装置２において行う場合と比較べて、背景3Dモデルの伝送が一度で済むのみならず、その他のフレームでも、見えているオブジェクトの空間情報のみを伝送すれば良い。例えば、サッカーの映像などを考えたときに、全体の絵に対して選手の存在する領域は非常に小さい場合が多く、毎回レンダリング後の映像を送るよりも、テクスチャやマスク画像だけを伝送した方がデータ量としては軽量で済むケースが多い。 According to the present embodiment, as compared with the case where the rendering is performed by the free viewpoint video generator 2, not only the background 3D model can be transmitted only once, but also the spatial information of the visible object can be obtained only in other frames. It should be transmitted. For example, when considering a soccer image, the area where the player exists is often very small for the entire picture, and it is better to transmit only the texture or mask image rather than sending the rendered image each time. However, there are many cases where the amount of data is lightweight.

また、各テクスチャは一度伝送された後、視聴端末において保存され続けるようにすれば、一度リプレイ再構成を行った視聴端末３にはテクスチャが残り続けるため、既にダウンロードされているテクスチャ番号については再度ダウンロードする必要がなく、ネットワークがなくてもリプレイ映像を再構成することが可能になる。 Also, if each texture is transmitted once and then kept saved in the viewing terminal, the texture will continue to remain in the viewing terminal 3 that has been replayed once, so the texture numbers that have already been downloaded will be re-recorded. There is no need to download and it will be possible to reconstruct the replay video without a network.

図３は、前記リプレイ用フォーマットの第１の例を示した図であり、ヘッダ情報と時系列情報とで構成されている。 FIG. 3 is a diagram showing a first example of the replay format, and is composed of header information and time-series information.

ヘッダ情報において、「自由視点映像ID」は、各リプレイ用フォーマットを一意に識別するために用いられる。この自由視点映像IDは一度記録されればよいことから、フォーマットのヘッダに書き込まれる情報となる。「合計フレーム数」は、当該リプレイ用フォーマットに基づいて再構成されるリプレイ映像のフレーム数であり、再生時間に対応している。 In the header information, the "free viewpoint video ID" is used to uniquely identify each replay format. Since this free viewpoint video ID only needs to be recorded once, it is information to be written in the header of the format. The "total number of frames" is the number of frames of the replay video reconstructed based on the replay format, and corresponds to the playback time.

時系列情報はフレームごとに生成され、「再生時刻識別子」には、リプレイ映像における各フレームの位置（時刻）を特定する情報が記録される。図示の例では、２フレーム分の視点時系列情報（２１，２２）のみが示されているが、「合計フレーム数」が２００であれば、２００フレーム分の時系列情報が連結されることになる。 Time-series information is generated for each frame, and information that identifies the position (time) of each frame in the replay video is recorded in the "playback time identifier". In the illustrated example, only the viewpoint time series information (21, 22) for two frames is shown, but if the "total number of frames" is 200, the time series information for 200 frames will be concatenated. Become.

例えば、毎秒３０フレームで１分間の自由視点映像に関して、その開始から１０秒のタイミングで記録開始要求RQ2が検知され、２０秒のタイミングで記録終了要求RQ3が検知されると、時系列情報は、「再生時刻識別子」が３００の情報から６００の情報までを時系列で連結して構成される。なお、「再生時刻識別子」はフレーム番号に限定されるものではなく、絶対的な時刻情報または相対的な時間情報であっても良い。このように、本実施形態では各フレームをリプレイするための情報が時系列で管理されるので、コンテンツサーバに記録されている音声が時刻情報で管理されていれば、時刻ベースで映像及び音声を簡単に同期再生できるようになる。 For example, for a 1-minute free-viewpoint video at 30 frames per second, when the recording start request RQ2 is detected at the timing of 10 seconds from the start and the recording end request RQ3 is detected at the timing of 20 seconds, the time-series information is The "playback time identifier" is configured by concatenating 300 information to 600 information in chronological order. The "playback time identifier" is not limited to the frame number, and may be absolute time information or relative time information. As described above, in the present embodiment, the information for replaying each frame is managed in chronological order. Therefore, if the audio recorded in the content server is managed by the time information, the video and audio can be recorded on a time basis. You will be able to easily play in sync.

また、本実施形態では仮想視点Pを特定する情報として、図４に示したように、視点の３次元位置座標を表す「視点位置E(ex，ey，ez)」、視点の方向（視線）を表す「視線方向D(dx，dy，dz)」および視点の姿勢情報を表す「姿勢方向U(ux，uy，uz)」を採用し、視点情報が３つの３次元ベクトルの計９つのパラメータで特定される。 Further, in the present embodiment, as the information for specifying the virtual viewpoint P, as shown in FIG. 4, the “viewpoint position E (ex, ey, ez)” representing the three-dimensional position coordinates of the viewpoint, the direction of the viewpoint (line of sight). The "line-of-sight direction D (dx, dy, dz)" that represents the viewpoint information and the "posture direction U (ux, uy, uz)" that represents the posture information of the viewpoint are adopted, and the viewpoint information is a total of nine parameters of three-dimensional vectors. Specified by.

なお「姿勢方向」とは、ある視点位置からある方向を見ている場合に、表示に用いるスクリーンのどちらが上になるのかを示す情報である。視点位置および視線方向が同じであっても、直立した状態で観た映像と、逆立ちした状態で観た映像とでは映像が上下反転するので、どちらが上になるのかという姿勢情報があって初めて、リプレイ動画の再構成が可能となる。 The "posture direction" is information indicating which of the screens used for display is on top when looking in a certain direction from a certain viewpoint position. Even if the viewpoint position and line-of-sight direction are the same, the image is upside down between the image viewed in an upright position and the image viewed in an upright position. Replay videos can be reconstructed.

前記「カメラ番号」は、仮想視点Pと方向が最も近い実カメラcamの識別子である。「テクスチャ番号」は、現在の仮想視点Pにおいて見えているオブジェクトのテクスチャの番号である。「ビルボード位置」は、現在の仮想視点において見えているオブジェクトをモデル化するビルボードの座標位置と当該ビルボードに張り付けるテクスチャの識別子との関係を表している。本実施形態では、このような時系列情報が所定の周期、例えばフレーム単位で構築され、前記「再生時刻識別子」で管理されて順次に連結される。 The "camera number" is an identifier of the real camera cam whose direction is closest to the virtual viewpoint P. The "texture number" is the number of the texture of the object visible in the current virtual viewpoint P. The "billboard position" represents the relationship between the coordinate position of the billboard that models the object visible in the current virtual viewpoint and the identifier of the texture attached to the billboard. In the present embodiment, such time-series information is constructed in a predetermined cycle, for example, in frame units, managed by the "reproduction time identifier", and sequentially connected.

図５は、前記リプレイ用フォーマットの構築から当該リプレイ用フォーマットに基づくリプレイ映像の再生までの手順を示したシーケンスフローである。 FIG. 5 is a sequence flow showing a procedure from the construction of the replay format to the reproduction of the replay video based on the replay format.

時刻t1では、視聴端末３から自由視点映像生成装置２へ映像の視聴要求RQ1が送信される。自由視点映像生成装置２は、前記視聴要求RQ1に応答して、時刻t2において映像コンテンツの配信を開始する。時刻t3では、前記映像コンテンツを取得した視聴端末３において前記映像が再生される。 At time t1, the video viewing request RQ1 is transmitted from the viewing terminal 3 to the free viewpoint video generator 2. The free-viewpoint video generation device 2 starts distribution of video content at time t2 in response to the viewing request RQ1. At time t3, the video is played back on the viewing terminal 3 that has acquired the video content.

時刻t4において、端末ユーザUが視聴端末３に対して自由視点映像を視聴するための視点操作を行い、これが前記ユーザ操作検知部３０２により検知されると、時刻t5では、端末ユーザUが選択した仮想視点Pを特定する視点情報が視聴端末３から自由視点映像生成装置２へ転送される。自由視点映像生成装置２では、時刻t6において、自由視点映像生成部２０１が前記視点情報および各カメラ映像に基づいてレンダリングを実施し、自由視点映像を生成する。時刻t7では、前記自由視点映像が視聴端末３へ配信され、時刻t8で再生される。 At time t4, the terminal user U performs a viewpoint operation for viewing the free viewpoint video to the viewing terminal 3, and when this is detected by the user operation detection unit 302, the terminal user U selects at time t5. The viewpoint information that identifies the virtual viewpoint P is transferred from the viewing terminal 3 to the free viewpoint video generator 2. In the free viewpoint image generation device 2, at time t6, the free viewpoint image generation unit 201 performs rendering based on the viewpoint information and each camera image, and generates a free viewpoint image. At time t7, the free viewpoint video is delivered to the viewing terminal 3 and played back at time t8.

時刻t9において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録開始を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t10では、記録開始要求RQ2が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t11において、前記フォーマット記録部２０２が前記記録開始要求RQ2に応答して、再生中の自由視点映像に関してリプレイ用フォーマットの記録を開始する。当該リプレイ用フォーマットの記録は、視聴端末３からの記録終了要求RQ3が検知されるまでフレーム単位で繰り返される。 At time t9, the terminal user U operates the viewing terminal 3 to request the start of recording of the replay video, and when this is detected by the user operation detection unit 302, at time t10, the recording start request RQ2 is sent from the viewing terminal 3. It is transmitted to the free viewpoint video generator 2. In the free-viewpoint video generation device 2, at time t11, the format recording unit 202 starts recording the replay format for the free-viewpoint video being reproduced in response to the recording start request RQ2. Recording of the replay format is repeated in frame units until the recording end request RQ3 from the viewing terminal 3 is detected.

一般に、自由視点映像生成部２０１が自由視点映像を再生している場合、ユーザが選択している仮想視点Pの情報は自由視点映像生成部２０１から取得することが可能である。本実施形態でも、リプレイ映像の記録開始RQ2が検知されると、フォーマット記録部２０２が自由視点映像生成部２０１から視点情報をフレーム単位で取得し、そのパラメータをリプレイ用フォーマットに記録する。 Generally, when the free viewpoint video generation unit 201 is playing back the free viewpoint video, the information of the virtual viewpoint P selected by the user can be acquired from the free viewpoint video generation unit 201. Also in this embodiment, when the recording start RQ2 of the replay video is detected, the format recording unit 202 acquires the viewpoint information from the free viewpoint video generation unit 201 in frame units, and records the parameters in the replay format.

本実施形態では、フレーム単位で「視点位置E(ex，ey，ez)」、「視線方向D(dx，dy，dz)」および「姿勢方向U(ux，uy，uz)」が記録される。さらに、現在の視点で見えるオブジェクトのビルボードを立てる位置の位置情報が記録される。さらに、各ビルボードに張り付ける対応オブジェクトのテクスチャ番号およびカメラ番号が記録される。 In this embodiment, "viewpoint position E (ex, ey, ez)", "line-of-sight direction D (dx, dy, dz)" and "posture direction U (ux, uy, uz)" are recorded in frame units. .. In addition, the position information of the position where the billboard of the object seen from the current viewpoint is erected is recorded. In addition, the texture number and camera number of the corresponding object to be attached to each billboard are recorded.

その後、時刻t12において、端末ユーザUが視聴端末３を操作してリプレイ映像の記録終了を要求し、これが前記ユーザ操作検知部３０２により検知されると、時刻t13では、記録終了要求RQ3が視聴端末３から自由視点映像生成装置２へ送信される。自由視点映像生成装置２では、時刻t14において、フォーマット記録部２０２がリプレイ用フォーマットの記録を終了する。時刻t15では、前記生成されたリプレイ用フォーマットが視聴端末３へ転送され、時刻t16において、視聴端末３のストレージ３０１に蓄積される。 After that, at time t12, the terminal user U operates the viewing terminal 3 to request the end of recording of the replay video, and when this is detected by the user operation detection unit 302, at time t13, the recording end request RQ3 is sent to the viewing terminal. It is transmitted from 3 to the free viewpoint image generation device 2. In the free viewpoint video generation device 2, at time t14, the format recording unit 202 ends recording of the replay format. At time t15, the generated replay format is transferred to the viewing terminal 3, and at time t16, it is stored in the storage 301 of the viewing terminal 3.

その後、時刻t17において、前記自由視点映像のリプレイを所望するユーザが、前記ストレージ上のリプレイ用フォーマットを指定してリプレイを要求し、これが前記ユーザ操作検知部３０２により検知されると、前記レンダリング情報取得部３０３が前記リプレイ用フォーマットを解釈し、フォーマットに記述されている自由視点映像IDに基づいて、当該フォーマットがどの自由視点映像のリプレイ動画なのかを突き止める。 After that, at time t17, a user who desires to replay the free viewpoint video specifies a replay format on the storage and requests replay, and when this is detected by the user operation detection unit 302, the rendering information. The acquisition unit 303 interprets the replay format and determines which free-viewpoint video replay video the format is based on the free-viewpoint video ID described in the format.

時刻t18では、視聴端末３が映像のリプレイに必要な情報を前記リプレイ用フォーマットに基づいて自由視点映像生成装置２へ要求（RQ4）する。本実施形態では、リプレイ用フォーマットの自由視点映像IDに紐付けられている背景3Dモデルが要求され、時刻t19では、自由視点映像生成装置２が当該要求に応答して背景3Dモデルを配信する。 At time t18, the viewing terminal 3 requests the free viewpoint video generator 2 (RQ4) for information necessary for video replay based on the replay format. In the present embodiment, a background 3D model associated with the free-viewpoint video ID in the replay format is requested, and at time t19, the free-viewpoint video generator 2 distributes the background 3D model in response to the request.

時刻t20では、前記リプレイ用フォーマットおよび取得した背景3Dモデルに基づいて、前記リプレイ映像再生部３０４がレンダリングを実施し、自由視点映像のリプレイ映像の再生が開始される。リプレイ映像の再生中、レンダリング情報取得部３０３はフレーム単位で前記リプレイ用フォーマットに基づき、ビルボードの位置、マスク画像およびテクスチャなどの空間情報を自由視点映像生成装置２に要求して取得する。 At time t20, the replay video reproduction unit 304 renders based on the replay format and the acquired background 3D model, and the reproduction of the replay video of the free viewpoint video is started. During playback of the replay video, the rendering information acquisition unit 303 requests and acquires spatial information such as a billboard position, a mask image, and a texture from the free viewpoint video generation device 2 based on the replay format on a frame-by-frame basis.

そして、リプレイ映像再生部３０４が前記フォーマットに記載されている自由視点空間情報に基づいて高効率に3D空間の再構成を行い、その再構成を行った空間に対して、フォーマットに記録されている視点位置から見た画像を、取得した空間情報に基づいてレンダリングすることでリプレイ映像が再構成される。 Then, the replay image reproduction unit 304 reconstructs the 3D space with high efficiency based on the free viewpoint space information described in the format, and the reconstructed space is recorded in the format. The replay image is reconstructed by rendering the image viewed from the viewpoint position based on the acquired spatial information.

図６は、前記リプレイ用フォーマットの他の例を示した図である。上記の実施形態では、視点情報が視点位置E(ex，ey，ez)、視線方向D(dx，dy，dz)、姿勢方向U(ux，uy，uz)の各３次元ベクトル、９パラメータで表現されるものとして説明した。しかしながら、パラメータが変化しないときに、その冗長性を排除してデータサイズの削減を行う機能を備えてもよく、これはフレーム間でパラメータが変化しないときに、後のパラメータを記述しないことで実現できる。 FIG. 6 is a diagram showing another example of the replay format. In the above embodiment, the viewpoint information is a three-dimensional vector of the viewpoint position E (ex, ey, ez), the line-of-sight direction D (dx, dy, dz), and the posture direction U (ux, uy, uz), and 9 parameters. Explained as being expressed. However, when the parameter does not change, it may be provided with a function to eliminate the redundancy and reduce the data size, which is realized by not describing the later parameter when the parameter does not change between frames. can.

例えば、視点が平行移動する際は視線方向Dや姿勢方向Uは変化せず、視点位置Eのみが変化する場合がある。本実施形態では、このような視点の動きが検知されると、図６に示したように、次フレームの時系列情報に関しては視線方向Dおよび姿勢方向Uの記録を省略することにより、データサイズの削減および処理負荷の軽減が可能になる。また、記録するパラメータはデータサイズの削減のために、一定の桁数で丸めて近似値として記録してもよい。 For example, when the viewpoint moves in parallel, the line-of-sight direction D and the posture direction U do not change, and only the viewpoint position E may change. In the present embodiment, when such a movement of the viewpoint is detected, as shown in FIG. 6, the data size of the time-series information of the next frame is omitted by omitting the recording of the line-of-sight direction D and the posture direction U. And the processing load can be reduced. Further, the parameters to be recorded may be rounded to a certain number of digits and recorded as approximate values in order to reduce the data size.

図７は、前記リプレイ用フォーマットの更に他の例を示した図である。本実施形態では、前記視線方向D(dx，dy，dz)に代えて注視点位置F(fx，fy，fz)を保存するようにした点に特徴がある。注視点位置Fとは、視線方向D上にある特定の一点の位置を示している。視線方向Dは、視点位置Eおよび注視点位置Fから次式(1)で求められる。 FIG. 7 is a diagram showing still another example of the replay format. The present embodiment is characterized in that the gazing point position F (fx, fy, fz) is stored instead of the line-of-sight direction D (dx, dy, dz). The gazing point position F indicates the position of a specific point on the line-of-sight direction D. The line-of-sight direction D is obtained from the viewpoint position E and the gazing point position F by the following equation (1).

このように、視点方向D(dx，dy，dz)の代わりに注視点位置F(fx，fy，fz)を採用することにより冗長性を排除できる場合がある。例えば、注視点Fを中心に回転するような動きが視聴端末３のスワイプ操作などに割り当てられていると、注視点Fを中心に回転する動きが多く登場することが考えられる。このような場合、本実施形態によれば注視点位置F(fx，fy，fz)が変化しないので冗長性の排除が可能になる。 In this way, redundancy may be eliminated by adopting the gazing point position F (fx, fy, fz) instead of the viewpoint direction D (dx, dy, dz). For example, if a movement that rotates around the gazing point F is assigned to a swipe operation of the viewing terminal 3, it is conceivable that many movements that rotate around the gazing point F will appear. In such a case, according to the present embodiment, the gazing point position F (fx, fy, fz) does not change, so that redundancy can be eliminated.

前記視点情報の更に他の例として、回転移動量（回転角度）および平行移動量を視点情報のパラメータとして採用しても良い。 As still another example of the viewpoint information, a rotational movement amount (rotation angle) and a parallel movement amount may be adopted as parameters of the viewpoint information.

ある視点を得るためには、ワールド座標系の原点を中心としてx軸を中心に回転量θx、y軸を中心に回転量θy、z軸を中心に回転量θzだけ視点を回転させ、さらに視点位置までの平行移動T(tx，ty，tz)を行うことで視点の位置、視線方向および姿勢を特定できる。したがって、回転量θx、θy、θzおよび平行移動量T(tx，ty，tz)の６つのパラメータから視点を再構成できる。 To obtain a certain viewpoint, rotate the viewpoint by the amount of rotation θx around the x-axis, the amount of rotation θy around the y-axis, and the amount of rotation θz around the z-axis around the origin of the world coordinate system. By translating to the position T (tx, ty, tz), the position of the viewpoint, the direction of the line of sight, and the posture can be specified. Therefore, the viewpoint can be reconstructed from the six parameters of the rotation amount θx, θy, θz and the translation amount T (tx, ty, tz).

なお、この例では少ないパラメータから視点を再構成できるが、回転や平行移動を施す前の、視点のデフォルトの位置や方向、姿勢が明確に決められている必要がある。これはつまり、回転や平行移動などを何も施さない場合、「視点位置はワールド座標系の原点にあり、z軸の正の方向を向いており、姿勢はy軸の正方向を上にしている」といったような初期値が決まっている必要があることを意味しており、視聴端末３のリプレイ映像再生部３０４でも、初期の視点情報を認識している必要がある。この情報はフォーマット自体に書き込んでやり取りしてもよいが、自由視点映像生成部２０１において自由視点映像の再生を行う場合の初期位置を、そのまま初期位置として定めてもよい。 In this example, the viewpoint can be reconstructed from a small number of parameters, but the default position, direction, and posture of the viewpoint must be clearly determined before rotation or translation. This means that if you do not rotate or translate anything, "the viewpoint position is at the origin of the world coordinate system, pointing in the positive direction of the z-axis, and the attitude is facing up in the positive direction of the y-axis. It means that it is necessary to determine an initial value such as "is", and the replay video reproduction unit 304 of the viewing terminal 3 also needs to recognize the initial viewpoint information. This information may be written in the format itself and exchanged, but the initial position when the free viewpoint image is reproduced in the free viewpoint image generation unit 201 may be set as the initial position as it is.

前記視点情報の更に他の例として、ビュー変換行列を記録する形態を採用しても良い。ビュー変換行列とは、ワールド座標系から視点の座標系（カメラ座標系）への変換を行う変換行列を指し示すものであり、この変換行列を用いれば、視点の位置と方向、姿勢情報について特定することが可能である。ここでは、ビュー座標行列は同次座標系で示されるものすると、４×４の変換行列Mは次式(2)で表される。 As yet another example of the viewpoint information, a form of recording a view transformation matrix may be adopted. The view transformation matrix points to a transformation matrix that transforms the world coordinate system to the viewpoint coordinate system (camera coordinate system), and by using this transformation matrix, the position, direction, and attitude information of the viewpoint are specified. It is possible. Here, assuming that the view coordinate matrix is represented by a homogeneous coordinate system, the 4 × 4 transformation matrix M is represented by the following equation (2).

このような行列はOpenGLやDirectXなどの一般に普及した3D表示を行うライブラリにおいて頻繁に使われるものであり、視点位置E(ex，ey，ez)、視線方向D(dx，dy，dz)、姿勢方向U(ux，uy，uz)などからビュー変換行列Mを計算することが多い。したがって、予めビュー変換行列を保存しておけば、ライブラリなどで用いることを考えた場合に、最も簡単に変換行列を取得できるため処理コストが少なくなる。 Such a matrix is frequently used in popular 3D display libraries such as OpenGL and DirectX, and has a viewpoint position E (ex, ey, ez), a line-of-sight direction D (dx, dy, dz), and a posture. The view transformation matrix M is often calculated from the direction U (ux, uy, uz). Therefore, if the view transformation matrix is saved in advance, the transformation matrix can be obtained most easily when it is considered to be used in a library or the like, and the processing cost is reduced.

さらに、上記の各フォーマットの例では、原則として視点情報をそのまま記録したが、図７に示したように、前フレームとの差分値のみを記録するようにしても良い。 Further, in the above examples of each format, the viewpoint information is recorded as it is in principle, but as shown in FIG. 7, only the difference value from the previous frame may be recorded.

このような形式のフォーマットでは、フレーム間の差分値は小さくなりやすいため、小さい値が多く書き込まれるという特徴がある。値が小さくなる場合、通常「0」などの同じ値の並びが発生しやすくなることが考えられる。このような、同じ値の並びが発生しやすくなる符号列に対して、ハフマン符号化に代表されるようなエントロピー符号化を行うことによって、更なるデータサイズの削減を実施できる可能性がある。 In such a format, the difference value between frames tends to be small, so that a large number of small values are written. When the value becomes small, it is considered that the same value sequence such as "0" is likely to occur. There is a possibility that the data size can be further reduced by performing entropy coding as typified by Huffman coding for such a code string in which the same sequence of values is likely to occur.

しかしながら、途中から再生を行いたい場合などには、最初のフレームからの差分を足し合わせて途中のフレームの値を計算しなければならないため、計算コストが大きくなりがちである。したがって、数フレームかに１枚は通常の差分ではないパラメータを記載し、他のフレームでは、前のフレームからの差分値を記録するようなフォーマットとすることも可能である。 However, when it is desired to reproduce from the middle, the calculation cost tends to be large because the value of the frame in the middle must be calculated by adding the differences from the first frame. Therefore, it is possible to describe a parameter that is not a normal difference in one sheet in several frames, and to record the difference value from the previous frame in the other frames.

この場合には、どのフレームが全ての情報を保持したフレームで、どのフレームが差分情報を保持したフレームなのかがわかるようなフォーマットとする必要がある。図７の例では、差分フレームには識別子「D」、差分フレーム以外には識別子「I」を付することで各フレームを区別するようにしている。 In this case, it is necessary to format the frame so that it can be understood which frame holds all the information and which frame holds the difference information. In the example of FIG. 7, each frame is distinguished by assigning an identifier "D" to the difference frame and an identifier "I" other than the difference frame.

前記フォーマット記録部２０２は、上記の各方式で各種のパラメータを各フレームに渡って記述していくことで視点の情報を記録する。自由視点情報記録部２０２ａは、自由視点映像生成部２０１から受け取る視点に関する情報を、フォーマットに記載する形式になるように変換や整形する機能を持たなくてはならない。 The format recording unit 202 records information on the viewpoint by describing various parameters over each frame in each of the above methods. The free viewpoint information recording unit 202a must have a function of converting or shaping the information regarding the viewpoint received from the free viewpoint video generation unit 201 so as to be described in the format.

ここでいう変換や整形とは、例えば自由視点映像生成部２０１で、ユーザの視点を得るためにワールド座標系からカメラ座標系への視点の変換行列（ビュー変換行列）を用いて特定視点からの映像を生成しているとすると、この変換行列を取得して、変換行列から視点の位置座標の３次元ベクトルなどの情報を得るまでの計算処理や、あるいは決まった桁数で記録する数値を切り捨て、丸める処理などの、フォーマットに適した形式へと変換する処理を指す。 The transformation and shaping referred to here is, for example, in the free viewpoint image generation unit 201, using a viewpoint transformation matrix (view transformation matrix) from the world coordinate system to the camera coordinate system in order to obtain the user's viewpoint from a specific viewpoint. Assuming that an image is being generated, the calculation process from acquiring this transformation matrix to obtaining information such as the three-dimensional vector of the position coordinates of the viewpoint from the transformation matrix, or truncating the numerical value to be recorded with a fixed number of digits. , Refers to the process of converting to a format suitable for the format, such as the process of rounding.

図８は、本発明の第２実施形態が採用する自由視点技術を説明するための図であり、図８は、第２実施形態におけるリプレイ用フォーマットの例を示した図である。 FIG. 8 is a diagram for explaining a free viewpoint technique adopted by the second embodiment of the present invention, and FIG. 8 is a diagram showing an example of a replay format in the second embodiment.

第１実施形態では、自由視点映像生成部２０１がビルボード方式を採用して自由視点映像を生成するものとして説明した。これに対して、本実施形態は特許文献１に示されているように、オブジェクトの3Dモデルの形状を正確に復元する方式（ここでは、「逆投影面を用いたフルモデル方式」と表現する）を採用して自由視点映像を生成する点に特徴がある。 In the first embodiment, it has been described that the free viewpoint image generation unit 201 adopts the billboard method to generate the free viewpoint image. On the other hand, as shown in Patent Document 1, this embodiment is expressed as a method of accurately restoring the shape of a 3D model of an object (here, a "full model method using a back projection plane"). ) Is adopted to generate a free-viewpoint image.

自由視点映像生成部２０１がフルモデル方式を採用する場合、オブジェクトの3D形状を復元するために多数の逆投影面P1，P2…を仮想視点Pに正対する形で並べる。次いで、各逆投影面P1，P2…に対して、背景差分法などで得られた対象オブジェクトのマスク画像を投影し、その視体積を計算することで、逆投影面ごとに3Dモデル化を行い、更に対象オブジェクトのテクスチャ画像をマッピングすることで逆投影面の色付けを行う。したがって、逆投影面を適切に削り出すことで3Dモデルの復元が可能である。 When the free viewpoint image generation unit 201 adopts the full model method, a large number of back projection planes P1, P2 ... Are arranged so as to face the virtual viewpoint P in order to restore the 3D shape of the object. Next, a mask image of the target object obtained by background subtraction is projected onto each back-projection surface P1, P2 ..., and the visual volume is calculated to create a 3D model for each back-projection surface. Furthermore, the back projection surface is colored by mapping the texture image of the target object. Therefore, it is possible to restore the 3D model by properly carving out the back projection plane.

このような手法では、各逆投影面P1，P2…が常に仮想視線Pと直交する形で配置されるため、各逆投影面P1，P2…の位置は仮想視点Pの位置に依存して変化する。フォーマット記録部２０２は、視聴端末３からのリプレイ映像の記録開始要求RQ2に応答してフォーマットの記録を開始する。この際、第１実施形態と同様に、自由視点映像IDおよび合計フレーム数がヘッダに記録され、視点情報も第１実施形態と同様の手法でフレームごとに記録する。 In such a method, since the back projection planes P1, P2 ... Are always arranged orthogonal to the virtual line of sight P, the positions of the back projection planes P1, P2 ... change depending on the position of the virtual viewpoint P. do. The format recording unit 202 starts recording the format in response to the recording start request RQ2 of the replay video from the viewing terminal 3. At this time, as in the first embodiment, the free viewpoint video ID and the total number of frames are recorded in the header, and the viewpoint information is also recorded for each frame by the same method as in the first embodiment.

空間情報記録部２０２ｂは、多数の逆投影面P1，P2…の中で、モデルが生成される面のインデックスのみを空間情報として記録する。すなわち、本実施形態ではモデルが生成されない面のインデックスは記録されない。例えば、図８に示した例では、円筒状のオブジェクトが空間に存在しているが、そのモデルが生成されるのはP2，P3，P4のみである。したがって、図９に示したように、そのインデックスとして「2 3 4」のみが記録される。 The spatial information recording unit 202b records only the index of the surface on which the model is generated as spatial information among a large number of back projection surfaces P1, P2 ... That is, in this embodiment, the index of the surface on which the model is not generated is not recorded. For example, in the example shown in FIG. 8, a cylindrical object exists in space, but the model is generated only in P2, P3, and P4. Therefore, as shown in FIG. 9, only "2 3 4" is recorded as the index.

例えばサッカーのように、選手が広いフィールド内の一部に離散的に存在する自由視点映像では、フィールド全体に逆投影面を配置するとモデルの生成されない無駄な逆投影面が多く発生し、このような面の計算を、リプレイ動画再生時に再度行うことは無駄である。 For example, in a free-viewpoint image in which players are discretely present in a part of a wide field such as soccer, if a back projection plane is arranged over the entire field, many useless back projection planes that do not generate a model occur. It is useless to recalculate the various aspects when playing the replay video.

これに対して、本実施形態では予めモデルの生成される逆投影面と生成されない逆投影面とを識別できるので、効率的なメモリ確保が可能となり、またモデルの生成されない逆投影面に関してはマスク画像を逆投影する計算も不要となるので計算負荷が減ぜられる。 On the other hand, in the present embodiment, since the back projection plane in which the model is generated and the back projection plane in which the model is not generated can be distinguished in advance, efficient memory can be secured, and the back projection plane in which the model is not generated can be masked. Since the calculation of back-projecting the image is not required, the calculation load is reduced.

特に、本実施形態が採用する特許文献１のフルモデル方式は、GPUを用いて並列計算を行うことが特許文献１でも触れられており、逆投影面の枚数を減らすことは省メモリ化につながる。その結果、メモリのアクセスに要する時間なども減らすことができることから、計算資源の節約と計算の高速化を実現できる。 In particular, in the full model method of Patent Document 1 adopted in this embodiment, it is mentioned in Patent Document 1 that parallel calculation is performed using a GPU, and reducing the number of back projection planes leads to memory saving. .. As a result, the time required to access the memory can be reduced, so that it is possible to save computational resources and speed up the calculation.

空間情報記録部２０２ｂは、3D空間を再現する際に計算する必要のある投影面のインデックスを記録することで、計算の高速化および計算資源の節約を図る。逆投影面に付するインデックスについては、視点に正対する逆投影面が１０００枚存在する場合、視点に近い方から順番に１～１０００のようにインデックスを振っていく方式が考えられる。図９に示したフォーマットの例では、モデルが生成される３枚の逆投影面P2，P3、P4を代表するインデックスとして「2 3 4」が記録されている。このようにして記録されたリプレイ用フォーマットは、第１実施形態と同様に視聴端末３へ転送されて蓄積され、後にリプレイ時に参照されることになる。 The spatial information recording unit 202b records the index of the projection plane that needs to be calculated when reproducing the 3D space, thereby speeding up the calculation and saving computational resources. As for the index attached to the back-projection surface, if there are 1000 back-projection surfaces facing the viewpoint, a method of allocating the index in order from the one closest to the viewpoint, such as 1 to 1000, can be considered. In the example of the format shown in FIG. 9, "2 3 4" is recorded as an index representing the three back projection planes P2, P3, and P4 from which the model is generated. The replay format recorded in this way is transferred to the viewing terminal 3 and stored in the same manner as in the first embodiment, and is later referred to at the time of replay.

視聴端末３では、リプレイ動画再生部３０４が蓄積されているリプレイ用フォーマットに基づいてリプレイ映像を再構成する。この際、リプレイ映像フォーマットの視点情報に基づいて視点を確定し、この視点に基づいて逆投影面を配置するが、前記インデックスを参照することでモデルが生成されない逆投影面を識別し、当該逆投影面については配置と計算を行わない。 In the viewing terminal 3, the replay video reproduction unit 304 reconstructs the replay video based on the stored replay format. At this time, the viewpoint is determined based on the viewpoint information of the replay video format, and the back projection plane is arranged based on this viewpoint, but the back projection plane from which the model is not generated is identified by referring to the index, and the reverse projection plane is identified. No placement or calculation is performed on the projection plane.

これにより、モデルが生成されることが約束されている逆投影面のみ計算を行って３Ｄ空間を再現することができる。その後、この３Ｄ空間に対して視点からの映像のレンダリングを行い、レンダリング画像を視聴端末３へと伝送することでリプレイ映像の再生を実現する。 As a result, it is possible to reproduce the 3D space by performing calculations only on the back projection plane where the model is promised to be generated. After that, the image from the viewpoint is rendered in this 3D space, and the rendered image is transmitted to the viewing terminal 3 to realize the reproduction of the replay image.

なお、本実施例では自由視点映像生成装置２においてレンダリングを行っているが、例えば自由視点映像を構成するための全ての動画を視聴端末３に予め配信し、端末側でレンダリングを行うようにしても良い。この場合でも、予めモデルの生じない逆投影面のインデックスを記録しておけば、計算の高速化と計算資源の節約を行うことが可能である。 In this embodiment, rendering is performed by the free viewpoint video generation device 2, but for example, all the moving images for constituting the free viewpoint video are distributed in advance to the viewing terminal 3 and the rendering is performed on the terminal side. Is also good. Even in this case, if the index of the back projection plane where the model does not occur is recorded in advance, it is possible to speed up the calculation and save the calculation resources.

また、このような構成では自由視点映像生成装置２にリプレイ映像再生機能が設けられ、3D空間の完全な再構成が可能となる。このため、リプレイ動画再生機能が視聴端末３のスペックに関する情報を受信し、再生デバイスの解像度や画面サイズに応じて、視点は同じであるが見える視野や画像の縦横比が変わるようにレンダリング画像を出力する機能を備えてもよい。 Further, in such a configuration, the free viewpoint video generation device 2 is provided with a replay video reproduction function, and the 3D space can be completely reconstructed. Therefore, the replay video playback function receives information about the specifications of the viewing terminal 3, and renders the rendered image so that the visible field of view and the aspect ratio of the image change according to the resolution and screen size of the playback device, although the viewpoint is the same. It may have a function to output.

加えて、複数の視聴端末３が自由視点映像生成装置２に対して同時にリプレイ映像の再生を要求した場合に、同一の視点位置かつ同一の時刻のフレームのレンダリング要求があった場合には、レンダリング結果を保存し、使い回すなどの機構を備えてもよい。 In addition, when a plurality of viewing terminals 3 request the free viewpoint video generation device 2 to play the replay video at the same time, and when there is a request to render a frame at the same viewpoint position and at the same time, rendering is performed. It may be equipped with a mechanism for storing and reusing the results.

１…コンテンツサーバ，２…自由視点映像生成装置，３…視聴端末，２０１…自由視点映像生成部，２０２…フォーマット記録部，２０２ａ…自由視点情報記録部，２０２ｂ…空間情報記録部，３０１…ストレージ，３０２…ユーザ操作検知部，３０３…レンダリング情報取得部，３０４…リプレイ映像再生部 1 ... Content server, 2 ... Free viewpoint video generator, 3 ... Viewing terminal, 201 ... Free viewpoint video generation unit, 202 ... Format recording unit, 202a ... Free viewpoint information recording unit, 202b ... Spatial information recording unit, 301 ... Storage , 302 ... User operation detection unit, 303 ... Rendering information acquisition unit, 304 ... Replay video playback unit

Claims

In a system that reproduces a replay image of a free-viewpoint video by connecting a viewing terminal and a free-viewpoint video generator via a network.
The viewing terminal is
A means to request the reproduction of free-viewpoint video,
Equipped with a means to request the recording of the replay video for the free viewpoint video being played.
The free viewpoint video generator
A means for generating a free viewpoint image based on a plurality of camera images and viewpoint information of a virtual viewpoint in response to the reproduction request.
In response to the recording request, a means for recording a replay format in which a virtual viewpoint is described for each playback time of the replay video in the process of generating the free viewpoint video.
It is equipped with a means to transfer the recorded replay format to the viewing terminal.
The viewing terminal further
A means for acquiring information necessary for replay based on the replay format, and
It is provided with a means for performing rendering based on the replay format and the acquired information and reproducing the replay video.
A system for reproducing a replay image of a free-viewpoint image, characterized in that the information required for the replay is a background 3D model and spatial information for rendering an object on the background 3D model.

The replay format contains header information and time series information.
The time-series information is configured by concatenating a plurality of time-series information recorded in a predetermined cycle.
The system for reproducing a replay image of a free-viewpoint image according to claim 1, wherein a reproduction time ID unique to a reproduction position of the replay image based on the time-series information is recorded in each time-series information.

The generation process includes the generation process of the background 3D model of the free viewpoint image.
The ID of the free viewpoint video is recorded in the replay format, and the ID is recorded.
The system for reproducing the replay image of the free viewpoint image according to claim 1 or 2, wherein the acquisition means acquires the background 3D model based on the ID of the free viewpoint image.

The generation process includes a viewpoint information generation process.
The system for reproducing a replay image of a free viewpoint image according to any one of claims 1 to 3, wherein the viewpoint information is recorded in the replay format.

The system for reproducing a replay image of a free-viewpoint image according to any one of claims 1 to 4, wherein the means for generating the free-viewpoint image adopts a billboard-type free-viewpoint technique.

The generation process includes a process of identifying an object that can be seen from a virtual viewpoint.
The general ID associated with the visible object is recorded in the replay format, and
The system for reproducing a replay image of a free-viewpoint image according to claim 5, wherein the acquisition means acquires rendering information of each object based on the general ID.

The generation process includes a process of generating a mask image of each object.
The mask image is associated with the general ID,
The system for reproducing a replay image of a free viewpoint image according to claim 6, wherein the acquisition means acquires a mask image of each object based on the general ID.

The generation process includes the process of extracting the texture of each object from the camera image.
The texture is associated with the general ID,
The system for reproducing a replay image of a free-viewpoint image according to claim 6, wherein the acquisition means acquires the texture of each object based on the general ID.

Reproduce the replayed image of the free-viewpoint image according to any one of claims 1 to 4, wherein the means for generating the free-viewpoint image adopts a full-model type free-viewpoint technique using a back projection surface. System to do.

The generation process includes a process of arranging a plurality of back projection planes facing the virtual viewpoint at the position of the object, projecting a mask image of the object, performing 3D modeling for each back projection plane, and restoring the 3D model. ,
In the replay format, the index of the back projection plane where the 3D model exists is recorded.
The system for reproducing a replay image of a free viewpoint image according to claim 9, wherein the acquisition means acquires information necessary for replay based on the index of the back projection surface.

The means for generating the free viewpoint video and the means for recording the replay format are implemented in the server on the cloud.
The means for acquiring the information necessary for the replay and the means for playing the replay video are implemented in the viewing terminal of the replay video.
The system for reproducing a replay image of a free viewpoint image according to any one of claims 1 to 10, wherein the replay format is transferred from a server to a viewing terminal and stored on the viewing terminal.

A means for generating the free viewpoint video, a means for recording the format for replay, and a means for acquiring the information necessary for the replay are implemented in a server on the cloud.
The means for reproducing the replay video is implemented in the viewing terminal,
The system for reproducing a replay image of a free viewpoint image according to any one of claims 1 to 10, wherein the replay format is transferred from a server to a viewing terminal and stored on the viewing terminal.

11. A system for reproducing a replay image of the free viewpoint image according to 12.

In a method of playing a replay video of a free viewpoint video by connecting a viewing terminal and a free viewpoint video generator via a network.
The viewing terminal is
Requesting the reproduction of free-viewpoint video,
Requested recording of replay video for free-viewpoint video being played,
The free viewpoint video generator
In response to the playback request, a free viewpoint image is generated based on the viewpoint information of a plurality of camera images and a virtual viewpoint.
In response to the recording request, in the process of generating the free viewpoint video, a replay format in which the virtual viewpoint is described is recorded for each playback time of the replay video.
Transfer the recorded replay format to the viewing terminal and
The viewing terminal further
Obtain the information required for replay based on the replay format, and
Rendering is executed based on the replay format and the acquired information, and the replay video is played back.
A method of reproducing a replay image of a free viewpoint image, characterized in that the information required for the replay is a background 3D model and spatial information for rendering an object on the background 3D model.