JP7320400B2

JP7320400B2 - VIDEO PRODUCTION PROCESSING DEVICE AND PROGRAM THEREOF

Info

Publication number: JP7320400B2
Application number: JP2019144745A
Authority: JP
Inventors: 秀樹三ツ峰; 俊枝三須; 敦志荒井
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2023-08-03
Anticipated expiration: 2039-08-06
Also published as: JP2021027487A

Description

本発明は、ＶＲコンテンツに視線誘導効果を付加する映像演出処理装置及びそのプログラムに関する。 The present invention relates to a video presentation processing device and program for adding visual guidance effects to VR content.

従来、映像コンテンツを視聴する表示デバイスとしては、２次元ディスプレイが主流であった。近年、仮想現実（ＶＲ：Virtual Reality）や拡張現実（ＡＲ：Augmented Reality）のコンテンツを視聴できるヘッドマウントディスプレイやスマートグラスといった表示デバイスの利用も拡大しつつある（例えば、非特許文献１及び２）。以後、ＶＲ及びＡＲのコンテンツをＶＲコンテンツと記載する。 Conventionally, two-dimensional displays have been the mainstream display devices for viewing video content. In recent years, the use of display devices such as head-mounted displays and smart glasses capable of viewing virtual reality (VR) and augmented reality (AR) content is also expanding (for example, Non-Patent Documents 1 and 2). . Hereinafter, VR and AR content will be referred to as VR content.

山口勝、公共放送による３６０°映像のＶＲ配信の意義～２０２０年とその先に向けて、放送研究と調査、２０１７年１０月、Ｐ．９０～９７Masaru Yamaguchi, Significance of VR distribution of 360° video by public broadcasting - Broadcasting research and surveys for 2020 and beyond, October 2017, P. 90-97 神谷直亮、ＮＨＫ番組技術展、ＦＤＩ，２０１８年７月、Ｐ．２８～２９Naosuke Kamiya, NHK Program Technology Exhibition, FDI, July 2018, P. 28-29

しかしながら、コンテンツの内容が同一であるにも関わらず、表示デバイスの種類に合わせて、２次元映像コンテンツとＶＲコンテンツを別々に制作することが多い。このため、ＶＲコンテンツを制作する際、２次元映像コンテンツの制作者（例えば、放送局やプロダクション）のノウハウを十分に活かすことができていない。例えば、２次元映像コンテンツの制作で用いられる映像演出手法である撮影時のカメラワークによる視線・視野の拘束やテロップ、ナレーション又はセリフ無しで「視線を誘導」することによる映像的ストーリーテリングが困難である。 However, in many cases, two-dimensional video content and VR content are produced separately according to the type of display device, even though the contents are the same. For this reason, when creating VR content, the know-how of the creators of two-dimensional video content (for example, broadcasting stations and production companies) cannot be fully utilized. For example, it is difficult to perform visual storytelling by restricting the line of sight and field of view by camera work during shooting, which is a video production method used in the production of 2D video content, and by “guiding the line of sight” without telops, narration, or dialogue. be.

そこで、本発明は、肉眼の動きの特性により被写体のＶＲコンテンツに誘導をかける視線誘導効果を有するＶＲコンテンツを効率的に制作できる映像演出処理装置及びそのプログラムを提供することを課題とする。 Accordingly, an object of the present invention is to provide a video presentation processing apparatus and a program thereof that can efficiently produce VR content having a visual guidance effect that guides the subject to the VR content based on the movement characteristics of the naked eye.

前記した課題に鑑みて、本発明に係る映像演出処理装置は、被写体の３次元形状及び表面模様からなるボリューメトリックキャプチャ情報を、被写体の動きから３次元データにデジタル化するボリューメトリックキャプチャにより生成し、ボリューメトリックキャプチャ情報と被写体を撮影する撮影カメラのカメラパラメータとを用いて、肉眼の動き特性により被写体のＶＲコンテンツに誘導をかける視線誘導効果を付加するようにした映像演出処理装置であって、ボリューメトリックキャプチャ手段と、視体積算出手段と、３次元顕著性マップ生成手段と、被写体認識手段と、重要度設定手段と、重要度乗算手段と、視線誘導手段と、を備える構成とした。 In view of the above-described problems, a video effect processing apparatus according to the present invention generates volumetric capture information consisting of the three-dimensional shape and surface pattern of a subject by volumetric capture that digitizes the movement of the subject into three-dimensional data. , a video production processing device that uses volumetric capture information and camera parameters of a camera that shoots a subject to add a visual guidance effect that guides the VR content of the subject according to the movement characteristics of the naked eye, The configuration includes volumetric capture means, visual volume calculation means, three-dimensional saliency map generation means, object recognition means, importance setting means, importance multiplication means, and visual guidance means.

かかる構成によれば、ボリューメトリックキャプチャ手段は、ボリューメトリックキャプチャにより、ＶＲコンテンツ及び前記ボリューメトリックキャプチャ情報を生成する。
視体積算出手段は、カメラパラメータに含まれる撮影カメラのレンズの中心座標であるレンズ主点と撮影カメラの撮影画角とで表される四角錐状の領域を、撮影カメラの撮影範囲である視体積として算出する。このカメラパラメータには、２次元映像コンテンツの制作者のノウハウの一つである、カメラマンによるカメラワークが反映されている。従って、カメラパラメータから算出した視体積には、２次元映像コンテンツの制作者が視線を誘導したい被写体が含まれることになる。 According to such a configuration, the volumetric capture means generates VR content and the volumetric capture information by volumetric capture.
The visual volume calculation means calculates a quadrangular pyramid-shaped area represented by the lens principal point, which is the center coordinate of the lens of the imaging camera, and the imaging angle of view of the imaging camera, which is included in the camera parameters, as the imaging range of the imaging camera. Calculate as volume. The camera parameters reflect the camera work of the cameraman, which is one of the know-how of the creator of the two-dimensional video content. Therefore, the visual volume calculated from the camera parameters includes the subject that the creator of the two-dimensional video content wishes to guide the line of sight.

３次元顕著性マップ生成手段は、ボリューメトリックキャプチャ情報に基づいて、視体積に含まれる被写体の３次元形状及び表面模様についての３次元顕著性マップを生成する。この３次元顕著性マップは、被写体の３次元形状を平面上の奥行き画像に変換した情報と、表面模様の勾配や色彩を数値化した情報とが含まれている。 The 3D saliency map generating means generates a 3D saliency map of the 3D shape and surface pattern of the subject contained in the visual volume based on the volumetric capture information. This three-dimensional saliency map contains information obtained by converting the three-dimensional shape of the object into a depth image on a plane, and information obtained by quantifying the gradient and color of the surface pattern.

被写体認識手段は、機械学習により、視体積に含まれる被写体の種類を認識する。重要度設定手段は、被写体認識手段が認識した被写体の種類と重要度とを対応付けて設定する。重要度乗算手段は、視体積を複数の分割領域に分割し、被写体の種類毎に設定された重要度を被写体が分割されている分割領域に割り当て、割り当てた分割領域の重要度に、撮影カメラのフォーカス位置から分割領域までの距離が離れる程に小さくなるように予め設定された第１係数と、分割領域がカメラパラメータの焦点深度に基づきフォーカス位置から外れる程に小さくなるように予め設定された第２係数とを乗算する。つまり、重要度乗算手段は、２次元映像コンテンツの制作者が視線を誘導したい被写体の重要度が高くなるように、第１係数及び第２係数を乗算する。 The subject recognition means recognizes the type of subject included in the visual volume by machine learning. The importance setting means associates and sets the type of the subject recognized by the subject recognition means and the importance. The importance multiplication means divides the visual volume into a plurality of divided areas, assigns the importance set for each type of subject to the divided areas in which the subject is divided, and assigns the importance of the divided areas to the photographing camera. A first coefficient that is preset to decrease as the distance from the focus position to the divided area increases, and a first coefficient that is preset to decrease as the divided area deviates from the focus position based on the depth of focus of the camera parameters. Multiply by the second coefficient. That is, the importance multiplication means multiplies the first coefficient and the second coefficient so that the importance of the subject to which the producer of the two-dimensional video content wants to guide the line of sight becomes higher.

視線誘導手段は、重要度乗算手段が乗算した重要度をボリューメトリックキャプチャ情報に反映させたレンダリングパラメータを生成し、レンダリングパラメータをＶＲコンテンツに付加する。つまり、このＶＲコンテンツは、重要度が高い被写体を注視させるようにレンダリングパラメータが付加されているので、視線誘導効果が高くなる。 The visual guidance means generates rendering parameters in which the importance multiplied by the importance multiplication means is reflected in the volumetric capture information, and adds the rendering parameters to the VR content. In other words, this VR content has a rendering parameter added so that a subject with a high degree of importance is gazed at, so the visual guidance effect is enhanced.

ここで、視線誘導効果とは、肉眼の動きの特性により、映像コンテンツの制作者が意図した被写体のＶＲコンテンツに視聴者の視線を誘導（誘目）する映像演出効果のことである。例えば、視線誘導効果は、所望の被写体を強調する、色鮮やかにする、明るくする、又は、視聴者の視線を誘導したくない被写体をぼかす（デフォーカス）ことである。 Here, the line-of-sight guidance effect is a video production effect that guides (attracts) the viewer's line of sight to the VR content of the subject intended by the creator of the video content due to the characteristics of the movement of the naked eye. For example, the visual guidance effect is to emphasize, brighten, or brighten a desired subject, or blur (defocus) a subject that is not desired to guide the viewer's gaze.

なお、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスクなどのハードウェア資源を、前記した映像演出処理装置として動作させるプログラムで実現することもできる。 The present invention can also be implemented by a program that causes hardware resources such as a CPU, memory, and hard disk provided in a computer to operate as the above-described image presentation processing device.

本発明によれば、２次元映像コンテンツの制作者が視線を誘導したい被写体を注視させるように、視線誘導効果が高いＶＲコンテンツを効率的に制作することができる。 According to the present invention, it is possible to efficiently create VR content with a high visual-guiding effect so that a producer of 2D video content can gaze at a subject to which the visual-sight is to be guided.

実施形態に係るＶＲコンテンツ制作システムの概略構成図である。1 is a schematic configuration diagram of a VR content production system according to an embodiment; FIG. 実施形態に係る映像演出処理装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a video effect processing device according to an embodiment; FIG. 実施形態において、視体積の算出を説明する説明図である。FIG. 4 is an explanatory diagram illustrating calculation of a visual volume in the embodiment; 実施形態において、重要度ＤＢを説明する説明図である。FIG. 10 is an explanatory diagram for explaining an importance DB in the embodiment; 実施形態において、注視パラメータの推定を説明する説明図である。FIG. 4 is an explanatory diagram illustrating estimation of a gaze parameter in an embodiment; 図２の映像演出処理装置の動作を示すフローチャートである。3 is a flow chart showing the operation of the image effect processing device of FIG. 2;

（実施形態）
［ＶＲコンテンツ制作システムの構成］
以下、本発明の実施形態について、適宜図面を参照しながら詳細に説明する。
図１を参照し、実施形態に係るＶＲコンテンツ制作システム１の構成について説明する。
図１に示すように、ＶＲコンテンツ制作システム１は、視線誘導効果が高いＶＲコンテンツを制作するものであり、固定カメラ２と、撮影カメラ３と、映像演出処理装置４とを備える。 (embodiment)
[Configuration of VR content production system]
BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
A configuration of a VR content production system 1 according to an embodiment will be described with reference to FIG.
As shown in FIG. 1 , a VR content production system 1 produces VR content with a high visual guidance effect, and includes a fixed camera 2 , a photographing camera 3 , and a video presentation processing device 4 .

固定カメラ２は、後記する撮影カメラ３で撮影した被写体９の動きを正確にトラッキングして、３次元データとしてデジタル化するボリューメトリックキャプチャを行うためのカメラである。この固定カメラ２は、被写体９を撮影した映像を映像演出処理装置４に出力する。この固定カメラ２が撮影した映像には、映像演出処理装置４によりボリューメトリックキャプチャが施される。例えば、固定カメラ２は、図示を省略した撮影スタジオの所定位置に固定されている。なお、図１では、図面を見やすくするために、固定カメラ２を５台図示したが、固定カメラ２の台数は特に限定されない。 The fixed camera 2 is a camera for accurately tracking the movement of a subject 9 photographed by a photographing camera 3 to be described later and performing volumetric capture for digitizing it as three-dimensional data. The fixed camera 2 outputs a video image of the subject 9 to the video presentation processing device 4 . The image captured by the fixed camera 2 is subjected to volumetric capture by the image effect processing device 4. - 特許庁For example, the fixed camera 2 is fixed at a predetermined position in a photography studio (not shown). In FIG. 1, five fixed cameras 2 are illustrated for easy viewing of the drawing, but the number of fixed cameras 2 is not particularly limited.

撮影カメラ３は、被写体９を撮影する一般的な実写カメラであり、被写体９の撮影映像を映像演出処理装置４に出力する。この撮影カメラ３が撮影した映像は、２次元映像コンテンツの制作に用いられる。例えば、図示を省略したカメラマンが撮影カメラ３を操作して、被写体９の撮影を行う。このとき、固定カメラ２と撮影カメラ３とによる撮影を同時に行ってもよい。
なお、撮影カメラ３を仮想カメラとしてもよい。この場合、カメラマンは、カメラパラメータを入力可能なマニピュレータを用いて、仮想カメラを操作して被写体９を撮影する。 The photographing camera 3 is a general live-action camera for photographing the subject 9 and outputs the photographed image of the subject 9 to the image effect processing device 4 . The video captured by this camera 3 is used for producing two-dimensional video content. For example, a cameraman (not shown) operates the photographing camera 3 to photograph the subject 9 . At this time, the fixed camera 2 and the photographing camera 3 may be photographed at the same time.
Note that the photographing camera 3 may be a virtual camera. In this case, the cameraman uses a manipulator capable of inputting camera parameters to operate the virtual camera and shoot the subject 9 .

映像演出処理装置４は、ボリューメトリックキャプチャ情報と、撮影カメラ３のカメラパラメータとを用いて、ボリューメトリックキャプチャで予め生成した被写体９のＶＲコンテンツに視線誘導効果を付加するものである。 The image effect processing device 4 uses the volumetric capture information and the camera parameters of the imaging camera 3 to add a visual guidance effect to the VR content of the subject 9 generated in advance by volumetric capture.

［映像演出処理装置の構成］
図２を参照し、映像演出処理装置４の構成について説明する。
図２に示すように、映像演出処理装置４は、ボリューメトリックキャプチャ手段４０と、カメラパラメータ推定手段４１と、視体積算出手段４２と、３次元顕著性マップ生成手段４３と、被写体認識手段４４と、重要度ラベリング手段（重要度設定手段）４５と、注視パラメータ推定手段（重要度乗算手段）４６と、視線誘導手段４７とを備える。 [Configuration of image production processing device]
The configuration of the image effect processing device 4 will be described with reference to FIG.
As shown in FIG. 2, the image effect processing device 4 includes volumetric capture means 40, camera parameter estimation means 41, visual volume calculation means 42, three-dimensional saliency map generation means 43, and subject recognition means 44. , importance labeling means (importance setting means) 45 , gaze parameter estimation means (importance multiplication means) 46 , and visual guidance means 47 .

ボリューメトリックキャプチャ手段４０は、ボリューメトリックキャプチャにより、ＶＲコンテンツ及びボリューメトリックキャプチャ情報を生成するものである。本実施形態では、ボリューメトリックキャプチャ手段４０は、各固定カメラ２からの映像に後記するボリューメトリックキャプチャを施すことで、被写体９のＶＲコンテンツ及びボリューメトリックキャプチャ情報を生成する。そして、ボリューメトリックキャプチャ手段４０は、生成したボリューメトリックキャプチャ情報及びＶＲコンテンツをカメラパラメータ推定手段４１及び視線誘導手段４７に出力する。 The volumetric capture means 40 generates VR content and volumetric capture information by volumetric capture. In this embodiment, the volumetric capture means 40 generates VR content and volumetric capture information of the subject 9 by performing volumetric capture, which will be described later, on the video from each fixed camera 2 . Then, the volumetric capture means 40 outputs the generated volumetric capture information and VR contents to the camera parameter estimation means 41 and the line of sight guidance means 47 .

なお、ボリューメトリックキャプチャとは、被写体９の３次元形状及び表面模様（例えば、テクスチャ等の表面特性）を時系列で取得する手法である。また、ボリューメトリックキャプチャ情報とは、ボリューメトリックキャプチャにより取得した被写体９の３次元形状及び表面模様を表す情報のことである。例えば、ボリューメトリックキャプチャとしては、参考文献１に記載の手法があげられる。
参考文献１：”4D Views”、［online］、株式会社クレッセント、［令和１年５月１７日検索］、インターネット〈URL：https://www.crescentinc.co.jp/company/〉 Note that volumetric capture is a method of acquiring the three-dimensional shape and surface pattern (for example, surface characteristics such as texture) of the subject 9 in time series. The volumetric capture information is information representing the three-dimensional shape and surface pattern of the subject 9 obtained by volumetric capture. For example, volumetric capture includes the method described in Reference 1.
Reference 1: “4D Views”, [online], Crescent Inc., [searched on May 17, 2019], Internet <URL: https://www.crescentinc.co.jp/company/>

カメラパラメータ推定手段４１は、カメラキャリブレーションにより、撮影カメラ３のカメラパラメータを推定するものである。本実施形態では、カメラパラメータ推定手段４１は、撮影カメラ３からの撮影映像に一般的なカメラキャリブレーションを施すことで、撮影カメラ３のカメラパラメータを推定する。例えば、カメラキャリブレーションの手法としては、特開２０１１－１１８７２４号公報、及び、特開２０１４－１２７０６８号公報に記載の手法があげられるため、詳細な説明を省略する。そして、カメラパラメータ推定手段４１は、推定したカメラパラメータ及び撮影映像を視体積算出手段４２に出力する。 The camera parameter estimation means 41 estimates the camera parameters of the photographing camera 3 by camera calibration. In this embodiment, the camera parameter estimating means 41 estimates the camera parameters of the imaging camera 3 by performing general camera calibration on the image captured by the imaging camera 3 . For example, as a method of camera calibration, there are methods described in JP-A-2011-118724 and JP-A-2014-127068, so detailed description thereof will be omitted. Then, the camera parameter estimation means 41 outputs the estimated camera parameters and the captured image to the visual volume calculation means 42 .

なお、カメラパラメータは、カメラマンにより操作されている撮影カメラ３の位置及び姿勢、撮影画角を表しており、例えば、パン、チルト、ズーム、フォーカス位置、アイリス、レンズ主点の位置が含まれている。つまり、カメラパラメータには、２次元映像コンテンツの制作者のノウハウの一つである、カメラマンによるカメラワークが反映されていると考えられる。 Note that the camera parameters represent the position and orientation of the photographing camera 3 operated by the cameraman and the photographing angle of view, and include, for example, pan, tilt, zoom, focus position, iris, and the position of the lens principal point. there is In other words, it is considered that the camera parameters reflect the camera work of the cameraman, which is one of the know-how of the creator of the two-dimensional video content.

ここで、カメラパラメータ推定手段４１は、撮影カメラ３が仮想カメラの場合、仮想カメラを操作するマニピュレータの操作結果に基づいてカメラパラメータを推定してもよい。この場合、カメラパラメータ推定手段４１は、カメラパラメータ及びボリューメトリックキャプチャ情報を用いて、仮想カメラで撮影した映像をレンダリングし、仮想カメラのビューファインダ映像として、カメラマンに提示してもよい。 Here, when the photographing camera 3 is a virtual camera, the camera parameter estimation means 41 may estimate camera parameters based on the operation result of a manipulator that operates the virtual camera. In this case, the camera parameter estimating means 41 may use the camera parameters and the volumetric capture information to render the video captured by the virtual camera and present it to the cameraman as the viewfinder video of the virtual camera.

視体積算出手段４２は、カメラパラメータ推定手段４１が推定したカメラパラメータのレンズ主点及び撮影画角で表される四角錐状の領域を、撮影カメラ３の撮影範囲である視体積として算出するものである。この視体積は、３次元空間でどのエリアが撮影対象となっているかを表すボリューム情報である。図３に示すように、視体積Ｖは、レンズ主点Ｖ_Ｔが頂点となり、水平撮影画角θ_Ｈ及び垂直撮影画角θ_Ｖに応じたサイズとなる。また、底面Ｖ_Ｂは、奥行き方向で被写体９の背面まで視体積Ｖに収まるように、撮影カメラ３のフォーカス位置の近傍に設定される。従って、視体積Ｖには、２次元映像コンテンツの制作者が視線を誘導したい被写体９（例えば、人物の顔）が含まれている。
その後、視体積算出手段４２は、算出した視体積Ｖを３次元顕著性マップ生成手段４３及び被写体認識手段４４に出力する。 The visual volume calculating means 42 calculates a quadrangular pyramid-shaped area represented by the lens principal point and the imaging angle of view of the camera parameters estimated by the camera parameter estimating means 41 as the visual volume, which is the photographing range of the photographing camera 3. is. This visual volume is volume information that indicates which area is the shooting target in the three-dimensional space. As shown in FIG. 3, the visual volume V has a vertex at the lens principal point _VT , and has a size corresponding to the horizontal photographing angle of view _θH and the vertical photographing angle of view _θV . Further, the bottom surface _VB is set near the focus position of the photographing camera 3 so that the back surface of the subject 9 can be accommodated in the visual volume V in the depth direction. Therefore, the visual volume V includes the subject 9 (for example, a person's face) to which the creator of the two-dimensional video content wants to guide the line of sight.
After that, the visual volume calculating means 42 outputs the calculated visual volume V to the three-dimensional saliency map generating means 43 and the object recognizing means 44 .

３次元顕著性マップ生成手段４３は、ボリューメトリックキャプチャ手段４０からのボリューメトリックキャプチャ情報を参照し、３次元顕著性マップを生成するものである。図３に示すように、３次元顕著性マップ生成手段４３は、視体積算出手段４２が算出した視体積Ｖに含まれる被写体９について、３次元顕著性マップを生成する。そして、３次元顕著性マップ生成手段４３は、生成した３次元顕著性マップを注視パラメータ推定手段４６に出力する。
なお、図３では、被写体９の全体のうち、視体積Ｖに含まれる被写体９の領域を実線で図示し、視体積Ｖに含まれない被写体９の領域を破線で図示した。 The 3D saliency map generation means 43 refers to the volumetric capture information from the volumetric capture means 40 and generates a 3D saliency map. As shown in FIG. 3 , the three-dimensional saliency map generating means 43 generates a three-dimensional saliency map for the subject 9 included in the visual volume V calculated by the visual volume calculating means 42 . The three-dimensional saliency map generating means 43 then outputs the generated three-dimensional saliency map to the gaze parameter estimating means 46 .
In FIG. 3, of the entire subject 9, the area of the subject 9 included in the visual volume V is illustrated with solid lines, and the area of the subject 9 not included in the visual volume V is illustrated with broken lines.

３次元顕著性マップは、被写体９の３次元形状を平面の奥行き画像に変換した情報と、被写体９の表面模様を対象として、その勾配や色彩の目立ちやすさを数値化した情報とを表している。すなわち、３次元顕著性マップは、輝度値、色空間及び勾配方向という３つの特徴マップを生成し、各特徴マップから算出した注目度を表している。ここで、３次元顕著性マップは、視体積Ｖに含まれる被写体９毎に生成される。 The three-dimensional saliency map represents information obtained by converting the three-dimensional shape of the subject 9 into a planar depth image, and information obtained by quantifying the conspicuity of the gradient and color of the surface pattern of the subject 9. there is That is, the three-dimensional saliency map represents the degree of attention calculated from each feature map by generating three feature maps of luminance value, color space, and gradient direction. Here, a three-dimensional saliency map is generated for each subject 9 included in the visual volume V. FIG.

なお、２次元顕著性マップの一例は、参考文献２及び、参考文献３に詳細に記載されており、３次元顕著性マップも同様の手順で生成できるため、これ以上の説明を省略する。
参考文献２：Itti ,Koch, “A saliency-based search mechanism for overt and covert shifts of visual attention”, Vision Research, 40(2000), 1489-1506
参考文献３：ディジタル画像処理［改定新版］、公益社団法人画像情報教育振興協会、２０１５年３月９日、２４４頁－２４６頁 An example of the 2D saliency map is described in detail in References 2 and 3, and the 3D saliency map can be generated in the same procedure, so further explanation is omitted.
Reference 2: Itti, Koch, "A saliency-based search mechanism for overt and covert shifts of visual attention", Vision Research, 40(2000), 1489-1506
Reference 3: Digital Image Processing [Revised New Edition], Association for the Promotion of Image Information Education, March 9, 2015, pp.244-246

被写体認識手段４４は、機械学習により、視体積算出手段４２からの視体積Ｖに含まれる被写体９の種類を認識するものである。この被写体の種類は、２次元映像コンテンツの台本に含まれる被写体９の種類を表しており、例えば、主役、主役の顔、脇役、エキストラ等である。例えば、被写体認識手段４４は、ボリューメトリックキャプチャ情報を参照し、被写体９の３次元形状及び表面模様を対象として、各被写体９を上下左右前後の６方向から２次元映像にレンダリングする。そして、被写体認識手段４４は、２次元映像としてレンダリングされた被写体９の種類を機械学習により認識する。なお、機械学習の手法としては、参考文献３に記載された手法があげられる。
参考文献３：Joseph Redmon, Ali Farhadi ,”YOLOv3: An Incremental Improvement“,2018.4.8 The subject recognition means 44 recognizes the type of the subject 9 included in the visual volume V from the visual volume calculation means 42 by machine learning. The type of subject indicates the type of subject 9 included in the script of the two-dimensional video content, such as the main character, main character's face, supporting characters, extras, and the like. For example, the subject recognition means 44 refers to the volumetric capture information, targets the three-dimensional shape and surface pattern of the subject 9, and renders each subject 9 into a two-dimensional image from six directions, up, down, left, right, front and back. Then, the subject recognition means 44 recognizes the type of the subject 9 rendered as a two-dimensional image by machine learning. Note that the method described in reference 3 can be cited as a method of machine learning.
Reference 3: Joseph Redmon, Ali Farhadi,”YOLOv3: An Incremental Improvement“,2018.4.8

ここで、被写体認識手段４４は、被写体９の種類が顔である場合、機械学習により、「前方」又は「後方」のように顔の方向も認識してもよい。例えば、被写体認識手段４４は、主役が正面を向いている場合、顔の方向を「前方」と認識する。
その後、被写体認識手段４４は、認識した被写体９の種類を重要度ラベリング手段４５に出力する。 Here, when the type of the subject 9 is a face, the subject recognition means 44 may also recognize the direction of the face such as "forward" or "backward" by machine learning. For example, when the main character faces the front, the subject recognition means 44 recognizes the direction of the face as "forward".
After that, the subject recognizing means 44 outputs the type of the recognized subject 9 to the importance labeling means 45 .

重要度ラベリング手段４５は、被写体認識手段４４からの被写体９の種類と、被写体９の種類毎に予め設定された重要度とをラベリング（対応付ける）ものである。さらに、重要度ラベリング手段４５は、被写体９の種類が顔である場合、顔に視聴者の視線が集まりやすいので、被写体認識手段４４が認識した顔の方向と重要度とをラベリングする。つまり、重要度ラベリング手段４５は、図４に示すように、被写体９の種類と、被写体９の方向と、重要度とを対応付けた重要度ＤＢを生成する。そして、重要度ラベリング手段４５は、生成した重要度ＤＢを注視パラメータ推定手段４６に出力する。 The importance labeling means 45 labels (corresponds to) the type of the subject 9 from the subject recognition means 44 and the degree of importance set in advance for each type of the subject 9 . Furthermore, when the type of subject 9 is a face, the importance labeling means 45 labels the direction and importance of the face recognized by the subject recognizing means 44 because the viewer's gaze tends to focus on the face. That is, the importance labeling unit 45 generates an importance DB in which the type of the subject 9, the direction of the subject 9, and the importance are associated with each other, as shown in FIG. The importance labeling means 45 then outputs the generated importance DB to the gaze parameter estimation means 46 .

「被写体の種類」は、被写体認識手段４４が認識した被写体９の種類を表す。
「被写体の方向」は、被写体９の種類が顔である場合、その顔の方向（例えば、「前方」又は「後方」）を表している。図４の例では、主役の顔の向きが「前方」になっている。
「重要度」は、２次元映像コンテンツにおける被写体９の重要度を表している。この重要度は、被写体９が重要であれば大きな値、被写体９が重要でなければ小さな値になる。ここで、重要度は、２次元映像コンテンツを制作する台本に含まれるワード（例えば、セリフ）に基づいて、手動で設定する。例えば、重要度は、台本に含まれるワードの出現頻度（例えば、ＴＦ－ＩＤＦ）に基づいて設定する。 “Subject type” represents the type of the subject 9 recognized by the subject recognition means 44 .
"Direction of subject" indicates the direction of the face (for example, "forward" or "backward") when the type of subject 9 is a face. In the example of FIG. 4, the direction of the main character's face is "forward".
"Importance" represents the importance of the subject 9 in the two-dimensional video content. This degree of importance takes a large value if the object 9 is important, and a small value if the object 9 is not important. Here, the importance level is manually set based on the words (for example, lines) included in the script for producing the 2D video content. For example, the degree of importance is set based on the appearance frequency of words included in the script (eg, TF-IDF).

注視パラメータ推定手段４６は、３次元顕著性マップ生成手段４３からの３次元顕著性マップ、及び、重要度ラベリング手段４５からの重要度ＤＢに基づいて、以下で説明するように、各ボクセルの注視パラメータ（重要度）を推定するものである。そして、注視パラメータ推定手段４６は、推定した注視パラメータを視線誘導手段４７に出力する。 Based on the 3D saliency map from the 3D saliency map generation means 43 and the importance DB from the importance labeling means 45, the gaze parameter estimation means 46 calculates the gaze parameter of each voxel as described below. It estimates parameters (importance). Then, the gaze parameter estimation means 46 outputs the estimated gaze parameters to the gaze guidance means 47 .

＜注視パラメータの推定＞
図５を参照し、注視パラメータの推定について説明する。
図５に示すように、注視パラメータ推定手段４６は、視体積Ｖを複数のボクセル（分割領域）Ｂに分割する。その結果、視体積Ｖに含まれる被写体９もボクセルＢに分割される。このボクセルＢは、直方体であり、その個数及びサイズが撮影カメラ３の画角（アスペクト比）を基準として任意である。図５の例では、撮影カメラ３の画角を４対３とし、水平方向に４個、垂直方向に３個、奥行き方向に４個、合計４８個のボクセルＢに視体積Ｖを分割している。また、図５の例では、奥行き方向の分割数を水平方向又は垂直方向の分割数の大きい方に合わせている。また、視体積Ｖの底面Ｖ_Ｂは、ボクセルＢに分割した際、撮影カメラ３のフォーカス位置が奥行方向でボクセル空間の中心となるように設定されている。なお、ボクセル空間とは、各ボクセルＢの集合で構成される空間のことである。 <Estimation of gaze parameters>
Estimation of gaze parameters will be described with reference to FIG.
As shown in FIG. 5, the gaze parameter estimator 46 divides the visual volume V into a plurality of voxels (divided regions) B. FIG. As a result, the subject 9 included in the visual volume V is also divided into voxels B. FIG. The voxels B are rectangular parallelepipeds, and the number and size thereof are arbitrary based on the angle of view (aspect ratio) of the photographing camera 3 . In the example of FIG. 5, the angle of view of the imaging camera 3 is 4:3, and the visual volume V is divided into a total of 48 voxels B, 4 in the horizontal direction, 3 in the vertical direction, and 4 in the depth direction. there is Also, in the example of FIG. 5, the number of divisions in the depth direction is adjusted to the larger number of divisions in the horizontal or vertical direction. Further, the bottom surface _VB of the visual volume V is set so that when divided into voxels B, the focus position of the photographing camera 3 is the center of the voxel space in the depth direction. Note that the voxel space is a space composed of a set of voxels B. FIG.

次に、注視パラメータ推定手段４６は、重要度ＤＢに格納されている重要度を、被写体９が分割されているボクセルＢに割り当てる。これにより、ボクセルＢ毎に重要度が推定されることになり、重要度が各ボクセルＢの注視パラメータとなる。 Next, the gaze parameter estimating means 46 assigns the importance stored in the importance DB to the voxel B into which the subject 9 is divided. As a result, the importance is estimated for each voxel B, and the importance becomes the gaze parameter of each voxel B. FIG.

次に、注視パラメータ推定手段４６は、ボクセルＢ毎に重要度に、後記する第１係数及び第２係数を乗算する。つまり、注視パラメータ推定手段４６は、２次元映像コンテンツの制作者が視線を誘導したい被写体９の重要度が高くなるように、第１係数及び第２係数を各ボクセルＢの重要度に乗算する。この第１係数は、撮影カメラ３のフォーカス位置（ボクセル空間の中心）から各ボクセルＢまでの距離が離れる程に小さくなるように予め設定された係数である。具体的には、第１係数は、下記の式（１）に示すように、距離Ｌの二乗に反比例し、この距離ＬはボクセルＢの一辺を１としている。なお、Ｗは、１以上の任意の値で予め設定した重みを表す。
Ｗ／（Ｌ^２＋１） …式（１） Next, the gaze parameter estimating means 46 multiplies the importance of each voxel B by a first coefficient and a second coefficient, which will be described later. That is, the gaze parameter estimating means 46 multiplies the importance of each voxel B by the first coefficient and the second coefficient so that the importance of the subject 9 to which the producer of the two-dimensional video content wants to guide the line of sight is high. This first coefficient is a coefficient set in advance so as to decrease as the distance from the focus position (the center of the voxel space) of the photographing camera 3 to each voxel B increases. Specifically, the first coefficient is inversely proportional to the square of the distance L as shown in the following equation (1), and the distance L is set to 1 on one side of the voxel B. Note that W represents a weight set in advance with an arbitrary value of 1 or more.
W/(L ² +1) Expression (1)

ここで、ボクセルＢに対応する被写体９の種類が顔である場合、下記の式（１－２）及び式（１－３）に示すように、顔の方向に応じて第１係数を設定してもよい。これにより、後記する視線誘導手段４７において、２次元映像コンテンツの映像演出効果である「前空き」を実現できる。なお、Ｗ１及びＷ２は、１以上の任意の値で予め設定した重みを表し、例えば、「前空き」を実現する場合、Ｗ１＞Ｗ２となるように設定する。
前方向：Ｗ１／（Ｌ^２＋１） …式（１－２）
後方向：Ｗ２／（Ｌ^２＋１） …式（１－３） Here, when the type of the subject 9 corresponding to the voxel B is a face, the first coefficient is set according to the direction of the face as shown in Equations (1-2) and (1-3) below. may As a result, in the visual guidance means 47 described later, it is possible to realize the "front space", which is the visual presentation effect of the two-dimensional video content. Note that W1 and W2 represent weights set in advance with arbitrary values of 1 or more.
Forward: W1/(L ² +1) Equation (1-2)
Backward: W2/(L ² +1) Equation (1-3)

また、第２係数は、撮影カメラ３の焦点深度に応じた係数である。具体的には、第２係数は、カメラパラメータの焦点深度に基づいて、ボクセルＢがフォーカス位置から外れる程に小さくなるように予め設定された係数である。すなわち、第２係数は、カメラ光軸上の合焦位置を基準として、撮影カメラ３に近づく方向及び遠ざかる方向の両方で、合焦位置から離れるにつれて小さくなる。 Also, the second coefficient is a coefficient corresponding to the depth of focus of the imaging camera 3 . Specifically, the second coefficient is a coefficient preset based on the depth of focus of the camera parameters so that the voxel B becomes smaller as it deviates from the focus position. That is, the second coefficient becomes smaller with distance from the in-focus position in both the direction toward and away from the photographing camera 3, with the in-focus position on the camera optical axis as a reference.

例えば、第１係数の重みＷ，Ｗ１，Ｗ２については、経験則より「２．０」に設定した。また、例えば、第２係数については、合焦時の解像度に対して、解像度が１／４となる限界位置で「０」とし、合焦位置で「２．０」とし、限界位置から合焦位置までの間を線形補間した値とした。 For example, the weights W, W1, and W2 of the first coefficients are set to "2.0" based on empirical rules. Further, for example, the second coefficient is set to "0" at the limit position where the resolution is 1/4 of the resolution at the time of focusing, and is set to "2.0" at the in-focus position. A value obtained by linearly interpolating between positions.

図２に戻り、映像演出処理装置４の構成ついて説明を続ける。
視線誘導手段４７は、注視パラメータ推定手段４６が推定した重要度（注視パラメータ）を、ボリューメトリックキャプチャ手段４０からのボリューメトリックキャプチャ情報に反映させたレンダリングパラメータを生成するものである。 Returning to FIG. 2, the description of the configuration of the image effect processing device 4 is continued.
The gaze guidance means 47 generates rendering parameters in which the importance (gazing parameters) estimated by the gaze parameter estimating means 46 is reflected in the volumetric capture information from the volumetric capture means 40 .

まず、視線誘導手段４７は、各ボクセルＢの重要度を正規化する。例えば、視線誘導手段４７は、ボクセルＢの重要度を「０」～「１」の値で正規化する。次に、視線誘導手段４７は、ボリューメトリックキャプチャ情報の表面模様（色彩度）と正規化した重要度とを乗算する。これにより、各ボクセルの重要度が高いほど、色が鮮やかになる。さらに、視線誘導手段４７は、３次元空間におけるフォーカス位置と、撮影カメラ３の撮像素子サイズ及びレンズの口径とから、レンダリングに反映させるボケフィルタのカーネルサイズと焦点距離との関係を示す係数を一般的なレンズモデルに基づいて算出し、算出した係数をレンダリングパラメータに反映させる。これにより、各ボクセルＢでデフォーカスが表現される。 First, the line-of-sight guidance means 47 normalizes the importance of each voxel B. As shown in FIG. For example, the line-of-sight guidance means 47 normalizes the importance of voxel B with a value from "0" to "1". Next, the visual guidance means 47 multiplies the surface pattern (color saturation) of the volumetric capture information by the normalized importance. As a result, the higher the importance of each voxel, the brighter the color. Further, the line-of-sight guidance means 47 generally obtains a coefficient indicating the relationship between the kernel size and the focal length of the bokeh filter to be reflected in the rendering from the focus position in the three-dimensional space, the size of the imaging device of the photographing camera 3, and the aperture of the lens. are calculated based on a specific lens model, and the calculated coefficients are reflected in the rendering parameters. Thus, each voxel B expresses defocus.

そして、視線誘導手段４７は、重要度や焦点深度が反映されたレンダリングパラメータをボリューメトリックキャプチャ手段４０からのＶＲコンテンツに付加し、そのＶＲコンテンツを出力する。このように、ＶＲコンテンツにおいて、視線を誘導したい被写体９の色が鮮やかになり、デフォーカスが表現される。 Then, the visual guidance means 47 adds rendering parameters reflecting the degree of importance and depth of focus to the VR content from the volumetric capture means 40, and outputs the VR content. In this way, in the VR content, the color of the subject 9 whose line of sight is to be guided becomes vivid, and defocus is expressed.

［映像演出処理装置の処理］
図６を参照し、映像演出処理装置４の処理について説明する。
図６に示すように、ステップＳ１において、ボリューメトリックキャプチャ手段４０は、ボリューメトリックキャプチャにより、ＶＲコンテンツ及びボリューメトリックキャプチャ情報を生成する。 [Processing of image production processing device]
Processing of the image effect processing device 4 will be described with reference to FIG.
As shown in FIG. 6, in step S1, the volumetric capture means 40 generates VR content and volumetric capture information by volumetric capture.

ステップＳ２において、カメラパラメータ推定手段４１は、カメラキャリブレーションにより、撮影カメラ３のカメラパラメータを推定する。
ステップＳ３において、視体積算出手段４２は、ステップＳ２で推定したカメラパラメータのレンズ主点及び撮影画角で表される四角錐状の領域を、撮影カメラ３の撮影範囲である視体積Ｖとして算出する。 In step S2, the camera parameter estimation means 41 estimates camera parameters of the photographing camera 3 by camera calibration.
In step S3, the visual volume calculation means 42 calculates a quadrangular pyramid-shaped area represented by the lens principal point and the imaging angle of view of the camera parameters estimated in step S2 as the visual volume V, which is the imaging range of the imaging camera 3. do.

ステップＳ４において、３次元顕著性マップ生成手段４３は、ステップＳ１で生成したボリューメトリックキャプチャ情報を参照し、３次元顕著性マップを生成する。
ステップＳ５において、被写体認識手段４４は、機械学習により、ステップＳ３で算出した視体積Ｖに含まれる被写体９の種類を認識する。
ステップＳ６において、重要度ラベリング手段４５は、ステップＳ５で認識した被写体９の種類と、被写体９の種類毎に予め設定された重要度とをラベリングする。 In step S4, the three-dimensional saliency map generating means 43 refers to the volumetric capture information generated in step S1 and generates a three-dimensional saliency map.
In step S5, the subject recognition means 44 recognizes the type of subject 9 included in the visual volume V calculated in step S3 by machine learning.
In step S6, the importance labeling unit 45 labels the type of the subject 9 recognized in step S5 and the degree of importance set in advance for each type of the subject 9. FIG.

ステップＳ７において、注視パラメータ推定手段４６は、３次元顕著性マップ及び重要度ＤＢに基づいて、各ボクセルＢの注視パラメータを推定する。
ステップＳ８において、視線誘導手段４７は、注視パラメータをボリューメトリックキャプチャ情報に反映させたレンダリングパラメータを生成し、生成したレンダリングパラメータをＶＲコンテンツに付加する。 In step S7, the gaze parameter estimation means 46 estimates the gaze parameter of each voxel B based on the 3D saliency map and the importance DB.
In step S8, the visual guidance means 47 generates rendering parameters in which the gaze parameters are reflected in the volumetric capture information, and adds the generated rendering parameters to the VR content.

［作用・効果］
以上のように、映像演出処理装置４は、視線を誘導したい被写体９の色が鮮やかになり、デフォーカスが表現されたＶＲコンテンツを生成する。このようにして、映像演出処理装置４は、２次元映像コンテンツの制作者が視線を誘導したい被写体９を注視させて、視線誘導効果が高いＶＲコンテンツを効率的に制作することができる。すなわち、映像演出処理装置４は、ＶＲコンテンツの制作を効率化するだけでなく、ＶＲコンテンツにおいて、制作者の意図を２次元映像と同レベルで伝えることが可能となる。 [Action/effect]
As described above, the image effect processing device 4 generates VR content in which the color of the subject 9 to which the line of sight is to be guided becomes vivid and the defocus is expressed. In this way, the image effect processing device 4 can make the creator of the two-dimensional image content gaze at the subject 9 to guide the line of sight, and can efficiently create VR content with a high line of sight guidance effect. That is, the video effect processing device 4 not only makes the production of VR content more efficient, but also enables the intention of the creator to be conveyed in the VR content at the same level as in the two-dimensional video.

さらに、映像演出処理装置４は、視聴者の視線が集まりやすい顔については、その方向も重要度に反映させる。これにより、映像演出処理装置４は、２次元映像コンテンツの制作で用いられる映像演出手法である「前空き」をＶＲコンテンツにも適用することができる。 Furthermore, the image effect processing device 4 reflects the direction of the face, which tends to attract the viewer's gaze, to the degree of importance. As a result, the image rendering processing device 4 can apply the "front blank", which is the image rendering method used in the production of two-dimensional image content, to VR content.

（変形例）
以上、本発明の実施形態を詳述してきたが、本発明は前記した実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 (Modification)
Although the embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and includes design changes and the like without departing from the gist of the present invention.

前記した実施形態において、カメラパラメータに含まれていないレンズ歪を考慮してもよい。つまり、レンズ歪については、レンズ歪モデルを予め選択しておき、レンズ歪に関連するカメラパラメータを用いて、レンズ歪モデルからレンズ歪み係数を設定する。そして、視体積算出手段が、設定されたレンズ歪係数が反映された撮影画角、及び、レンズ主点で表される四角錐状の領域を、視体積として算出する。これにより、レンズ歪の影響を受けやすい超広角レンズで撮影する場合でも、視体積を正確に求められるので、視線を誘導したい被写体を正確に注視させることができる。 In the embodiments described above, lens distortions not included in the camera parameters may be taken into account. That is, for lens distortion, a lens distortion model is selected in advance, and a lens distortion coefficient is set from the lens distortion model using camera parameters related to lens distortion. Then, the visual volume calculation means calculates, as the visual volume, a quadrangular pyramid-shaped area represented by the photographing angle of view reflecting the set lens distortion coefficient and the lens principal point. As a result, even when shooting with an ultra-wide-angle lens that is susceptible to lens distortion, the visual volume can be obtained accurately, so the subject to which the line of sight is to be guided can be accurately gazed at.

前記した実施形態において、ボリュームダイアルやＧＵＩ（Graphical User Interface）を介して、ユーザが手動で第１係数の重みを調整してもよい。また、被写体の種類及び第１係数をニューラルネットワークで学習し、学習した識別器を用いて、認識した被写体の種類に応じた第１係数を推定してもよい。 In the embodiment described above, the user may manually adjust the weight of the first coefficient via a volume dial or GUI (Graphical User Interface). Alternatively, the subject type and the first coefficient may be learned by a neural network, and the first coefficient corresponding to the recognized subject type may be estimated using the learned discriminator.

視線誘導効果は、前記した実施形態に限定されない。例えば、重要度に基づいて所定の基準値を超えるボクセルを内包するバウンディングボックスを設定し、そのバウンディングボックスをＶＲコンテンツに含めてもよい。この場合、視聴者の視点に対応させて、バウンディングボックスを矩形実線で描画することで、注視させたい領域を明示（強調）してもよい。 The visual guidance effect is not limited to the embodiment described above. For example, a bounding box that includes voxels exceeding a predetermined reference value may be set based on the degree of importance, and the bounding box may be included in the VR content. In this case, by drawing a bounding box with a rectangular solid line corresponding to the viewer's viewpoint, the area desired to be focused on may be specified (emphasized).

前記した各実施形態では、映像演出処理装置を独立したハードウェアとして説明したが、本発明は、これに限定されない。例えば、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した映像演出処理装置として動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ－ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 In each of the above-described embodiments, the image effect processing device has been described as independent hardware, but the present invention is not limited to this. For example, the present invention can also be realized by a program that causes hardware resources such as a CPU, memory, and hard disk provided in a computer to operate as the above-described image presentation processing device. These programs may be distributed via a communication line, or may be distributed after being written in a recording medium such as a CD-ROM or flash memory.

１ＶＲコンテンツ制作システム
２固定カメラ
３撮影カメラ
４映像演出処理装置
４０ボリューメトリックキャプチャ手段
４１カメラパラメータ推定手段
４２視体積算出手段
４３３次元顕著性マップ生成手段
４４被写体認識手段
４５重要度ラベリング手段（重要度設定手段）
４６注視パラメータ推定手段（重要度乗算手段）
４７視線誘導手段 1 VR content production system 2 Fixed camera 3 Shooting camera 4 Video effect processing device 40 Volumetric capture means 41 Camera parameter estimation means 42 Visual volume calculation means 43 3D saliency map generation means 44 Object recognition means 45 Importance labeling means (important degree setting means)
46 gaze parameter estimation means (importance multiplication means)
47 Line-of-sight guidance means

Claims

A camera of a photography camera that generates volumetric capture information consisting of a three-dimensional shape and surface pattern of a subject by volumetric capture that digitizes the movement of the subject into three-dimensional data, and photographs the volumetric capture information and the subject. A video effect processing device that uses a parameter to add a visual guidance effect that guides the VR content of the subject according to the movement characteristics of the naked eye,
volumetric capture means for generating the VR content and the volumetric capture information from the volumetric capture;
A quadrangular pyramid-shaped area represented by the lens principal point, which is the center coordinate of the lens of the imaging camera, and the imaging angle of view of the imaging camera, which is included in the camera parameters, is defined as a visual volume, which is the imaging range of the imaging camera. a visual volume calculating means for calculating;
3D saliency map generating means for generating a 3D saliency map of a 3D shape and surface texture of an object contained in the visual volume based on the volumetric capture information;
subject recognition means for recognizing the type of subject included in the visual volume by machine learning;
importance level setting means for setting the type of the subject recognized by the subject recognition means and the level of importance in association with each other;
dividing the visual volume into a plurality of divided areas, assigning the importance set for each type of the subject to the divided areas in which the subject is divided, and assigning the assigned importance to the divided areas of the photographing camera; a first coefficient set in advance so that the distance from the focus position to the divided area becomes smaller as the distance increases; importance multiplication means for multiplying the set second coefficient;
visual guidance means for generating rendering parameters in which the importance multiplied by the importance multiplication means is reflected in the volumetric capture information, and adding the rendering parameters to the VR content;
A video presentation processing device comprising:

camera parameter estimation means for estimating the camera parameters by camera calibration ;
2. The video presentation processing device according to claim 1, further comprising:

The subject recognition means recognizes a face as the subject and also recognizes the direction of the face by the machine learning,
3. The image effect processing apparatus according to claim 2, wherein said importance level setting means sets the direction of the face recognized by said subject recognition means in association with said importance level.

The visual volume calculation means calculates, as the visual volume, a quadrangular pyramid-shaped region represented by the photographing angle of view and the lens principal point in which a preset lens distortion coefficient is reflected. The image effect processing device according to any one of claims 1 to 3.

A program for causing a computer to function as the image effect processing device according to any one of claims 1 to 4.