JP2019145894A

JP2019145894A - Image processing device, image processing method, and program

Info

Publication number: JP2019145894A
Application number: JP2018025822A
Authority: JP
Inventors: 洋佑高田; Yosuke Takada
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-02-16
Filing date: 2018-02-16
Publication date: 2019-08-29

Abstract

To enable a user to grasp in advance the predicted image quality of a generated virtual viewpoint video in the route setting of a virtual camera.SOLUTION: An image processing device for generating a virtual viewpoint image from a plurality of viewpoint images photographed by a plurality of cameras includes acquisition means for acquiring the plurality of viewpoint images, a GUI for receiving user setting about the parameter of the virtual camera corresponding to the virtual viewpoint image, and generation means for generating the virtual viewpoint image by using the plurality of viewpoint images on the basis of the set parameter. Information about the image quality of the virtual viewpoint image for assisting the user setting is displayed on the GUI.SELECTED DRAWING: Figure 3

Description

本発明は、仮想視点映像生成時における仮想カメラの経路を設定する技術に関する。 The present invention relates to a technique for setting a path of a virtual camera when generating a virtual viewpoint video.

複数台の実カメラで撮影した映像を用いて、3次元空間内に仮想的に配置した実際には存在しないカメラ（仮想カメラ）からの映像を再現する技術として、仮想視点映像生成技術がある。仮想視点映像生成技術は、スポーツ放送等において、より臨場感の高い映像表現として期待されているが、どのような視点からでも高画質の画像が得られるというわけではなく、その原理上の制約が存在する。例えば、実カメラで撮影された複数視点映像間の死角によって、仮想カメラからの映像を再現できない領域が存在する。また、オブジェクトから一定の距離よりも近い位置に仮想カメラを設定した場合には、実カメラによる撮影画像を拡大して描画することになるので、オブジェクトの解像度合いが低下したボケた映像となる。これらを考慮せずに仮想カメラの経路（時間軸に沿った仮想カメラの位置移動）を設定すると、得られる仮想視点映像は低画質なものになってしまう。そのため、出来上がった仮想視点映像をプレビュー画面等で確認し、満足のいく画像でなければ、仮想カメラの経路設定をやり直すことになる。 There is a virtual viewpoint video generation technology as a technology for reproducing video from a camera (virtual camera) that does not actually exist and is virtually arranged in a three-dimensional space using videos taken by a plurality of real cameras. Virtual viewpoint video generation technology is expected as a more realistic video expression in sports broadcasting etc., but it does not mean that high quality images can be obtained from any viewpoint, and there are limitations in principle. Exists. For example, there is an area where the image from the virtual camera cannot be reproduced due to the blind spot between the multiple viewpoint images captured by the real camera. Further, when the virtual camera is set at a position closer than a certain distance from the object, an image captured by the real camera is enlarged and drawn, resulting in a blurred image in which the resolution of the object is reduced. If the path of the virtual camera (position movement of the virtual camera along the time axis) is set without taking these into consideration, the obtained virtual viewpoint video has low image quality. Therefore, the completed virtual viewpoint video is confirmed on the preview screen or the like, and if the image is not satisfactory, the route setting of the virtual camera is performed again.

この点、特許文献１には、実カメラで撮影した複数視点映像間の死角の情報を2Dマップ上に可視化してユーザに提示し、実際に仮想視点映像を生成せずとも、事前に死角がどこにあるのかをユーザが確認できるようにする技術が提案されている。 In this regard, in Patent Document 1, information on the blind spot between multiple viewpoint videos captured by a real camera is visualized on a 2D map and presented to the user, and the blind spot is generated in advance without actually generating a virtual viewpoint picture. Techniques have been proposed that allow a user to check where it is.

特開２０１１−１７２１６９号公報JP 2011-172169 A

上記特許文献１の手法では、ユーザが事前に確認可能なのは死角情報のみで、生成される仮想視点映像の画質がどのようになるのかまでは事前に確認することはできなかった。そこで本発明では、仮想カメラの経路設定の時点で、仮想視点映像の予想される画質をユーザが事前に把握できるようにすることを目的とする。 In the method of Patent Document 1, only the blind spot information can be confirmed in advance by the user, and the image quality of the generated virtual viewpoint video cannot be confirmed in advance. Therefore, an object of the present invention is to allow a user to know in advance the expected image quality of a virtual viewpoint video at the time of setting a route of a virtual camera.

本発明に係る画像処理装置は、複数のカメラで撮影した複数視点画像から仮想視点画像を生成する画像処理装置であって、前記複数視点画像を取得する取得手段と、前記仮想視点画像に対応する仮想カメラのパラメータをユーザが設定するためのＧＵＩと、設定された前記パラメータに基づき、前記複数視点画像を用いて前記仮想視点画像を生成する生成手段と、を備え、前記ＧＵＩには、前記仮想視点画像の画質に関する情報が表示されることを特徴とする。 An image processing apparatus according to the present invention is an image processing apparatus that generates a virtual viewpoint image from a plurality of viewpoint images captured by a plurality of cameras, and corresponds to the acquisition means for acquiring the plurality of viewpoint images and the virtual viewpoint image. A GUI for a user to set parameters of the virtual camera; and a generation unit that generates the virtual viewpoint image using the plurality of viewpoint images based on the set parameter, the GUI including the virtual camera Information on the image quality of the viewpoint image is displayed.

本発明によれば、ユーザは、仮想カメラの経路設定時において、生成される仮想視点映像の画質を把握することができる。 According to the present invention, the user can grasp the image quality of the generated virtual viewpoint video when setting the route of the virtual camera.

仮想視点映像システムの構成の一例を示す図The figure which shows an example of a structure of a virtual viewpoint video system （ａ）は２種類のカメラ群からなる撮像システムにおけるカメラ配置の一例を示した図、（ｂ）はズームカメラの撮影領域を示す図、（ｃ）は広角カメラの撮影領域を示す図(A) is a diagram illustrating an example of a camera arrangement in an imaging system including two types of camera groups, (b) is a diagram illustrating a shooting area of a zoom camera, and (c) is a diagram illustrating a shooting area of a wide-angle camera. 画像処理装置の論理構成の一例を示す図The figure which shows an example of the logical structure of an image processing apparatus 仮想視点映像が生成されるまでの全体の流れを示したフローチャートFlow chart showing the overall flow until the virtual viewpoint video is generated 画質設定処理の詳細を示すフローチャートFlow chart showing details of image quality setting process カメラの設置位置の一例を示す図The figure which shows an example of the installation position of a camera カメラとその撮影範囲の幾何条件を示す図Diagram showing the geometric conditions of the camera and its shooting range （ａ）及び（ｂ）は、画質パラメータ設定用のＵＩ画面の一例を示す図(A) And (b) is a figure which shows an example of UI screen for an image quality parameter setting （ａ）は画像サンプルの説明図、（ｂ）は画像サンプルを利用した画質パラメータ設定用のＵＩ画面の一例を示す図(A) is explanatory drawing of an image sample, (b) is a figure which shows an example of UI screen for the image quality parameter setting using an image sample （ａ）は実施形態１に係る仮想カメラ設定処理の流れを示すフローチャート、（ｂ）は仮想カメラパラメータ設定用のＧＵＩの表示処理の詳細を示すフローチャート(A) is a flowchart showing the flow of virtual camera setting processing according to the first embodiment, (b) is a flowchart showing details of GUI display processing for virtual camera parameter setting. 仮想カメラパラメータ設定用のＵＩ画面の一例を示す図The figure which shows an example of UI screen for a virtual camera parameter setting カメラ群毎の共通撮影領域の導出方法を説明する図The figure explaining the derivation method of the common photography field for every camera group フィールドマップ上にカメラ群毎の共通撮影領域を投影した図Projected common shooting area for each camera group on the field map 仮想カメラパラメータ設定用のＵＩ画面の一例を示す図The figure which shows an example of UI screen for a virtual camera parameter setting 空間速度コントラスト感度関数を説明するグラフGraph explaining space velocity contrast sensitivity function 実施形態２に係る、仮想カメラ設定処理の流れを示すフローチャートThe flowchart which shows the flow of the virtual camera setting process based on Embodiment 2. FIG. 仮想カメラパラメータ設定用のＵＩ画面の一例を示す図The figure which shows an example of UI screen for a virtual camera parameter setting 仮想カメラについての推奨される移動速度を説明するグラフA graph describing the recommended travel speed for a virtual camera

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and all the combinations of features described in the present embodiment are not necessarily essential to the solution means of the present invention.

Embodiment 1

図１は、本実施形態における、仮想視点映像システムの構成の一例を示す図である。なお、仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、自由視点画像や任意視点画像などとも呼ばれる。また、仮想視点画像は、動画であっても、静止画であっても良い。本実施形態では、仮想視点画像が動画である場合の例を中心に説明する。図１に示す仮想視点映像システムは、画像処理装置１００と2種類のカメラ群１０９及び１１０とで構成される。そして、画像処理装置１００は、ＣＰＵ１０１、メインメモリ１０２、記憶部１０３、入力部１０４、表示部１０５、外部Ｉ／Ｆ１０６を備え、各部がバス１０７を介して接続されている。まず、ＣＰＵ１０１は、画像処理装置１００を統括的に制御する演算処理装置であり、記憶部１０３等に格納された各種プログラムを実行して様々な処理を行う。メインメモリ１０２は、各種処理で用いるデータやパラメータなどを一時的に格納するほか、ＣＰＵ１０１に作業領域を提供する。記憶部１０３は、各種プログラムやＧＵＩ（グラフィカル・ユーザ・インターフェイス）表示に必要な各種データを記憶する大容量記憶装置で、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。入力部１０４は、キーボードやマウス、電子ペン、タッチパネル等の装置であり、様々なユーザ入力を受け付ける。表示部１０５は、液晶パネルなどで構成され、自由視点映像生成時の仮想カメラの経路設定のためのＵＩ画面の表示などを行う。外部Ｉ／Ｆ部１０６は、カメラ群１０９及び１１０を構成する各カメラとネットワーク（ここではＬＡＮ１０８）を介して接続され、映像データや制御信号データの送受信を行う。バス１０７は上述の各部を接続し、データ転送を行う。 FIG. 1 is a diagram illustrating an example of a configuration of a virtual viewpoint video system in the present embodiment. Note that a virtual viewpoint image is a video generated by an end user and / or a selected operator who freely manipulates the position and orientation of a virtual camera, and is also referred to as a free viewpoint image or an arbitrary viewpoint image. The virtual viewpoint image may be a moving image or a still image. In the present embodiment, an example where the virtual viewpoint image is a moving image will be mainly described. The virtual viewpoint video system shown in FIG. 1 includes an image processing apparatus 100 and two types of camera groups 109 and 110. The image processing apparatus 100 includes a CPU 101, a main memory 102, a storage unit 103, an input unit 104, a display unit 105, and an external I / F 106, and each unit is connected via a bus 107. First, the CPU 101 is an arithmetic processing device that controls the image processing apparatus 100 in an integrated manner, and executes various programs by executing various programs stored in the storage unit 103 or the like. The main memory 102 temporarily stores data and parameters used in various processes and provides a work area for the CPU 101. The storage unit 103 is a large-capacity storage device that stores various programs and various data necessary for GUI (graphical user interface) display. For example, a nonvolatile memory such as a hard disk or a silicon disk is used. The input unit 104 is a device such as a keyboard, mouse, electronic pen, or touch panel, and accepts various user inputs. The display unit 105 is configured by a liquid crystal panel or the like, and displays a UI screen for setting a route of the virtual camera when generating a free viewpoint video. The external I / F unit 106 is connected to each camera constituting the camera groups 109 and 110 via a network (here, the LAN 108), and transmits and receives video data and control signal data. A bus 107 connects the above-described units and performs data transfer.

上記２種類のカメラ群は、それぞれズームカメラ群１０９と広角カメラ群１１０である。ズームカメラ群１０９は、画角の狭いレンズ（例えば１０度）を搭載した複数台のカメラで構成されている。広角カメラ群１１０は、画角の広いレンズ（例えば４５度）を搭載した複数台のカメラで構成されている。そして、ズームカメラ群１０９及び広角カメラ群１１０を構成している各カメラはＬＡＮ１０８経由で画像処理装置１００に接続されている。また、ズームカメラ群１０９及び広角カメラ群１１０は画像処理装置１００からの制御信号をもとに、撮影の開始と停止、カメラ設定（シャッタースピード、絞りなど）の変更、撮影した映像データの転送を行う。 The two types of camera groups are a zoom camera group 109 and a wide-angle camera group 110, respectively. The zoom camera group 109 is composed of a plurality of cameras equipped with lenses having a narrow angle of view (for example, 10 degrees). The wide-angle camera group 110 includes a plurality of cameras equipped with lenses having a wide angle of view (for example, 45 degrees). Each camera constituting the zoom camera group 109 and the wide-angle camera group 110 is connected to the image processing apparatus 100 via the LAN 108. The zoom camera group 109 and the wide-angle camera group 110 also start and stop shooting, change camera settings (shutter speed, aperture, etc.), and transfer shot video data based on control signals from the image processing apparatus 100. Do.

なお、システム構成については、上記以外にも、様々な構成要素が存在するが、本発明の主眼ではないので、その説明は省略する。 In addition to the above, there are various components of the system configuration, but the description is omitted because it is not the main point of the present invention.

図２（ａ）は、ズームカメラ群１０９と広角カメラ群１１０のカメラ配置の一例を示した図である。フィールド２０１は例えばサッカー等を行うスタジアム内のグラウンド面であり、その上にはオブジェクト（被写体）２０２としての選手が存在している。そして、ズームカメラ群１０９を構成する１２台のズームカメラ２０３と広角カメラ群１１０を構成する１２台の広角カメラ２０４がフィールド２０１を取り囲むように配置されている。図２（ｂ）において点線で囲まれた領域２１３は、ズームカメラ２０３の撮影領域を示している。ズームカメラ２０３は画角が狭いため撮影領域は狭いが、オブジェクト２０２を撮影した際の解像度合いは高いという特性を持つ。また、図２（ｃ）において点線で囲まれた領域２１４は、広角カメラ２０４の撮影領域を示している。広角カメラ２０４は画角が広いため撮影領域は広いが、オブジェクト２０２を撮影した際の解像度合いは低いという特性を持つ。なお、図２（ｂ）では撮影領域の形状を便宜的に楕円で示しているが、後述のとおり各カメラの実際の撮影領域の形状は矩形であるのが一般的である。 FIG. 2A is a diagram illustrating an example of the camera arrangement of the zoom camera group 109 and the wide-angle camera group 110. A field 201 is a ground surface in a stadium where soccer or the like is performed, for example, and a player as an object (subject) 202 exists thereon. Then, twelve zoom cameras 203 constituting the zoom camera group 109 and twelve wide-angle cameras 204 constituting the wide-angle camera group 110 are arranged so as to surround the field 201. An area 213 surrounded by a dotted line in FIG. 2B indicates a shooting area of the zoom camera 203. Although the zoom camera 203 has a narrow angle of view so that the shooting area is narrow, the zoom camera 203 has a high resolution when shooting the object 202. In addition, an area 214 surrounded by a dotted line in FIG. 2C indicates an imaging area of the wide-angle camera 204. The wide-angle camera 204 has a wide angle of view and thus a wide shooting area, but has a characteristic that the resolution when shooting the object 202 is low. In FIG. 2B, the shape of the shooting area is indicated by an ellipse for the sake of convenience. However, as will be described later, the actual shooting area shape of each camera is generally rectangular.

続いて、動画の複数視点画像（以下、複数視点映像）から動画の仮想視点画像（以下、仮想視点映像）を生成する際の処理を例に、画像処理装置１００の構成等について詳しく説明する。図３は、画像処理装置１００の論理構成の一例を示す図である。また、図４は、画像処理装置１００における、仮想視点映像が生成されるまでの全体の流れを示したフローチャートである。図４に示す一連の処理は、ＣＰＵ１０１が、所定のプログラムを記憶部１０３から読み込んでメインメモリ１０２に展開し、これをＣＰＵ１０１が実行することで実現される。なお、各処理の冒頭における記号「Ｓ」はステップを意味する。
Ｓ４０１では、ズームカメラ群１０９及び広角カメラ群１１０のそれぞれが同期撮影を行って得た複数視点の映像データが画像処理装置１００に入力される。入力された複数視点（ここでは各1２視点）の映像データは、メインメモリ１０２に展開される。 Next, the configuration and the like of the image processing apparatus 100 will be described in detail by taking as an example processing for generating a moving image virtual viewpoint image (hereinafter referred to as virtual viewpoint image) from a moving image multiple viewpoint image (hereinafter referred to as multiple viewpoint video). FIG. 3 is a diagram illustrating an example of a logical configuration of the image processing apparatus 100. FIG. 4 is a flowchart showing the overall flow until the virtual viewpoint video is generated in the image processing apparatus 100. The series of processing shown in FIG. 4 is realized by the CPU 101 reading a predetermined program from the storage unit 103 and developing it in the main memory 102, which is executed by the CPU 101. The symbol “S” at the beginning of each process means a step.
In step S <b> 401, video data of a plurality of viewpoints obtained by performing synchronized shooting with the zoom camera group 109 and the wide-angle camera group 110 are input to the image processing apparatus 100. The input video data of a plurality of viewpoints (here, 12 viewpoints) is developed in the main memory 102.

Ｓ４０２では、形状推定部３０１において、Ｓ４０１で取得した複数視点映像データを用いて、撮影シーン内に存在する各オブジェクトの３次元形状推定処理が実行される。推定手法としては、オブジェクトの輪郭情報を用いるVisual-hull手法や、三角測量を用いたMulti-view stereo手法など公知の手法を適用すればよい。これにより、オブジェクトの３次元形状を表すデータ（例えば、ポリゴンデータやボクセルデータ）が生成される。 In S402, the shape estimation unit 301 executes a three-dimensional shape estimation process for each object existing in the shooting scene using the multi-viewpoint video data acquired in S401. As the estimation method, a known method such as a Visual-hull method using the contour information of an object or a Multi-view stereo method using triangulation may be applied. Thereby, data (for example, polygon data and voxel data) representing the three-dimensional shape of the object is generated.

Ｓ４０３では、画質設定部３０３において、生成される仮想視点映像についての画質レベルが設定される。ここで、画質レベルには、解像度レベル、色差レベル、ノイズレベルなどが含まれる。この画質レベルの設定は、例えばＧＵＩを介したユーザ入力に基づいて行う。仮想視点映像の画質レベルとして、ユーザが許容或いは満足できるレベルを予め設定することで、一定の基準を満たす画質の仮想視点映像を生成可能にする。ユーザ入力に基づく画質設定処理の詳細については後述する。なお、ユーザ入力に基づかずに、予め定めておいた例えば推奨される画質レベルが自動で設定されるようにしてもよい。 In S403, the image quality setting unit 303 sets the image quality level for the generated virtual viewpoint video. Here, the image quality level includes a resolution level, a color difference level, a noise level, and the like. The image quality level is set based on, for example, user input via a GUI. By setting in advance a level acceptable or satisfactory for the user as the image quality level of the virtual viewpoint video, it is possible to generate a virtual viewpoint video having an image quality that satisfies a certain standard. Details of image quality setting processing based on user input will be described later. Note that, for example, a recommended image quality level set in advance may be automatically set without being based on user input.

Ｓ４０４では、仮想カメラ設定部３０４において、仮想視点映像の対象タイムフレームにおける仮想カメラの位置・姿勢（仮想カメラパス）や注視する点（仮想注視点パス）といったパラメータが、ＧＵＩを介したユーザ入力に基づき設定される。この際、Ｓ４０３で設定された画質レベルが得られる範囲を示す情報が、仮想カメラのパラメータ設定用ＵＩ画面上に表示される。仮想カメラのパラメータ設定処理に関する詳細は後述する。 In S404, in the virtual camera setting unit 304, parameters such as the position / posture of the virtual camera (virtual camera path) and the point to be watched (virtual gaze point path) in the target time frame of the virtual viewpoint video are input to the user via the GUI. Set based on. At this time, information indicating the range in which the image quality level set in S403 can be obtained is displayed on the parameter setting UI screen of the virtual camera. Details regarding the parameter setting processing of the virtual camera will be described later.

Ｓ４０５では、仮想視点画像生成部３０５において、Ｓ４０４で設定された仮想カメラパラメータに従って、仮想視点映像が生成される。仮想視点映像は、Ｓ４０２の形状推定処理で得られたオブジェクトの３Ｄ形状データを用いて、設定された仮想カメラから見た映像をコンピュータグラフィックスの技術を用いることで生成することができる。この生成処理には公知の技術を適宜適用すればよく、本発明の主眼ではないので説明を省く。 In S405, the virtual viewpoint image generation unit 305 generates a virtual viewpoint video in accordance with the virtual camera parameters set in S404. The virtual viewpoint video can be generated by using a computer graphics technique for the video viewed from the set virtual camera, using the 3D shape data of the object obtained by the shape estimation processing in S402. A publicly-known technique may be appropriately applied to this generation process, and since it is not the main point of the present invention, description thereof will be omitted.

以上が、本実施形態に係る、仮想視点映像が生成されるまでの大まかな流れである。なお、Ｓ４０１で入力された複数視点映像に対して画質を向上させる処理（例えば、超解像処理やエッジ鮮鋭化処理）を施し、得られた複数視点映像を元に仮想視点映像の生成を行なうように構成してもよい。これにより、画質向上処理後の複数視点映像の画質レベルを限界とした仮想視点映像の生成が可能となる。続いて、Ｓ４０３のユーザ入力に基づく画質設定処理、Ｓ４０４の仮想カメラ設定処理の詳細について順に説明する。 The above is a rough flow until the virtual viewpoint video is generated according to the present embodiment. Note that processing (for example, super-resolution processing and edge sharpening processing) for improving the image quality is performed on the multi-view video input in S401, and virtual viewpoint video is generated based on the obtained multi-view video. You may comprise as follows. This makes it possible to generate a virtual viewpoint video with the image quality level of the multi-viewpoint video after the image quality improvement processing as a limit. Next, details of the image quality setting process based on the user input in S403 and the virtual camera setting process in S404 will be described in order.

（画質設定処理）
図５は、画質設定処理の詳細を示すフローチャートの一例である。ここでは、解像度レベルをユーザが指定する場合を例に、画質設定処理を説明することとする。以下、図５のフローに沿って説明する。 (Image quality setting process)
FIG. 5 is an example of a flowchart showing details of the image quality setting process. Here, the image quality setting process will be described by taking as an example the case where the user specifies the resolution level. Hereinafter, it demonstrates along the flow of FIG.

Ｓ５０１では、すべてのカメラ群（ここではズームカメラ群１０９と広角カメラ群１１０の２種類）に属する各カメラのカメラ情報が取得される。カメラ情報には、カメラの設置位置、注視点位置、画角、焦点距離、イメージセンサの画素ピッチと縦横それぞれの総画素数といった情報が含まれる。このカメラ情報は、予め記憶部１０３に保持しておいたものを読み出して取得してもよいし、各カメラ（或いは各カメラ群を代表する１台のカメラ）にアクセスして取得してもよい。 In S501, camera information of each camera belonging to all camera groups (here, two types of zoom camera group 109 and wide-angle camera group 110) is acquired. The camera information includes information such as the camera installation position, gazing point position, angle of view, focal length, pixel pitch of the image sensor, and total number of pixels in the vertical and horizontal directions. This camera information may be acquired by reading out information stored in the storage unit 103 in advance, or may be acquired by accessing each camera (or one camera representing each camera group). .

Ｓ５０２では、Ｓ５０１で取得したカメラ情報のうちカメラの設置位置と注視点位置の情報を用いて、各カメラから注視点までの撮影距離が算出される。この撮影距離d[mm]は、例えば以下の式（１）を用いて求めることができる。 In S502, the shooting distance from each camera to the gazing point is calculated using information on the camera installation position and the gazing point position in the camera information acquired in S501. The photographing distance d [mm] can be obtained using, for example, the following formula (1).

上記式（１）において、（Xc, Yc, Zc）はカメラの設置位置の３次元座標を表し、（Xo, Yo, Zo）は注視点位置の３次元座標を表している。この際の３次元座標の原点は、例えば、サッカー等の試合を行うスタジアムに各カメラを設置する場合であれば、サッカーコート内のセンターマークとする。図６は、座標原点をセンターマーク、各カメラ（図６中の菱形◆）の注視点（図６中の黒丸●）を一方のペナルティマークとした場合の、１つのカメラ群に属するカメラの設置位置（X座標とY座標）の一例を示している。図６において、X座標に対応する横軸、Y座標に対応する縦軸の単位は共に[mm]である。図６は二次元のグラフなので高さを表すZ座標の情報は含まれていないが、Z座標は例えば15000[mm]といった値となる。 In the above formula (1), (Xc, Yc, Zc) represents the three-dimensional coordinates of the camera installation position, and (Xo, Yo, Zo) represents the three-dimensional coordinates of the gazing point position. The origin of the three-dimensional coordinates at this time is, for example, a center mark in a soccer court when each camera is installed in a stadium where a game such as soccer is performed. Fig. 6 shows the installation of cameras belonging to one camera group when the origin of coordinates is the center mark and the gazing point (black circle in Fig. 6) of each camera (diamond in Fig. 6) is one penalty mark. An example of the position (X coordinate and Y coordinate) is shown. In FIG. 6, the unit of the horizontal axis corresponding to the X coordinate and the vertical axis corresponding to the Y coordinate is [mm]. Since FIG. 6 is a two-dimensional graph, information on the Z coordinate representing the height is not included, but the Z coordinate has a value of 15000 [mm], for example.

Ｓ５０２では、Ｓ５０１で算出した撮影距離に基づき、カメラ群毎に解像度が算出される。図７は、カメラ７０１の焦点距離をf[mm]、イメージセンサの横幅をSW[mm]、センサ横方向の総画素数x[pixel]としたときの、カメラ７０１とその撮影範囲７０２の幾何条件を示している。図７に示す幾何条件を前提としたとき、撮影範囲７０２の実空間上での横幅H[mm]は、以下の式（２）で表される。 In S502, the resolution is calculated for each camera group based on the shooting distance calculated in S501. FIG. 7 shows the geometry of the camera 701 and its imaging range 702 when the focal length of the camera 701 is f [mm], the horizontal width of the image sensor is SW [mm], and the total number of pixels x [pixel] in the horizontal direction of the sensor. The conditions are shown. When the geometric condition shown in FIG. 7 is assumed, the horizontal width H [mm] of the imaging range 702 in the real space is expressed by the following equation (2).

上記式（２）において、カメラ７０１の画角θは、以下の式（３）で表される。 In the above equation (2), the angle of view θ of the camera 701 is expressed by the following equation (3).

そして、横幅Hと総画素数xから、実空間上での単位長さ[mm]当たりにおける撮影画像の画素数を表す解像度Rpが得られる。この解像度Rp[pixel/mm]は、以下の式（４）で表される。 The resolution Rp representing the number of pixels of the captured image per unit length [mm] in real space is obtained from the horizontal width H and the total number of pixels x. This resolution Rp [pixel / mm] is expressed by the following equation (4).

上記式（４）において、pは画素ピッチ[mm]を表している。こうして得られる解像度Rpは、０〜１．０の範囲をとり、その値が大きいほど実空間を細かくサンプリングできており、高解像に撮影できていることを表す。そして、算出された各カメラの解像度Rpから、カメラ群毎の解像度が例えばその平均値をとるなどして決定される。なお、仮想視点映像を生成する都合上、各カメラ群に属する各カメラは、同一オブジェクトが同一サイズに撮像されるように焦点距離が調整されている。そのため、同じカメラ群に属するカメラであれば、どのカメラも同程度の解像度Rpが算出されることになる。つまり、ズームカメラ群１０９と広角カメラ群１１０のそれぞれから１種類の代表解像度が算出される。本実施形態の場合、例えば広角カメラ群１１０の代表解像度は０．５程度、ズームカメラ群１０９の代表解像度は１．０程度となり、ズームカメラ群１０９の方が高解像となる。 In the above formula (4), p represents the pixel pitch [mm]. The resolution Rp obtained in this way ranges from 0 to 1.0, and the larger the value, the finer the real space can be sampled, and the higher the resolution can be captured. Then, from the calculated resolution Rp of each camera, the resolution for each camera group is determined, for example, by taking an average value thereof. For convenience of generating the virtual viewpoint video, the focal lengths of the cameras belonging to the camera groups are adjusted so that the same object is imaged in the same size. Therefore, the same resolution Rp is calculated for all cameras belonging to the same camera group. That is, one type of representative resolution is calculated from each of the zoom camera group 109 and the wide-angle camera group 110. In the case of this embodiment, for example, the representative resolution of the wide-angle camera group 110 is about 0.5, the representative resolution of the zoom camera group 109 is about 1.0, and the zoom camera group 109 has higher resolution.

Ｓ５０４では、仮想視点映像においてユーザが特定の解像度（例えば、許容可能な限界解像度）が、画質レベル設定用ＵＩ画面を介したユーザ入力に基づき設定される。図８の（ａ）及び（ｂ）は、ＧＵＩ制御部３０２によって表示部１０５に表示される画質レベル設定用ＵＩ画面の一例を示している。特定の解像度レベルを指定するユーザは、図８（ａ）の状態でボタン８０１〜８０３の中から解像度指定ボタン８０３を押下する。すると、図８（ｂ）の状態へと遷移し、その画面左側にはＳ５０３で算出されたズームカメラ群の代表解像度が得られる範囲８１０と広角カメラ群の代表解像度が得られる範囲８１１が表示され、サブウィンドウ８０６がポップアップ表示される。ユーザは、サブウィンドウ８０６の中の入力欄８０７に所望の解像度Rpを入力して、ＯＫボタン８０４を押下する。このとき、Rp＝０は解像できないことを意味し原理的に成り立たないので、解像度Rpの範囲は、０〜＜ Rp ≦１．０となる。こうしてユーザは、仮想視点映像について許容可能な特定の解像度を指定することができる。例えば特定解像度として“０．７”が設定された場合には、広角カメラ群１１０で撮影された複数視点画像はこの解像度を満足することができないため、ズームカメラ群１０９で撮影された複数視点画像のみが仮想視点画像の生成に用いられることになる。また、特定解像度として“０．４”が設定された場合には、広角カメラ群１１０で撮影された複数視点画像とズームカメラ群１０９で撮影された複数視点画像の双方が仮想視点画像の生成に用いられることになる。なお、特定解像度をユーザが指定する方法は上述の例に限定されず、例えば、ユーザがより直観的に認識しやすいように、少なくとも２以上の異なる解像度にそれぞれ対応付けられた画像サンプルを提示して、ユーザが所望の解像度を指定できるようにしてもよい。画像サンプルは、ズームカメラ群１０９に属するカメラ２０３で撮影した最も高解像度の画像から生成可能である。具体的には、カメラ２０３の撮影画像を縮小した後、ニアレストネイバー等の手法で段階的に拡大することで解像度の異なる複数の画像サンプルを得る。こうして、例えば図９（ａ）に示すように複数（ここでは４つ）の解像度別の画像サンプルを用意し、サブウィンドウ９０６のように表示する（図９（ｂ）を参照）。この場合、図９（ｂ）に示すように、例えば画像サンプルＡ〜Ｄの各解像度レベルに対応する範囲を示すライン等を、例えばフィールドマップ上に表示させてもよい。これによりユーザは、解像度レベルの異なる画像を実際に確認しながら、自身が望む解像度レベルを指定することが可能になる。 In S504, a specific resolution (for example, an allowable limit resolution) by the user in the virtual viewpoint video is set based on the user input via the image quality level setting UI screen. 8A and 8B show an example of an image quality level setting UI screen displayed on the display unit 105 by the GUI control unit 302. FIG. A user who designates a specific resolution level presses the resolution designation button 803 from among the buttons 801 to 803 in the state of FIG. Then, a transition is made to the state of FIG. 8B, and on the left side of the screen, a range 810 where the representative resolution of the zoom camera group calculated in S503 is obtained and a range 811 where the representative resolution of the wide-angle camera group is obtained are displayed. Sub window 806 is popped up. The user inputs a desired resolution Rp in the input field 807 in the sub window 806 and presses an OK button 804. At this time, Rp = 0 means that it cannot be resolved and does not hold in principle, so the range of resolution Rp is 0 to <Rp ≦ 1.0. In this way, the user can specify a specific resolution allowable for the virtual viewpoint video. For example, when “0.7” is set as the specific resolution, the multi-viewpoint image captured by the wide-angle camera group 110 cannot satisfy this resolution. Only will be used to generate the virtual viewpoint image. In addition, when “0.4” is set as the specific resolution, both the multiple viewpoint images captured by the wide-angle camera group 110 and the multiple viewpoint images captured by the zoom camera group 109 are used to generate a virtual viewpoint image. Will be used. Note that the method for designating the specific resolution by the user is not limited to the above-described example. For example, image samples associated with at least two different resolutions are presented so that the user can more intuitively recognize the specific resolution. Thus, the user may be allowed to specify a desired resolution. The image sample can be generated from the highest resolution image captured by the camera 203 belonging to the zoom camera group 109. Specifically, after reducing the captured image of the camera 203, a plurality of image samples having different resolutions are obtained by stepwise enlargement using a technique such as nearest neighbor. In this way, for example, as shown in FIG. 9A, a plurality (four in this case) of image samples for each resolution are prepared and displayed as a sub-window 906 (see FIG. 9B). In this case, as shown in FIG. 9B, for example, lines indicating ranges corresponding to the respective resolution levels of the image samples A to D may be displayed on a field map, for example. As a result, the user can specify the desired resolution level while actually checking images with different resolution levels.

以上が、ユーザ入力に基づく画質設定処理の内容である。なお、Ｓ５０３までの各処理については、会場設営後の準備段階でキャリブレーションを行うなどして、複数視点映像の撮影開始前に済ませておいてもよい。また、解像度を例に説明を行ったが、色差やノイズについても同様に適用可能である。 The above is the content of the image quality setting process based on the user input. Note that each processing up to S503 may be completed before the start of the shooting of the multi-viewpoint video by performing calibration at the preparation stage after the venue is set up. Further, although the description has been given taking the resolution as an example, the present invention can be similarly applied to color differences and noise.

＜色差の場合＞
色差レベルを指定する場合は、まず、全カメラ群に属する全カメラを対象とした全ての組合せについて、所属するカメラ群の共通撮影領域における、撮影画像内の平坦部の色相、明度、彩度の差から色差ΔEを、事前のキャリブレーション等で求めておく。そして、ユーザが例えば許容可能な色差ΔE_ （例えば、０〜１０．０の範囲内）を、前述の画質レベル設定用ＵＩ画面を介して、解像度の場合と同様にユーザが指定できるようにする。色差ΔEは、その値が０のときカメラ間の色の差がまったくないことを表し、その値が大きいほどカメラ間の色の差が大きいことを表す。ＪＩＳ規格や各種の工業界で一般的に使用されている例では、ΔE=１．６〜３．２の範囲であれば、離間比較では人がほとんど差に気付かないレベルだとされている。そこで、ユーザが指定した色差ΔE_uよりも小さい値の色差ΔEを持つ組合せだけを抽出し、当該組合せに含まれる複数のカメラにおける共通撮影領域を導出して、これを特定の色差レベルの仮想視点映像が得られる範囲を示す情報として仮想カメラ設定用ＵＩ画面に表示する。この特定色差レベルの範囲情報を参考にユーザが仮想カメラを設定し、指定した特定色差レベルを満たすカメラで撮影された複数視点映像データだけを仮想視点映像の生成に使用することで、一定の色差レベルが保証された仮想視点映像を得ることが可能になる。なお、共通撮影領域の導出方法については、後述する。 <For color difference>
When specifying the color difference level, first, for all combinations targeting all cameras belonging to all camera groups, the hue, brightness, and saturation of the flat part in the captured image in the common shooting area of the camera group to which it belongs. The color difference ΔE is obtained from the difference by prior calibration or the like. Then, for example, the user can specify an allowable color difference ΔE_ (for example, in the range of 0 to 10.0) via the image quality level setting UI screen, as in the case of the resolution. The color difference ΔE indicates that there is no color difference between the cameras when the value is 0, and the greater the value, the greater the color difference between the cameras. In examples generally used in the JIS standard and various industrial industries, it is said that if ΔE = 1.6 to 3.2, a person hardly notices the difference in the distance comparison. Therefore, only a combination having a color difference ΔE smaller than the color difference ΔE_u specified by the user is extracted, a common shooting area in a plurality of cameras included in the combination is derived, and this is a virtual viewpoint video with a specific color difference level. Is displayed on the virtual camera setting UI screen as information indicating the range in which can be obtained. The user sets a virtual camera with reference to this specific color difference level range information, and uses only the multi-viewpoint video data captured by the camera that satisfies the specified specific color difference level to generate a virtual viewpoint video. It is possible to obtain a virtual viewpoint video with a guaranteed level. A method for deriving the common imaging area will be described later.

＜ノイズの場合＞
ノイズについては、複数視点映像データにおいてテクスチャが存在しない平坦領域のSN比によって、仮想視点映像におけるノイズレベルを制御することができる。具体的には、まず、全カメラ群に属する全カメラを対象に、撮影画像内の平坦領域における平均輝度μと標準偏差σの比率であるSN比（＝μ／σ）を、事前のキャリブレーション等で求めておく。そして、ユーザが例えば許容可能なＳＮ比（例えば、０〜１００[dB]の範囲内）を、前述の画質設定用ＵＩ画面を介して、解像度の場合と同様にユーザが指定できるようにする。SN比は、その値が０のときノイズ量が最大であることを表し、その値が大きいほどノイズ量が少ないことを表す。そこで、ユーザが指定したSN比よりも大きい値のSN比を持つカメラだけを抽出し、当該抽出された複数のカメラにおける共通撮影領域を導出して、これを特定のノイズレベルの仮想視点映像が得られる範囲を示す情報として仮想カメラ設定用ＵＩ画面に表示する。この特定ノイズレベルの範囲情報を参考にユーザが仮想カメラを設定し、指定したSN比を満たすカメラで撮影された複数視点映像データだけを仮想視点映像の生成に使用することで、一定のノイズレベルが保証された仮想視点映像を得ることが可能になる。 <In case of noise>
As for noise, the noise level in the virtual viewpoint video can be controlled by the SN ratio of the flat area where the texture does not exist in the multi-view video data. Specifically, first, for all cameras belonging to all camera groups, the SN ratio (= μ / σ), which is the ratio of the average luminance μ and the standard deviation σ in the flat area in the captured image, is preliminarily calibrated. Etc. Then, the user can specify, for example, an allowable SN ratio (for example, within a range of 0 to 100 [dB]), as in the case of the resolution, through the above-described image quality setting UI screen. The SN ratio indicates that the amount of noise is maximum when the value is 0, and that the larger the value, the smaller the amount of noise. Therefore, only a camera having an SN ratio that is larger than the SN ratio specified by the user is extracted, a common shooting area for the extracted plurality of cameras is derived, and this is used as a virtual viewpoint video of a specific noise level. Information on the obtained range is displayed on the virtual camera setting UI screen. The user sets a virtual camera with reference to this specific noise level range information, and uses only the multi-viewpoint video data captured by the camera that satisfies the specified signal-to-noise ratio to generate a virtual viewpoint video. Can be obtained.

なお、上述の説明では、設定対象の画質の種類に応じた任意の数値を自由にユーザが指定する例を説明したが、これに限定されない。例えば、「高」「低」或いは「大」「中」「小」といったように段階別のレベルを用意し、その中からユーザが任意のレベルを選択し、各レベルに対応する特定画質の範囲情報が表示されるようにしてもよい。さらには、解像度、色差、ノイズといった複数の画質要素を組合せた特定画質の範囲情報を表示するようにしても良い。 In the above description, an example in which the user freely designates an arbitrary numerical value according to the type of image quality to be set has been described, but the present invention is not limited to this. For example, “High”, “Low”, “Large”, “Medium”, and “Small” levels are prepared for each stage, and the user selects an arbitrary level from among them, and a specific image quality range corresponding to each level. Information may be displayed. Furthermore, specific image quality range information combining a plurality of image quality elements such as resolution, color difference, and noise may be displayed.

（仮想カメラ設定処理）
図１０（ａ）は、仮想カメラのパラメータ設定を行う処理の流れを示すフローチャートである。以下、図１０（ａ）のフローに沿って説明する。 (Virtual camera setting process)
FIG. 10A is a flowchart showing a flow of processing for setting the parameters of the virtual camera. Hereinafter, it demonstrates along the flow of Fig.10 (a).

Ｓ１００１では、ＧＵＩ制御部３０２によって、仮想カメラ設定用ＵＩ画面が表示部１０５に表示される。図１１は、仮想カメラ設定用ＵＩ画面の一例を示している。図１１に示す仮想カメラ設定用ＵＩ画面１１００では、フィールド２０１を真上から俯瞰で見たフィールドマップ１１１０が画面左側に表示されており、その上にＳ４０２で推定したオブジェクト（選手とボール）の３Ｄ形状がマッピングされている。図１０（ｂ）に示す詳細フローに沿って、仮想カメラのパラメータ設定を行うためのＧＵＩの表示制御について説明する。 In step S <b> 1001, the GUI control unit 302 displays a virtual camera setting UI screen on the display unit 105. FIG. 11 shows an example of a virtual camera setting UI screen. In the virtual camera setting UI screen 1100 shown in FIG. 11, a field map 1110 in which the field 201 is viewed from above is displayed on the left side of the screen, and 3D of the objects (players and balls) estimated in S402 is displayed on the field map 1110. The shape is mapped. GUI display control for setting virtual camera parameters will be described along the detailed flow shown in FIG.

まず、Ｓ１０１１では、すべてのカメラ群（ここではズームカメラ群１０９と広角カメラ群１１０の２種類）に属する各カメラのカメラ情報が取得される。カメラ情報には、当該カメラの設置位置、注視点位置、画角、焦点距離、イメージセンサの画素ピッチと縦横それぞれの総画素数といった情報が含まれる。このカメラ情報は、予め記憶部１０３に保持しておいたものを読み出して取得してもよいし、各カメラ（或いは各カメラ群を代表する１台のカメラ）にアクセスして取得してもよい。 First, in S1011, camera information of each camera belonging to all camera groups (here, two types of zoom camera group 109 and wide-angle camera group 110) is acquired. The camera information includes information such as the installation position of the camera, the point of gaze position, the angle of view, the focal length, the pixel pitch of the image sensor, and the total number of pixels in the vertical and horizontal directions. This camera information may be acquired by reading out information stored in the storage unit 103 in advance, or may be acquired by accessing each camera (or one camera representing each camera group). .

Ｓ１０１２では、取得した各カメラ群のカメラ情報に基づいて、略同じ画角を有する複数のカメラの撮影領域が重複する領域（共通撮影領域）が、カメラ群単位で導出される。ここで、「略同じ」と表現しているのはカメラから注視点までの距離の差によってカメラ間で微細な画角の変動が発生するためである。すなわち、同じカメラ群に属する各カメラの画角は完全に同じでなくてもよく、同程度の画角を有していればよい。図１２は、カメラ群毎の共通撮影領域の導出方法を説明する図である。いま、サッカーコートのあるフィールド２０１に対して同じカメラ群に属する２台のカメラ１２０１と１２０２が同じ注視点を向いて撮影を行っている。このカメラ群の共通撮影領域は、カメラ１２０１から注視点に向けた四角錐とフィールド２０１との交面１２１１と、カメラ１２０２から注視点に向けた四角錐とフィールド２０１との交面１２１２とが重複する領域１２１３として求められる。ここでは、説明の便宜上、カメラ１２０１と１２０２の２台で説明を行ったが、３台以上でも考え方は同じである。こうして、各カメラ群についての共通撮影領域が導出される。 In S1012, based on the acquired camera information of each camera group, an area (common shooting area) where shooting areas of a plurality of cameras having substantially the same angle of view overlap is derived for each camera group. Here, the expression “substantially the same” is because a fine change in the angle of view occurs between the cameras due to a difference in distance from the camera to the gazing point. That is, the angles of view of the cameras belonging to the same camera group do not have to be completely the same, as long as they have the same angle of view. FIG. 12 is a diagram illustrating a method for deriving a common shooting area for each camera group. Now, two cameras 1201 and 1202 belonging to the same camera group are shooting with respect to the field 201 with the soccer court facing the same point of interest. In the common photographing region of this camera group, the intersection surface 1211 of the square pyramid and the field 201 directed from the camera 1201 toward the gazing point overlaps the intersection surface 1212 of the square pyramid directed from the camera 1202 to the gazing point and the field 201. The area 1213 is obtained. Here, for convenience of explanation, explanation has been given with two cameras 1201 and 1202, but the concept is the same with three or more cameras. Thus, a common shooting area for each camera group is derived.

Ｓ１０１３では、カメラ群毎の共通撮影領域が認識可能に仮想カメラ設定用ＵＩ画面の左側に表示される。図１３は、フィールドマップ上に、カメラ群毎の共通撮影領域が認識可能に投影された結果の一例を示す図である。ここではフィールドマップを２次元で示しているが３次元でもよい。図１３において、矩形の枠１３００がフィールドマップを示す。そして、破線の楕円１３０１がズームカメラ群１０９の共通撮影領域（以下、ズームカメラ群撮影領域）を示している。ズームカメラ群撮影領域１３０１内では、ズームカメラ群１０９で撮影した複数視点映像を用いた仮想視点映像の生成が可能である。ズームカメラ群撮影領域１３０１内では、オブジェクトの解像度合いが相対的に高いため、仮想カメラをオブジェクトに近づけても、仮想視点映像の画質を維持することが可能である。本実施形態の場合、ズームカメラ群撮影領域１３０１は、ズームカメラ群１０９に属する１２台全てのズームカメラ２０３の撮影領域が重複する領域となっている。そして、一点鎖線の楕円１３０２が広角カメラ群１１０の共通撮影領域（以下、広角カメラ群撮影領域）を示している。広角カメラ群撮影領域１３０２内では、広角カメラ群１１０で撮影した複数視点映像を用いた仮想視点映像の生成が可能である。広角カメラ群撮影領域１３０２内では、オブジェクトの解像度合いが相対的に低いため、仮想カメラをオブジェクトに一定以上近づけると、仮想視点映像の画質が悪化することになる。本実施形態の場合、広角カメラ群撮影領域１３０２は、広角カメラ群１１０に属する１２台全ての広角カメラ２０４の撮影領域が重複する領域となっている。また、フィールドマップ１３００内の斜線で示す領域１３０３は、広角カメラ群１１０に属する各広角カメラ２０４の撮影領域が重複せず、仮想視点映像生成が不可能な領域（以下、仮想視点映像生成不可領域）を示している。カメラ群毎の共通撮影領域は、例えば色分けなどによって識別可能に表示される。なお、図１３では、説明の便宜上、各カメラ群の共通撮影領域の形状を楕円で示しているが、実際には多角形となる。 In step S1013, the common shooting area for each camera group is displayed on the left side of the virtual camera setting UI screen so as to be recognizable. FIG. 13 is a diagram illustrating an example of a result of recognizing and projecting a common shooting area for each camera group on the field map. Although the field map is shown in two dimensions here, it may be three dimensions. In FIG. 13, a rectangular frame 1300 indicates a field map. A broken ellipse 1301 indicates a common shooting area of the zoom camera group 109 (hereinafter, zoom camera group shooting area). Within the zoom camera group shooting area 1301, it is possible to generate a virtual viewpoint video using a plurality of viewpoint videos shot by the zoom camera group 109. In the zoom camera group shooting area 1301, the resolution of the object is relatively high, so that the image quality of the virtual viewpoint video can be maintained even when the virtual camera is brought close to the object. In this embodiment, the zoom camera group shooting area 1301 is an area where the shooting areas of all twelve zoom cameras 203 belonging to the zoom camera group 109 overlap. A dashed-dotted ellipse 1302 indicates a common shooting area of the wide-angle camera group 110 (hereinafter, a wide-angle camera group shooting area). Within the wide-angle camera group shooting area 1302, it is possible to generate a virtual viewpoint video using a plurality of viewpoint videos shot by the wide-angle camera group 110. Since the resolution of the object is relatively low in the wide-angle camera group shooting region 1302, the image quality of the virtual viewpoint video deteriorates when the virtual camera is brought closer to the object than a certain level. In the case of the present embodiment, the wide-angle camera group shooting area 1302 is an area where the shooting areas of all 12 wide-angle cameras 204 belonging to the wide-angle camera group 110 overlap. An area 1303 indicated by oblique lines in the field map 1300 is an area in which shooting areas of the wide-angle cameras 204 belonging to the wide-angle camera group 110 do not overlap and a virtual viewpoint video cannot be generated (hereinafter, a virtual viewpoint video generation disabled area). ). The common shooting area for each camera group is displayed so as to be identifiable by color coding, for example. In FIG. 13, for convenience of explanation, the shape of the common shooting area of each camera group is indicated by an ellipse, but actually it is a polygon.

Ｓ１０１４では、仮想カメラ設定用ＵＩ画面を介して、どの領域を中心に仮想視点映像を生成するかについてのユーザ指定が受け付けられ、ユーザによって指定された領域が着目領域として設定される。この際、ユーザはマウスやタッチペン等を用いて、フィールドマップ１１１０内の所望の領域を指定する。図１１において、破線の矩形１１１１がユーザ入力に基づいて設定された着目領域を示している。着目領域が設定されると、仮想カメラ設定用ＵＩ画面１１００の左側は、当該着目領域１１１１を中心に拡大されたフィールドマップへと切り換わる。図１４に、着目領域１１１１を中心に拡大されたフィールドマップ１４００が表示された仮想カメラ設定用ＵＩ画面を示す。 In S1014, a user designation as to which region is used to generate the virtual viewpoint video is accepted via the virtual camera setting UI screen, and the region designated by the user is set as the region of interest. At this time, the user designates a desired area in the field map 1110 using a mouse, a touch pen, or the like. In FIG. 11, a broken-line rectangle 1111 indicates a region of interest set based on user input. When the region of interest is set, the left side of the virtual camera setting UI screen 1100 is switched to a field map enlarged around the region of interest 1111. FIG. 14 shows a virtual camera setting UI screen on which a field map 1400 enlarged around the region of interest 1111 is displayed.

Ｓ１０１５では、前述の画質設定処理で設定された特定画質レベルの仮想視点画像の生成可能な範囲が、着目領域設定後の仮想カメラ設定用ＵＩ画面におけるフィールドマップ上に示される。図１４において、着目領域１１１１設定後の拡大フィールドマップ１４００内の二点鎖線１４０１が、ユーザ指定の特定解像度が得られなくなる限界を示すラインである。二点鎖線１４０１近くにある白抜きの矢印が示す方向の範囲内で仮想カメラを設定すれば、特定解像度を満足する仮想視点映像が得られることになる。図１０（ａ）のフローの説明に戻る。 In S1015, the range in which the virtual viewpoint image of the specific image quality level set in the image quality setting process described above can be generated is shown on the field map on the virtual camera setting UI screen after setting the region of interest. In FIG. 14, a two-dot chain line 1401 in the enlarged field map 1400 after setting the region of interest 1111 is a line indicating a limit at which a specific resolution specified by the user cannot be obtained. If a virtual camera is set within a range indicated by a white arrow near the two-dot chain line 1401, a virtual viewpoint video satisfying a specific resolution can be obtained. Returning to the description of the flow in FIG.

Ｓ１００２では、仮想カメラ設定用ＵＩ画面を介したユーザ入力に基づいて、仮想カメラパスが設定される。仮想カメラパスを設定するユーザは、カメラパス指定ボタン１１０１を押下する。そして、別途設定のタイムフレーム（60fpsで撮影されたうちの10秒分であれば600フレーム）の間に移動する仮想カメラの軌跡をマウス等でフィールドマップ１４００上に描くことで、仮想カメラパスを指定する。この際、ユーザは、フォールドマップ１４００上に重畳表示されたライン等（図１４の例では二点鎖線１４０１）によって、どのように仮想カメラを設定すれば、特定解像度を満たす仮想視点画像が得られるのかを事前に把握できる。こうしてユーザは、出来上がる仮想視点映像の解像度を予測しつつ仮想カメラパスの指定作業を行うことができる。図１４において、フィールドマップ１４００上の太線矢印１４０２は、こうして設定された仮想カメラのパスを示している。なお、設定されたパス１４０２上の任意の点Pをマウス等のドラッグ操作によって移動させることにより、パスの修正が可能である。また、この段階での仮想カメラパスのフィールド面からの高さはデフォルト値（例えば、15ｍ）となる。 In S1002, a virtual camera path is set based on a user input via the virtual camera setting UI screen. A user who sets a virtual camera path presses a camera path designation button 1101. Then, the virtual camera path is drawn on the field map 1400 with a mouse or the like by drawing the locus of the virtual camera that moves during a separately set time frame (600 frames for 10 seconds of images taken at 60 fps). specify. At this time, the user can set a virtual camera using a line or the like superimposed on the fold map 1400 (two-dot chain line 1401 in the example of FIG. 14) to obtain a virtual viewpoint image that satisfies a specific resolution. Can be grasped in advance. In this way, the user can perform a virtual camera path designation operation while predicting the resolution of the resulting virtual viewpoint video. In FIG. 14, a thick line arrow 1402 on the field map 1400 indicates the virtual camera path set in this way. The path can be corrected by moving an arbitrary point P on the set path 1402 by a drag operation with a mouse or the like. Further, the height from the field plane of the virtual camera path at this stage is a default value (for example, 15 m).

Ｓ１００３では、仮想カメラの高さ調整を行うか否かによって処理の切り分けがなされる。ユーザがカメラパス高さ編集ボタン１１０２を押下した場合は、Ｓ１００４に進み、そうでない場合は仮想カメラ設定処理を終了する。 In S1003, the process is divided depending on whether or not the height of the virtual camera is adjusted. If the user presses the camera path height edit button 1102, the process proceeds to S1004, and if not, the virtual camera setting process ends.

Ｓ１００４では、仮想カメラの高さを調整する処理が実行される。ユーザは、仮想カメラパス１４０２上の任意の位置（座標）にカーソルを移動し、マウス等のクリック操作を行うことによって、高さを変更したい仮想カメラの位置（高さ編集点）を指定する。高さ編集点が指定されると、仮想カメラ設定用ＵＩ画面内に、高さ設定ウィンドウ１４１０が表示される（図１４を参照）。この高さ設定ウィンドウ１４１０には、フィールド２０１を真横（グラウンド面に対して水平方向）からみた画像１４１１が表示され、その中には特定解像度が得られなくなる限界を示すライン１４０１も表示される。ユーザは、高さ設定ウィンドウ１４１０内の入力欄１４１２に任意の値（単位：m、例えば０〜２５ｍ）を入力することによって当該位置における仮想カメラの高さを変更することができる。この際、仮想カメラの高さが変わると、各オブジェクトまでの距離も変わることになる。例えば、仮想カメラの高さをより低い位置に変更すると各オブジェクトまでの距離が近くなってそのサイズが大きくなる一方で、解像度が落ちてボケた映像になりやすくなる。この場合、前述のＳ５０２で説明したやり方で、変更後の高さにおける撮影距離ｄに対応した解像度を求めて、上記ライン１４０１も更新する。これによりユーザは、高さ変更を行った場合に画質がどのように変化するのかを随時把握しながら、仮想カメラの高さを編集することができる。なお、高さ編集点によって高度が変更された箇所以外の高さは、近接する位置の高さ編集点又はデフォルト値から補間して、高さが急激に変化しないように調整される。 In S1004, a process for adjusting the height of the virtual camera is executed. The user moves the cursor to an arbitrary position (coordinates) on the virtual camera path 1402 and performs a click operation using a mouse or the like, thereby specifying the position (height editing point) of the virtual camera whose height is to be changed. When the height edit point is designated, a height setting window 1410 is displayed in the virtual camera setting UI screen (see FIG. 14). In the height setting window 1410, an image 1411 obtained by viewing the field 201 from the side (horizontal direction with respect to the ground plane) is displayed, and a line 1401 indicating a limit at which a specific resolution cannot be obtained is displayed. The user can change the height of the virtual camera at the position by inputting an arbitrary value (unit: m, for example, 0 to 25 m) in the input field 1412 in the height setting window 1410. At this time, if the height of the virtual camera changes, the distance to each object also changes. For example, if the height of the virtual camera is changed to a lower position, the distance to each object is reduced and the size of the object is increased, but the resolution is lowered and a blurred image is likely to occur. In this case, the line 1401 is also updated by obtaining the resolution corresponding to the shooting distance d at the height after the change in the manner described in S502 above. Accordingly, the user can edit the height of the virtual camera while grasping at any time how the image quality changes when the height is changed. It should be noted that the heights other than the portion where the altitude is changed by the height editing point are adjusted so that the height does not change abruptly by interpolating from the height editing point at the adjacent position or the default value.

以上がＳ４０６における仮想カメラ設定処理の詳細である。 The above is the details of the virtual camera setting process in S406.

＜変形例＞
上述の例では、仮想カメラの設定する際のＧＵＩにおいて、予め指定された画質レベルに関する情報を表示して、ユーザが仮想視点画像の出来上がりの画質を予測しながら仮想カメラを設定できるようにしている。ＧＵＩで事前に画質レベルに関する情報を表示するのに代えて、ユーザが設定した仮想カメラパスでは予め指定された画質レベルを満足しない場合にその旨を通知するメッセージによる警告表示を行うようにしてもよい。これによりユーザは、設定中の仮想カメラパスでは一定の画質レベルが保証されないことを知ることができ、仮想カメラの設定をやり直す機会を仮想視点画像の生成前に確保できる。また、仮想カメラ設定用ＵＩ画面でユーザが参照するフィールドマップ上に、ズームカメラ群撮影領域や広角カメラ群撮影領域を示す情報も併せて表示するようにしてもよい。 <Modification>
In the above-described example, in the GUI when setting the virtual camera, information on the image quality level designated in advance is displayed so that the user can set the virtual camera while predicting the final image quality of the virtual viewpoint image. . Instead of displaying information on the image quality level in advance using the GUI, if the image quality level specified in advance by the virtual camera path set by the user is not satisfied, a warning display with a message to that effect may be displayed. Good. Thus, the user can know that a certain image quality level is not guaranteed in the virtual camera path being set, and can secure an opportunity to redo the virtual camera setting before generating the virtual viewpoint image. Further, information indicating the zoom camera group shooting area and the wide-angle camera group shooting area may be displayed together on the field map referred to by the user on the virtual camera setting UI screen.

以上のとおり本実施形態によれば、仮想カメラの移動経路や高さを設定する際に、どのように仮想カメラを設定すれば、一定の画質が保証された仮想視点画像が得られるかをユーザは事前に把握することできる。その結果、仮想カメラのパラメータ設定作業を、出来上がった仮想視点画像を見た上で何度もやり直すという必要がなくなり、効率よく仮想視点画像を生成することができる。 As described above, according to this embodiment, when setting the movement path and height of the virtual camera, the user can determine how to set the virtual camera to obtain a virtual viewpoint image with a certain image quality guaranteed. Can be grasped in advance. As a result, it is not necessary to repeat the parameter setting operation of the virtual camera after viewing the completed virtual viewpoint image, and the virtual viewpoint image can be generated efficiently.

Embodiment 2

実施形態１では、仮想カメラを設定する際に、一定の画質レベルが保証される仮想カメラの設定範囲をユーザが事前に把握できるようにする態様について、解像度の場合を例に説明した。仮想視点映像において人が知覚できる解像度は、仮想カメラとオブジェクトとの相対速度によっても変わってくるところ、実施形態１ではこの点までは考慮していなかった。そこで、設定された仮想カメラとオブジェクトとの相対速度に応じて、ユーザが仮想カメラの移動速度を調整して、指定された特定解像度の条件が満たされるようにする態様を実施形態２として説明する。なお、実施形態１と共通する内容については説明を省略し、以下では差異点について述べるものとする。 In the first embodiment, when setting a virtual camera, the aspect in which the user can grasp in advance the setting range of the virtual camera that guarantees a certain image quality level has been described by taking the case of resolution as an example. The resolution that humans can perceive in the virtual viewpoint video varies depending on the relative speed between the virtual camera and the object, but this point has not been considered in the first embodiment. Therefore, a mode in which the user adjusts the moving speed of the virtual camera in accordance with the set relative speed between the virtual camera and the object so that the specified specific resolution condition is satisfied will be described as a second embodiment. . In addition, description is abbreviate | omitted about the content which is common in Embodiment 1, and a difference point shall be described below.

本実施形態に係る仮想カメラの設定時における制御の説明に入る前に、オブジェクトの速度によって人が知覚できる解像度が変わる点について確認しておく。図１５は、人間の動き刺激に対する視覚系の時空間周波数特性の一つである空間速度コントラスト感度関数（SV-CSF：Spatio-Velocity Contrast Sensitivity Function）を説明するグラフである。横軸は空間周波数、縦軸は人の視覚感度（応答特性）を表す。そして、曲線１５０１はオブジェクトの移動速度が比較的速い（秒速４．５ｍ程度）の場合のSV-CSF、曲線１５０２は移動策度が比較的遅い（秒速１．５ｍ程度）場合のSV-CSFを示している。人の視覚感度は、空間周波数が“０”のときに低く、空間周波数が高くなるにつれて急激に上昇する。その後、緩やかに下降して、高周波（＝高解像度）であるほど感度が低くなり、限りなく“０”に近づいていく。さらに、SV-CSFはオブジェクトの移動速度が速いほど低周波側（左方向）に曲線の極大値がシフトしていく。例えば、オブジェクトの移動速度が相対的に遅い曲線１５０２の場合、空間周波数がｓ（例えば、ｓ＝８）のときとｔ（例えば、ｔ＝１５）のときとでは応答特性に差があるため、解像度の違いを知覚可能である。しかしながら、オブジェクトの移動速度が相対的に速い曲線１５０１の場合、空間周波数がｓ以降は応答特性の値が“０”に収束しており解像度の違いを知覚することができない。このような人の視覚特性を利用して、特定解像度に収まるように仮想カメラとオブジェクトとの相対速度を調整する処理を行えるようにする。 Before entering the description of the control at the time of setting the virtual camera according to the present embodiment, it is confirmed that the resolution perceivable by a person changes depending on the speed of the object. FIG. 15 is a graph for explaining a space-velocity contrast sensitivity function (SV-CSF), which is one of the spatio-temporal frequency characteristics of the visual system with respect to human motion stimuli. The horizontal axis represents spatial frequency, and the vertical axis represents human visual sensitivity (response characteristics). Curve 1501 shows SV-CSF when the moving speed of the object is relatively fast (about 4.5 m / s), and curve 1502 shows SV-CSF when the moving strategy is relatively slow (about 1.5 m / s). Show. The human visual sensitivity is low when the spatial frequency is “0”, and increases rapidly as the spatial frequency increases. After that, it gradually descends and the sensitivity becomes lower as the frequency becomes higher (= high resolution), and it approaches “0” as much as possible. Furthermore, in SV-CSF, the maximum value of the curve shifts to the low frequency side (leftward) as the moving speed of the object increases. For example, in the case of the curve 1502 in which the moving speed of the object is relatively slow, there is a difference in response characteristics when the spatial frequency is s (for example, s = 8) and t (for example, t = 15). The difference in resolution can be perceived. However, in the case of the curve 1501 in which the moving speed of the object is relatively fast, the response characteristic value converges to “0” after the spatial frequency is s and the difference in resolution cannot be perceived. Using such human visual characteristics, processing for adjusting the relative speed between the virtual camera and the object so as to be within a specific resolution can be performed.

図１６は、本実施形態に係る、仮想カメラ設定処理の流れを示すフローチャートである。Ｓ１６０１〜Ｓ１６０４は、実施形態１の図１０（ａ）のフローのＳ１００１〜Ｓ１００４と同じである。ここまでの各ステップの処理によって仮想カメラが設定されると、続くＳ１６０５において、着目領域の中に存在するオブジェクトの中から注目するオブジェクトが、ユーザ入力等に基づいて決定される。図１７は、本実施形態に係る、仮想カメラ設定用ＵＩ画面の一例である。ユーザは、図１７に示す仮想カメラ設定用ＵＩ画面１１００において、オブジェクト選択ボタン１７０１を押下した後、フィールドマップ１４００上でマウス等を操作して注目するオブジェクトを選択する。すると選択されたオブジェクトが注目オブジェクトとして決定される。ここでは、４人の選手の中から最もボールに近い選手のオブジェクト１７０２が選択されたものとする。なお、ユーザが直接選択するのに代えて、予め登録した特定の選手やボールを注目オブジェクトとして自動で決定するように構成してもよい。 FIG. 16 is a flowchart showing a flow of virtual camera setting processing according to the present embodiment. S1601 to S1604 are the same as S1001 to S1004 in the flow of FIG. 10A of the first embodiment. When the virtual camera is set by the processing of each step so far, in S1605, an object of interest is determined from objects existing in the region of interest based on user input or the like. FIG. 17 is an example of a virtual camera setting UI screen according to the present embodiment. The user presses an object selection button 1701 on the virtual camera setting UI screen 1100 shown in FIG. 17 and then operates the mouse or the like on the field map 1400 to select an object of interest. Then, the selected object is determined as the target object. Here, it is assumed that the player object 1702 closest to the ball is selected from the four players. Note that instead of direct selection by the user, a specific player or ball registered in advance may be automatically determined as the object of interest.

Ｓ１６０２では、対象タイムフレームにおける注目オブジェクトと仮想カメラの移動速度をそれぞれ算出し、両者間の相対速度がユーザ入力に基づき調整される。図１８は、仮想カメラについての推奨される移動速度を説明するグラフである。縦軸は実際のオブジェクトとカメラとの相対速度を示し、横軸は空間周波数を示している。そして、曲線１８０１は、オブジェクトと仮想カメラとの相対速度について、低速〜高速までの複数のSV-CSFを求め、それぞれの感度が限りなく０に近づいたときの空間周波数をプロットしたものである。例えば、ユーザが指定した特定解像度が、曲線１８０１上の丸印１８０２に対応するとする。この場合、決定された注目オブジェクトと現在設定している仮想カメラとの相対速度がｕ[m/sec]（例えば、ｕ＝２）を超えている場合には、ユーザに対し、仮想カメラの移動速度を調整するよう促す表示を行い、許容範囲内となるように調整させる。本実施形態では、仮想視点映像をテレビモニタ等のディスプレイ上で視聴することを想定している。このようなディスプレイ上で人がオブジェクトを知覚可能（追従視可能）な速度はおよそ３０[degree/sec]であるため、これに対応する破線１８０３で示す速度を超えることがないように調整することになる。ユーザが速度編集ボタン１７０３を押下すると、速度設定用のサブウィンドウ１７１０がポップアップ表示される。このときのサブウィンドウ１７１０内のスライドバー１７１１のレンジは、ユーザが指定した特定解像度に対応したものとなる。例えば、上述の丸印１８０２の場合には０〜ｕ[m/sec]のレンジとなる。このようにして、ユーザはスライドバー１７１１によって仮想カメラの速度調整を行う。 In S1602, the moving speed of the object of interest and the virtual camera in the target time frame is calculated, and the relative speed between them is adjusted based on the user input. FIG. 18 is a graph illustrating the recommended moving speed for the virtual camera. The vertical axis represents the relative speed between the actual object and the camera, and the horizontal axis represents the spatial frequency. A curve 1801 is obtained by plotting the spatial frequencies when the respective sensitivities approach zero as much as possible by obtaining a plurality of SV-CSF from low speed to high speed with respect to the relative speed between the object and the virtual camera. For example, it is assumed that the specific resolution designated by the user corresponds to a circle 1802 on the curve 1801. In this case, if the relative speed between the determined object of interest and the currently set virtual camera exceeds u [m / sec] (for example, u = 2), the movement of the virtual camera to the user is performed. A display prompting the user to adjust the speed is performed, and the speed is adjusted to be within an allowable range. In the present embodiment, it is assumed that the virtual viewpoint video is viewed on a display such as a television monitor. On such a display, the speed at which a person can perceive an object (following view is possible) is about 30 [degree / sec], and therefore the speed shown by the corresponding broken line 1803 should not be exceeded. become. When the user presses the speed edit button 1703, a speed setting sub-window 1710 is popped up. At this time, the range of the slide bar 1711 in the sub window 1710 corresponds to the specific resolution designated by the user. For example, in the case of the circle 1802 described above, the range is 0 to u [m / sec]. In this way, the user adjusts the speed of the virtual camera using the slide bar 1711.

以上が、本実施形態に係る、仮想カメラ設定処理の内容である。仮想カメラとオブジェクトとの相対速度に応じて、仮想カメラの移動速度を調整するようユーザに促して調整させることで、より高画質な仮想視点映像を確実に得ることができる。 The above is the content of the virtual camera setting process according to the present embodiment. By prompting and adjusting the user to adjust the moving speed of the virtual camera in accordance with the relative speed between the virtual camera and the object, a higher-quality virtual viewpoint video can be reliably obtained.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００画像処理装置
１０５表示部
３０２視点制御部
３０５仮想視点画像生成部 100 Image Processing Device 105 Display Unit 302 Viewpoint Control Unit 305 Virtual Viewpoint Image Generation Unit

Claims

An image processing device that generates a virtual viewpoint image from a plurality of viewpoint images captured by a plurality of cameras,
Obtaining means for obtaining the multi-viewpoint image;
A GUI for a user to set parameters of a virtual camera corresponding to the virtual viewpoint image;
Generating means for generating the virtual viewpoint image using the plurality of viewpoint images based on the set parameters;
With
The GUI displays information related to the image quality of the virtual viewpoint image.

The image processing apparatus according to claim 1, wherein the information on the image quality is generated according to a predetermined image quality level for the generated virtual viewpoint image.

The GUI further accepts designation of an image quality level for the generated virtual viewpoint image,
The image processing apparatus according to claim 1, wherein the information related to the image quality is generated according to the designated image quality level.

The designation of the image quality level is designation of a resolution level,
The image processing apparatus according to claim 3, wherein the information on the image quality is information indicating a range of a position of the virtual camera from which a virtual viewpoint image satisfying a designated resolution level is obtained.

When the GUI receives designation of the resolution level, the GUI displays information on representative resolution derived in units of camera groups adjusted so that the same object is imaged in the same size among the plurality of cameras. The image processing apparatus according to claim 4.

The image processing apparatus according to claim 4, wherein, when receiving the designation of the resolution level, the GUI displays a plurality of image samples respectively associated with at least two different resolutions.

The GUI is a display that prompts the user to adjust the moving speed of the virtual camera when the set moving speed of the virtual camera and the relative speed between the object of interest in the multiple viewpoint images exceed a predetermined speed. The image processing apparatus according to claim 1, wherein:

The designation of the image quality level is designation of a color difference level,
The image processing apparatus according to claim 3, wherein the information related to the image quality is information indicating a range of a position of the virtual camera from which a virtual viewpoint image satisfying a specified color difference level is obtained.

9. The image according to claim 8, wherein the generation unit generates the virtual viewpoint image using a plurality of viewpoint images captured by a camera that satisfies a specified color difference level among the plurality of cameras. Processing equipment.

The designation of the image quality is designation of an S / N ratio representing a noise level,
The image processing apparatus according to claim 3, wherein the information related to the image quality is information indicating a range of a position of the virtual camera from which a virtual viewpoint image satisfying a specified SN ratio is obtained.

The image according to claim 10, wherein the generation unit generates the virtual viewpoint image using a plurality of viewpoint images captured by a camera satisfying a specified SN ratio among the plurality of cameras. Processing equipment.

The image processing apparatus according to claim 3, wherein the GUI displays a warning when the image quality of the virtual viewpoint image generated based on the parameter is inferior to the specified image quality level.

An image processing method for generating a virtual viewpoint image from a plurality of viewpoint images photographed by a plurality of cameras,
Obtaining the multi-viewpoint image;
Receiving a setting of a parameter of a virtual camera corresponding to the virtual viewpoint image via a GUI on which information on the image quality of the virtual viewpoint image is displayed;
Generating the virtual viewpoint image using the plurality of viewpoint images based on the set parameters;
An image processing method comprising:

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 12.