JP7034666B2

JP7034666B2 - Virtual viewpoint image generator, generation method and program

Info

Publication number: JP7034666B2
Application number: JP2017204420A
Authority: JP
Inventors: 貴志花本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-12-27
Filing date: 2017-10-23
Publication date: 2022-03-14
Anticipated expiration: 2037-10-23
Also published as: JP2018107793A

Description

本発明は、複数視点映像から仮想視点画像を生成する画像処理に関する。 The present invention relates to image processing for generating a virtual viewpoint image from a plurality of viewpoint images.

複数台の実カメラで撮影した映像を用いて、3次元空間内に仮想的に配置した実際には存在しないカメラ（仮想カメラ）からの映像を再現する技術として、仮想視点画像技術がある。仮想視点画像技術は、スポーツ放送等において、より臨場感の高い映像表現として期待されている。仮想視点画像の生成では、実カメラで撮影した映像を画像処理装置に取り込んで、まずオブジェクトの形状推定を実施する。次に、形状推定の結果を基に仮想カメラの移動経路をユーザが決定し、仮想カメラから撮影した映像を再現する。ここで、例えば撮影シーンがサッカーの試合であれば、仮想カメラの移動経路の決定の際には、サッカーが行われるフィールド全域でオブジェクトである選手やボールの形状推定が行われている必要がある。しかし、広いフィールドの全域に対してオブジェクト形状の推定処理を実施する場合、実カメラで撮影された複数視点映像データの転送時間や形状推定処理時間の増大を招いてしまう。より迫力のある臨場感の高い試合放送の実現には、例えばシュートシーンの仮想視点画像をリプレイ映像として試合中にタイムリーに放送することが重要である。映像転送時間や形状推定処理時間の増大は、リアルタイム性の高い仮想視点画像生成の阻害要因となる。 There is a virtual viewpoint image technology as a technology for reproducing an image from a camera (virtual camera) that does not actually exist and is virtually arranged in a three-dimensional space by using images taken by a plurality of real cameras. Virtual viewpoint image technology is expected as a more realistic image expression in sports broadcasting and the like. In the generation of the virtual viewpoint image, the image taken by the actual camera is taken into the image processing device, and the shape of the object is first estimated. Next, the user determines the movement path of the virtual camera based on the result of shape estimation, and reproduces the image taken from the virtual camera. Here, for example, if the shooting scene is a soccer match, when determining the movement route of the virtual camera, it is necessary to estimate the shape of the player or the ball, which is an object, in the entire field where the soccer is played. .. However, when the object shape estimation process is performed on the entire area of a wide field, the transfer time and the shape estimation process time of the multi-viewpoint video data taken by the actual camera are increased. In order to realize a more powerful and realistic match broadcast, it is important to broadcast a virtual viewpoint image of a shoot scene as a replay image in a timely manner during the match. The increase in video transfer time and shape estimation processing time is an obstacle to the generation of highly real-time virtual viewpoint images.

この点、実カメラで撮影された映像データを異なる解像度で保持し、まず低解像度の映像で形状推定を実施し、その結果を初期値として次に高解像度の映像で形状推定を実施しこれを繰り返すことで処理時間を短縮させる技術が提案されている（特許文献１）。 In this regard, the video data taken by the actual camera is held at different resolutions, the shape is first estimated with the low resolution video, the result is used as the initial value, and then the shape is estimated with the high resolution video. A technique for shortening the processing time by repeating the process has been proposed (Patent Document 1).

特開平５－１２６５４６号公報Japanese Unexamined Patent Publication No. 5-126546

しかしながら、上記特許文献１の手法では形状推定処理時間の短縮は可能でも、実カメラで撮影された複数視点映像データの画像処理装置への転送時間の短縮はできない。 However, although the method of Patent Document 1 can shorten the shape estimation processing time, it cannot shorten the transfer time of the multi-viewpoint video data taken by the actual camera to the image processing device.

本発明に係る仮想視点画像生成装置は、フィールドを互いに異なる方向から撮影する複数の第１撮影装置により得られる複数の撮影画像であって、仮想視点の位置及び前記仮想視点からの視線方向に応じた第１仮想視点画像の生成に使用される複数の撮影画像を取得する第１取得手段と、前記フィールドの少なくとも一部を互いに異なる方向から撮影する１または複数の第２撮影装置により得られる１または複数の撮影画像に基づいて前記第１仮想視点画像よりも高画質の第２仮想視点画像を生成するか否かを、前記第１取得手段により取得された前記複数の撮影画像に基づいて生成される前記第１仮想視点画像の評価結果に応じて決定する決定手段と、前記決定手段による決定に応じて、前記第２撮影装置により得られる１又は複数の撮影画像を取得する第２取得手段と、前記第１取得手段により取得された複数の撮影画像、及び、前記決定手段による決定に応じて前記第２取得手段により取得された１または複数の撮影画像に基づいて、前記第１仮想視点画像及び前記第２仮想視点画像を生成する生成手段と、を備えたことを特徴とする。
The virtual viewpoint image generation device according to the present invention is a plurality of captured images obtained by a plurality of first imaging devices that capture fields from different directions, depending on the position of the virtual viewpoint and the line-of-sight direction from the virtual viewpoint. It is obtained by a first acquisition means for acquiring a plurality of captured images used for generating a first virtual viewpoint image, and one or a plurality of second imaging devices that capture at least a part of the field from different directions. Alternatively, whether or not to generate a second virtual viewpoint image having a higher image quality than the first virtual viewpoint image based on the plurality of captured images is generated based on the plurality of captured images acquired by the first acquisition means . A determination means for determining according to the evaluation result of the first virtual viewpoint image, and a second acquisition means for acquiring one or a plurality of captured images obtained by the second imaging device according to the determination by the determination means. And, based on the plurality of captured images acquired by the first acquisition means and one or a plurality of captured images acquired by the second acquisition means according to the determination by the determination means, the first virtual viewpoint. It is characterized by comprising a generation means for generating an image and the second virtual viewpoint image .

本発明によれば、実カメラからの映像転送時間と画像処理装置における形状推定処理時間の両方を削減することができる。これにより、リアルタイム性の高い仮想視点映像生成が可能となる。 According to the present invention, both the video transfer time from the actual camera and the shape estimation processing time in the image processing apparatus can be reduced. This makes it possible to generate virtual viewpoint images with high real-time characteristics.

仮想視点画像システムの構成の一例を示す図である。It is a figure which shows an example of the structure of a virtual viewpoint image system. （ａ）は、カメラ配置の一例を示す図、（ｂ）は各カメラ群に属するカメラの高さを示す図である。(A) is a diagram showing an example of camera arrangement, and (b) is a diagram showing the height of cameras belonging to each camera group. 広角カメラ群の撮影領域を示す図である。It is a figure which shows the shooting area of a wide-angle camera group. 標準カメラ群の撮影領域を示す図である。It is a figure which shows the shooting area of a standard camera group. ズームカメラ群の撮影領域を示す図である。It is a figure which shows the shooting area of a zoom camera group. 仮想視点画像が生成されるまでの全体の流れを示したフローチャートである。It is a flowchart which showed the whole flow until the virtual viewpoint image is generated. 仮想カメラに関するパラメータ設定用GUI画面の一例を示す図である。It is a figure which shows an example of the GUI screen for parameter setting about a virtual camera. 実施例１に係る、仮想視点画像生成処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the virtual viewpoint image generation processing which concerns on Example 1. 仮想カメラの撮影領域の導出方法を説明する図である。It is a figure explaining the derivation method of the shooting area of a virtual camera. 暫定的な仮想視点画像における最近傍オブジェクトの解像度合いの判定の説明図である。It is explanatory drawing of the determination of the resolution degree of the nearest neighbor object in the provisional virtual viewpoint image. 変形例に係る、ビルボード法による形状推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the shape estimation process by the billboard method which concerns on a modification. 変形例にかかる、オブジェクト位置の特定方法を説明する図である。It is a figure explaining the method of specifying the object position concerning the modification. 変形例に係る、平板にオブジェクトの部分画像が投影された状態を示す図である。It is a figure which shows the state which the partial image of an object is projected on the flat plate which concerns on the modification. 実施例２に係る、標準カメラ群とズームカメラ群の撮影領域を最適化する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process of optimizing the shooting area of the standard camera group and the zoom camera group which concerns on Example 2. FIG. カメラ群毎の撮影領域が変化する様子を説明する図である。It is a figure explaining how the shooting area changes for each camera group. 実施例３に係る、仮想カメラの各種項目を自動設定する処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the process which automatically sets various items of a virtual camera which concerns on Example 3. FIG. シーン解析処理の概念図である。It is a conceptual diagram of a scene analysis process. 実施例４に係る、制限時間内に仮想視点映像が生成されるまでの全体の流れを示したフローチャートである。It is a flowchart which showed the whole flow until the virtual viewpoint image is generated within the time limit which concerns on Example 4. FIG. 実施例４に係る、仮想視点画像生成処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the virtual viewpoint image generation processing which concerns on Example 4.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

図１は、本実施例における、仮想視点画像システムの構成の一例を示す図である。なお仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、仮想視点画像や任意視点画像などとも呼ばれる。また、仮想視点画像は、動画であっても、静止画であっても良い。本実施形態では、仮想視点画像が動画である場合の例を中心に説明する。図１に示す仮想視点画像システムは、画像処理装置１００と3種類のカメラ群１０９～１１１とで構成される。そして、画像処理装置１００は、CPU１０１、メインメモリ１０２、記憶部１０３、入力部１０４、表示部１０５、外部I/F１０６を備え、各部がバス１０７を介して接続されている。まず、CPU１０１は、画像処理装置１００を統括的に制御する演算処理装置であり、記憶部１０３等に格納された各種プログラムを実行して様々な処理を行う。メインメモリ１０２は、各種処理で用いるデータやパラメータなどを一時的に格納するほか、CPU１０１に作業領域を提供する。記憶部１０３は、各種プログラムやGUI（グラフィカル・ユーザ・インターフェイス）表示に必要な各種データを記憶する大容量記憶装置で、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。入力部１０４は、キーボードやマウス、電子ペン、タッチパネル等の装置であり、ユーザからの操作入力を受け付ける。表示部１０５は、液晶パネルなどで構成され、仮想視点画像生成時の仮想カメラの経路設定のためのGUI表示などを行う。外部I/F部１０６は、カメラ群１０９～１１１を構成する各カメラとLAN１０８を介して接続され、映像データや制御信号データの送受信を行う。バス１０７は上述の各部を接続し、データ転送を行う。 FIG. 1 is a diagram showing an example of the configuration of a virtual viewpoint image system in this embodiment. The virtual viewpoint image is an image generated by the end user and / or an appointed operator freely manipulating the position and posture of the virtual camera, and is also called a virtual viewpoint image or an arbitrary viewpoint image. Further, the virtual viewpoint image may be a moving image or a still image. In this embodiment, an example in which the virtual viewpoint image is a moving image will be mainly described. The virtual viewpoint image system shown in FIG. 1 is composed of an image processing device 100 and three types of camera groups 109 to 111. The image processing device 100 includes a CPU 101, a main memory 102, a storage unit 103, an input unit 104, a display unit 105, and an external I / F 106, and each unit is connected via a bus 107. First, the CPU 101 is an arithmetic processing device that collectively controls the image processing device 100, and executes various programs stored in the storage unit 103 or the like to perform various processes. The main memory 102 temporarily stores data and parameters used in various processes, and also provides a work area to the CPU 101. The storage unit 103 is a large-capacity storage device that stores various data necessary for displaying various programs and GUIs (graphical user interfaces), and for example, a non-volatile memory such as a hard disk or a silicon disk is used. The input unit 104 is a device such as a keyboard, a mouse, an electronic pen, and a touch panel, and receives an operation input from a user. The display unit 105 is composed of a liquid crystal panel or the like, and performs GUI display for setting a route of a virtual camera at the time of generating a virtual viewpoint image. The external I / F unit 106 is connected to each camera constituting the camera groups 109 to 111 via LAN 108, and transmits / receives video data and control signal data. The bus 107 connects each of the above-mentioned parts and performs data transfer.

上記3種類のカメラ群は、それぞれズームカメラ群１０９、標準カメラ群１１０、広角カメラ群１１１である。ズームカメラ群１０９は、画角の狭いレンズ（例えば10度）を搭載した複数台のカメラで構成されている。標準カメラ群１１０は、画角が標準的なレンズ（例えば30度）を搭載した複数台のカメラで構成されている。広角カメラ群１１１は、画角の広いレンズ（例えば45度）を搭載した複数台のカメラで構成されている。そして、カメラ群１０９～１１１を構成している各カメラはLAN１０８経由で画像処理装置１００に接続されている。また、各カメラ群１０９～１１１は画像処理装置１００からの制御信号をもとに、撮影の開始と停止、カメラ設定（シャッタースピード、絞りなど）の変更、撮影した映像データの転送を行う。 The above three types of cameras are a zoom camera group 109, a standard camera group 110, and a wide-angle camera group 111, respectively. The zoom camera group 109 is composed of a plurality of cameras equipped with a lens having a narrow angle of view (for example, 10 degrees). The standard camera group 110 is composed of a plurality of cameras equipped with a lens having a standard angle of view (for example, 30 degrees). The wide-angle camera group 111 is composed of a plurality of cameras equipped with a lens having a wide angle of view (for example, 45 degrees). Each of the cameras constituting the camera groups 109 to 111 is connected to the image processing device 100 via the LAN 108. Further, each camera group 109 to 111 starts and stops shooting, changes camera settings (shutter speed, aperture, etc.), and transfers shot video data based on a control signal from the image processing device 100.

なお、システム構成については、上記以外にも、様々な構成要素が存在するが、本発明の主眼ではないので、その説明は省略する。 Regarding the system configuration, there are various components other than the above, but since it is not the main subject of the present invention, the description thereof will be omitted.

図２（ａ）は、例えばサッカー等を行う競技場に、ズームカメラ群１０９、標準カメラ群１１０、広角カメラ群１１１の3種類のカメラ群からなる撮像システムにおけるカメラ配置の一例を示した図である。競技を行うフィールド２０１上にはオブジェクト２０２としての選手が存在している。そして、ズームカメラ群１０９を構成する12台のズームカメラ２０３、標準カメラ群１１０を構成する8台の標準カメラ２０４、広角カメラ群１１１を構成する4台の広角カメラ２０５がフィールド２０１を取り囲むように配置されている。各カメラ群を構成するカメラの台数については、ズームカメラ２０３＞標準カメラ２０４＞広角カメラ２０５の関係が成り立つ。また、ズームカメラ２０３がオブジェクト２０２を囲む径の長さrzと、標準カメラ２０４がオブジェクト２０２を囲む径の長さrsと、広角カメラ２０５がオブジェクト２０２を囲む径の長さrwとの間には、rw＞rs＞rzの関係が成り立つ。これは、標準カメラ２０４及び広角カメラ２０５がより広い領域を撮影可能とするためである。図２（ｂ）は、ズームカメラ２０３、標準カメラ２０４、広角カメラ２０５のフィールド２０１からの高さを示す図である。ズームカメラ２０３の高さhzと、標準カメラ２０４の高さhsと、広角カメラ２０５の高さhwとの間には、hw＞hs＞hzの関係が成り立つ。これも、標準カメラ２０４及び広角カメラ２０５がより広い領域を撮影するためである。 FIG. 2A is a diagram showing an example of camera arrangement in an imaging system consisting of three types of cameras, a zoom camera group 109, a standard camera group 110, and a wide-angle camera group 111, in a stadium where, for example, soccer is played. be. A player as an object 202 exists on the field 201 in which the competition is performed. Then, the 12 zoom cameras 203 constituting the zoom camera group 109, the eight standard cameras 204 constituting the standard camera group 110, and the four wide-angle cameras 205 constituting the wide-angle camera group 111 surround the field 201. Have been placed. Regarding the number of cameras constituting each camera group, the relationship of zoom camera 203> standard camera 204> wide-angle camera 205 holds. Also, between the diameter length rz where the zoom camera 203 surrounds the object 202, the diameter length rs where the standard camera 204 surrounds the object 202, and the diameter length rw where the wide-angle camera 205 surrounds the object 202. , Rw> rs> rz. This is because the standard camera 204 and the wide-angle camera 205 can capture a wider area. FIG. 2B is a diagram showing the heights of the zoom camera 203, the standard camera 204, and the wide-angle camera 205 from the field 201. The relationship hw> hs> hz holds between the height hz of the zoom camera 203, the height hs of the standard camera 204, and the height hw of the wide-angle camera 205. This is also because the standard camera 204 and the wide-angle camera 205 capture a wider area.

図３～図５は、カメラ群１０９～１１１それぞれの撮影領域を示す図である。まず、広角カメラ群１１１の撮影領域について説明する。図３に示すように、広角カメラ群１１１を構成する4台の広角カメラ２０５は、フィールド２０１の中心である広角注視点３１０を向き、フィールド２０１の全体を画角内に収めるように、等間隔に配置されている。このとき、4台の広角カメラ２０５の撮影領域が重複する領域を広角カメラ群撮影領域３０１とし、当該領域３０１内では、４台の広角カメラ２０５によって撮影された複数視点映像データを用いたオブジェクト２０２の形状推定が実施可能となる。なお、本実施形態では各カメラが等間隔に配置される場合の例を中心に説明するが、これに限らない。特に、スタジアムの形状などさまざまな事情を考慮してカメラ配置が決定されることがある。 3 to 5 are diagrams showing shooting areas of each of the camera groups 109 to 111. First, the shooting area of the wide-angle camera group 111 will be described. As shown in FIG. 3, the four wide-angle cameras 205 constituting the wide-angle camera group 111 face the wide-angle gazing point 310, which is the center of the field 201, and are evenly spaced so that the entire field 201 is within the angle of view. Is located in. At this time, the area where the shooting areas of the four wide-angle cameras 205 overlap is set as the wide-angle camera group shooting area 301, and in the area 301, the object 202 using the multi-viewpoint video data shot by the four wide-angle cameras 205. Shape estimation becomes possible. In this embodiment, an example in which the cameras are arranged at equal intervals will be mainly described, but the present invention is not limited to this. In particular, the camera layout may be decided in consideration of various circumstances such as the shape of the stadium.

次に、標準カメラ群１１０の撮影領域について説明する。図４に示すように、標準カメラ群１１０を構成する8台の標準カメラ２０４は、更に2つのグループAとBに分類されており、グループAは4台の標準カメラ２０４Aで、グループBは4台の標準カメラ２０４Bで構成される。グループAの標準カメラ２０４Aは標準注視点３１１Aを向き、フィールド２０１の特定部分（左半分）を画角内に収めるように設計されている。また、グループBの標準カメラ２０４Bは標準注視点３１１Bを向き、フィールド２０１の特定部分（右半分）を画角に収めるように設計されている。図４に示すとおり、各グループに属する標準カメラ２０４A又は２０４Bは、例えば選手の正面を撮影する確率の高い方向に密に配置され、それ以外の方向（例えば選手の背面や側面方向を撮影する確率の高い方向）に疎に配置されている。このように、フィールドや競技（イベント）等の特性に応じてカメラ配置の疎密を設定することにより、例えば、少ないカメラ台数であっても仮想視点画像に対するユーザの満足度を向上させることができる。ただし、標準カメラの配置を等間隔にしても構わない。いま、グループAに属する4台の標準カメラ２０４Aの撮影領域が重複する領域を標準カメラ群撮影領域３１１A、グループBに属する4台の標準カメラ２０４Bの撮影領域が重複する領域を標準カメラ群撮影領域３１１Bとする。標準カメラ群撮影領域３１１A内では、4台の標準カメラ２０４Aによって撮影された複数視点映像データを用いたオブジェクト２０２の形状推定が実施可能となる。同様に、標準カメラ群撮影領域３１１B内では、4台の標準カメラ２０４Bによって撮影された複数視点映像データを用いたオブジェクト２０２の形状推定が実施可能となる。 Next, the shooting area of the standard camera group 110 will be described. As shown in FIG. 4, the eight standard cameras 204 constituting the standard camera group 110 are further classified into two groups A and B, group A is four standard cameras 204A, and group B is four. It consists of a standard camera 204B. The standard camera 204A of Group A faces the standard gazing point 311A and is designed to fit a specific part (left half) of the field 201 within the angle of view. Further, the standard camera 204B of the group B is designed so as to face the standard gazing point 311B and to fit a specific portion (right half) of the field 201 within the angle of view. As shown in FIG. 4, the standard cameras 204A or 204B belonging to each group are densely arranged in a direction having a high probability of shooting the front of the player, for example, and the probability of shooting the back or side of the player in other directions (for example, the probability of shooting the back or side of the player). It is sparsely arranged in the high direction of). In this way, by setting the density of camera arrangement according to the characteristics of the field, competition (event), etc., it is possible to improve the user's satisfaction with the virtual viewpoint image even with a small number of cameras, for example. However, the standard cameras may be arranged at equal intervals. Now, the area where the shooting areas of the four standard cameras 204A belonging to group A overlap is the standard camera group shooting area 311A, and the area where the shooting areas of the four standard cameras 204B belonging to group B overlap is the standard camera group shooting area. It is set to 311B. Within the standard camera group shooting area 311A, it is possible to estimate the shape of the object 202 using the multi-viewpoint video data shot by the four standard cameras 204A. Similarly, in the standard camera group shooting area 311B, it is possible to estimate the shape of the object 202 using the multi-viewpoint video data shot by the four standard cameras 204B.

次に、ズームカメラ群１０９の撮影領域について説明する。図５に示すように、ズームカメラ群１０９を構成する16台のズームカメラ２０３は、更に4つのグループC、D、E、Fに分類される。具体的には、グループCは4台のズームカメラ２０３Cで、グループDは4台のズームカメラ２０３Dで、グループEは4台のズームカメラ２０３Eで、グループFは4台のズームカメラ２０３Fで構成される。そして、グループCのズームカメラ２０３Cはズーム注視点５１０Cを向き、フィールド２０１の特定部分（左上四分の一）を画角内に収めるように設計されている。また、グループDのズームカメラ２０３Dはズーム注視点５１０Dを向き、フィールド２０１の特定部分（左下四分の一）を画角に収めるように設計されている。グループEのズームカメラ２０３Eはズーム注視点５１０Eを向き、フィールド２０１の特定部分（右上四分の一）を画角内に収めるように設計されている。そして、グループFのズーム２０３Fカメラはズーム注視点５１０Fを向き、フィールド２０１の特定部分（右下四分の一）を画角に収めるよう設計されている。図５に示すとおり、各グループに属するズームカメラ２０３C～２０３Fは、選手の正面を撮影する確率の高い方向に密に配置され、選手の背面や側面方向を撮影する確率の高い方向に疎に配置されている。いま、各グループに属する4台のズームカメラ２０３C～２０３Fの撮影領域が重複する領域をそれぞれ、ズームカメラ群撮影領域５０１C、ズームカメラ群撮影領域５０１D、ズームカメラ群撮影領域５０１E、ズームカメラ群撮影領域５０１Fとする。ズームカメラ群撮影領域５０１C～５０１F内では、各4台のズームカメラ２０３C～２０３Fによって撮影された複数視点映像データを用いたオブジェクトの形状推定が実施可能となる。 Next, the shooting area of the zoom camera group 109 will be described. As shown in FIG. 5, the 16 zoom cameras 203 constituting the zoom camera group 109 are further classified into four groups C, D, E, and F. Specifically, Group C consists of four zoom cameras 203C, Group D consists of four zoom cameras 203D, Group E consists of four zoom cameras 203E, and Group F consists of four zoom cameras 203F. To. The zoom camera 203C of the group C is designed so as to face the zoom gazing point 510C and to fit a specific portion (upper left quarter) of the field 201 within the angle of view. Further, the zoom camera 203D of the group D is designed so as to face the zoom gazing point 510D and to fit a specific portion (lower left quarter) of the field 201 within the angle of view. The zoom camera 203E of the group E is designed to face the zoom gaze point 510E and to fit a specific part (upper right quarter) of the field 201 within the angle of view. The group F zoom 203F camera is designed to face the zoom gaze point 510F and fit a specific portion (lower right quarter) of the field 201 within the angle of view. As shown in FIG. 5, the zoom cameras 203C to 203F belonging to each group are densely arranged in a direction having a high probability of shooting the front of the player, and sparsely arranged in a direction having a high probability of shooting the back or side of the player. Has been done. Now, the areas where the shooting areas of the four zoom cameras 203C to 203F belonging to each group overlap are the zoom camera group shooting area 501C, the zoom camera group shooting area 501D, the zoom camera group shooting area 501E, and the zoom camera group shooting area, respectively. It is 501F. Within the zoom camera group shooting areas 501C to 501F, it is possible to estimate the shape of an object using multi-viewpoint video data shot by each of the four zoom cameras 203C to 203F.

なお、カメラ台数や位置、グループ数、注視点位置等は一例を示したのもので、撮影シーン等に応じて変更されるものである。例えば、本実施例ではグループ単位で注視点を同一としているが、同一グループに属する各カメラが一定間隔を空けて異なる注視点を向いていてもよい。その場合の間隔調整については実施例２で説明する。また、本実施形態では、ズームカメラ群１０９、標準カメラ群１１０、及び広角カメラ群１１１という３種類のカメラ群を有するカメラシステムについて説明しているが、これに限らない。例えば、標準カメラ群１１０と広角カメラ群１１１の２種類のカメラ群のみ有するようにしてもよいし、４種類以上のカメラ群を有するようにしてもよい。また、上記では、カメラ群ごとにカメラ数や、撮像範囲や、設置の高さが異なる例を示したが、これに限らず、すべてのカメラ群のカメラ数が同じであってもよいし、各カメラの撮影範囲が同じ広さであってもよいし、各カメラの設置の高さが同じであってもよい。また、各カメラ群のカメラ数、撮影範囲、及び設置の高さ以外の要素が、カメラ群によって異なるようにしてもよい。例えば、第１カメラ群に属する複数カメラの有効画素数が、第２カメラ群に属する複数カメラの有効画素数よりも高くなるようにシステムを構築してもよい。また、少なくとも１つのカメラ群に属するカメラが１台という場合もありうる。このように、本実施形態で説明しているシステムの構成は一例に過ぎず、スタジアムの広さ、設備、カメラ数、及び予算等の制約に応じて、種々の変形を行うことが可能である。 The number and position of cameras, the number of groups, the position of the gazing point, etc. are shown as an example, and are changed according to the shooting scene and the like. For example, in this embodiment, the gazing points are the same for each group, but each camera belonging to the same group may face different gazing points at regular intervals. The interval adjustment in that case will be described in the second embodiment. Further, in the present embodiment, a camera system having three types of camera groups, that is, a zoom camera group 109, a standard camera group 110, and a wide-angle camera group 111, is described, but the present invention is not limited to this. For example, it may have only two types of camera groups, a standard camera group 110 and a wide-angle camera group 111, or it may have four or more types of camera groups. Further, in the above, an example is shown in which the number of cameras, the imaging range, and the height of installation are different for each camera group, but the number of cameras may be the same for all camera groups. The shooting range of each camera may be the same, or the height of installation of each camera may be the same. Further, factors other than the number of cameras, the shooting range, and the height of installation of each camera group may be different depending on the camera group. For example, the system may be constructed so that the number of effective pixels of the plurality of cameras belonging to the first camera group is higher than the number of effective pixels of the plurality of cameras belonging to the second camera group. Further, there may be one camera belonging to at least one camera group. As described above, the configuration of the system described in the present embodiment is only an example, and various modifications can be made according to the restrictions such as the size of the stadium, the equipment, the number of cameras, and the budget. ..

図６は、画像処理装置１００において仮想視点画像が生成されるまでの全体の流れを示したフローチャートである。この一連の処理は、CPU１０１が、所定のプログラムを記憶部１０３から読み込んでメインメモリ１０２に展開し、これをCPU１０１が実行することで実現される。 FIG. 6 is a flowchart showing the entire flow until the virtual viewpoint image is generated in the image processing device 100. This series of processes is realized by the CPU 101 reading a predetermined program from the storage unit 103, expanding it into the main memory 102, and executing this by the CPU 101.

ステップ６０１では、各カメラ群１０９～１１１に対して、撮影時の露光条件等の撮影パラメータと撮影開始の信号が送信される。各カメラ群に属する各カメラは、受信した撮影パラメータに従って撮影を開始し、得られた映像データを各カメラ内のメモリに保持する。 In step 601, shooting parameters such as exposure conditions at the time of shooting and a signal for starting shooting are transmitted to each camera group 109 to 111. Each camera belonging to each camera group starts shooting according to the received shooting parameters, and holds the obtained video data in the memory in each camera.

ステップ６０２では、広角カメラ群１１１に属するすべての広角カメラ２０５によって撮影された複数視点映像データが取得される。取得された複数視点（ここでは4視点）の広角映像データは、メインメモリ１０２に展開される。前述のとおり、広角カメラ群１１１に属する広角カメラ２０５の数は他のカメラ群に属するカメラと比べて少ないため、各広角カメラ２０５からの映像データの転送に要する時間は短時間で済む。 In step 602, the multi-viewpoint video data captured by all the wide-angle cameras 205 belonging to the wide-angle camera group 111 is acquired. The acquired wide-angle video data of a plurality of viewpoints (here, four viewpoints) is expanded in the main memory 102. As described above, since the number of wide-angle cameras 205 belonging to the wide-angle camera group 111 is smaller than that of cameras belonging to other camera groups, the time required for transferring video data from each wide-angle camera group 205 can be shortened.

ステップ６０３では、広角カメラ群１１１から取得した複数視点映像データを用いてオブジェクトの3次元形状の推定処理が実行される。推定手法としては、オブジェクトの輪郭情報を用いるVisual-hull手法や、三角測量を用いたMulti-view stereo手法など公知の手法を適用すればよい。広角カメラ２０５によって撮影された映像データ内のオブジェクト領域は比較的に低解像度である。そのため、本ステップの形状推定によって得られる3次元形状データは低精細で粗いが、フィールド全体に存在するオブジェクトの形状を高速に推定できる。得られたオブジェクト形状データはその位置情報と共にメインメモリ１０２に保持される。 In step 603, the estimation process of the three-dimensional shape of the object is executed using the multi-viewpoint video data acquired from the wide-angle camera group 111. As the estimation method, a known method such as a Visual-hull method using the contour information of the object or a Multi-view stereo method using triangulation may be applied. The object area in the video data captured by the wide-angle camera 205 has a relatively low resolution. Therefore, although the 3D shape data obtained by the shape estimation in this step is low-definition and coarse, the shape of the object existing in the entire field can be estimated at high speed. The obtained object shape data is held in the main memory 102 together with the position information.

ステップ６０４では、推定した低精細のオブジェクト形状データに基づき、自由始点映像の生成に必要な仮想カメラの移動経路といった各種パラメータが設定される。本実施例では、GUI（グラフィカル・ユーザ・インタフェース）を介したユーザ入力に基づいて、各種項目の値等が設定される。図７（ａ）及び（ｂ）は、仮想カメラに関するパラメータ設定用GUI画面の一例を示す図である。図７（ａ）に示すGUI画面７００内の左側には、フィールド２０１を含む撮影空間全体の俯瞰図（フィールドマップ７０１）上に、広角カメラ群撮影領域３０１が表示されている。広角カメラ群撮影領域３０１上には、ステップ６０３で取得したオブジェクトの3次元形状７０２がマッピングされる。ユーザは、オブジェクト２０２の位置やオブジェクト２０２の向いている方向などを、マッピングされたオブジェクト3次元形状７０２によって確認可能である。ユーザは、仮想カメラパス設定ボタン（不図示）を押下した上で、広角カメラ群撮影領域３０１上でマウス等を操作してカーソル７０３を移動することで、その移動軌跡を仮想カメラパス７０４として指定することができる。このとき指定される仮想カメラパスのフィールド２０１からの高さはデフォルト値（例えば、15ｍ）となる。そして、仮想カメラパス７０４を指定した後、ユーザは、高さ編集ボタン（不図示）を押下して、指定した仮想カメラパスの高さを変更することができる。具体的には、広角カメラ群撮影領域３０１上に表示されている仮想カメラパス上の任意の位置（座標）にカーソル７０３を移動し、マウス等のクリック操作を行うことによって、高度を変更したい仮想カメラの位置（高さ編集点）を指定する。いま、広角カメラ群撮影領域３０１内に×印で示される箇所が、ユーザによって指定された高さ編集点を示している。この高さ編集点は複数個設定することが可能である。図７（ａ）の例では、P1とP2の2つの高さ編集点が設定されている。高さ編集点が設定されると、GUI画面７００内の右側に高さ設定ウィンドウ７０５が表示される。ユーザは高さ設定ウィンドウ７０５内の各編集点に対応する入力欄７０６に任意の値（単位：m）を入力することによって当該位置における仮想カメラの高さを変更することができる。この場合、高さ編集点によって高度が変更された箇所以外の高さは、近接する位置の高さ編集点又はデフォルト値から補間して、高さが急激に変化しないように調整される。仮想カメラパスを指定したユーザは、次に、タイムフレーム設定ボタン（不図示）を押下して、当該仮想カメラパスを仮想カメラが通過するのに要する時間（移動速度）を設定する。具体的には、タイムフレーム設定ボタンの押下に応答してGUI画面７００内の右側にタイムフレーム設定ウィンドウ７０７が表示され、入力欄７０８（項目：t）に移動にかかる時間、入力欄７０９（項目：fps）にフレームレートの各値を入力する。時間とフレームレートが入力されると、生成する仮想視点画像のフレーム数が計算され、表示欄７１０（項目：frame）に表示される。図７（ａ）の例では、入力欄７０８に入力された時間が2[s]で、入力欄７０９に入力されたフレームレートが60[fps]であるので、120フレーム分の仮想視点からみた画像（以下、仮想視点画像）を生成することになる。このとき算出されたフレーム数は“F_Max”としてメインメモリ102上に保持される。さらに、ユーザは、指定した仮想カメラパス上での仮想カメラの向く方向を決めるため、注視点設定ボタン（不図示）を押下して、仮想カメラの注視点位置を設定する。具体的には、広角カメラ群撮影領域３０１内に表示されている仮想カメラパス上の任意の位置（座標）にカーソル７０３を移動し、マウス等のクリック操作を行うことによって、注視点を設定する対象の仮想カメラ位置（注視点設定点）を指定する。高さ編集点と同様、注視点設定点も複数個設定することが可能である。注視点設定点が設定されると、それらと対になった現時点の注視点の位置が自動的に表示される。このときの注視点位置は、例えばボールを持った選手など予め決められた注目オブジェクトの位置となる。図７（ｂ）において、△印で示される箇所がユーザによって指定された注視点設定点（仮想カメラ位置）、☆印で示される箇所が対応する注視点位置である。図７（ｂ）の例では、C1とC2の2つの注視点設定点が設定され、C1に対応する注視点としてT1、C2に対応する注視点としてT2が表示されている。注視点設定点が設定されると、GUI画面７００内の右側に注視点設定ウィンドウ７１１が表示される。ユーザは注視点設定ウィンドウ７１１内の各設定点に対応する入力欄７１２に任意の座標（x,y,z）を入力することによって注視点設定点における仮想カメラが注視する位置を変更することができる。そして、注視点が変更された箇所以外の注視点は、近接する位置の注視点設定点又はデフォルトの注視点から補間して、注視点が急激に変化しないように調整される。以上のようにして、仮想カメラに関するパラメータが設定される。 In step 604, various parameters such as the movement path of the virtual camera necessary for generating the free start point image are set based on the estimated low-definition object shape data. In this embodiment, values of various items and the like are set based on user input via a GUI (graphical user interface). 7 (a) and 7 (b) are diagrams showing an example of a GUI screen for setting parameters related to a virtual camera. On the left side of the GUI screen 700 shown in FIG. 7A, a wide-angle camera group shooting area 301 is displayed on a bird's-eye view (field map 701) of the entire shooting space including the field 201. The three-dimensional shape 702 of the object acquired in step 603 is mapped on the wide-angle camera group shooting area 301. The user can confirm the position of the object 202, the direction in which the object 202 is facing, and the like by the mapped object three-dimensional shape 702. The user presses the virtual camera path setting button (not shown) and then operates the mouse or the like on the wide-angle camera group shooting area 301 to move the cursor 703, thereby designating the movement trajectory as the virtual camera path 704. can do. The height of the virtual camera path specified at this time from the field 201 is a default value (for example, 15 m). Then, after designating the virtual camera path 704, the user can press the height edit button (not shown) to change the height of the designated virtual camera path. Specifically, the virtual whose altitude is to be changed by moving the cursor 703 to an arbitrary position (coordinates) on the virtual camera path displayed on the wide-angle camera group shooting area 301 and performing a click operation with a mouse or the like. Specify the camera position (height edit point). Now, the part indicated by the cross in the wide-angle camera group shooting area 301 indicates the height edit point specified by the user. It is possible to set a plurality of height edit points. In the example of FIG. 7A, two height edit points P1 and P2 are set. When the height edit point is set, the height setting window 705 is displayed on the right side of the GUI screen 700. The user can change the height of the virtual camera at the position by inputting an arbitrary value (unit: m) in the input field 706 corresponding to each editing point in the height setting window 705. In this case, the height other than the place where the altitude is changed by the height editing point is interpolated from the height editing point at the adjacent position or the default value, and the height is adjusted so as not to change suddenly. The user who has specified the virtual camera path then presses the time frame setting button (not shown) to set the time (moving speed) required for the virtual camera to pass through the virtual camera path. Specifically, in response to pressing the time frame setting button, the time frame setting window 707 is displayed on the right side of the GUI screen 700, and the time required to move to the input field 708 (item: t), the input field 709 (item). : Fps) Enter each value of the frame rate. When the time and frame rate are input, the number of frames of the generated virtual viewpoint image is calculated and displayed in the display field 710 (item: frame). In the example of FIG. 7A, the time input to the input field 708 is 2 [s] and the frame rate input to the input field 709 is 60 [fps], so that it is viewed from a virtual viewpoint for 120 frames. An image (hereinafter referred to as a virtual viewpoint image) will be generated. The number of frames calculated at this time is held in the main memory 102 as “F_Max”. Further, the user presses the gaze point setting button (not shown) to set the gaze point position of the virtual camera in order to determine the direction in which the virtual camera faces on the designated virtual camera path. Specifically, the gazing point is set by moving the cursor 703 to an arbitrary position (coordinates) on the virtual camera path displayed in the wide-angle camera group shooting area 301 and performing a click operation with a mouse or the like. Specify the target virtual camera position (gaze point setting point). As with the height edit points, it is possible to set multiple gaze point setting points. When the gaze point setting points are set, the current gaze point position paired with them is automatically displayed. The gazing point position at this time is a predetermined position of the object of interest such as a player holding the ball. In FIG. 7B, the points indicated by Δ are the gaze point setting points (virtual camera positions) designated by the user, and the points indicated by ☆ are the corresponding gaze points. In the example of FIG. 7B, two gaze point setting points C1 and C2 are set, and T1 is displayed as the gaze point corresponding to C1 and T2 is displayed as the gaze point corresponding to C2. When the gaze point setting point is set, the gaze point setting window 711 is displayed on the right side of the GUI screen 700. The user can change the position of the virtual camera at the gaze point setting point by inputting arbitrary coordinates (x, y, z) in the input field 712 corresponding to each setting point in the gaze point setting window 711. can. Then, the gazing points other than the place where the gazing point is changed are interpolated from the gazing point setting point at the adjacent position or the default gazing point, and adjusted so that the gazing point does not change suddenly. As described above, the parameters related to the virtual camera are set.

ステップ６０５では、ステップ６０４で設定されたフレーム数分の仮想視点画像を生成するため、変数Fの格納領域がメインメモリ１０２に確保され、初期値として“0”が設定される。そして、続くステップ６０６で、設定された仮想カメラパラメータに従って、Fフレーム目の仮想視点画像が生成される。仮想視点画像生成処理の詳細に関しては後で詳述する。 In step 605, in order to generate virtual viewpoint images for the number of frames set in step 604, a storage area for the variable F is secured in the main memory 102, and “0” is set as an initial value. Then, in the following step 606, the virtual viewpoint image of the F frame is generated according to the set virtual camera parameters. The details of the virtual viewpoint image generation process will be described in detail later.

ステップ６０７では、変数Fの値がインクリメント（＋１）される。そして、ステップ６０８で、変数Fの値が、上述のF_Maxよりも大きいか否かが判定される。判定の結果、変数Fの値がF_Maxよりも大きい場合は、設定されたフレーム数分の仮想視点画像が生成されたこと（すなわち、設定されたタイムフレームに対応する仮想視点画像の完成）を意味するのでステップ６０９に進む。一方、変数Fの値がF_Max以下の場合はステップ６０６に戻って、次のフレームの仮想視点画像生成処理が実行される。 In step 607, the value of the variable F is incremented (+1). Then, in step 608, it is determined whether or not the value of the variable F is larger than the above-mentioned F_Max. As a result of the judgment, if the value of the variable F is larger than F_Max, it means that the virtual viewpoint images corresponding to the set number of frames have been generated (that is, the virtual viewpoint image corresponding to the set time frame is completed). Therefore, the process proceeds to step 609. On the other hand, when the value of the variable F is F_Max or less, the process returns to step 606 and the virtual viewpoint image generation process of the next frame is executed.

ステップ６０９では、仮想カメラパラメータの設定を変更して新たな仮想視点画像を生成するかどうかが判定される。この処理は、プレビューボタン（不図示）を押下すると表示されるプレビューウィンドウ７１３に表示された仮想視点画像を見て、その画質等を確認したユーザからの指示に基づいてなされる。ユーザが仮想視点画像を生成し直したいと考えた場合は、仮想カメラパス設定ボタン等を再び押下し、改めて仮想カメラに関するパラメータ設定を行なう（ステップ６０４に戻る。）。そして、新たに設定された仮想カメラパラメータに従った内容で仮想視点画像が生成される。一方、生成された仮想視点画像に問題がなければ本処理を終える。以上が、本実施例に係る、仮想視点画像が生成されるまでの大まかな流れである。 In step 609, it is determined whether or not to change the setting of the virtual camera parameter to generate a new virtual viewpoint image. This process is performed based on an instruction from a user who has confirmed the image quality and the like by looking at the virtual viewpoint image displayed in the preview window 713 displayed when the preview button (not shown) is pressed. When the user wants to regenerate the virtual viewpoint image, he / she presses the virtual camera path setting button or the like again, and sets the parameters related to the virtual camera again (returns to step 604). Then, a virtual viewpoint image is generated with the contents according to the newly set virtual camera parameters. On the other hand, if there is no problem with the generated virtual viewpoint image, this process ends. The above is a rough flow until the virtual viewpoint image is generated according to this embodiment.

続いて、前述のステップ６０６における仮想視点画像生成処理について詳しく説明する。図８は、本実施例に係る、仮想視点画像生成処理の詳細を示すフローチャートである。以下、図８のフローに沿って詳しく説明する。 Subsequently, the virtual viewpoint image generation process in step 606 described above will be described in detail. FIG. 8 is a flowchart showing the details of the virtual viewpoint image generation process according to the present embodiment. Hereinafter, a detailed description will be given along with the flow of FIG.

ステップ８０１では、前述のステップ６０５で設定した仮想カメラパスを基に、処理対象の注目フレームFiにおける仮想カメラ位置と注視点位置をそれぞれ取得する。続くステップ８０２では、取得した仮想カメラ位置と注視点位置から、注目フレームFiの仮想カメラの撮影領域Vrを導出する。図９は、仮想カメラの撮影領域の導出方法を説明する図である。図９において、仮想カメラ９０１から注視点９０２に向けて四角錐を形成し、四角錐とフィールド２０１との交面である矩形領域９０３が仮想カメラ撮影領域Vrとなる。そして、ステップ８０３では、ステップ８０１で取得した注視点位置に最も近いオブジェクトを検出し、最近傍オブジェクトとして設定する。図９において、符号９０４が最近傍オブジェクトを示している。 In step 801 are acquired the virtual camera position and the gazing point position in the frame Fi of interest to be processed, respectively, based on the virtual camera path set in step 605 described above. In the following step 802, the shooting area Vr of the virtual camera of the frame Fi of interest is derived from the acquired virtual camera position and gaze point position. FIG. 9 is a diagram illustrating a method of deriving a shooting area of a virtual camera. In FIG. 9, a quadrangular pyramid is formed from the virtual camera 901 toward the gazing point 902, and the rectangular region 903 that is the intersection of the quadrangular pyramid and the field 201 becomes the virtual camera shooting region Vr. Then, in step 803, the object closest to the gazing point position acquired in step 801 is detected and set as the nearest neighbor object. In FIG. 9, reference numeral 904 indicates the nearest neighbor object.

ステップ８０４では、設定された仮想カメラにおける最近傍オブジェクトの解像度合いを算出する。具体的には、注目フレームFiの仮想カメラからみた暫定的な仮想視点画像（広角カメラ２０５の複数視点映像データのみに基づく仮想視点画像）における最近傍オブジェクトが占める領域の割合Rを求める。この割合Rは、上記暫定的な仮想視点画像における最近傍オブジェクト領域の画素数を、当該画像全体の総画素数で割った値であり、例えば0.3といったような0～1の範囲の値となる。なお、本実施形態では、最近傍オブジェクトの解像度合いに基づいて暫定的な仮想視点画像を評価する例を中心に説明するが、最近傍オブジェクトに加えて、もしくは、最近傍オブジェクトに代えて、別のオブジェクトの解像度合いを評価するようにしてもよい。別のオブジェクトの例として、例えば、視聴者が選択したオブジェクト（例えば特定の選手）、暫定的な仮想視点画像の中心に最も近いオブジェクト、及び、正面を向いているオブジェクト（複数存在する場合にはその中で最も仮想カメラに近いオブジェクト）などが挙げられる。また、暫定的な仮想視点画像の評価のために参照するオブジェクトは、１つに限らず、複数であってもよい。 In step 804, the resolution of the nearest neighbor object in the set virtual camera is calculated. Specifically, the ratio R of the area occupied by the nearest neighbor object in the provisional virtual viewpoint image (virtual viewpoint image based only on the multi-viewpoint video data of the wide-angle camera 205) seen from the virtual camera of the frame Fi of interest is obtained. This ratio R is a value obtained by dividing the number of pixels of the nearest neighbor object area in the provisional virtual viewpoint image by the total number of pixels of the entire image, and is a value in the range of 0 to 1 such as 0.3. .. In this embodiment, an example of evaluating a provisional virtual viewpoint image based on the resolution of the nearest neighbor object will be mainly described, but it is different in addition to the nearest neighbor object or in place of the nearest neighbor object. You may want to evaluate the resolution of the object. Examples of other objects are, for example, the object selected by the viewer (eg, a particular player), the object closest to the center of the provisional virtual viewpoint image, and the object facing forward (if more than one). Among them, the object closest to the virtual camera) and so on. Further, the number of objects referred to for the evaluation of the provisional virtual viewpoint image is not limited to one, and may be plural.

ステップ８０５では、最近傍オブジェクトが標準カメラ群撮影領域内に存在するかどうかがそれぞれの位置座標に基づいて判定される。この場合において、最近傍オブジェクトの位置情報は前述のステップ６０３において導出されRAMメインメモリ１０２に保持されたものを用い、標準カメラ群撮影領域の位置情報は予め記憶部１０３に保持されたものを用いる。最近傍オブジェクトが標準カメラ群撮影領域内に存在する場合はステップ８０６に進む。一方、存在しない場合はステップ８１３に進み、広角カメラ群の複数視点映像データに基づく低精細のオブジェクト形状データを用いたレンダリングが実行されることになる。本実施例の場合、標準カメラ群撮影領域A又はBのいずれかに最近傍オブジェクトが含まれていればステップ８０６に進むことになる。 In step 805, it is determined whether or not the nearest neighbor object exists in the standard camera group shooting area based on the respective position coordinates. In this case, the position information of the nearest object is derived in step 603 described above and is stored in the RAM main memory 102, and the position information of the standard camera group shooting area is stored in the storage unit 103 in advance. .. If the nearest neighbor object is in the standard camera group shooting area, the process proceeds to step 806. On the other hand, if it does not exist, the process proceeds to step 813, and rendering using low-definition object shape data based on the multi-viewpoint video data of the wide-angle camera group is executed. In the case of this embodiment, if the nearest neighbor object is included in either the standard camera group shooting area A or B, the process proceeds to step 806.

ステップ８０６では、暫定的な仮想視点画像における最近傍オブジェクトの解像度合いを表す割合Rが、第1の閾値Rsよりも大きいかどうかを判定する。ここで第1の閾値Rsは、最近傍オブジェクトが含まれていると判定された標準カメラ群に属するいずれかの標準カメラ２０４の撮影画像を取得し、当該撮影画像における上記最近傍オブジェクト領域の画素数を、その総画素数で割ることで得られる。これにより、注目フレームFiの仮想カメラと標準カメラ間での最近傍オブジェクトの解像度合いの比較が可能となる。図１０（ａ）は、本ステップにおける判定内容を視覚的に表現した図であり、この場合は暫定的な仮想視点画像における最近傍オブジェクトの解像度合いの方が高い（割合Rの値が大きい）と判定されることになる。判定の結果、算出した割合Rの値が閾値Rsよりも大きい場合はステップ８０７に進む。一方、算出した割合Rの値が閾値Rs以下の場合はステップ８１３に進み、広角カメラ群の複数視点映像データに基づき生成された低精細のオブジェクト形状データを用いたレンダリングが実行される。なお、ステップ８０６の判定方法には種々の変形例が存在する。例えば、割合Rが閾値Rsよりも所定の閾値以上大きい場合にステップ８０７に進み、そうでない場合はステップ８１３に進むようにしてもよい。 In step 806, it is determined whether or not the ratio R representing the resolution of the nearest neighbor object in the provisional virtual viewpoint image is larger than the first threshold value Rs. Here, the first threshold Rs is a pixel of the nearest object region in the captured image obtained by acquiring an image captured by any of the standard cameras 204 belonging to the standard camera group determined to include the nearest object. It is obtained by dividing the number by the total number of pixels. This makes it possible to compare the resolution of the nearest neighbor object between the virtual camera of the frame Fi of interest and the standard camera. FIG. 10A is a diagram that visually represents the determination content in this step. In this case, the resolution of the nearest neighbor object in the provisional virtual viewpoint image is higher (the value of the ratio R is larger). Will be determined. As a result of the determination, if the calculated value of the ratio R is larger than the threshold value Rs, the process proceeds to step 807. On the other hand, when the value of the calculated ratio R is equal to or less than the threshold value Rs, the process proceeds to step 813, and rendering is executed using the low-definition object shape data generated based on the multi-viewpoint video data of the wide-angle camera group. There are various modifications in the determination method of step 806. For example, if the ratio R is larger than the threshold value Rs by a predetermined threshold value or more, the process may proceed to step 807, and if not, the process may proceed to step 813.

ステップ８０７では、上述のステップ８０５と同様、最近傍オブジェクトがズームカメラ群撮影領域内に存在するかどうかがそれぞれの位置座標に基づいて判定される。この場合において、ズームカメラ群撮影領域の位置情報も予め記憶部１０３に保持されている。最近傍オブジェクトがズームカメラ群撮影領域内に存在する場合はステップ８０８に進み、存在しない場合はステップ８１０に進む。本実施例の場合、ズーム群撮影領域C～Fのいずれかに最近傍オブジェクトが含まれていればステップ８０８に進むことになる。 In step 807, as in step 805 described above, it is determined whether or not the nearest neighbor object exists in the zoom camera group shooting area based on the respective position coordinates. In this case, the position information of the zoom camera group shooting area is also stored in the storage unit 103 in advance. If the nearest neighbor object exists in the zoom camera group shooting area, the process proceeds to step 808, and if it does not exist, the process proceeds to step 810. In the case of this embodiment, if the nearest neighbor object is included in any of the zoom group shooting areas C to F, the process proceeds to step 808.

ステップ８０８では、暫定的な仮想視点画像における最近傍オブジェクトの解像度合いを表す割合Rが、第2の閾値Rzよりも大きいかどうかを判定する。ここで第2の閾値Rzは、最近傍オブジェクトが含まれていると判定されたズームカメラ群に属するいずれかのズームカメラ２０３の撮影画像を取得し、当該撮影画像における最近傍オブジェクト領域の画素数を、その総画素数で割ることで得られる。これにより、注目フレームFiの仮想カメラとズームカメラ間での最近傍オブジェクトの解像度合いの比較が可能となる。図１０（ｂ）は、本ステップにおける判定内容を視覚的に表現した図であり、ここでも暫定的な仮想視点画像における最近傍オブジェクトの解像度合いの方が高い（割合Rの値が大きい）と判定されることになる。判定の結果、算出した割合Rの値が閾値Rzよりも大きい場合はステップ８０９に進む。一方、算出した割合Rの値が閾値Rz以下の場合はステップ８１０に進む。 In step 808, it is determined whether or not the ratio R representing the resolution of the nearest neighbor object in the provisional virtual viewpoint image is larger than the second threshold value Rz. Here, the second threshold Rz is the number of pixels of the nearest object region in the captured image obtained by acquiring the captured image of any of the zoom cameras 203 belonging to the zoom camera group determined to include the nearest object. Is obtained by dividing by the total number of pixels. This makes it possible to compare the resolution of the nearest neighbor object between the virtual camera of the frame Fi of interest and the zoom camera. FIG. 10B is a diagram that visually expresses the determination content in this step, and again, it is said that the resolution of the nearest neighbor object in the provisional virtual viewpoint image is higher (the value of the ratio R is larger). It will be judged. As a result of the determination, if the calculated value of the ratio R is larger than the threshold value Rz, the process proceeds to step 809. On the other hand, if the calculated value of the ratio R is equal to or less than the threshold value Rz, the process proceeds to step 810.

ステップ８０９では、注目フレームFiの仮想カメラ撮影領域Vrにおけるオブジェクト形状の高精細な推定（再推定）に用いる複数視点映像データを、最近傍オブジェクトが存在すると判定されたズームカメラ群撮影領域に対応するズームカメラ群から取得する。取得した複数視点映像データは、メインメモリ１０２に展開される。また、ステップ８１０では、注目フレームFiの仮想カメラ撮影領域Vrにおけるオブジェクト形状の再推定（高精細）に用いる複数視点映像データを、最近傍オブジェクトが存在すると判定された標準カメラ群撮影領域に対応する標準カメラ群から取得する。取得した複数視点映像データは、メインメモリ１０２に展開される。 In step 809, the multi-viewpoint video data used for high-definition estimation (re-estimation) of the object shape in the virtual camera shooting area Vr of the frame Fi of interest corresponds to the zoom camera group shooting area where it is determined that the nearest object exists. Obtained from the zoom camera group. The acquired multi-viewpoint video data is expanded in the main memory 102. Further, in step 810, the multi-viewpoint video data used for re-estimating the object shape (high definition) in the virtual camera shooting area Vr of the frame Fi of interest corresponds to the standard camera group shooting area where it is determined that the nearest object exists. Obtained from the standard camera group. The acquired multi-viewpoint video data is expanded in the main memory 102.

ステップ８１１では、メインメモリ１０２に展開された複数視点映像データを用いて、オブジェクト形状の再推定処理が実行される。これにより、前述のステップ６０３で得られたオブジェクト形状データよりも高精細なオブジェクト形状データが取得される。そして、ステップ８１２では、前述のステップ６０３の形状推定で得られた低精細のオブジェクト形状データを、ステップ８１１の形状推定で得られた高精細のオブジェクト形状データに置換する。 In step 811, the object shape re-estimation process is executed using the multi-viewpoint video data expanded in the main memory 102. As a result, higher-definition object shape data than the object shape data obtained in step 603 described above is acquired. Then, in step 812, the low-definition object shape data obtained by the shape estimation in step 603 is replaced with the high-definition object shape data obtained by the shape estimation in step 811.

ステップ８１３では、ステップ８１２までの処理で決まったオブジェクト形状データとコンピュータグラフィックスにおけるレンダリング手法を用いて、注目フレームFiの仮想カメラから見た画像である仮想視点画像を生成する。 In step 813, a virtual viewpoint image which is an image seen from the virtual camera of the frame Fi of interest is generated by using the object shape data determined by the processes up to step 812 and the rendering method in computer graphics.

以上が、本実施例に係る仮想視点画像生成処理の内容である。なお、オブジェクト形状の再推定を行って、より高精細なオブジェクト形状データを取得するかどうかの判定において、本実施例では暫定的な仮想視点画像における最近傍オブジェクトの解像度合いを指標として用いたがこれに限定されない。例えば、最近傍オブジェクトと仮想カメラとの距離を指標とし、最近傍オブジェクトと仮想カメラ位置との距離が、最近傍オブジェクトとズームカメラ位置或いは標準カメラ位置との距離よりも遠い場合に、再推定を行うようにしてもよい。また、上述の実施形態では、暫定的な仮想視点画像の解像度合い（具体的には、最近傍オブジェクトの画素数を暫定的な仮想視点画像の画素数で割ることで得られる割合R）と、閾値（具体的には、標準カメラ群に属するカメラにより得られる撮像画像における最近傍オブジェクトの画素数を当該撮像画像の画素数で割ることで得られる閾値Rs）との比較結果に基づいて、より高画質な仮想視点画像を生成すべきか否かを判定する例を中心に説明した。しかしながら、この判定方法に限るものではなく、種々の変形が可能である。例えば、暫定的な仮想視点画像における割合Rが所定の閾値よりも大きければ（つまり仮想視点画像におけるオブジェクトのサイズが閾値より大きければ）、閾値Rsに関わらず高画質な仮想視点画像を生成すべきと判定されるようにしてもよい。また、他の方法としては、暫定的な仮想視点画像における最近傍オブジェクトの画質を評価し、その評価結果に応じて高画質な仮想視点画像を生成すべきか判定するようにしてもよい。最近傍オブジェクトの画質の評価方法としては、例えば、オブジェクトが正面を向いた人間であれば、顔の認識結果に基づいて評価する方法を用いてもよいし、オブジェクトのエッジの鮮明度合いに基づいて評価する方法を用いてもよい。これらの判定方法を用いれば、標準カメラ群の撮影画像を用いた判定方法よりも簡易な判定が可能となる。その他の変形例については以下で述べる。 The above is the content of the virtual viewpoint image generation process according to this embodiment. In this embodiment, the resolution of the nearest neighbor object in the provisional virtual viewpoint image is used as an index in determining whether to re-estimate the object shape and acquire higher-definition object shape data. Not limited to this. For example, using the distance between the nearest object and the virtual camera as an index, re-estimation is performed when the distance between the nearest object and the virtual camera position is farther than the distance between the nearest object and the zoom camera position or the standard camera position. You may do it. Further, in the above-described embodiment, the resolution of the provisional virtual viewpoint image (specifically, the ratio R obtained by dividing the number of pixels of the nearest object by the number of pixels of the provisional virtual viewpoint image) and Based on the comparison result with the threshold value (specifically, the threshold Rs obtained by dividing the number of pixels of the nearest object in the captured image obtained by a camera belonging to the standard camera group by the number of pixels of the captured image). The explanation focused on an example of determining whether or not to generate a high-quality virtual viewpoint image. However, the present invention is not limited to this determination method, and various modifications are possible. For example, if the ratio R in the provisional virtual viewpoint image is larger than a predetermined threshold value (that is, if the size of the object in the virtual viewpoint image is larger than the threshold value), a high-quality virtual viewpoint image should be generated regardless of the threshold value Rs. It may be determined that. Alternatively, as another method, the image quality of the nearest neighbor object in the provisional virtual viewpoint image may be evaluated, and it may be determined whether or not a high quality virtual viewpoint image should be generated according to the evaluation result. As a method of evaluating the image quality of the nearest neighbor object, for example, if the object is a human facing the front, a method of evaluating based on the recognition result of the face may be used, or based on the sharpness of the edge of the object. The evaluation method may be used. If these determination methods are used, it is possible to make a simpler determination than the determination method using the captured image of the standard camera group. Other modifications will be described below.

＜変形例＞
上述の実施例では、最初に広角カメラ群の複数視点映像データを用いて低精細のオブジェクト3次元形状を取得し、その後に仮想カメラパスに応じて標準又はズームカメラ群の複数視点映像データを用いて高精細のオブジェクト3次元形状を再取得して仮想視点画像を生成していた。しかしこれに限らない。例えば、広角カメラ群の複数視点映像データを用いた低精彩の3次元形状推定に代えて、オブジェクトを平面と見做した2次元の形状推定を行ってもよい（ビルボード法）。ビルボード法の場合は、前述のステップ６０３において、図１１に示すフローを実行する。以下、詳しく説明する。 <Modification example>
In the above embodiment, the low-definition object 3D shape is first acquired using the multi-viewpoint video data of the wide-angle camera group, and then the multi-viewpoint video data of the standard or zoom camera group is used according to the virtual camera path. The high-definition object 3D shape was reacquired to generate a virtual viewpoint image. However, it is not limited to this. For example, instead of the low-definition three-dimensional shape estimation using the multi-viewpoint video data of the wide-angle camera group, the two-dimensional shape estimation in which the object is regarded as a plane may be performed (Billboard method). In the case of the billboard method, the flow shown in FIG. 11 is executed in step 603 described above. Hereinafter, it will be described in detail.

ステップ１１０１では、フィールド２０１上のオブジェクト位置が特定される。図１２は、オブジェクト位置の特定方法を説明する図である。図１２において、（ａ）の広角カメラ画像_1及び（ｂ）の広角カメラ画像_2は、それぞれ異なる広角カメラ２０５によって撮影された画像であり、それぞれに1本のライン１２０１とオブジェクト１２０２が写っている。そして、図１２（ｃ）は、広角カメラ画像_1と広角カメラ画像_2に対して、フィールド面を基準として射影変換を行って合成した、射影変換後合成画像である。射影変換後合成画像では、ライン１２０１は1本のままであるが、オブジェクト１２０２は2つに分離しているのが分かる。この特性を利用して、分離の基点となっている×印で示す位置がオブジェクト位置１２０３として特定される。 In step 1101, the object position on the field 201 is specified. FIG. 12 is a diagram illustrating a method of specifying an object position. In FIG. 12, the wide-angle camera image _1 of (a) and the wide-angle camera image _2 of (b) are images taken by different wide-angle cameras 205, and one line 1201 and an object 1202 are shown in each image. ing. FIG. 12 (c) is a composite image after the projection transformation, which is a composite image obtained by performing a projective transformation on the wide-angle camera image _1 and the wide-angle camera image _2 with the field plane as a reference. In the composite image after the projective transformation, it can be seen that the line 1201 remains one line, but the object 1202 is separated into two lines. Utilizing this characteristic, the position indicated by the cross, which is the base point of separation, is specified as the object position 1203.

ステップ１１０２では、特定されたオブジェクト位置に平板が設置される。そして、続くステップ１１０３では、設置した平板に対して、広角カメラ２０５の撮影画像から切り出したオブジェクトの部分画像が投影される。図１３は、平板にオブジェクトの部分画像が投影された状態を示す図である。フィールド２０１上に存在するオブジェクトの数の分だけ設置された平板１３００に、各オブジェクトの部分画像が投影されているのが分かる。 In step 1102, a flat plate is installed at the specified object position. Then, in the following step 1103, a partial image of the object cut out from the captured image of the wide-angle camera 205 is projected onto the installed flat plate. FIG. 13 is a diagram showing a state in which a partial image of an object is projected on a flat plate. It can be seen that the partial image of each object is projected on the flat plate 1300 installed as many as the number of objects existing on the field 201.

以上が、本変形例に係る処理の内容である。オブジェクトの形状を2次元で処理するため、高速な処理が可能である。 The above is the content of the processing related to this modification. Since the shape of the object is processed in two dimensions, high-speed processing is possible.

また、平板を設置して切り出し画像を投影する代わりに、予め用意したオブジェクト形状（例えば、3Dレンジスキャナによるスキャンや、手動でモデリングされたオブジェクト形状）を、特定されたオブジェクト位置に配置してもよい。 Also, instead of installing a flat plate and projecting a cutout image, a prepared object shape (for example, a scan with a 3D range scanner or a manually modeled object shape) can be placed at the specified object position. good.

なお、本変形例の場合は、仮想視点画像生成処理（ステップ６０６）における処理の一部が変わることになる。すなわち、ステップ８０５及びステップ８０６の判定処理で“No”となった場合にも、ステップ８１３ではなくステップ８１１に進んでオブジェクトの3次元形状の推定処理が実行される。この際の推定には、ステップ６０２で取得済みの広角カメラ群の複数視点映像データが用いられることになる。図８のフローにおける破線の矢印８００はこのことを示している。この場合も、注目フレームFiの仮想カメラの撮影領域Vrに含まれるオブジェクトのみを形状推定することで、処理の高速化が可能である。 In the case of this modification, a part of the processing in the virtual viewpoint image generation processing (step 606) will be changed. That is, even when the determination process of step 805 and step 806 is "No", the process proceeds to step 811 instead of step 813 to execute the estimation process of the three-dimensional shape of the object. For the estimation at this time, the multi-viewpoint video data of the wide-angle camera group acquired in step 602 will be used. The dashed arrow 800 in the flow of FIG. 8 indicates this. In this case as well, the processing speed can be increased by estimating the shape of only the objects included in the shooting area Vr of the virtual camera of the frame Fi of interest.

以上のとおり本実施例によれば、画質を維持したまま注目オブジェクトに仮想カメラがより接近できる場合のみ、より狭い画角のカメラ群で撮影された複数視点映像データを取得して高精細なオブジェクト形状推定や仮想視点画像の生成を行う。したがって、必要最小限のデータ転送量と処理負荷に抑制することができる。これにより、よりリアルタイム性の高い仮想視点画像生成が可能となる。 As described above, according to this embodiment, only when the virtual camera can be closer to the object of interest while maintaining the image quality, the multi-viewpoint video data taken by the camera group with a narrower angle of view is acquired and the high-definition object is obtained. Performs shape estimation and virtual viewpoint image generation. Therefore, the minimum required data transfer amount and processing load can be suppressed. This makes it possible to generate a virtual viewpoint image with higher real-time performance.

次に、広角カメラ群以外のカメラ群（実施例１ではズームカメラ群と標準カメラ群）の撮影領域を撮影シーンに応じて最適化して、映像転送時間や形状推定処理時間の更なる削減を可能にする態様について実施例２として説明する。システム構成や仮想視点画像生成、処理の大まかな過程は、実施例１と同一であるため説明を省略し、以下では異なる点を中心に簡潔に説明する。 Next, it is possible to further reduce the video transfer time and shape estimation processing time by optimizing the shooting area of the camera group other than the wide-angle camera group (zoom camera group and standard camera group in Example 1) according to the shooting scene. The embodiment of the above will be described as Example 2. Since the general process of system configuration, virtual viewpoint image generation, and processing is the same as that of the first embodiment, the description thereof will be omitted, and the differences will be briefly described below.

図１４は、本実施例に係る、標準カメラ群とズームカメラ群の撮影領域を最適化する処理の流れを示すフローチャートである。なお、本実施例の前提として、本処理を実行前のデフォルト状態では、各カメラ群内の同一グループに属する各カメラは一定間隔を空けてそれぞれ異なる方向を向いている（注視点が異なる）ものとする。 FIG. 14 is a flowchart showing a flow of processing for optimizing the shooting area of the standard camera group and the zoom camera group according to the present embodiment. As a premise of this embodiment, in the default state before executing this process, each camera belonging to the same group in each camera group faces in a different direction at regular intervals (the gazing point is different). And.

ステップ１４０１では、撮影シーンの設定（例えば球技を撮影対象とするのか陸上競技を撮影対象とするのか）が、不図示のUI画面を介したユーザ入力に基づいてなされる。続くステップ１４０２では、設定された撮影シーンが、所定高度以上の高高度領域を撮影する必要のあるシーンであるか否かが判定される。ここで、高高度領域の撮影を必要とする撮影シーンとは、ボールが数十ｍ程度まで達するサッカーやラグビーといった球技などである。また、所定高度未満で高高度領域の撮影を要しない撮影シーン（低高度領域の撮影で足りるシーン）とは、陸上の短距離走などである。判定の結果、高高度領域の撮影を伴う撮影シーンである場合はステップ１４０３に進む。一方、高高度領域の撮影を伴わない撮影シーンである場合はステップ１４０４に進む。 In step 1401, the shooting scene is set (for example, whether the shooting target is a ball game or an athletics target) based on a user input via a UI screen (not shown). In the following step 1402, it is determined whether or not the set shooting scene is a scene that needs to shoot a high altitude region equal to or higher than a predetermined altitude. Here, the shooting scene that requires shooting in a high altitude region is a ball game such as soccer or rugby in which the ball reaches several tens of meters. Further, a shooting scene below a predetermined altitude and not requiring shooting in a high altitude region (a scene in which shooting in a low altitude region is sufficient) is a short-distance run on land. As a result of the determination, if the shooting scene involves shooting in a high altitude region, the process proceeds to step 1403. On the other hand, in the case of a shooting scene that does not involve shooting in a high altitude region, the process proceeds to step 1404.

ステップ１４０３では、ズームカメラ群１０９及び標準カメラ群１１０に属する各カメラの位置は固定したままで、注視点間の距離をグループ単位で縮小する（又はグループ内の各カメラの注視点を同一にする）。また、ステップ１４０４では、ズームカメラ群１０９及び標準カメラ群１１０に属する各カメラの位置は維持したままで、注視点間の距離をグループ単位で拡大（又は維持）する。図１５は、各カメラ群のグループ単位で注視点間の距離を調整することで、標準カメラ群撮影領域やズームカメラ群撮影領域が変化する様子を説明する図である。図１５（ａ）は注視点間距離を最大限に縮小（同一の注視点に変更）した場合の説明図である。図１５（ａ）において、２つの白丸印１５０１と１５０２は2台のカメラ１５１１と１５１２の変更前のそれぞれの注視点を示している。そして、1つの黒丸印１５０３が変更後の注視点を示しており、2台のカメラ１５１１と１５１２が共に同じ注視点を向いている。このとき、フィールド面に沿ったカメラ群撮影領域Xは狭くなるが、高さ方向のカメラ群撮影領域Zは変更前よりも広くなる。そのため、ボールが高高度に達する球技等の撮影に好適な撮影領域となる。これに対し図１５（ｂ）は注視点間距離を拡大した場合の説明図である。図１５（ｂ）において、黒丸印１５０４がカメラ１５１１の変更後の注視点、黒丸印１５０５がカメラ１５１２の変更後の注視点をそれぞれ示しており、注視点間の間隔が拡がっているのがわかる。このとき、フィールド面に沿ったカメラ群撮影領域Xは広くなるが、高さ方向のカメラ群撮影領域Zは狭くなる。そのため、陸上の短距離走等においては、フィールド面と平行に広範囲な領域を撮影できる。 In step 1403, the positions of the cameras belonging to the zoom camera group 109 and the standard camera group 110 remain fixed, and the distance between the gazing points is reduced in group units (or the gazing points of the cameras in the group are made the same). ). Further, in step 1404, the distance between the gazing points is expanded (or maintained) in group units while maintaining the positions of the cameras belonging to the zoom camera group 109 and the standard camera group 110. FIG. 15 is a diagram illustrating how the standard camera group shooting area and the zoom camera group shooting area change by adjusting the distance between the gazing points in units of each camera group. FIG. 15A is an explanatory diagram when the distance between gazing points is reduced to the maximum (changed to the same gazing point). In FIG. 15A, the two white circles 1501 and 1502 indicate the respective gazing points of the two cameras 1511 and 1512 before the change. One black circle 1503 indicates the gaze point after the change, and the two cameras 1511 and 1512 both point to the same gaze point. At this time, the camera group shooting area X along the field surface becomes narrower, but the camera group shooting area Z in the height direction becomes wider than before the change. Therefore, it is a shooting area suitable for shooting a ball game or the like where the ball reaches a high altitude. On the other hand, FIG. 15B is an explanatory diagram when the distance between the gazing points is increased. In FIG. 15B, the black circle 1504 shows the gaze point after the change of the camera 1511, and the black circle 1505 shows the gaze point after the change of the camera 1512, and it can be seen that the interval between the gaze points is widened. .. At this time, the camera group shooting area X along the field surface becomes wide, but the camera group shooting area Z in the height direction becomes narrow. Therefore, in short-distance running on land, it is possible to photograph a wide area parallel to the field surface.

以上が、本実施例に係る、標準カメラ群とズームカメラ群の撮影領域を最適化する処理の内容である。なお、本実施例では、単一の所定高度を基準（閾値）として、所定高度以上の場合に注視点間距離を縮小し、所定高度未満の場合に注視点間距離を拡大（又は維持）しているが、これに限らない。例えば、注視点間距離を縮小する場合の閾値と拡大する場合の閾値とをそれぞれ別個に設けてもよい。本処理によって、１つの競技の撮影に必要なカメラ台数を削減することができる。また、削減したカメラを用いて、他の競技を同時に撮影するなど、利便性の向上も期待できる。 The above is the content of the process for optimizing the shooting area of the standard camera group and the zoom camera group according to this embodiment. In this embodiment, the distance between gaze points is reduced when the altitude is higher than the predetermined altitude, and the distance between gaze points is increased (or maintained) when the altitude is lower than the predetermined altitude, using a single predetermined altitude as a reference (threshold value). However, it is not limited to this. For example, a threshold value for reducing the distance between gazing points and a threshold value for increasing the distance between gaze points may be provided separately. By this processing, the number of cameras required for shooting one competition can be reduced. In addition, it can be expected to improve convenience, such as shooting other competitions at the same time using a reduced number of cameras.

本実施例によれば、広角カメラ群以外のカメラ群の撮影領域を撮影シーンに応じて最適化することができる。これにより、映像転送時間や処理時間の更なる削減が可能となる。 According to this embodiment, the shooting area of the camera group other than the wide-angle camera group can be optimized according to the shooting scene. This makes it possible to further reduce the video transfer time and processing time.

続いて、データベースを用いて、仮想カメラに関する設定を自動で行なう態様について、実施例３として説明する。実施例１及び２と共通する内容は説明を省略し、以下では異なる点を中心に説明する。 Subsequently, a mode in which the setting related to the virtual camera is automatically performed using the database will be described as Example 3. The contents common to the first and second embodiments will be omitted, and the differences will be mainly described below.

図１６は、本実施例に係る、前述の図６のフローにおけるステップ６０４に代えて実行する、仮想カメラの各種項目を自動設定する処理の詳細を示すフローチャートである。 FIG. 16 is a flowchart showing the details of the process for automatically setting various items of the virtual camera, which is executed in place of step 604 in the flow of FIG. 6 described above, according to this embodiment.

ステップ１６０１では、不図示の外部ネットワークを介して接続される、撮影シーン解析データベース（以下、「シーンDB」）に対して、撮影シーンの解析を依頼する。画像処理装置１００は、シーンDBとLAN１０８経由で接続され、シーンDBはさらに、外部から接続可能なネットワーク上に設置されている。シーンDBは、過去の撮影シーンに関する様々な情報を蓄積しており、画像処理装置１００から解析に必要な情報を受け取って、撮影シーンの解析処理を実行する。図１７は、シーン解析処理の概念図である。シーンDB１７００には、撮影シーンの種類毎に、オブジェクト変遷情報と撮影環境情報が記録されている。ここで、オブジェクト変遷情報には、例えば撮影シーンがスポーツ競技の試合の場合、選手の移動位置の軌跡を記録したデータ、選手形状の変化の軌跡を記録したデータ、さらに球技であればボールの位置移動の軌跡を記録したデータなどが含まれる。撮影環境情報は、撮影時の周辺環境、例えば観客席の音声を記録したデータである。スポーツ競技における決定的シーンにおいては歓声によって観客席における音量が増加するため、視聴者の関心が高い決定的シーンか否かの判別に利用可能である。また、撮影シーンがスポーツ競技の試合の場合、シーンDB１７００には、上述したオブジェクト変遷情報及び撮影環境情報と、各競技における決定的シーンとの対応関係を示す情報（以下、決定的シーン情報）も記録されている。決定的シーン情報は、決定的シーンの種類、および決定的シーンに適した代表的なカメラワーク（仮想カメラの移動経路）から構成される。決定的シーンの種類とは、例えばサッカーで言えば、シュートシーン、ロングパスシーン、コーナーキックシーン等である。決定的シーン情報は学習データとして保持し、深層学習（ディープラーニング）技術等を用いることで、撮影シーンの解析が可能である。学習データの素材は、世界中の競技場からインターネット等を介して取得可能なため、膨大なデータを収集可能である。画像処理装置１００は、シーンDB１７００に対し、撮影シーン（競技）の種類、選手やボールの移動ログ（移動軌跡データ）、選手の形状ログ（形状変化データ）、観客席音声データを送信して解析を依頼する。なお、シーンDB１７００に送信する上記データは、ステップ６０２で取得した広角カメラ群１１１の複数視点映像データに基づき生成される。シーンDB１７００では、解析依頼を受けて上述の解析処理を行う。解析結果は画像処理装置１００に送られる。 In step 1601, a shooting scene analysis database (hereinafter referred to as “scene DB”) connected via an external network (not shown) is requested to analyze the shooting scene. The image processing device 100 is connected to the scene DB via LAN 108, and the scene DB is further installed on a network that can be connected from the outside. The scene DB stores various information about past shooting scenes, receives information necessary for analysis from the image processing device 100, and executes analysis processing of the shooting scene. FIG. 17 is a conceptual diagram of the scene analysis process. In the scene DB 1700, object transition information and shooting environment information are recorded for each type of shooting scene. Here, the object transition information includes, for example, when the shooting scene is a game of sports competition, data recording the locus of the movement position of the player, data recording the locus of the change of the player shape, and further, in the case of a ball game, the position of the ball. It includes data that records the trajectory of movement. The shooting environment information is data that records the surrounding environment at the time of shooting, for example, the sound of the audience seats. In a decisive scene in a sports competition, the volume in the audience seats increases due to cheers, so that it can be used to determine whether or not the decisive scene is of great interest to the viewer. Further, when the shooting scene is a game of a sports competition, the scene DB1700 also contains information indicating the correspondence between the above-mentioned object transition information and shooting environment information and the decisive scene in each competition (hereinafter, decisive scene information). It has been recorded. The decisive scene information is composed of the type of the decisive scene and the typical camera work (movement path of the virtual camera) suitable for the decisive scene. The types of decisive scenes are, for example, in soccer, a shooting scene, a long pass scene, a corner kick scene, and the like. Definitive scene information is retained as learning data, and it is possible to analyze the shooting scene by using deep learning technology or the like. Since the material of the learning data can be obtained from stadiums around the world via the Internet or the like, a huge amount of data can be collected. The image processing device 100 transmits and analyzes the type of shooting scene (competition), the movement log of the player or ball (movement trajectory data), the shape log of the player (shape change data), and the audio data of the spectators' seats to the scene DB 1700. To ask. The data to be transmitted to the scene DB 1700 is generated based on the multi-viewpoint video data of the wide-angle camera group 111 acquired in step 602. In the scene DB 1700, the above-mentioned analysis process is performed in response to the analysis request. The analysis result is sent to the image processing apparatus 100.

ステップ１６０２では、画像処理装置１００が、シーンDB１７００から解析結果を受け取る。解析結果には、決定的シーンの発生した位置、決定的シーンの種類、決定的シーンに適した代表的なカメラワークの情報が含まれる。 In step 1602, the image processing apparatus 100 receives the analysis result from the scene DB 1700. The analysis result includes information on the position where the decisive scene occurred, the type of the decisive scene, and the representative camera work suitable for the decisive scene.

ステップ１６０３では、受け取った解析結果に基づき、仮想カメラの各種項目が自動で設定される。具体的には、決定的シーンの発生位置が仮想カメラの注視点として設定される。また、代表的なカメラワークに基づき、仮想カメラの移動経路や対応するタイムフレームが設定される。決定的シーンの種類を示す情報は、生成後の仮想視点画像にメタデータとして付与される。このメタデータは放送事業者による2次利用（文字エフェクト入れ、データベース化など）の際に参照される。 In step 1603, various items of the virtual camera are automatically set based on the received analysis result. Specifically, the position where the decisive scene is generated is set as the gazing point of the virtual camera. In addition, the movement path of the virtual camera and the corresponding time frame are set based on the typical camera work. Information indicating the type of the decisive scene is added as metadata to the generated virtual viewpoint image. This metadata is referred to for secondary use by broadcasters (character effect insertion, database creation, etc.).

以上が、仮想カメラの各種項目を自動設定する処理の内容である。なお、上述のようにして自動設定された仮想カメラの注視点や移動経路を前述のGUI画面７００上に表示し、さらにユーザがその内容を編集できるように構成してもよい。また、本実施例では、シーンDB１７００を画像処理装置１００とは別個の装置として構成したが、両者を一体化した１つの装置としてもよい。或いは、本実施例のシーンDB１７００が有するシーン解析機能とデータ保存機能とを分離し、それぞれを別個の装置で構成してもよい。 The above is the content of the process for automatically setting various items of the virtual camera. The gaze point and the movement route of the virtual camera automatically set as described above may be displayed on the GUI screen 700 described above, and the contents may be edited by the user. Further, in this embodiment, the scene DB 1700 is configured as a device separate from the image processing device 100, but both may be integrated into one device. Alternatively, the scene analysis function and the data storage function of the scene DB1700 of this embodiment may be separated and each may be configured by a separate device.

本実施例によれば、データベースを用いて、仮想カメラについての移動経路等の各種項目を自動で設定することができる。これにより、更なる処理時間の短縮が可能である。 According to this embodiment, various items such as a movement route for a virtual camera can be automatically set by using a database. This makes it possible to further reduce the processing time.

本実施例では、仮想視点画像の生成時間に制限が設けられている場合に特に好適な映像生成手法に関して説明を行う。生成時間に制限が設けられているケースとして、例えば、プレイ直後にリプレイとして仮想視点画像を生成するケースや、あるいはスポーツ放送中にリアルタイムに仮想視点画像を生成するケースなどがある。なお、実施例１と重複する処理に関しては、説明を省略する。 In this embodiment, a video generation method that is particularly suitable when the generation time of the virtual viewpoint image is limited will be described. As a case where the generation time is limited, for example, there is a case where a virtual viewpoint image is generated as a replay immediately after play, or a case where a virtual viewpoint image is generated in real time during sports broadcasting. The description of the process overlapping with the first embodiment will be omitted.

図１８は、画像処理装置１００において、制限時間内に仮想視点映像が生成されるまでの全体の流れを示したフローチャートである。この一連の処理は、CPU１０１が、所定のプログラムを記憶部１０３から読み込んでメインメモリ１０２に展開し、これをCPU１０１が実行することで実現される。 FIG. 18 is a flowchart showing the entire flow until the virtual viewpoint image is generated within the time limit in the image processing apparatus 100. This series of processes is realized by the CPU 101 reading a predetermined program from the storage unit 103, expanding it into the main memory 102, and executing this by the CPU 101.

ステップ１８０１～１８０９は、ステップ６０１～６０９とほぼ同処理である。図６との差異は、ステップ１８０６とステップ１８１０である。ステップ１８０２にて広角カメラ群から複数視点映像データが取得された後、ステップ１８０３～１８０５に並列して、ステップ１８１０が実行される。ステップ１８１０では、通信が完了したＬＡＮ１０８の通信帯域を有効活用すべく、標準カメラ群の複数視点映像データが順次取得される。取得された複数視点映像データは、ステップ１８０６にて用いられる。ステップ１８０６の仮想視点画像生成処理に関しては、図１９にて詳述する。 Steps 1801 to 1809 are substantially the same as steps 601 to 609. The difference from FIG. 6 is step 1806 and step 1810. After the multi-viewpoint video data is acquired from the wide-angle camera group in step 1802, step 1810 is executed in parallel with steps 1803 to 1805. In step 1810, the multi-viewpoint video data of the standard camera group is sequentially acquired in order to effectively utilize the communication band of the LAN 108 for which communication has been completed. The acquired multi-viewpoint video data is used in step 1806. The virtual viewpoint image generation process in step 1806 will be described in detail with reference to FIG.

以上が、本実施例に係る、仮想視点画像が生成されるまでの大まかな流れである。処理時間の長いデータ通信を、形状推定処理や仮想カメラのパラメータ設定処理と並列化することで全体の処理時間が大幅に削減されるという効果がある。なお、本実施例の構成を、仮想視点画像の生成時間に制限が設けられていないケースや、生成時間に十分な余裕があるケースに用いてもよい。 The above is a rough flow until the virtual viewpoint image is generated according to this embodiment. By parallelizing data communication, which has a long processing time, with shape estimation processing and virtual camera parameter setting processing, there is an effect that the overall processing time is significantly reduced. The configuration of this embodiment may be used in a case where the generation time of the virtual viewpoint image is not limited or a case where the generation time has a sufficient margin.

図１８のステップ１８０６の仮想視点画像生成処理を説明する。ここでは、標準カメラ群の複数視点映像データを用いてオブジェクト形状が生成された後、処理時間を鑑みながらズームカメラ群の複数視点映像データが適用される。図1９は、本実施例に係る、仮想視点画像生成処理の詳細を示すフローチャートである。以下、図1９のフローに沿って詳しく説明する。 The virtual viewpoint image generation process of step 1806 of FIG. 18 will be described. Here, after the object shape is generated using the multi-viewpoint video data of the standard camera group, the multi-viewpoint video data of the zoom camera group is applied in consideration of the processing time. FIG. 19 is a flowchart showing the details of the virtual viewpoint image generation process according to this embodiment. Hereinafter, a detailed description will be given along with the flow of FIG.

ステップ１９０１から、ステップ１９０６はステップ８０１～８０６と同処理であるため、説明を省略する。ステップ１９０７では、ステップ１８１０で取得された標準カメラ群の複数視点映像データを用いてオブジェクト形状の再推定処理が実行される。ステップ１９０８では、前述のステップ６０３の形状推定で得られた低精細のオブジェクト形状データが、ステップ１９０７の形状推定で得られた高精細のオブジェクト形状データに置換される。ステップ１９０９、１９１０はステップ８０７、８０８と同処理である。ステップ１９１１では、本ステップまでの処理時間が形状推定実施の限界値以内であるかが判断される。限界値は1画像フレームの形状推定を実施しなければならない時間に基づいて予め決定される。例えば、30秒以内に600フレームのリプレイ用の映像を生成する場合、1画像フレーム当たり50ミリ秒（30,000/600）を限界値とすることができる。ただし、処理時間に余裕を持たせる場合や、その他の事情によっては限界値が異なる値となることはありえる。Yesの場合はステップ１９１２に進む。Noの場合はステップ１９１７に進むことにより、標準カメラ群を用いた仮想視点画像が生成される。つまり、ステップ１９１１にてＮｏの判定となった場合は、ステップ１９０８で置換されたオブジェクト形状データの評価結果にかかわらず、ズームカメラ群の複数視点映像データに基づく形状推定を行わない。ステップ１９１２では、ステップ８０９と同様にズームカメラ群から複数視点映像データが取得される。このとき、ＬＡＮ１０８の通信帯域を確保するため、ステップ１８１０での標準カメラ群の複数視点映像データの取得は、一時停止され、本ステップ終了後に再開される。ステップ１９１３では、本ステップまでの処理時間が形状推定実施の限界値以内であるかが再度判断される。Yesの場合はステップ１９１４に進み、Noの場合はステップ１９１６に進む。ステップ１９１４では、ズームカメラ群の複数視点映像データを用いてオブジェクト形状の再推定処理が実行される。ステップ１９１５では、前述のステップ１９０７の形状推定で得られたオブジェクト形状データが、ステップ１９１４の形状推定で得られた高精細のオブジェクト形状データに置換される。ステップ１９１６では、形状推定の再実行時間が不足しているため、オブジェクト形状はステップ１９０７で得られたデータが用いられるが、オブジェクト形状に投影するテクスチャはズームカメラの複数視点映像データを用いるようにレンダリング設定が行われる。ステップ１９１７では、ステップ１９１６までの処理で決まったオブジェクト形状とテクスチャを用いて、注目フレームFiの仮想カメラから見た画像である仮想視点画像が生成される。 Since step 1901, step 1906 is the same process as steps 801 to 806, the description thereof will be omitted. In step 1907, the object shape re-estimation process is executed using the multi-viewpoint video data of the standard camera group acquired in step 1810. In step 1908, the low-definition object shape data obtained by the shape estimation in step 603 described above is replaced with the high-definition object shape data obtained by the shape estimation in step 1907. Steps 1909 and 1910 are the same processes as steps 807 and 808. In step 1911, it is determined whether the processing time up to this step is within the limit value for performing shape estimation. The limit value is predetermined based on the time during which the shape estimation of one image frame must be performed. For example, when generating 600 frames of video for replay within 30 seconds, the limit value can be 50 milliseconds (30,000/600) per image frame. However, the limit value may be different depending on the processing time and other circumstances. If Yes, the process proceeds to step 1912. If No, the process proceeds to step 1917 to generate a virtual viewpoint image using a standard camera group. That is, if No is determined in step 1911, the shape is not estimated based on the multi-viewpoint video data of the zoom camera group regardless of the evaluation result of the object shape data replaced in step 1908. In step 1912, the plurality of viewpoint video data is acquired from the zoom camera group as in step 809. At this time, in order to secure the communication band of the LAN 108, the acquisition of the multi-viewpoint video data of the standard camera group in step 1810 is suspended and restarted after the end of this step. In step 1913, it is determined again whether the processing time up to this step is within the limit value for performing shape estimation. If Yes, the process proceeds to step 1914, and if No, the process proceeds to step 1916. In step 1914, the object shape re-estimation process is executed using the multi-viewpoint video data of the zoom camera group. In step 1915, the object shape data obtained by the shape estimation in step 1907 is replaced with the high-definition object shape data obtained by the shape estimation in step 1914. In step 1916, since the re-execution time of the shape estimation is insufficient, the data obtained in step 1907 is used for the object shape, but the texture projected on the object shape uses the multi-viewpoint video data of the zoom camera. Rendering settings are made. In step 1917, a virtual viewpoint image, which is an image seen from the virtual camera of the frame Fi of interest, is generated using the object shape and texture determined by the processes up to step 1916.

なお、処理時間が形状推定の限界値以内であるか否かの判定を実行するタイミングは、図１９において示す例に限らない。例えば、ステップ１９０６とステップ１９０７の間に判定を行うようにしても良いし、ステップ１９０８とステップ１９０９の間に判定を行うようにしても良い。また、図１９におけるステップ１９１０とステップ１９１１の順序が逆であっても良い。 The timing for executing the determination of whether or not the processing time is within the limit value of shape estimation is not limited to the example shown in FIG. For example, the determination may be made between steps 1906 and 1907, or the determination may be made between steps 1908 and 1909. Further, the order of steps 1910 and 1911 in FIG. 19 may be reversed.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

It is a plurality of captured images obtained by a plurality of first imaging devices that capture fields from different directions, and is used to generate a first virtual viewpoint image according to the position of a virtual viewpoint and the line-of-sight direction from the virtual viewpoint. The first acquisition means for acquiring a plurality of captured images ,
A second virtual viewpoint image having a higher image quality than the first virtual viewpoint image is obtained based on one or more shot images obtained by one or a plurality of second shooting devices that shoot at least a part of the field from different directions. A determination means for determining whether or not to generate according to the evaluation result of the first virtual viewpoint image generated based on the plurality of captured images acquired by the first acquisition means .
A second acquisition means for acquiring one or a plurality of captured images obtained by the second imaging device according to the determination by the determination means.
The first virtual viewpoint image and the first virtual viewpoint image based on the plurality of captured images acquired by the first acquisition means and one or a plurality of captured images acquired by the second acquisition means according to the determination by the determination means. The generation means for generating the second virtual viewpoint image and
A virtual viewpoint image generator characterized by being equipped with.

The virtual viewpoint image generation device according to claim 1, wherein the determination means determines whether or not to generate the second virtual viewpoint image based on the evaluation result of the image quality of the object included in the first virtual viewpoint image. ..

The determination means is
When the size of the object in the first virtual viewpoint image is equal to or larger than the threshold value, it is determined to generate the second virtual viewpoint image.
The virtual viewpoint image generation device according to claim 1 or 2 , wherein when the size of the object is less than the threshold value, it is determined not to generate the second virtual viewpoint image.

The determination means determines whether or not to generate the second virtual viewpoint image based on the evaluation result of the image quality of the object closest to the virtual viewpoint among the plurality of objects included in the first virtual viewpoint image. The virtual viewpoint image generation device according to any one of claims 1 to 3, wherein the virtual viewpoint image generation device is characterized.

The determination means is
The image quality of the first virtual viewpoint image generated based on the plurality of captured images obtained by the plurality of first photographing devices , and the one or more captured images obtained by the one or the plurality of second photographing devices . Compare with the threshold for image quality based on
The invention according to any one of claims 1 to 4 , wherein when the image quality of the first virtual viewpoint image is lower than the image quality represented by the threshold value, it is determined to generate the second virtual viewpoint image. The virtual viewpoint image generator described.

The number of the plurality of first photographing devices is smaller than the number of the one or more second photographing devices .
The virtual viewpoint image generation device according to any one of claims 1 to 5 , wherein the shooting range of each of the plurality of first shooting devices is wider than the shooting range of the second shooting device.

The determination means includes a first parameter determined based on the ratio of the number of pixels of a predetermined object included in the first virtual viewpoint image to the number of pixels of the first virtual viewpoint image, and the plurality of first photographing devices . The first is determined based on the ratio of the number of pixels of the predetermined object in the plurality of captured images obtained by the above to the number of pixels of the predetermined object in the one or a plurality of captured images obtained by the one or a plurality of second photographing devices . The virtual viewpoint image generation device according to any one of claims 1 to 6 , wherein it is determined whether or not to generate the second virtual viewpoint image based on a comparison with two parameters.

A third virtual viewpoint image having a higher image quality than the second virtual viewpoint image generated in response to a determination by the determination means is generated based on one or a plurality of captured images obtained by one or a plurality of third photographing devices . A second determination means for determining whether or not to do so according to the evaluation result of the second virtual viewpoint image.
A third acquisition means for acquiring one or a plurality of captured images obtained by the third imaging device according to the determination by the second determination means.
Further have
The generation means generates a third virtual viewpoint image based on one or a plurality of captured images acquired by the third acquisition means in response to a determination by the second determination means.
The virtual viewpoint image generation device according to any one of claims 1 to 7 .

The second determination means is
When the second virtual viewpoint image is generated within a predetermined time limit, it is determined whether or not to generate the third virtual viewpoint image according to the evaluation result of the second virtual viewpoint image.
If the second virtual viewpoint image is not generated within the predetermined time limit, it is determined not to generate the third virtual viewpoint image regardless of the evaluation result of the second virtual viewpoint image. The virtual viewpoint image generation device according to claim 8.

The virtual according to any one of claims 1 to 9 , wherein the determination means determines whether or not to generate the second virtual viewpoint image based on a predetermined time limit. Viewpoint image generator.

The determination means is
When the first virtual viewpoint image is generated within the predetermined time limit, it is determined whether or not to generate the second virtual viewpoint image according to the evaluation result of the first virtual viewpoint image.
If the first virtual viewpoint image is not generated within the predetermined time limit, it is determined not to generate the second virtual viewpoint image regardless of the evaluation result of the first virtual viewpoint image. 10. The virtual viewpoint image generation device according to claim 10 .

It is used to generate a first virtual viewpoint image according to the position of a virtual viewpoint and the line-of-sight direction from the virtual viewpoint based on a plurality of captured images obtained by a plurality of first imaging devices that capture fields from different directions. The first acquisition process to acquire multiple captured images ,
Whether to generate a second virtual viewpoint image having a higher image quality than the first virtual viewpoint image based on the captured images obtained by one or a plurality of second photographing devices that capture at least a part of the field from different directions. A determination step of determining whether or not the image is based on the evaluation result of the first virtual viewpoint image generated based on the plurality of captured images acquired in the first acquisition step.
A second acquisition step of acquiring one or a plurality of captured images obtained by the second imaging device according to the determination, and a second acquisition step.
The first virtual viewpoint based on a plurality of captured images acquired in the first acquisition step and one or a plurality of captured images acquired in the second acquisition step according to a decision in the determination step. A generation process for generating an image and the second virtual viewpoint image, and
A virtual viewpoint image generation method characterized by including.

The deciding step is characterized in that it is determined whether or not to generate the second virtual viewpoint image based on the evaluation result of the image quality of the object included in the first virtual viewpoint image. The described virtual viewpoint image generation method.

In the determination step, when the size of the object in the first virtual viewpoint image is equal to or larger than the threshold value, it is determined to generate the second virtual viewpoint image, and when the size of the object is smaller than the threshold value, it is determined. The method for generating a virtual viewpoint image according to claim 12 , wherein it is determined not to generate the second virtual viewpoint image.

A program for causing a computer to function as the virtual viewpoint image generator according to any one of claims 1 to 11 .