JP2019053738A

JP2019053738A - Image processing apparatus, image processing system, image processing method, and program

Info

Publication number: JP2019053738A
Application number: JP2018192134A
Authority: JP
Inventors: 康文 ▲高▼間; Yasufumi Takama
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-10-10
Filing date: 2018-10-10
Publication date: 2019-04-04
Anticipated expiration: 2036-10-28
Also published as: JP6672417B2

Abstract

To provide an image processing apparatus configured to generate a virtual point-of-view image in accordance with different requirements, an image processing system, an image processing method, and a program.SOLUTION: An image processing apparatus receives an instruction to generate a virtual point-of-view image on the basis of images based on imaging of a subject by multiple cameras from different directions and information corresponding to designation of a virtual point of view. The image processing apparatus performs control in accordance with receiving the generation instruction, so that a first virtual point-of-view image output to a first display device and a second virtual point-of-view image output to a second display device may be generated on the basis of the images based on imaging and the information corresponding to the designation of the virtual point of view. The second virtual point-of-view image is a virtual point-of-view image having higher image quality than the first virtual point-of-view image.SELECTED DRAWING: Figure 3

Description

本発明は、仮想視点画像を生成する技術に関するものである。 The present invention relates to a technique for generating a virtual viewpoint image.

昨今、複数のカメラを異なる位置に設置して多視点から被写体を撮影し、当該撮影により得られた複数視点画像を用いて仮想視点画像や３次元モデルを生成する技術が注目されている。上記のようにして複数視点画像から仮想視点画像を生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることが出来る。 In recent years, attention has been paid to a technique in which a plurality of cameras are installed at different positions, a subject is photographed from multiple viewpoints, and a virtual viewpoint image and a three-dimensional model are generated using the plurality of viewpoint images obtained by the photographing. According to the technique for generating a virtual viewpoint image from a plurality of viewpoint images as described above, for example, since a highlight scene of soccer or basketball can be viewed from various angles, the user can compare it with a normal image. A high sense of reality can be given.

特許文献１では、複数の視点から撮影した画像を合成して仮想視点画像を生成する場合に、画像内のオブジェクトの境界領域におけるレンダリング単位を小さくすることで、仮想視点画像の画質を向上することについて記載されている。 In Patent Document 1, when a virtual viewpoint image is generated by synthesizing images taken from a plurality of viewpoints, the image quality of the virtual viewpoint image is improved by reducing the rendering unit in the boundary region of the object in the image. Is described.

特開２０１３−２２３００８号公報JP 2013-223008 A

しかしながら、従来の技術では、異なる複数の要件に応じた仮想視点画像を生成できない場合が考えられる。例えば、高画質の仮想視点画像だけを生成する場合には、生成に係る処理時間が長くなることが考えられ、画質は低くともリアルタイムで仮想視点画像を見たいユーザの要件に応えることが困難になる虞がある。一方、低画質の仮想視点画像だけを生成する場合には、リアルタイム性よりも仮想視点画像が高画質であることを優先するユーザの要件に応えることが困難になる虞がある。 However, in the conventional technique, there may be a case where a virtual viewpoint image corresponding to a plurality of different requirements cannot be generated. For example, when generating only a high-quality virtual viewpoint image, the processing time for generation may be longer, making it difficult to meet the requirements of users who want to view the virtual viewpoint image in real time even if the image quality is low. There is a risk of becoming. On the other hand, when only a low-quality virtual viewpoint image is generated, it may be difficult to meet the requirements of users who prioritize that the virtual viewpoint image has higher image quality than real-time characteristics.

本発明は上記の課題に鑑みてなされたものであり、異なる複数の要件に応じた仮想視点画像を生成することを目的とする。 The present invention has been made in view of the above problems, and an object thereof is to generate a virtual viewpoint image according to a plurality of different requirements.

上記課題を解決するため、本発明に係る画像処理システムは、例えば以下の構成を有する。すなわち、複数のカメラによる複数の方向からの撮影に基づく画像を取得する画像取得手段と、仮想視点を示す視点情報を取得する情報取得手段と、前記画像取得手段により取得された画像と前記情報取得手段により取得された前記視点情報とに基づいて仮想視点画像を生成する生成手段であって、表示装置へ出力される第１の仮想視点画像と、前記第１の仮想視点画像より画質が高い第２の仮想視点画像であって前記表示装置への前記第１の仮想視点画像の出力よりも遅いタイミングで別の表示装置へ出力される第２の仮想視点画像とを生成する生成手段とを有する。 In order to solve the above problems, an image processing system according to the present invention has, for example, the following configuration. That is, an image acquisition unit that acquires images based on photographing from a plurality of directions by a plurality of cameras, an information acquisition unit that acquires viewpoint information indicating a virtual viewpoint, an image acquired by the image acquisition unit, and the information acquisition Generation means for generating a virtual viewpoint image based on the viewpoint information acquired by the means, and a first virtual viewpoint image output to a display device and a first image having a higher image quality than the first virtual viewpoint image. Generating means for generating a second virtual viewpoint image that is output to another display device at a timing later than the output of the first virtual viewpoint image to the display device. .

本発明によれば、異なる複数の要件に応じた仮想視点画像を生成することができる。 According to the present invention, a virtual viewpoint image according to a plurality of different requirements can be generated.

画像処理システム１０の構成について説明するための図である。1 is a diagram for explaining a configuration of an image processing system 10. FIG. 画像処理装置１のハードウェア構成について説明するための図である。2 is a diagram for explaining a hardware configuration of an image processing apparatus 1. FIG. 画像処理装置１の動作の１形態について説明するためのフローチャートである。3 is a flowchart for explaining one mode of operation of the image processing apparatus 1. 表示装置３による表示画面の構成について説明するための図である。4 is a diagram for explaining a configuration of a display screen by the display device 3. FIG. 画像処理装置１の動作の１形態について説明するためのフローチャートである。3 is a flowchart for explaining one mode of operation of the image processing apparatus 1. 画像処理装置１の動作の１形態について説明するためのフローチャートである。3 is a flowchart for explaining one mode of operation of the image processing apparatus 1.

［システム構成］
以下、本発明の実施形態について図面を参照して説明する。まず図１を用いて、仮想視点画像を生成し出力する画像処理システム１０の構成について説明する。本実施形態における画像処理システム１０は、画像処理装置１、カメラ群２、表示装置３、及び表示装置４を有する。 [System configuration]
Embodiments of the present invention will be described below with reference to the drawings. First, the configuration of an image processing system 10 that generates and outputs a virtual viewpoint image will be described with reference to FIG. The image processing system 10 in this embodiment includes an image processing device 1, a camera group 2, a display device 3, and a display device 4.

なお、本実施形態における仮想視点画像は、仮想的な視点から被写体を撮影した場合に得られる画像である。言い換えると、仮想視点画像は、指定された視点における見えを表す画像である。仮想的な視点（仮想視点）は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点画像が動画である場合を中心に説明するが、仮想視点画像は静止画であってもよい。 Note that the virtual viewpoint image in the present embodiment is an image obtained when a subject is photographed from a virtual viewpoint. In other words, the virtual viewpoint image is an image representing the appearance at the designated viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. An image corresponding to the viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the apparatus are also included in the virtual viewpoint image. In the present embodiment, the case where the virtual viewpoint image is a moving image will be mainly described. However, the virtual viewpoint image may be a still image.

カメラ群２は、複数のカメラを含み、各カメラはそれぞれ異なる方向から被写体を撮影する。本実施形態において、カメラ群２に含まれる複数のカメラは、それぞれが画像処理装置１と接続されており、撮影画像や各カメラのパラメータ等を画像処理装置１に送信する。ただしこれに限らず、カメラ群２に含まれる複数のカメラ同士が通信可能であり、カメラ群２に含まれる何れかのカメラが複数のカメラによる撮影画像や複数のカメラのパラメータ等を画像処理装置１に送信してもよい。また、カメラ群２に含まれる何れかのカメラが、撮影画像に代えて、複数のカメラによる撮影画像の差分に基づいて生成された画像など、カメラ群２による撮影に基づく画像を送信してもよい。 The camera group 2 includes a plurality of cameras, and each camera photographs a subject from different directions. In the present embodiment, each of the plurality of cameras included in the camera group 2 is connected to the image processing apparatus 1, and transmits a captured image, parameters of each camera, and the like to the image processing apparatus 1. However, the present invention is not limited to this, and a plurality of cameras included in the camera group 2 can communicate with each other, and any one of the cameras included in the camera group 2 can capture images taken by the plurality of cameras, parameters of the plurality of cameras, 1 may be transmitted. In addition, any camera included in the camera group 2 may transmit an image based on photographing by the camera group 2, such as an image generated based on a difference between photographed images by a plurality of cameras, instead of the photographed image. Good.

表示装置３は、仮想視点画像を生成するための仮想視点の指定を受け付け、指定に応じた情報を画像処理装置１に送信する。例えば、表示装置３はジョイスティック、ジョグダイヤル、タッチパネル、キーボード、及びマウスなどの入力部を有し、仮想視点を指定するユーザ（操作者）は入力部を操作することで仮想視点を指定する。本実施形態におけるユーザとは、表示装置３の入力部を操作して仮想視点を指定する操作者または表示装置４により表示される仮想視点画像を見る視聴者であり、操作者と視聴者を特に区別しない場合には単にユーザと記載する。本実施形態では視聴者と操作者が異なる場合を中心に説明するが、これに限らず、視聴者と操作者が同一のユーザであってもよい。なお、本実施形態において、表示装置３から画像処理装置１に送信される仮想視点の指定に応じた情報は、仮想視点の位置や向きを示す仮想視点情報である。ただしこれに限らず、仮想視点の指定に応じた情報は仮想視点画像における被写体の形状や向きなど仮想視点に応じて定まる内容を示す情報であってもよく、画像処理装置１はこのような仮想視点の指定に応じた情報に基づいて仮想視点画像を生成してもよい。 The display device 3 accepts designation of a virtual viewpoint for generating a virtual viewpoint image, and transmits information corresponding to the designation to the image processing device 1. For example, the display device 3 includes input units such as a joystick, a jog dial, a touch panel, a keyboard, and a mouse, and a user (operator) who specifies a virtual viewpoint specifies a virtual viewpoint by operating the input unit. The user in the present embodiment is an operator who operates the input unit of the display device 3 to specify a virtual viewpoint or a viewer who views a virtual viewpoint image displayed by the display device 4. When not distinguished, it is simply described as a user. In this embodiment, the case where the viewer and the operator are different will be mainly described. However, the present invention is not limited to this, and the viewer and the operator may be the same user. In the present embodiment, information according to the designation of the virtual viewpoint transmitted from the display device 3 to the image processing apparatus 1 is virtual viewpoint information indicating the position and orientation of the virtual viewpoint. However, the present invention is not limited to this, and the information according to the designation of the virtual viewpoint may be information indicating contents determined according to the virtual viewpoint such as the shape and orientation of the subject in the virtual viewpoint image. A virtual viewpoint image may be generated based on information according to the designation of the viewpoint.

さらに表示装置３は、カメラ群２による撮影に基づく画像と表示装置３が受け付けた仮想視点の指定とに基づいて画像処理装置１により生成され出力された仮想視点画像を表示する。これにより操作者は、表示装置３に表示された仮想視点画像を見ながら仮想視点の指定を行うことができる。なお、本実施形態では仮想視点画像を表示する表示装置３が仮想視点の指定を受け付けるものとするが、これに限らない。例えば、仮想視点の指定を受け付ける装置と、操作者に仮想視点を指定させるための仮想視点画像を表示する表示装置とが、別々の装置であってもよい。 Further, the display device 3 displays the virtual viewpoint image generated and output by the image processing device 1 based on the image based on the photographing by the camera group 2 and the designation of the virtual viewpoint accepted by the display device 3. As a result, the operator can specify the virtual viewpoint while viewing the virtual viewpoint image displayed on the display device 3. In the present embodiment, the display device 3 that displays the virtual viewpoint image accepts the designation of the virtual viewpoint, but is not limited thereto. For example, a device that receives designation of a virtual viewpoint and a display device that displays a virtual viewpoint image for allowing an operator to designate a virtual viewpoint may be separate devices.

また表示装置３は、操作者による操作に基づいて、仮想視点画像の生成を開始させるための生成指示を画像処理装置１に対して行う。なお生成指示はこれに限らず、例えば所定の時刻に仮想視点画像の生成が開始されるように画像処理装置１に仮想視点画像の生成を予約するための指示であってもよい。また例えば、所定のイベントが発生した場合に仮想視点画像の生成が開始されるように予約するための指示であってもよい。なお、画像処理装置１に対して仮想視点画像の生成指示を行う装置が表示装置３と異なる装置であってもよいし、ユーザが画像処理装置１に対して生成指示を直接入力してもよい。 Further, the display device 3 issues a generation instruction for starting generation of a virtual viewpoint image to the image processing device 1 based on an operation by the operator. The generation instruction is not limited to this. For example, the generation instruction may be an instruction for reserving generation of the virtual viewpoint image in the image processing apparatus 1 so that generation of the virtual viewpoint image is started at a predetermined time. Further, for example, an instruction for making a reservation so that generation of a virtual viewpoint image is started when a predetermined event occurs may be used. Note that the device that instructs the image processing device 1 to generate a virtual viewpoint image may be a device different from the display device 3, or the user may directly input the generation instruction to the image processing device 1. .

表示装置４は、表示装置３を用いた操作者による仮想視点の指定に基づいて画像処理装置１により生成される仮想視点画像を、仮想視点を指定する操作者とは異なるユーザ（視聴者）に対して表示する。なお、画像処理システム１０は複数の表示装置４を有していてもよく、複数の表示装置４がそれぞれ異なる仮想視点画像を表示してもよい。例えば、生放送される仮想視点画像（ライブ画像）を表示する表示装置４と、収録後に放送される仮想視点画像（非ライブ画像）を表示する表示装置４とが、画像処理システム１０に含まれていてもよい。 The display device 4 gives the virtual viewpoint image generated by the image processing device 1 based on the designation of the virtual viewpoint by the operator using the display device 3 to a user (viewer) different from the operator who designates the virtual viewpoint. Display. Note that the image processing system 10 may include a plurality of display devices 4, and the plurality of display devices 4 may display different virtual viewpoint images. For example, the image processing system 10 includes a display device 4 that displays a virtual viewpoint image (live image) broadcast live and a display device 4 that displays a virtual viewpoint image (non-live image) broadcast after recording. May be.

画像処理装置１は、カメラ情報取得部１００、仮想視点情報取得部１１０（以降、視点取得部１１０）、画像生成部１２０、及び出力部１３０を有する。カメラ情報取得部１００は、カメラ群２による撮影に基づく画像や、カメラ群２に含まれる各カメラの外部パラメータ及び内部パラメータなどを、カメラ群２から取得し、画像生成部１２０へ出力する。視点取得部１１０は、操作者による仮想視点の指定に応じた情報を表示装置３から取得し、画像生成部１２０へ出力する。また視点取得部１１０は、表示装置３による仮想視点画像の生成指示を受け付ける。画像生成部１２０は、カメラ情報取得部１００により取得された撮影に基づく画像と、視点取得部１１０により取得された指定に応じた情報と、視点取得部１１０により受け付けられた生成指示とに基づいて、仮想視点画像を生成し、出力部１３０へ出力する。出力部１３０は、画像生成部１２０により生成された仮想視点画像を、表示装置３や表示装置４などの外部の装置へ出力する。 The image processing apparatus 1 includes a camera information acquisition unit 100, a virtual viewpoint information acquisition unit 110 (hereinafter referred to as a viewpoint acquisition unit 110), an image generation unit 120, and an output unit 130. The camera information acquisition unit 100 acquires, from the camera group 2, images based on shooting by the camera group 2, external parameters and internal parameters of each camera included in the camera group 2, and outputs them to the image generation unit 120. The viewpoint acquisition unit 110 acquires information according to the designation of the virtual viewpoint by the operator from the display device 3 and outputs the information to the image generation unit 120. The viewpoint acquisition unit 110 receives a virtual viewpoint image generation instruction from the display device 3. The image generation unit 120 is based on the image based on the shooting acquired by the camera information acquisition unit 100, information according to the designation acquired by the viewpoint acquisition unit 110, and the generation instruction received by the viewpoint acquisition unit 110. Then, a virtual viewpoint image is generated and output to the output unit 130. The output unit 130 outputs the virtual viewpoint image generated by the image generation unit 120 to an external device such as the display device 3 or the display device 4.

なお、本実施形態において画像処理装置１は、画質の異なる複数の仮想視点画像を生成し、各仮想視点画像に応じた出力先に出力する。例えば、リアルタイム（低遅延）の仮想視点画像を要望する視聴者が見ている表示装置４には、生成に係る処理時間が短い低画質の仮想視点画像を出力する。一方、高画質の仮想視点画像を要望する視聴者が見ている表示装置４には、生成に係る処理時間が長い高画質の仮想視点画像を出力する。なお、本実施形態における遅延は、カメラ群２による撮影が行われてからその撮影に基づく仮想視点画像が表示されるまでの期間に対応する。ただし遅延の定義はこれに限らず、例えば現実世界の時刻と表示画像に対応する時刻との時間差を遅延としてもよい。 Note that in the present embodiment, the image processing apparatus 1 generates a plurality of virtual viewpoint images with different image quality and outputs them to an output destination corresponding to each virtual viewpoint image. For example, a low-quality virtual viewpoint image with a short processing time is generated to the display device 4 viewed by a viewer who desires a virtual viewpoint image in real time (low delay). On the other hand, a high-quality virtual viewpoint image with a long processing time is generated to the display device 4 viewed by a viewer who desires a high-quality virtual viewpoint image. Note that the delay in the present embodiment corresponds to a period from when the camera group 2 performs shooting until a virtual viewpoint image based on the shooting is displayed. However, the definition of the delay is not limited to this. For example, a time difference between the time in the real world and the time corresponding to the display image may be used as the delay.

続いて、画像処理装置１のハードウェア構成について、図２を用いて説明する。画像処理装置１は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、表示部２０５、操作部２０６、通信部２０７、及びバス２０８を有する。ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているコンピュータプログラムやデータを用いて画像処理装置１の全体を制御する。なお、画像処理装置１がＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有し、ＣＰＵ２０１による処理の少なくとも一部をＧＰＵが行ってもよい。ＲＯＭ２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ２０３は、補助記憶装置２０４から供給されるプログラムやデータ、及び通信部２０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置２０４は、例えばハードディスクドライブ等で構成され、静止画や動画などのコンテンツデータを記憶する。 Next, the hardware configuration of the image processing apparatus 1 will be described with reference to FIG. The image processing apparatus 1 includes a CPU 201, ROM 202, RAM 203, auxiliary storage device 204, display unit 205, operation unit 206, communication unit 207, and bus 208. The CPU 201 controls the entire image processing apparatus 1 using computer programs and data stored in the ROM 202 and the RAM 203. Note that the image processing apparatus 1 may have a GPU (Graphics Processing Unit), and the GPU may perform at least part of the processing performed by the CPU 201. The ROM 202 stores programs and parameters that do not need to be changed. The RAM 203 temporarily stores programs and data supplied from the auxiliary storage device 204 and data supplied from the outside via the communication unit 207. The auxiliary storage device 204 is composed of, for example, a hard disk drive and stores content data such as still images and moving images.

表示部２０５は、例えば液晶ディスプレイ等で構成され、ユーザが画像処理装置１を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部２０６は、例えばキーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ２０１に入力する。通信部２０７は、カメラ群２や表示装置３、表示装置４などの外部の装置と通信を行う。例えば、画像処理装置１が外部の装置と有線で接続される場合には、ＬＡＮケーブル等が通信部２０７に接続される。なお、画像処理装置１が外部の装置と無線通信する機能を有する場合、通信部２０７はアンテナを備える。バス２０８は、画像処理装置１の各部を繋いで情報を伝達する。 The display unit 205 includes a liquid crystal display, for example, and displays a GUI (Graphical User Interface) for the user to operate the image processing apparatus 1. The operation unit 206 is composed of, for example, a keyboard and a mouse, and inputs various instructions to the CPU 201 in response to a user operation. The communication unit 207 communicates with external devices such as the camera group 2, the display device 3, and the display device 4. For example, when the image processing apparatus 1 is connected to an external apparatus by wire, a LAN cable or the like is connected to the communication unit 207. Note that when the image processing apparatus 1 has a function of performing wireless communication with an external apparatus, the communication unit 207 includes an antenna. A bus 208 connects each part of the image processing apparatus 1 and transmits information.

なお、本実施形態では表示部２０５と操作部２０６は画像処理装置１の内部に存在するが、画像処理装置１は表示部２０５及び操作部２０６の少なくとも一方を備えていなくてもよい。また、表示部２０５及び操作部２０６の少なくとも一方が画像処理装置１の外部に別の装置として存在していて、ＣＰＵ２０１が、表示部２０５を制御する表示制御部、及び操作部２０６を制御する操作制御部として動作してもよい。 In the present embodiment, the display unit 205 and the operation unit 206 exist inside the image processing apparatus 1, but the image processing apparatus 1 may not include at least one of the display unit 205 and the operation unit 206. Further, at least one of the display unit 205 and the operation unit 206 exists as another device outside the image processing apparatus 1, and the CPU 201 controls the display control unit that controls the display unit 205 and the operation that controls the operation unit 206. It may operate as a control unit.

［動作フロー］
次に図３を用いて、画像処理装置１の動作の１形態について説明する。図３に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始され、定期的（例えば仮想視点画像が動画である場合の１フレームごと）に繰り返される。ただし、図３に示す処理の開始タイミングは上記タイミングに限定されない。図３に示す処理は、ＣＰＵ２０１がＲＯＭ２０２に格納されたプログラムをＲＡＭ２０３に展開して実行することで実現される。なお、図３に示す処理の少なくとも一部を、ＣＰＵ２０１とは異なる専用のハードウェアにより実現してもよい。 [Operation flow]
Next, one form of the operation of the image processing apparatus 1 will be described with reference to FIG. The process shown in FIG. 3 is started at the timing when the viewpoint acquisition unit 110 receives an instruction to generate a virtual viewpoint image, and is repeated periodically (for example, every frame when the virtual viewpoint image is a moving image). However, the start timing of the process shown in FIG. 3 is not limited to the above timing. The processing shown in FIG. 3 is realized by the CPU 201 developing and executing a program stored in the ROM 202 on the RAM 203. Note that at least part of the processing illustrated in FIG. 3 may be realized by dedicated hardware different from the CPU 201.

図３に示すフローにおいて、Ｓ２０１０とＳ２０２０は情報を取得する処理に対応し、Ｓ２０３０−Ｓ２０５０は操作者に仮想視点を指定させるための仮想視点画像（指定用画像）を生成し出力する処理に対応する。また、Ｓ２０７０−Ｓ２１００は、ライブ画像を生成し出力する処理に対応する。Ｓ２１１０−Ｓ２１３０は、非ライブ画像を生成し出力する処理に対応する。以下、各ステップにおける処理の詳細を説明する。 In the flow shown in FIG. 3, S2010 and S2020 correspond to processing for acquiring information, and S2030 to S2050 correspond to processing for generating and outputting a virtual viewpoint image (designation image) for allowing the operator to specify a virtual viewpoint. To do. S2070-S2100 correspond to processing for generating and outputting a live image. S2110 to S2130 correspond to processing for generating and outputting a non-live image. Details of the processing in each step will be described below.

Ｓ２０１０において、カメラ情報取得部１００は、カメラ群２による撮影に基づく各カメラの撮影画像と、各カメラの外部パラメータ及び内部パラメータを取得する。外部パラメータはカメラの位置や姿勢に関する情報であり、内部パラメータはカメラの焦点距離や画像中心に関する情報である。 In step S2010, the camera information acquisition unit 100 acquires a captured image of each camera based on the shooting by the camera group 2, and external parameters and internal parameters of each camera. The external parameter is information related to the position and orientation of the camera, and the internal parameter is information related to the focal length of the camera and the image center.

Ｓ２０２０において、視点取得部１１０は、操作者による仮想視点の指定に応じた情報として仮想視点情報を取得する。本実施形態において仮想視点情報は、仮想視点から被写体を撮影する仮想カメラの外部パラメータと内部パラメータに対応し、仮想視点画像の１フレームを生成するために１つの仮想視点情報が必要となる。 In S2020, the viewpoint acquisition unit 110 acquires virtual viewpoint information as information according to the designation of the virtual viewpoint by the operator. In this embodiment, the virtual viewpoint information corresponds to an external parameter and an internal parameter of a virtual camera that captures a subject from the virtual viewpoint, and one piece of virtual viewpoint information is required to generate one frame of the virtual viewpoint image.

Ｓ２０３０において、画像生成部１２０は、カメラ群２による撮影画像に基づいて、被写体となるオブジェクトの３次元形状を推定する。被写体となるオブジェクトは、例えば、カメラ群２の撮影範囲内に存在する人物や動体などである。画像生成部１２０は、カメラ群２から取得した撮影画像と、予め取得した各カメラに対応する背景画像との差分を算出することにより、撮影画像内のオブジェクトに対応する部分（前景領域）が抽出されたシルエット画像を生成する。そして画像生成部１２０は、各カメラに対応するシルエット画像と各カメラのパラメータを用いて、オブジェクトの３次元形状を推定する。３次元形状の推定には、例えばＶｉｓｕａｌＨｕｌｌ手法が用いられる。この処理の結果、被写体となるオブジェクトの３次元形状を表現した３Ｄ点群（３次元座標を持つ点の集合）が得られる。なお、カメラ群２による撮影画像からオブジェクトの３次元形状を導出する方法はこれに限らない。 In step S <b> 2030, the image generation unit 120 estimates the three-dimensional shape of the object that is the subject based on the image captured by the camera group 2. The object that is the subject is, for example, a person or a moving object that exists within the shooting range of the camera group 2. The image generation unit 120 calculates a difference between the captured image acquired from the camera group 2 and a background image corresponding to each camera acquired in advance, thereby extracting a portion (foreground region) corresponding to the object in the captured image. Generate a silhouette image. Then, the image generation unit 120 estimates the three-dimensional shape of the object using the silhouette image corresponding to each camera and the parameters of each camera. For example, the Visual Hull method is used for the estimation of the three-dimensional shape. As a result of this processing, a 3D point group (a set of points having three-dimensional coordinates) expressing the three-dimensional shape of the object as the subject is obtained. Note that the method of deriving the three-dimensional shape of the object from the image captured by the camera group 2 is not limited to this.

Ｓ２０４０において、画像生成部１２０は、取得された仮想視点情報に基づいて、３Ｄ点群と背景３Ｄモデルをレンダリングし、仮想視点画像を生成する。背景３Ｄモデルは、例えばカメラ群２が設置されている競技場などのＣＧモデルであり、予め作成されて画像処理システム１０内に保存されている。ここまでの処理により生成される仮想視点画像において、オブジェクトに対応する領域や背景領域はそれぞれ所定の色（例えば一色）で表示される。なお、３Ｄ点群や背景３Ｄモデルをレンダリングする処理はゲームや映画の分野において既知であり、例えばＧＰＵを用いて処理する方法など、高速に処理を行うための方法が知られている。そのため、Ｓ２０４０までの処理で生成される仮想視点画像は、カメラ群２による撮影及び操作者による仮想視点の指定に応じて高速に生成可能である。 In S2040, the image generation unit 120 renders the 3D point group and the background 3D model based on the acquired virtual viewpoint information, and generates a virtual viewpoint image. The background 3D model is a CG model such as a stadium where the camera group 2 is installed, for example, and is created in advance and stored in the image processing system 10. In the virtual viewpoint image generated by the processes so far, the area corresponding to the object and the background area are each displayed in a predetermined color (for example, one color). Note that the process of rendering the 3D point cloud and the background 3D model is known in the field of games and movies, and a method for performing high-speed processing such as a method of processing using a GPU is known. Therefore, the virtual viewpoint image generated by the processing up to S2040 can be generated at high speed according to the shooting by the camera group 2 and the designation of the virtual viewpoint by the operator.

Ｓ２０５０において、出力部１３０は、画像生成部１２０によりＳ２０４０で生成された仮想視点画像を、操作者に仮想視点を指定させるための表示装置３へ出力する。ここで、表示装置３により表示される表示画面３０の画面構成を、図４を用いて説明する。表示画面３０は領域３１０と領域３２０と領域３３０から構成される。例えば、指定用画像として生成された仮想視点画像は領域３１０に表示され、ライブ画像として生成された仮想視点画像は領域３２０に表示され、非ライブ画像として生成された仮想視点画像は領域３３０に表示される。すなわち、Ｓ２０４０において生成されＳ２０５０において出力された仮想視点画像は、領域３１０に表示される。そして操作者は領域３１０の画面を見ながら仮想視点の指定を行う。なお、表示装置３は少なくとも指定用画像を表示すればよく、ライブ画像や非ライブ画像を表示しなくてもよい。 In S2050, the output unit 130 outputs the virtual viewpoint image generated in S2040 by the image generation unit 120 to the display device 3 for allowing the operator to specify the virtual viewpoint. Here, the screen configuration of the display screen 30 displayed by the display device 3 will be described with reference to FIG. The display screen 30 includes an area 310, an area 320, and an area 330. For example, the virtual viewpoint image generated as the designation image is displayed in the area 310, the virtual viewpoint image generated as the live image is displayed in the area 320, and the virtual viewpoint image generated as the non-live image is displayed in the area 330. Is done. That is, the virtual viewpoint image generated in S2040 and output in S2050 is displayed in area 310. Then, the operator designates the virtual viewpoint while looking at the screen of the area 310. The display device 3 only needs to display at least the designation image, and does not have to display a live image or a non-live image.

Ｓ２０６０において、画像生成部１２０は、Ｓ２０４０で生成した仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、仮想視点を指定させるための低画質な画像だけが必要とされている場合は、Ｓ２０７０へは進まず処理を終了する。一方、より高画質な画像が必要である場合は、Ｓ２０７０へ進み処理を続ける。 In step S2060, the image generation unit 120 determines whether to perform processing for generating a virtual viewpoint image with higher image quality than the virtual viewpoint image generated in step S2040. For example, if only a low-quality image for specifying the virtual viewpoint is required, the process does not proceed to S2070 but ends. On the other hand, if a higher quality image is required, the process proceeds to S2070 and the process is continued.

Ｓ２０７０において、画像生成部１２０は、Ｓ２０３０で推定したオブジェクトの形状モデル（３Ｄ点群）を、例えばＰｈｏｔｏＨｕｌｌ手法を用いてさらに高精度化する。具体的には、３Ｄ点群の各点を各カメラの撮影画像に射影し、各撮影画像における色の一致度を評価することで、その点が被写体形状を表現するために必要な点かどうかを判定する。例えば３Ｄ点群内のある点について、射影先の画素値の分散が閾値より大きければ、その点は被写体の形状を表す点としては正しくないと判定され、３Ｄ点群からその点が削除される。この処理を３Ｄ点群内の全点に対して行い、オブジェクトの形状モデルの高精度化を実現する。なお、オブジェクトの形状モデルを高精度化する方法はこれに限らない。 In step S2070, the image generation unit 120 further increases the accuracy of the object shape model (3D point group) estimated in step S2030 using, for example, the PhotoHull method. Specifically, by projecting each point of the 3D point group onto the captured image of each camera and evaluating the degree of color matching in each captured image, whether or not the point is necessary for expressing the subject shape Determine. For example, if the variance of the projection target pixel value is larger than a threshold value for a certain point in the 3D point group, it is determined that the point is not correct as a point representing the shape of the subject, and the point is deleted from the 3D point group. . This processing is performed on all points in the 3D point group, thereby realizing high accuracy of the shape model of the object. Note that the method for increasing the accuracy of the shape model of the object is not limited to this.

Ｓ２０８０において、画像生成部１２０は、Ｓ２０７０で高精度化された３Ｄ点群に色を付け、それを仮想視点の座標に射影して前景領域に対応する前景画像を生成する処理と、仮想視点から見た背景画像を生成する処理とを実行する。そして画像生成部１２０は、生成された背景画像に前景画像を重ねることでライブ画像としての仮想視点画像を生成する。 In step S2080, the image generation unit 120 adds a color to the 3D point group that has been improved in step S2070, projects the image to the coordinates of the virtual viewpoint, and generates a foreground image corresponding to the foreground area. A process of generating a viewed background image is executed. Then, the image generation unit 120 generates a virtual viewpoint image as a live image by superimposing the foreground image on the generated background image.

ここで、仮想視点画像の前景画像（オブジェクトに対応する領域の画像）を生成する方法の一例について説明する。前景画像を生成するために、３Ｄ点群に色を付ける処理が実行される。色付け処理は点の可視性判定と色の算出処理で構成される。可視性の判定では、３Ｄ点群内の各点とカメラ群２に含まれる複数のカメラとの位置関係から、各点について撮影可能なカメラを特定することができる。次に各点について、その点を撮影可能なカメラの撮影画像に点を射影し、射影先の画素の色をその点の色とする。ある点が複数のカメラにより撮影可能な場合、複数のカメラの撮影画像に点を射影し、射影先の画素値を取得し、画素値の平均を算出することでその点の色を決める。このようにして色が付けられた３Ｄ点群を既存のＣＧレンダリング手法によりレンダリングすることで、仮想視点画像の前景画像を生成することができる。 Here, an example of a method for generating a foreground image (an image of an area corresponding to an object) of a virtual viewpoint image will be described. In order to generate the foreground image, a process of coloring the 3D point group is executed. The coloring process includes a point visibility determination and a color calculation process. In the visibility determination, it is possible to specify a camera that can be photographed for each point from the positional relationship between each point in the 3D point group and a plurality of cameras included in the camera group 2. Next, for each point, the point is projected onto a captured image of a camera that can shoot the point, and the color of the pixel at the projection destination is set as the color of the point. When a certain point can be captured by a plurality of cameras, the point is projected onto the captured images of the plurality of cameras, the pixel value of the projection destination is obtained, and the average of the pixel values is calculated to determine the color of the point. A foreground image of a virtual viewpoint image can be generated by rendering the 3D point group that is colored in this way by an existing CG rendering method.

次に、仮想視点画像の背景画像を生成する方法の一例について説明する。まず、背景３Ｄモデルの頂点（例えば競技場の端に対応する点）が設定される。そして、これらの頂点が、仮想視点に近い２台のカメラ（第１カメラ及び第２カメラとする）の座標系と仮想視点の座標系に射影される。また、仮想視点と第１カメラの対応点、及び仮想視点と第２カメラの対応点を用いて、仮想視点と第１カメラの間の第１射影行列と仮想視点と第２カメラの間の第２射影行列が算出される。そして、第１射影行列と第２射影行列を用いて、背景画像の各画素が第１カメラの撮影画像と第２カメラの撮影画像に射影され、射影先の２つの画素値の平均を算出することで、背景画像の画素値が決定される。なお、同様の方法により、３台以上のカメラの撮影画像から背景画像の画素値を決定してもよい。 Next, an example of a method for generating a background image of a virtual viewpoint image will be described. First, the vertex of the background 3D model (for example, a point corresponding to the end of the playing field) is set. These vertices are projected onto the coordinate system of two cameras (referred to as the first camera and the second camera) close to the virtual viewpoint and the coordinate system of the virtual viewpoint. In addition, the first projection matrix between the virtual viewpoint and the first camera and the first projection matrix between the virtual viewpoint and the second camera using the corresponding points of the virtual viewpoint and the first camera and the corresponding points of the virtual viewpoint and the second camera. A two-projection matrix is calculated. Then, using the first projection matrix and the second projection matrix, each pixel of the background image is projected onto the captured image of the first camera and the captured image of the second camera, and the average of the two pixel values of the projection destination is calculated. Thus, the pixel value of the background image is determined. Note that the pixel value of the background image may be determined from the captured images of three or more cameras by a similar method.

このようにして得られた仮想視点画像の背景画像上に前景画像を重ねることで、色が付いた仮想視点画像が生成できる。すなわち、Ｓ２０８０で生成された仮想視点画像はＳ２０４０で生成された仮想視点画像よりも色の階調数に関して画質が高い。逆に言うと、Ｓ２０４０で生成された仮想視点画像に含まれる色の階調数は、Ｓ２０８０で生成された仮想視点画像に含まれる色の階調数より少ない。なお、仮想視点画像に色情報を付加する方法はこれに限らない。 By overlaying the foreground image on the background image of the virtual viewpoint image obtained in this way, a colored virtual viewpoint image can be generated. That is, the virtual viewpoint image generated in S2080 has higher image quality with respect to the number of color gradations than the virtual viewpoint image generated in S2040. In other words, the number of gradations of colors included in the virtual viewpoint image generated in S2040 is smaller than the number of gradations of colors included in the virtual viewpoint image generated in S2080. Note that the method of adding color information to the virtual viewpoint image is not limited to this.

Ｓ２０９０において、出力部１３０は、画像生成部１２０によりＳ２０８０において生成された仮想視点画像を、ライブ画像として表示装置３及び表示装置４へ出力する。表示装置３に出力された画像は領域３２０へ表示されて操作者が見ることができ、表示装置４に出力された画像は視聴者が見ることができる。 In S2090, the output unit 130 outputs the virtual viewpoint image generated in S2080 by the image generation unit 120 to the display device 3 and the display device 4 as a live image. The image output to the display device 3 is displayed in the area 320 and can be viewed by the operator, and the image output to the display device 4 can be viewed by the viewer.

Ｓ２１００において、画像生成部１２０は、Ｓ２０８０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、仮想視点画像を視聴者に対して生放送でのみ提供する場合は、Ｓ２１１０へは進まず処理を終了する。一方、収録後に視聴者に向けてより高画質な画像を放送する場合は、Ｓ２１１０へ進み処理を続ける。 In step S2100, the image generation unit 120 determines whether to perform a process of generating a virtual viewpoint image with higher image quality than the virtual viewpoint image generated in step S2080. For example, when the virtual viewpoint image is provided only to the viewer by live broadcasting, the process does not proceed to S2110 and the process ends. On the other hand, when a higher quality image is broadcast to the viewer after recording, the process proceeds to S2110 and the process is continued.

Ｓ２１１０において、画像生成部１２０は、Ｓ２０７０で生成されたオブジェクトの形状モデルをさらに高精度化する。本実施形態では、形状モデルの孤立点を削除することで高精度化を実現する。孤立点除去においては、まず、ＰｈｏｔｏＨｕｌｌで算出されたボクセル集合（３Ｄ点群）について、各ボクセルの周囲に別のボクセルが存在するか否か調べられる。周囲にボクセルがない場合、そのボクセルは孤立した点であると判断され、そのボクセルはボクセル集合から削除される。このようにして孤立点を削除した形状モデルを用いてＳ２０８０と同様の処理を実行することで、Ｓ２０８０で生成された仮想視点画像よりもオブジェクトの形状が高精度化された仮想視点画像が生成される。 In step S2110, the image generation unit 120 further increases the accuracy of the shape model of the object generated in step S2070. In the present embodiment, high accuracy is realized by deleting isolated points of the shape model. In isolated point removal, first, it is checked whether or not another voxel exists around each voxel in the voxel set (3D point group) calculated by Photo Hull. If there are no surrounding voxels, it is determined that the voxel is an isolated point, and the voxel is deleted from the voxel set. By executing processing similar to S2080 using the shape model from which the isolated points are deleted in this way, a virtual viewpoint image in which the shape of the object is made more accurate than the virtual viewpoint image generated in S2080 is generated. The

Ｓ２１２０において、画像生成部１２０は、Ｓ２１１０で生成された仮想視点画像の前景領域と背景領域との境界に平滑化処理をかけ、境界領域が滑らかに表示されるように画像の修正を行う。 In step S2120, the image generation unit 120 performs a smoothing process on the boundary between the foreground region and the background region of the virtual viewpoint image generated in step S2110, and corrects the image so that the boundary region is displayed smoothly.

Ｓ２１３０において、出力部１３０は、画像生成部１２０によりＳ２１２０において生成された仮想視点画像を非ライブ画像として表示装置３及び表示装置４へ出力する。表示装置３へ出力された非ライブ画像は領域３３０へ表示される。 In S2130, the output unit 130 outputs the virtual viewpoint image generated in S2120 by the image generation unit 120 to the display device 3 and the display device 4 as a non-live image. The non-live image output to the display device 3 is displayed in the area 330.

以上の処理により画像処理装置１は、指定用画像としての仮想視点画像と、指定用画像より画質が高い仮想視点画像であるライブ画像とを、１組の撮影画像と仮想視点情報に基づいて生成する。また、画像処理装置１は、ライブ画像よりさらに画質が高い仮想視点画像である非ライブ画像も生成する。そして画像処理装置１は、生成したライブ画像及び非ライブ画像を、非ライブ画像が表示されるより前にライブ画像が表示されるように、表示装置４へ出力する。また画像処理装置１は、生成した指定用画像を、ライブ画像が表示装置４に表示されるより前に指定用画像が表示装置３に表示されるように、表示装置３へ出力する。 Through the above processing, the image processing apparatus 1 generates a virtual viewpoint image as a designation image and a live image that is a virtual viewpoint image with higher image quality than the designation image based on a set of captured images and virtual viewpoint information. To do. The image processing apparatus 1 also generates a non-live image that is a virtual viewpoint image with higher image quality than the live image. Then, the image processing device 1 outputs the generated live image and non-live image to the display device 4 so that the live image is displayed before the non-live image is displayed. The image processing apparatus 1 outputs the generated designation image to the display device 3 so that the designation image is displayed on the display device 3 before the live image is displayed on the display device 4.

これにより、表示装置４は、低画質の指定用画像と、指定用画像より高画質であり生放送されるライブ画像と、ライブ画像よりさらに高画質であり収録後に放送される非ライブ画像とを表示することが可能となる。なお、表示装置４はライブ画像と非ライブ画像の何れか一方だけを表示してもよく、その場合には画像処理装置１は表示装置４に適した仮想視点画像を出力する。また、表示装置３は、指定用画像としての低画質の仮想視点画像と、ライブ画像としての中画質の仮想視点画像と、非ライブ画像としての高画質の仮想視点画像との、３種類の仮想視点画像を表示することが可能となる。なお、表示装置３はライブ画像及び非ライブ画像の少なくとも何れかを表示しなくてもよい。 Thereby, the display device 4 displays a low-quality designation image, a live image that is higher in quality than the designation image and is broadcast live, and a non-live image that is higher in quality than the live image and broadcast after recording. It becomes possible to do. Note that the display device 4 may display only one of the live image and the non-live image. In this case, the image processing device 1 outputs a virtual viewpoint image suitable for the display device 4. Further, the display device 3 has three types of virtual viewpoints: a low-quality virtual viewpoint image as a designation image, a medium-quality virtual viewpoint image as a live image, and a high-quality virtual viewpoint image as a non-live image. A viewpoint image can be displayed. The display device 3 may not display at least one of the live image and the non-live image.

すなわち、画像処理装置１は、ユーザに仮想視点を指定させるための表示装置３に対して指定用画像を出力する。そして画像処理装置１は、ユーザによる仮想視点の指定に基づいて生成される仮想視点画像を表示するための表示装置４に対して指定用画像より高画質なライブ画像及び非ライブ画像の少なくとも何れかを出力する。これにより、仮想視点を指定するために低遅延で仮想視点画像を表示させたい操作者と、高画質な仮想視点画像を見たい視聴者の、両方の要件に応えることができる。 That is, the image processing apparatus 1 outputs a designation image to the display device 3 for allowing the user to designate a virtual viewpoint. The image processing apparatus 1 then displays at least one of a live image and a non-live image with higher image quality than the designation image on the display device 4 for displaying a virtual viewpoint image generated based on the designation of the virtual viewpoint by the user. Is output. Thereby, it is possible to satisfy both requirements of an operator who wants to display a virtual viewpoint image with low delay in order to designate a virtual viewpoint and a viewer who wants to see a high-quality virtual viewpoint image.

なお、以上の処理では、カメラ群２による撮影に基づく画像と仮想視点の指定に応じた情報とに基づいて仮想視点画像が生成され、その生成のための処理の結果に基づいてより高画質の仮想視点画像が生成される。そのため、低画質の仮想視点画像と高画質の仮想視点画像をそれぞれ独立した処理で生成する場合よりも、全体の処理量を低減することができる。ただし、低画質の仮想視点画像と高画質の仮想視点画像を独立した処理により生成してもよい。また、仮想視点画像を競技会場やライブ会場に設置されたディスプレイに表示させたり生放送したりする場合であって、収録後に放送する必要がない場合には、画像処理装置１は非ライブ画像を生成するための処理を行わない。これにより、高画質な非ライブ画像を生成するための処理量を削減することができる。 In the above processing, a virtual viewpoint image is generated based on the image based on the photographing by the camera group 2 and information according to the designation of the virtual viewpoint, and a higher image quality is obtained based on the processing result for the generation. A virtual viewpoint image is generated. Therefore, the overall processing amount can be reduced as compared with the case where a low-quality virtual viewpoint image and a high-quality virtual viewpoint image are generated by independent processing. However, the low-quality virtual viewpoint image and the high-quality virtual viewpoint image may be generated by independent processing. In addition, when the virtual viewpoint image is displayed on a display installed at a competition venue or a live venue or is broadcast live, and it is not necessary to broadcast after recording, the image processing apparatus 1 generates a non-live image. The processing to do is not performed. Thereby, the processing amount for generating a high-quality non-live image can be reduced.

次に図５を用いて、画像処理装置１の動作の別の１形態について説明する。図３を用いて上述した動作形態では、低画質の仮想視点画像を生成した後に、新たな種別の処理を追加で行うことで、高画質の仮想視点画像を生成する。一方、図５を用いて以下で説明する動作形態では、仮想視点画像を生成するために使用するカメラの台数を増やすことで仮想視点画像の高画質化を実現する。以下の説明において、図３の処理と同様の部分については説明を省略する。 Next, another embodiment of the operation of the image processing apparatus 1 will be described with reference to FIG. In the operation mode described above with reference to FIG. 3, after a low-quality virtual viewpoint image is generated, a new type of processing is additionally performed to generate a high-quality virtual viewpoint image. On the other hand, in the operation mode described below with reference to FIG. 5, the image quality of the virtual viewpoint image is improved by increasing the number of cameras used to generate the virtual viewpoint image. In the following description, the description of the same part as the process of FIG. 3 is omitted.

図５に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始される。ただし図５の処理の開始タイミングはこれに限定されない。Ｓ２０１０及びＳ２０２０において、画像処理装置１は、図３で説明したものと同様の処理により、カメラ群２の各カメラによる撮影画像と仮想視点情報とを取得する。 The process illustrated in FIG. 5 is started at the timing when the viewpoint acquisition unit 110 receives a virtual viewpoint image generation instruction. However, the start timing of the process of FIG. 5 is not limited to this. In S2010 and S2020, the image processing apparatus 1 acquires a captured image and virtual viewpoint information by each camera in the camera group 2 by the same processing as that described in FIG.

Ｓ４０３０において、画像生成部１２０は、仮想視点画像の生成に用いる撮影画像に対応するカメラの数を設定する。ここで画像生成部１２０は、Ｓ４０５０−Ｓ４０７０の処理が所定の閾値（例えば仮想視点画像が動画である場合の１フレームに対応する時間）以下の処理時間で完了するようにカメラの数を設定する。例えば、予め１００台のカメラの撮影画像を用いてＳ４０５０−Ｓ４０７０の処理を実行し、その処理時間が０．５秒であったとする。この場合に、フレームレートが６０ｆｐｓ（ｆｒａｍｅｐｅｒｓｅｃｏｎｄ）である仮想視点画像の１フレームに対応する０．０１６秒以内にＳ４０５０−Ｓ４０７０の処理を完了させたければ、カメラの数を３台に設定する。 In step S4030, the image generation unit 120 sets the number of cameras corresponding to the captured image used for generating the virtual viewpoint image. Here, the image generation unit 120 sets the number of cameras so that the processing of S4050 to S4070 is completed in a processing time equal to or less than a predetermined threshold (for example, a time corresponding to one frame when the virtual viewpoint image is a moving image). . For example, it is assumed that the processing of S4050 to S4070 is executed in advance using images taken by 100 cameras and the processing time is 0.5 seconds. In this case, if it is desired to complete the processing of S4050-S4070 within 0.016 seconds corresponding to one frame of a virtual viewpoint image having a frame rate of 60 fps (frame per second), the number of cameras is set to three. .

なお、Ｓ４０５０−Ｓ４０７０の処理によって仮想視点画像が出力された後に、Ｓ４０８０において画像生成を続ける判断がされた場合、Ｓ４０３０に戻って使用するカメラの数を再設定する。ここでは、先に出力した仮想視点画像より高画質な仮想視点画像が生成されるように、許容する処理時間を長くし、それに応じてカメラの数を増やす。例えば、０．１秒以下の処理時間でＳ４０５０−Ｓ４０７０の処理が完了されるように、使用する撮影画像に対応するカメラの数を２０台に設定する。 If it is determined in S4080 that image generation is to be continued after the virtual viewpoint image is output in the processes of S4050 to S4070, the process returns to S4030 to reset the number of cameras to be used. Here, the permissible processing time is increased and the number of cameras is increased accordingly so that a virtual viewpoint image with higher image quality than the previously output virtual viewpoint image is generated. For example, the number of cameras corresponding to the captured image to be used is set to 20 so that the processing of S4050 to S4070 is completed in a processing time of 0.1 seconds or less.

Ｓ４０４０において、画像生成部１２０は、仮想視点画像を生成するために使用する撮影画像に対応するカメラを、Ｓ４０３０で設定されたカメラの数に応じてカメラ群２の中から選択する。例えば、１００台のカメラから３台のカメラを選択する場合、仮想視点に一番近いカメラと、そのカメラから数えて３４台目のカメラ及び６７台目のカメラを選択する。 In S4040, the image generation unit 120 selects a camera corresponding to the captured image used to generate the virtual viewpoint image from the camera group 2 according to the number of cameras set in S4030. For example, when three cameras are selected from 100 cameras, the camera closest to the virtual viewpoint and the 34th camera and the 67th camera counted from the camera are selected.

また、仮想視点画像を１回生成した後に、使用する撮影画像の数を増やして２回目の処理を行う場合には、１回目の処理で推定した形状モデルをさらに高精度化することから、１回目で選択されたカメラ以外のカメラが選択される。具体的には、１００台のカメラから２０台のカメラを選択する場合、１回目の処理で選択されていないカメラの中から仮想視点に一番近いカメラをまず選択し、そこから５台間隔でカメラを選択していく。この際、１回目で既に選択したカメラは飛ばして次のカメラを選択する。なお、例えば非ライブ画像として最も高画質な仮想視点画像を生成する場合には、カメラ群２に含まれる全てのカメラを選択し、各カメラの撮影画像を使用してＳ４０５０−Ｓ４０７０の処理を実行する。 Further, after the virtual viewpoint image is generated once, when the number of captured images to be used is increased and the second processing is performed, the shape model estimated in the first processing is further improved in accuracy. A camera other than the camera selected at the second time is selected. Specifically, when 20 cameras are selected from 100 cameras, the camera closest to the virtual viewpoint is first selected from the cameras not selected in the first processing, and then 5 cameras are selected at intervals of 5 cameras. Select the camera. At this time, the camera already selected in the first time is skipped and the next camera is selected. For example, when generating the highest-quality virtual viewpoint image as a non-live image, all the cameras included in the camera group 2 are selected, and the processing of S4050-S4070 is executed using the captured images of each camera. To do.

なお、使用する撮影画像に対応するカメラを選択する方法はこれに限らない。例えば、仮想視点に近いカメラを優先して選択してもよい。この場合、被写体となるオブジェクトの形状推定において仮想視点からは見えない背面領域の形状推定の精度は低くなるが、仮想視点から見える前面領域の形状推定の精度は向上する。つまり、仮想視点画像の中で視聴者にとって目につき易い領域の画質を優先的に向上させることができる。 Note that the method of selecting a camera corresponding to a captured image to be used is not limited to this. For example, a camera close to the virtual viewpoint may be selected with priority. In this case, the accuracy of shape estimation of the rear region that cannot be seen from the virtual viewpoint in the shape estimation of the object that is the subject is lowered, but the accuracy of shape estimation of the front region that is visible from the virtual viewpoint is improved. That is, it is possible to preferentially improve the image quality of an area that is easily noticeable by the viewer in the virtual viewpoint image.

Ｓ４０５０において、画像生成部１２０は、Ｓ４０４０で選択されたカメラによる撮影画像を用いて、オブジェクトの形状推定処理を実行する。ここでの処理は、例えば、図３のＳ２０３０における処理（ＶｉｓｕａｌＨｕｌｌ）とＳ２０７０における処理（ＰｈｏｔｏＨｕｌｌ）の組み合わせである。ＶｉｓｕａｌＨｕｌｌの処理は、使用する複数の撮影画像に対応する複数のカメラの視体積の論理積を計算する処理を含む。また、ＰｈｏｔｏＨｕｌｌの処理は形状モデルの各点を複数の撮影画像に射影して画素値の一貫性を計算する処理を含む。そのため、使用する撮影画像に対応するカメラの数が少ないほど、形状推定の精度は低くなり処理時間が短くなる。 In step S4050, the image generation unit 120 executes object shape estimation processing using the image captured by the camera selected in step S4040. The process here is, for example, a combination of the process in S2030 (VisualHull) and the process in S2070 (PhotoHull) in FIG. The VisualHull process includes a process of calculating a logical product of the viewing volumes of a plurality of cameras corresponding to a plurality of captured images to be used. The PhotoHull process includes a process of projecting each point of the shape model onto a plurality of captured images and calculating the consistency of pixel values. Therefore, the smaller the number of cameras corresponding to the captured image to be used, the lower the accuracy of shape estimation and the shorter the processing time.

Ｓ４０６０において、画像生成部１２０は、レンダリング処理を実行する。ここでの処理は、図３のＳ２０８０における処理と同様であり、３Ｄ点群の色付け処理と背景画像の生成処理を含む。３Ｄ点群の色付け処理も背景画像の生成処理も、複数の撮影画像の対応する点の画素値を用いた計算により色を決定する処理を含む。そのため、使用する撮影画像に対応するカメラの数が少ないほど、レンダリングの精度は低くなり処理時間が短くなる。 In S4060, the image generation unit 120 executes rendering processing. The processing here is the same as the processing in S2080 of FIG. 3, and includes 3D point group coloring processing and background image generation processing. Both the 3D point group coloring process and the background image generation process include a process of determining a color by calculation using pixel values of corresponding points of a plurality of captured images. Therefore, the smaller the number of cameras corresponding to the captured image to be used, the lower the rendering accuracy and the shorter the processing time.

Ｓ４０７０において、出力部１３０は、画像生成部１２０によりＳ４０６０において生成された仮想視点画像を、表示装置３や表示装置４へ出力する。 In step S4070, the output unit 130 outputs the virtual viewpoint image generated in step S4060 by the image generation unit 120 to the display device 3 and the display device 4.

Ｓ４０８０において、画像生成部１２０は、Ｓ４０６０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、Ｓ４０６０において生成された仮想視点画像が操作者に仮想視点を指定させるための画像であり、さらにライブ画像を生成する場合には、Ｓ４０３０に戻って、使用するカメラの数を増やしてライブ画像としての仮想視点画像を生成する。また、さらにライブ画像を生成した後に、非ライブ画像を生成する場合には、さらにカメラの数を増やして非ライブ画像としての仮想視点画像を生成する。すなわち、ライブ用画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数は、指定用画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数より多いため、ライブ画像は指定用画像よりも画質が高い。同様に、非ライブ画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数は、ライブ画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数よりも多いため、非ライブ画像はライブ画像よりも画質が高い。 In step S4080, the image generation unit 120 determines whether to perform a process of generating a virtual viewpoint image with higher image quality than the virtual viewpoint image generated in step S4060. For example, the virtual viewpoint image generated in S4060 is an image for allowing the operator to specify a virtual viewpoint, and when further generating a live image, the process returns to S4030 to increase the number of cameras to be used and display the live image. As a virtual viewpoint image. Further, when a non-live image is generated after further generating a live image, the number of cameras is further increased to generate a virtual viewpoint image as a non-live image. That is, because the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the live image is larger than the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the specifying image, Live images have higher image quality than designated images. Similarly, the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the non-live image is larger than the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the live image. Non-live images have higher image quality than live images.

なおＳ４０８０において、既に生成した仮想視点画像より高画質な仮想視点画像を生成する必要がないと判断された場合、もしくはより高画質な仮想視点画像を生成することはできないと判断された場合には、処理を終了する。 If it is determined in S4080 that it is not necessary to generate a higher-quality virtual viewpoint image than the already generated virtual viewpoint image, or if it is determined that a higher-quality virtual viewpoint image cannot be generated. The process is terminated.

以上の処理により、画像処理装置１は、画質を段階的に向上させた複数の仮想視点画像をそれぞれ適切なタイミングで生成して出力することが可能となる。例えば、仮想視点画像の生成に使用するカメラを、設定された処理時間以内に生成処理が完了できるような台数に制限することで、遅延の少ない指定用画像を生成することができる。また、ライブ画像や非ライブ画像を生成する場合には、使用するカメラの数を増やして生成処理を行うことで、より高画質の画像を生成することができる。 Through the above processing, the image processing apparatus 1 can generate and output a plurality of virtual viewpoint images whose image quality is improved in stages at appropriate timings. For example, by limiting the number of cameras used for generating the virtual viewpoint image to the number that can complete the generation process within the set processing time, it is possible to generate the designation image with a small delay. Further, when generating a live image or a non-live image, it is possible to generate a higher quality image by increasing the number of cameras used and performing the generation process.

次に図６を用いて、画像処理装置１の動作の別の１形態について説明する。図５を用いて上述した動作形態では、仮想視点画像を生成するために使用するカメラの台数を増やすことで仮想視点画像の高画質化を実現する。一方、図６を用いて以下で説明する動作形態では、仮想視点画像の解像度を段階的に高めていくことで仮想視点画像の高画質化を実現する。以下の説明において、図３や図５の処理と同様の部分については説明を省略する。なお、以下で説明する動作形態においては、生成される仮想視点画像の画素数は常に４Ｋ（３８４０×２１６０）であり、画素値の計算を大きい画素ブロックごとに行うか小さい画素ブロックごとに行うかによって仮想視点画像の解像度を制御する。ただしこれに限らず、生成される仮想視点画像の画素数を変更することで解像度を制御してもよい。 Next, another embodiment of the operation of the image processing apparatus 1 will be described with reference to FIG. In the operation mode described above with reference to FIG. 5, the image quality of the virtual viewpoint image is improved by increasing the number of cameras used for generating the virtual viewpoint image. On the other hand, in the operation mode described below with reference to FIG. 6, the image quality of the virtual viewpoint image is improved by gradually increasing the resolution of the virtual viewpoint image. In the following description, the description of the same part as the process of FIG. 3 or 5 is omitted. In the operation mode described below, the number of pixels of the generated virtual viewpoint image is always 4K (3840 × 2160), and whether the pixel value is calculated for each large pixel block or each small pixel block. To control the resolution of the virtual viewpoint image. However, the present invention is not limited to this, and the resolution may be controlled by changing the number of pixels of the generated virtual viewpoint image.

図６に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始される。ただし図６の処理の開始タイミングはこれに限定されない。Ｓ２０１０及びＳ２０２０において、画像処理装置１は、図３で説明したものと同様の処理により、カメラ群２の各カメラによる撮影画像と仮想視点情報とを取得する。 The process illustrated in FIG. 6 is started at the timing when the viewpoint acquisition unit 110 receives a virtual viewpoint image generation instruction. However, the start timing of the process of FIG. 6 is not limited to this. In S2010 and S2020, the image processing apparatus 1 acquires a captured image and virtual viewpoint information by each camera in the camera group 2 by the same processing as that described in FIG.

Ｓ５０３０において、画像生成部１２０は、生成する仮想視点画像の解像度を設定する。ここで画像生成部１２０は、Ｓ５０５０及びＳ４０７０の処理が所定の閾値以下の処理時間で完了するように解像度を設定する。例えば、予め４Ｋ解像度の仮想視点画像を生成する場合のＳ５０５０及びＳ４０７０の処理を実行し、その処理時間が０．５秒であったとする。この場合に、フレームレートが６０ｆｐｓである仮想視点画像の１フレームに対応する０．０１６秒以内にＳ５０５０及びＳ４０７０の処理を完了させたければ、解像度を４Ｋの０．０１６／０．５＝１／３１．２５倍以下にする必要がある。そこで、仮想視点画像の解像度を縦横それぞれ４Ｋ解像度の１／８倍に設定すれば、画素値を計算すべき画素ブロックの数は１／６４になり、０．０１６秒未満で処理を完了できる。 In step S5030, the image generation unit 120 sets the resolution of the virtual viewpoint image to be generated. Here, the image generation unit 120 sets the resolution so that the processing of S5050 and S4070 is completed in a processing time equal to or less than a predetermined threshold. For example, it is assumed that the processing of S5050 and S4070 when generating a 4K resolution virtual viewpoint image in advance is performed and the processing time is 0.5 seconds. In this case, if the processing of S5050 and S4070 is to be completed within 0.016 seconds corresponding to one frame of a virtual viewpoint image with a frame rate of 60 fps, the resolution is 0.016 / 0.5 = 1 / of 4K. It is necessary to make it 31.25 times or less. Therefore, if the resolution of the virtual viewpoint image is set to 1/8 times the 4K resolution in the vertical and horizontal directions, the number of pixel blocks whose pixel values are to be calculated becomes 1/64, and the process can be completed in less than 0.016 seconds.

なお、Ｓ５０５０及びＳ４０７０の処理によって仮想視点画像が出力された後に、Ｓ４０８０において画像生成を続ける判断がされた場合、Ｓ５０３０に戻って解像度を再設定する。ここでは、先に出力した仮想視点画像より高画質な仮想視点画像が生成されるように、許容する処理時間を長くし、それに応じて解像度を高くする。例えば、解像度を縦横それぞれ４Ｋ解像度の１／４に設定すると、０．１秒以下の処理時間でＳ５０５０及びＳ４０７０の処理が完了される。Ｓ５０４０において、画像生成部１２０は、仮想視点画像において画素値を計算すべき画素の位置を、Ｓ５０３０で設定された解像度に応じて決定する。例えば、仮想視点画像の解像度を４Ｋ解像度の１／８に設定した場合、縦横それぞれ８画素毎に画素値が算出される。そして、画素値が算出された画素（ｘ，ｙ）と画素（ｘ＋８，ｙ＋８）の間に存在する画素には、画素（ｘ，ｙ）と同じ画素値が設定される。 If it is determined in S4080 that image generation is to be continued after the virtual viewpoint image is output in the processes of S5050 and S4070, the process returns to S5030 to reset the resolution. Here, the allowable processing time is increased and the resolution is increased accordingly so that a virtual viewpoint image with higher image quality than the previously output virtual viewpoint image is generated. For example, when the resolution is set to 1/4 of the 4K resolution in the vertical and horizontal directions, the processing of S5050 and S4070 is completed in a processing time of 0.1 second or less. In S5040, the image generation unit 120 determines the position of the pixel whose pixel value is to be calculated in the virtual viewpoint image according to the resolution set in S5030. For example, when the resolution of the virtual viewpoint image is set to 1/8 of the 4K resolution, pixel values are calculated every 8 pixels in the vertical and horizontal directions. Then, the same pixel value as that of the pixel (x, y) is set to the pixel existing between the pixel (x, y) for which the pixel value is calculated and the pixel (x + 8, y + 8).

また、仮想視点画像を１回生成した後に、解像度を高くして２回目の処理を行う場合には、１回目に画素値が算出された画素は飛ばして画素値を算出する。例えば、解像度が４Ｋ解像度の１／４に設定された場合、画素（ｘ＋４，ｙ＋４）の画素値を算出し、画素（ｘ＋４，ｙ＋４）と画素（ｘ＋８，ｙ＋８）の間に存在する画素には、画素（ｘ＋４，ｙ＋４）と同じ画素値が設定される。このように、画素値を算出する画素の数を増やしていくことで、仮想視点画像の解像度を最大で４Ｋ解像度まで高くすることができる。 In addition, after the virtual viewpoint image is generated once, when the resolution is increased and the second processing is performed, the pixel value calculated at the first time is skipped to calculate the pixel value. For example, when the resolution is set to 1/4 of the 4K resolution, the pixel value of the pixel (x + 4, y + 4) is calculated, and the pixel existing between the pixel (x + 4, y + 4) and the pixel (x + 8, y + 8) is calculated. , The same pixel value as the pixel (x + 4, y + 4) is set. Thus, by increasing the number of pixels for calculating the pixel value, the resolution of the virtual viewpoint image can be increased to a maximum of 4K resolution.

Ｓ５０５０において、画像生成部１２０は、Ｓ５０４０で決定された位置の画素の画素値を算出して仮想視点画像への色付け処理を行う。画素値の算出方法としては、例えばＩｍａｇｅ−ＢａｓｅｄＶｉｓｕａｌＨｕｌｌの方法を使用することができる。この方法では画素毎に画素値が算出されるので、画素値を算出すべき画素の数が少ないほど、すなわち仮想視点画像の解像度が低いほど、処理時間が短くなる。 In step S5050, the image generation unit 120 calculates the pixel value of the pixel at the position determined in step S5040 and performs a coloring process on the virtual viewpoint image. As a pixel value calculation method, for example, an Image-Based Visual Hull method can be used. In this method, since the pixel value is calculated for each pixel, the processing time becomes shorter as the number of pixels for which the pixel value is to be calculated is smaller, that is, as the resolution of the virtual viewpoint image is lower.

Ｓ４０７０において、出力部１３０は、画像生成部１２０によりＳ５０５０において生成された仮想視点画像を、表示装置３や表示装置４へ出力する。 In step S4070, the output unit 130 outputs the virtual viewpoint image generated in step S5050 by the image generation unit 120 to the display device 3 and the display device 4.

Ｓ４０８０において、画像生成部１２０は、Ｓ５０５０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、Ｓ５０５０において生成された仮想視点画像が操作者に仮想視点を指定させるための画像であり、さらにライブ画像を生成する場合には、Ｓ５０３０に戻って、解像度を高くした仮想視点画像を生成する。また、ライブ画像を生成した後に、さらに非ライブ画像を生成する場合には、さらに解像度を高くした非ライブ画像としての仮想視点画像を生成する。すなわち、ライブ画像としての仮想視点画像は、指定用画像としての仮想視点画像より解像度が高いため、ライブ画像は指定用画像よりも画質が高い。同様に、非ライブ画像としての仮想視点画像は、ライブ画像としての仮想視点画像よりも解像度が高いため、非ライブ画像はライブ画像よりも画質が高い。 In step S4080, the image generation unit 120 determines whether to perform processing for generating a virtual viewpoint image with higher image quality than the virtual viewpoint image generated in step S5050. For example, the virtual viewpoint image generated in S5050 is an image for allowing the operator to specify a virtual viewpoint, and when generating a live image, the process returns to S5030 to generate a virtual viewpoint image with a higher resolution. . In addition, when a non-live image is further generated after the live image is generated, a virtual viewpoint image as a non-live image having a higher resolution is generated. That is, since the virtual viewpoint image as the live image has a higher resolution than the virtual viewpoint image as the designation image, the live image has higher image quality than the designation image. Similarly, since the virtual viewpoint image as a non-live image has a higher resolution than the virtual viewpoint image as a live image, the non-live image has higher image quality than the live image.

以上の処理により、画像処理装置１は、解像度を段階的に向上させた複数の仮想視点画像をそれぞれ適切なタイミングで生成して出力することが可能となる。例えば、仮想視点画像の解像度を、設定された処理時間以内に生成処理が完了できるような解像度に設定することで、遅延の少ない指定用画像を生成することができる。また、ライブ画像や非ライブ画像を生成する場合には、解像度を高く設定して生成処理を行うことで、より高画質の画像を生成することができる。 With the above processing, the image processing apparatus 1 can generate and output a plurality of virtual viewpoint images whose resolution is improved in stages at appropriate timings. For example, by setting the resolution of the virtual viewpoint image so that the generation process can be completed within the set processing time, it is possible to generate a designating image with little delay. Further, when generating a live image or a non-live image, a higher-quality image can be generated by performing generation processing with a high resolution.

以上のように、画像処理装置１は、仮想視点画像の画質を向上させるための画像処理を行うことにより高画質の画像（例えば非ライブ画像）を生成する。また画像処理装置１は、該画像処理に含まれる部分的な処理であって所定の閾値以下の処理時間で実行される処理によって低画質の画像（例えばライブ画像）を生成する。これにより、所定時間以下の遅延で表示される仮想視点画像と、高画質な仮想視点画像とを両方生成して表示することが可能となる。 As described above, the image processing apparatus 1 generates a high-quality image (for example, a non-live image) by performing image processing for improving the image quality of the virtual viewpoint image. Further, the image processing apparatus 1 generates a low-quality image (for example, a live image) by a process that is a partial process included in the image process and is executed in a processing time that is equal to or less than a predetermined threshold. This makes it possible to generate and display both a virtual viewpoint image displayed with a delay of a predetermined time or less and a high-quality virtual viewpoint image.

なお、図６の説明においては、所定の閾値以下の処理時間で生成処理を完了させるための生成パラメータ（解像度）を推定し、推定された生成パラメータで仮想視点画像を生成するものとした。ただしこれに限らず、画像処理装置１は、仮想視点画像の画質を段階的に向上させていき、処理時間が所定の閾値に達した時点において生成済みの仮想視点画像を出力してもよい。例えば、処理時間が所定の閾値に達した時点において、解像度が４Ｋ解像度の１／８である仮想視点画像が生成済みであり、解像度が４Ｋ解像度の１／４である仮想視点画像が未完成である場合には、１／８の解像度の仮想視点画像を出力してもよい。また、１／８の解像度から１／４の解像度へ解像度を向上させる処理が途中まで行われた仮想視点画像を出力してもよい。 In the description of FIG. 6, it is assumed that a generation parameter (resolution) for completing the generation process within a processing time equal to or less than a predetermined threshold is estimated, and a virtual viewpoint image is generated using the estimated generation parameter. However, the present invention is not limited to this, and the image processing apparatus 1 may improve the image quality of the virtual viewpoint image in stages, and output the generated virtual viewpoint image when the processing time reaches a predetermined threshold. For example, when the processing time reaches a predetermined threshold, a virtual viewpoint image whose resolution is 1/8 of 4K resolution has been generated, and a virtual viewpoint image whose resolution is 1/4 of 4K resolution has not been completed. In some cases, a virtual viewpoint image with 1/8 resolution may be output. Further, a virtual viewpoint image in which the process of improving the resolution from 1/8 resolution to 1/4 resolution is performed halfway may be output.

本実施形態では、画像処理装置１が有する画像生成部１２０が、カメラ情報取得部１００が取得した画像と視点取得部１１０が取得した仮想視点情報とに基づいて仮想視点画像の生成を制御し、異なる画質の複数の仮想視点画像を生成する場合を中心に説明した。ただしこれに限らず、仮想視点画像の生成を制御する機能と、実際に仮想視点画像を生成する機能とが、それぞれ異なる装置に備わっていてもよい。 In the present embodiment, the image generation unit 120 included in the image processing apparatus 1 controls generation of a virtual viewpoint image based on the image acquired by the camera information acquisition unit 100 and the virtual viewpoint information acquired by the viewpoint acquisition unit 110, The case where a plurality of virtual viewpoint images having different image quality are generated has been mainly described. However, the present invention is not limited to this, and a function for controlling generation of a virtual viewpoint image and a function for actually generating a virtual viewpoint image may be provided in different apparatuses.

例えば、画像処理システム１０内に、画像生成部１２０の機能を有し仮想視点画像を生成する生成装置（不図示）が存在してもよい。そして、画像処理装置１はカメラ情報取得部１００が取得した画像及び視点取得部１１０が取得した情報に基づいて生成装置による仮想視点画像の生成を制御してもよい。具体的には、画像処理装置１が撮影画像と仮想視点情報を生成装置に送信し、仮想視点画像の生成を制御する指示を行う。そして生成装置は、第１の仮想視点画像と、第１の仮想視点画像が表示されるより早いタイミングで表示されるべき第２の仮想視点画像であって第１の仮想視点画像より画質が低い第２の仮想視点画像とを、受信した撮影画像と仮想視点情報とに基づいて生成する。ここで第１の仮想視点画像は例えば非ライブ画像であり、第２の仮想視点画像は例えばライブ画像である。ただし第１の仮想視点画像と第２の仮想視点画像の用途はこれに限定されない。なお、画像処理装置１は、第１の仮想視点画像と第２の仮想視点画像とがそれぞれ異なる生成装置により生成されるように制御を行ってもよい。また、画像処理装置１は、生成装置による仮想視点画像の出力先や出力タイミングを制御する等の出力制御を行ってもよい。 For example, a generation device (not shown) that has the function of the image generation unit 120 and generates a virtual viewpoint image may exist in the image processing system 10. Then, the image processing apparatus 1 may control the generation of the virtual viewpoint image by the generation apparatus based on the image acquired by the camera information acquisition unit 100 and the information acquired by the viewpoint acquisition unit 110. Specifically, the image processing apparatus 1 transmits a captured image and virtual viewpoint information to the generation apparatus, and gives an instruction to control generation of the virtual viewpoint image. The generation apparatus is a first virtual viewpoint image and a second virtual viewpoint image that should be displayed at an earlier timing than the first virtual viewpoint image is displayed, and has a lower image quality than the first virtual viewpoint image. A second virtual viewpoint image is generated based on the received captured image and virtual viewpoint information. Here, the first virtual viewpoint image is a non-live image, for example, and the second virtual viewpoint image is a live image, for example. However, the use of the first virtual viewpoint image and the second virtual viewpoint image is not limited to this. Note that the image processing apparatus 1 may perform control so that the first virtual viewpoint image and the second virtual viewpoint image are generated by different generation apparatuses. The image processing apparatus 1 may perform output control such as controlling the output destination and output timing of the virtual viewpoint image by the generation apparatus.

また、生成装置が視点取得部１１０及び画像生成部１２０の機能を有しており、画像処理装置１がカメラ情報取得部１００により取得される画像に基づいて生成装置による仮想視点画像の生成を制御してもよい。ここでカメラ情報取得部１００により取得される画像は、カメラ群２により撮影された撮影画像や複数の撮影画像の差分に基づいて生成された画像などの、撮影に基づく画像である。また、生成装置がカメラ情報取得部１００及び画像生成部１２０の機能を有しており、画像処理装置１が視点取得部１１０により取得される画像に基づいて生成装置による仮想視点画像の生成を制御してもよい。ここで視点取得部１１０により取得される画像は、仮想視点画像における被写体の形状や向きなど仮想視点に応じて定まる内容を示す情報や仮想視点情報など、仮想視点の指定に応じた情報である。すなわち、画像処理装置１は、撮影に基づく画像及び仮想視点の指定に応じた情報の少なくとも何れかを含む仮想視点画像の生成に係る情報を取得し、取得した情報に基づいて仮想視点画像の生成を制御してもよい。 Further, the generation device has the functions of the viewpoint acquisition unit 110 and the image generation unit 120, and the image processing device 1 controls generation of a virtual viewpoint image by the generation device based on the image acquired by the camera information acquisition unit 100. May be. Here, the image acquired by the camera information acquisition unit 100 is an image based on shooting, such as a captured image captured by the camera group 2 or an image generated based on a difference between a plurality of captured images. Further, the generation device has the functions of the camera information acquisition unit 100 and the image generation unit 120, and the image processing device 1 controls the generation of the virtual viewpoint image by the generation device based on the image acquired by the viewpoint acquisition unit 110. May be. Here, the image acquired by the viewpoint acquisition unit 110 is information according to the designation of the virtual viewpoint, such as information indicating the contents determined according to the virtual viewpoint, such as the shape and orientation of the subject in the virtual viewpoint image, and virtual viewpoint information. That is, the image processing apparatus 1 acquires information related to generation of a virtual viewpoint image including at least one of an image based on shooting and information according to designation of the virtual viewpoint, and generates a virtual viewpoint image based on the acquired information. May be controlled.

また例えば、画像処理システム１０内に存在する生成装置がカメラ情報取得部１００、視点取得部１１０及び画像生成部１２０の機能を有しており、画像処理装置１は仮想視点画像の生成に係る情報に基づいて生成装置による仮想視点画像の生成を制御してもよい。この場合における仮想視点画像の生成に係る情報は、例えば生成装置により生成される第１の仮想視点画像の画質に関するパラメータ及び第２の仮想視点画像の画質に関するパラメータの少なくとも何れかを含む。画質に関するパラメータの具体例としては、仮想視点画像の生成に用いられる撮影画像に対応するカメラの数、仮想視点画像の解像度、仮想視点画像の生成に係る処理時間として許容される時間等がある。画像処理装置１は例えば操作者による入力に基づいてこれらの画質に関するパラメータを取得し、パラメータを生成装置に送信するなど、取得したパラメータに基づいて生成装置を制御する。これにより操作者は、それぞれ異なる所望の画質の複数の仮想視点画像を生成させることができる。 In addition, for example, the generation apparatus existing in the image processing system 10 has the functions of the camera information acquisition unit 100, the viewpoint acquisition unit 110, and the image generation unit 120. The image processing apparatus 1 is information related to generation of a virtual viewpoint image. The generation of the virtual viewpoint image by the generation device may be controlled based on the above. The information related to the generation of the virtual viewpoint image in this case includes, for example, at least one of a parameter related to the image quality of the first virtual viewpoint image and a parameter related to the image quality of the second virtual viewpoint image generated by the generation device. Specific examples of the parameters relating to the image quality include the number of cameras corresponding to the captured image used for generating the virtual viewpoint image, the resolution of the virtual viewpoint image, and the time allowed as the processing time for generating the virtual viewpoint image. The image processing apparatus 1 controls the generation device based on the acquired parameters, for example, acquires parameters related to the image quality based on an input by the operator and transmits the parameters to the generation device. Accordingly, the operator can generate a plurality of virtual viewpoint images having different desired image quality.

以上説明したように、画像処理装置１は、複数のカメラによるそれぞれ異なる方向からの被写体の撮影に基づく画像と仮想視点の指定に応じた情報とに基づく仮想視点画像の生成指示を受け付ける。そして画像処理装置１は、第１表示装置に出力される第１の仮想視点画像と第２表示装置に出力される第２の仮想視点画像とが、撮影に基づく画像と仮想視点の指定に応じた情報とに基づいて生成されるように、生成指示の受け付けに応じて制御を行う。ここで、第２の仮想視点画像は、第１の仮想視点画像より画質が高い仮想視点画像である。これにより、例えばリアルタイムで仮想視点画像を見たいユーザとリアルタイム性よりも仮想視点画像が高画質であることを優先するユーザの両方がいるような場合にも、表示されるべきタイミングに適した仮想視点画像を生成することができる。 As described above, the image processing apparatus 1 receives a virtual viewpoint image generation instruction based on an image based on photographing of a subject from different directions by a plurality of cameras and information according to designation of the virtual viewpoint. Then, the image processing apparatus 1 determines that the first virtual viewpoint image output to the first display apparatus and the second virtual viewpoint image output to the second display apparatus are based on the image based on the shooting and the designation of the virtual viewpoint. Control is performed in response to reception of the generation instruction so that the generation instruction is generated based on the received information. Here, the second virtual viewpoint image is a virtual viewpoint image with higher image quality than the first virtual viewpoint image. Thus, for example, even when there are both a user who wants to see a virtual viewpoint image in real time and a user who prioritizes that the virtual viewpoint image has higher image quality than real-time characteristics, a virtual suitable for the timing to be displayed A viewpoint image can be generated.

なお、本実施形態では仮想視点画像の画質として色の階調、解像度、及び仮想視点画像の生成に用いられる撮影画像に対応するカメラの数を制御する場合について説明したが、画質としてその他のパラメータを制御してもよい。また、画質に関する複数のパラメータを同時に制御してもよい。 In this embodiment, the case of controlling the color gradation, the resolution, and the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the image quality of the virtual viewpoint image has been described. May be controlled. A plurality of parameters relating to image quality may be controlled simultaneously.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ等）によっても実現可能である。また、そのプログラムをコンピュータにより読み取り可能な記録媒体に記録して提供してもよい。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions. Further, the program may be provided by being recorded on a computer-readable recording medium.

１画像処理装置
２カメラ群
１００カメラ情報取得部
１１０仮想視点情報取得部
１２０画像生成部 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 2 Camera group 100 Camera information acquisition part 110 Virtual viewpoint information acquisition part 120 Image generation part

Claims

Image acquisition means for acquiring images based on photographing from a plurality of directions by a plurality of cameras;
Information acquisition means for acquiring viewpoint information indicating a virtual viewpoint;
Generation means for generating a virtual viewpoint image based on the image acquired by the image acquisition means and the viewpoint information acquired by the information acquisition means; a first virtual viewpoint image output to a display device; The second virtual viewpoint image having a higher image quality than the first virtual viewpoint image, and is output to another display device at a timing later than the output of the first virtual viewpoint image to the display device. An image processing system comprising: generating means for generating two virtual viewpoint images.

The image processing system according to claim 1, wherein the first virtual viewpoint image and the second virtual viewpoint image generated by the generation unit are virtual viewpoint images corresponding to the same virtual viewpoint. .

The display device from which the first virtual viewpoint image is output is a display device used to designate a virtual viewpoint to the user,
The image processing according to claim 1, wherein the another display device from which the second virtual viewpoint image is output is a display device that is not used for allowing a user to specify a virtual viewpoint. system.

4. The image processing system according to claim 1, further comprising an output unit that outputs the first virtual viewpoint image and the second virtual viewpoint image generated by the generation unit. 5. .

The first virtual viewpoint image generated by the generating unit so that the first virtual viewpoint image is output to the display device and the second virtual viewpoint image is output to the other display device; The image processing system according to claim 1, further comprising an output control unit that controls output of the second virtual viewpoint image.

6. The viewpoint information acquired by the information acquisition unit indicates a virtual viewpoint specified based on a user operation corresponding to an image display on the display device. The image processing system described in 1.

The generation unit generates at least one of data generated in the process of generating the first virtual viewpoint image by image processing using the image acquired by the image acquisition unit and the first virtual viewpoint image. The image processing system according to claim 1, wherein the first virtual viewpoint image is used to generate the first virtual viewpoint image.

The generation unit generates the second virtual viewpoint image by performing image processing for improving image quality on a virtual viewpoint image generated based on the image based on the photographing and the viewpoint information, The first virtual viewpoint by performing a process that is a partial process included in the process for generating the second virtual viewpoint image from the virtual viewpoint image and that is executed in a processing time equal to or less than a predetermined threshold value. The image processing system according to claim 1, wherein an image is generated.

The image processing system according to claim 1, wherein the generation unit generates the first virtual viewpoint image and the second virtual viewpoint image by independent processing. .

The image processing system according to claim 1, wherein the generation unit generates the first virtual viewpoint image and the second virtual viewpoint image by different generation devices. .

The first virtual viewpoint image is an image representing the shape of an object photographed by at least one of the plurality of cameras.
11. The second virtual viewpoint image is an image representing the color of the object that does not appear in the first virtual viewpoint image in addition to the shape of the object. The image processing system according to item 1.

The image processing system according to claim 1, wherein the image quality of the virtual viewpoint image generated by the generation unit is the number of gradations of colors included in the virtual viewpoint image. .

The image processing system according to claim 1, wherein the image quality of the virtual viewpoint image generated by the generation unit is a resolution of the virtual viewpoint image.

The image processing system according to claim 1, wherein the another display device is a display installed in a competition venue or a live venue.

The image processing system according to claim 1, wherein the viewpoint information acquired by the information acquisition unit is information indicating at least one of a position and a direction of a virtual viewpoint.

Image acquisition means for acquiring images based on photographing from a plurality of directions by a plurality of cameras;
Information acquisition means for acquiring viewpoint information indicating a virtual viewpoint;
Generation means for generating a virtual viewpoint image based on the image acquired by the image acquisition means and the viewpoint information acquired by the information acquisition means; a first virtual viewpoint image output to a display device; Generating means for generating a second virtual viewpoint image output to another display device at a timing later than the output of the first virtual viewpoint image to the display device;
The processing amount of the generating unit related to the process of generating the second virtual viewpoint image from the image based on the image acquired by the image acquiring unit is the process of generating the first virtual viewpoint image from the image based on the image capturing. An image processing system having a larger processing amount than the generating means.

The image processing system according to claim 16, wherein the first virtual viewpoint image and the second virtual viewpoint image generated by the generation unit are virtual viewpoint images corresponding to the same virtual viewpoint. .

The display device from which the first virtual viewpoint image is output is a display device used to designate a virtual viewpoint to the user,
The image processing according to claim 16 or 17, wherein the another display device from which the second virtual viewpoint image is output is a display device that is not used for allowing a user to designate a virtual viewpoint. system.

Receiving means for receiving a virtual viewpoint image generation instruction;
Control means for controlling a generating means for generating a virtual viewpoint image based on images based on images taken from a plurality of directions by a plurality of cameras and viewpoint information indicating a virtual viewpoint, and is output to a display device. A virtual viewpoint image and a second virtual viewpoint image having higher image quality than the first virtual viewpoint image, and output to another display device at a timing later than the output of the first virtual viewpoint image to the display device Control means for controlling the generation means in response to acceptance of the generation instruction by the reception means so that the second virtual viewpoint image to be generated is generated by the generation means. Processing equipment.

The image processing apparatus according to claim 19, wherein the first virtual viewpoint image and the second virtual viewpoint image generated by the generation unit are virtual viewpoint images corresponding to the same virtual viewpoint. .

The display device from which the first virtual viewpoint image is output is a display device used to designate a virtual viewpoint to the user,
The image processing according to claim 19 or 20, wherein the another display device from which the second virtual viewpoint image is output is a display device that is not used for allowing a user to specify a virtual viewpoint. apparatus.

An image acquisition step of acquiring an image based on photographing from a plurality of directions by a plurality of cameras;
An information acquisition step of acquiring viewpoint information indicating a virtual viewpoint;
A generation step of generating a virtual viewpoint image based on the image acquired in the image acquisition step and the viewpoint information acquired in the information acquisition step; and a first virtual viewpoint image output to a display device; The second virtual viewpoint image having a higher image quality than the first virtual viewpoint image, and is output to another display device at a timing later than the output of the first virtual viewpoint image to the display device. And a generation step of generating two virtual viewpoint images.

The image processing method according to claim 22, wherein the first virtual viewpoint image and the second virtual viewpoint image generated in the generation step are virtual viewpoint images corresponding to the same virtual viewpoint. .

The display device from which the first virtual viewpoint image is output is a display device used to designate a virtual viewpoint to the user,
The image processing according to claim 22 or 23, wherein the another display device from which the second virtual viewpoint image is output is a display device that is not used for allowing a user to designate a virtual viewpoint. Method.

A program for causing a computer to function as each unit of the image processing apparatus according to any one of claims 19 to 21.