JP7016899B2

JP7016899B2 - Image processing equipment, image processing system, image processing method and program

Info

Publication number: JP7016899B2
Application number: JP2020036313A
Authority: JP
Inventors: 康文 ▲高▼間
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2022-02-07
Anticipated expiration: 2036-10-28
Also published as: JP2022058770A; JP7439146B2; JP2020095741A

Description

本発明は、仮想視点画像を生成する技術に関するものである。 The present invention relates to a technique for generating a virtual viewpoint image.

昨今、複数のカメラを異なる位置に設置して多視点から被写体を撮影し、当該撮影により得られた複数視点画像を用いて仮想視点画像や３次元モデルを生成する技術が注目されている。上記のようにして複数視点画像から仮想視点画像を生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることが出来る。 Recently, a technique of installing a plurality of cameras at different positions to shoot a subject from multiple viewpoints and generating a virtual viewpoint image or a three-dimensional model using the multiple viewpoint images obtained by the shooting has attracted attention. According to the technology for generating a virtual viewpoint image from a multi-viewpoint image as described above, for example, a highlight scene of soccer or basketball can be viewed from various angles, so that the user can see the highlight scene as compared with a normal image. It can give a high sense of presence.

特許文献１では、複数の視点から撮影した画像を合成して仮想視点画像を生成する場合に、画像内のオブジェクトの境界領域におけるレンダリング単位を小さくすることで、仮想視点画像の画質を向上することについて記載されている。 In Patent Document 1, when images taken from a plurality of viewpoints are combined to generate a virtual viewpoint image, the image quality of the virtual viewpoint image is improved by reducing the rendering unit in the boundary region of the object in the image. Is described.

特開２０１３－２２３００８号公報Japanese Unexamined Patent Publication No. 2013-223008

しかしながら、従来の技術では、異なる複数の要件に応じた仮想視点画像を生成できない場合が考えられる。例えば、高画質の仮想視点画像だけを生成する場合には、生成に係る処理時間が長くなることが考えられ、画質は低くともリアルタイムで仮想視点画像を見たいユーザの要件に応えることが困難になる虞がある。一方、低画質の仮想視点画像だけを生成する場合には、リアルタイム性よりも仮想視点画像が高画質であることを優先するユーザの要件に応えることが困難になる虞がある。 However, with the conventional technology, it may not be possible to generate a virtual viewpoint image according to a plurality of different requirements. For example, when only a high-quality virtual viewpoint image is generated, the processing time for generation may be long, and it is difficult to meet the requirements of users who want to see the virtual viewpoint image in real time even if the image quality is low. There is a risk of becoming. On the other hand, when only a low-quality virtual viewpoint image is generated, it may be difficult to meet the user's requirement that the virtual viewpoint image has high image quality rather than real-time performance.

本発明は上記の課題に鑑みてなされたものであり、異なる複数の要件に応じた仮想視点画像を生成することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to generate a virtual viewpoint image according to a plurality of different requirements.

上記課題を解決するため、本発明に係る画像処理システムは、例えば以下の構成を有する。すなわち、複数の撮影装置によるそれぞれ異なる位置からの撮影に基づく複数の画像を取得する画像取得手段と、仮想視点を指定するための操作に応じた入力を受け付ける受付手段と、前記画像取得手段により取得された前記複数の画像と前記受付手段により受け付けられた前記入力とに基づいて複数の仮想視点画像を生成する生成手段であって、前記操作のために用いられる第１の仮想視点画像と、前記操作に基づいて指定された仮想視点に対応する第２の仮想視点画像であって前記第１の仮想視点画像よりも動画の１フレームあたりの画像データサイズが大きい第２の仮想視点画像と、を生成する生成手段と、を有し、前記生成手段により前記第２の仮想視点画像の生成に用いられる複数の画像に対応する撮影装置の数は、前記生成手段により前記第１の仮想視点画像の生成に用いられる複数の画像に対応する撮影装置の数よりも多く、前記生成手段により前記第２の仮想視点画像の生成に用いられる複数の画像に対応する複数の撮影装置は、前記生成手段により前記第１の仮想視点画像の生成に用いられる複数の画像に対応する複数の撮影装置を含む。 In order to solve the above problems, the image processing system according to the present invention has, for example, the following configuration. That is, an image acquisition means for acquiring a plurality of images based on images taken from different positions by a plurality of photographing devices, a reception means for receiving an input according to an operation for designating a virtual viewpoint, and the image acquisition means. A generation means for generating a plurality of virtual viewpoint images based on the plurality of images and the input received by the reception means, the first virtual viewpoint image used for the operation, and the said. A second virtual viewpoint image corresponding to a virtual viewpoint designated based on an operation and having a larger image data size per frame of a moving image than the first virtual viewpoint image. The number of photographing devices having the generation means to generate and corresponding to a plurality of images used by the generation means to generate the second virtual viewpoint image is the number of the first virtual viewpoint image by the generation means. The number of photographing devices corresponding to the plurality of images used for generation is larger than the number of photographing devices corresponding to the plurality of images used for generation, and the plurality of photographing devices corresponding to the plurality of images used for generation of the second virtual viewpoint image by the generation means are produced by the generation means. A plurality of photographing devices corresponding to a plurality of images used for generating the first virtual viewpoint image are included.

本発明によれば、異なる複数の要件に応じた仮想視点画像を生成することができる。 According to the present invention, it is possible to generate a virtual viewpoint image according to a plurality of different requirements.

画像処理システム１０の構成について説明するための図である。It is a figure for demonstrating the structure of the image processing system 10. 画像処理装置１のハードウェア構成について説明するための図である。It is a figure for demonstrating the hardware composition of the image processing apparatus 1. FIG. 画像処理装置１の動作の１形態について説明するためのフローチャートである。It is a flowchart for demonstrating one form of operation of image processing apparatus 1. 表示装置３による表示画面の構成について説明するための図である。It is a figure for demonstrating the composition of the display screen by a display device 3. 画像処理装置１の動作の１形態について説明するためのフローチャートである。It is a flowchart for demonstrating one form of operation of image processing apparatus 1. 画像処理装置１の動作の１形態について説明するためのフローチャートである。It is a flowchart for demonstrating one form of operation of image processing apparatus 1.

［システム構成］
以下、本発明の実施形態について図面を参照して説明する。まず図１を用いて、仮想視点画像を生成し出力する画像処理システム１０の構成について説明する。本実施形態における画像処理システム１０は、画像処理装置１、カメラ群２、表示装置３、及び表示装置４を有する。 [System configuration]
Hereinafter, embodiments of the present invention will be described with reference to the drawings. First, the configuration of the image processing system 10 that generates and outputs a virtual viewpoint image will be described with reference to FIG. 1. The image processing system 10 in the present embodiment includes an image processing device 1, a camera group 2, a display device 3, and a display device 4.

なお、本実施形態における仮想視点画像は、仮想的な視点から被写体を撮影した場合に得られる画像である。言い換えると、仮想視点画像は、指定された視点における見えを表す画像である。仮想的な視点（仮想視点）は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点画像が動画である場合を中心に説明するが、仮想視点画像は静止画であってもよい。 The virtual viewpoint image in the present embodiment is an image obtained when the subject is photographed from a virtual viewpoint. In other words, a virtual viewpoint image is an image that represents the appearance at a specified viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. Further, an image corresponding to a viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the device are also included in the virtual viewpoint image. In the present embodiment, the case where the virtual viewpoint image is a moving image will be mainly described, but the virtual viewpoint image may be a still image.

カメラ群２は、複数のカメラを含み、各カメラはそれぞれ異なる方向から被写体を撮影する。本実施形態において、カメラ群２に含まれる複数のカメラは、それぞれが画像処理装置１と接続されており、撮影画像や各カメラのパラメータ等を画像処理装置１に送信する。ただしこれに限らず、カメラ群２に含まれる複数のカメラ同士が通信可能であり、カメラ群２に含まれる何れかのカメラが複数のカメラによる撮影画像や複数のカメラのパラメータ等を画像処理装置１に送信してもよい。また、カメラ群２に含まれる何れかのカメラが、撮影画像に代えて、複数のカメラによる撮影画像の差分に基づいて生成された画像など、カメラ群２による撮影に基づく画像を送信してもよい。 The camera group 2 includes a plurality of cameras, and each camera shoots a subject from a different direction. In the present embodiment, each of the plurality of cameras included in the camera group 2 is connected to the image processing device 1, and the captured image, the parameters of each camera, and the like are transmitted to the image processing device 1. However, the present invention is not limited to this, and a plurality of cameras included in the camera group 2 can communicate with each other, and any camera included in the camera group 2 can process images taken by the plurality of cameras, parameters of the plurality of cameras, and the like as an image processing device. It may be transmitted to 1. Further, even if any of the cameras included in the camera group 2 transmits an image based on the shooting by the camera group 2, such as an image generated based on the difference between the shot images taken by a plurality of cameras, instead of the shot image. good.

表示装置３は、仮想視点画像を生成するための仮想視点の指定を受け付け、指定に応じた情報を画像処理装置１に送信する。例えば、表示装置３はジョイスティック、ジョグダイヤル、タッチパネル、キーボード、及びマウスなどの入力部を有し、仮想視点を指定するユーザ（操作者）は入力部を操作することで仮想視点を指定する。本実施形態におけるユーザとは、表示装置３の入力部を操作して仮想視点を指定する操作者または表示装置４により表示される仮想視点画像を見る視聴者であり、操作者と視聴者を特に区別しない場合には単にユーザと記載する。本実施形態では視聴者と操作者が異なる場合を中心に説明するが、これに限らず、視聴者と操作者が同一のユーザであってもよい。なお、本実施形態において、表示装置３から画像処理装置１に送信される仮想視点の指定に応じた情報は、仮想視点の位置や向きを示す仮想視点情報である。ただしこれに限らず、仮想視点の指定に応じた情報は仮想視点画像における被写体の形状や向きなど仮想視点に応じて定まる内容を示す情報であってもよく、画像処理装置１はこのような仮想視点の指定に応じた情報に基づいて仮想視点画像を生成してもよい。 The display device 3 accepts the designation of the virtual viewpoint for generating the virtual viewpoint image, and transmits the information according to the designation to the image processing device 1. For example, the display device 3 has an input unit such as a joystick, a jog dial, a touch panel, a keyboard, and a mouse, and a user (operator) who designates a virtual viewpoint designates a virtual viewpoint by operating the input unit. The user in the present embodiment is an operator who operates the input unit of the display device 3 to specify a virtual viewpoint or a viewer who sees a virtual viewpoint image displayed by the display device 4, and particularly includes the operator and the viewer. If there is no distinction, simply describe as user. In this embodiment, the case where the viewer and the operator are different will be mainly described, but the present invention is not limited to this, and the viewer and the operator may be the same user. In the present embodiment, the information according to the designation of the virtual viewpoint transmitted from the display device 3 to the image processing device 1 is virtual viewpoint information indicating the position and direction of the virtual viewpoint. However, the present invention is not limited to this, and the information according to the designation of the virtual viewpoint may be information indicating the contents determined according to the virtual viewpoint such as the shape and orientation of the subject in the virtual viewpoint image, and the image processing device 1 may be such information. A virtual viewpoint image may be generated based on the information according to the designation of the viewpoint.

さらに表示装置３は、カメラ群２による撮影に基づく画像と表示装置３が受け付けた仮想視点の指定とに基づいて画像処理装置１により生成され出力された仮想視点画像を表示する。これにより操作者は、表示装置３に表示された仮想視点画像を見ながら仮想視点の指定を行うことができる。なお、本実施形態では仮想視点画像を表示する表示装置３が仮想視点の指定を受け付けるものとするが、これに限らない。例えば、仮想視点の指定を受け付ける装置と、操作者に仮想視点を指定させるための仮想視点画像を表示する表示装置とが、別々の装置であってもよい。 Further, the display device 3 displays the virtual viewpoint image generated and output by the image processing device 1 based on the image taken by the camera group 2 and the designation of the virtual viewpoint accepted by the display device 3. As a result, the operator can specify the virtual viewpoint while looking at the virtual viewpoint image displayed on the display device 3. In the present embodiment, the display device 3 that displays the virtual viewpoint image accepts the designation of the virtual viewpoint, but the present invention is not limited to this. For example, the device that accepts the designation of the virtual viewpoint and the display device that displays the virtual viewpoint image for causing the operator to specify the virtual viewpoint may be separate devices.

また表示装置３は、操作者による操作に基づいて、仮想視点画像の生成を開始させるための生成指示を画像処理装置１に対して行う。なお生成指示はこれに限らず、例えば所定の時刻に仮想視点画像の生成が開始されるように画像処理装置１に仮想視点画像の生成を予約するための指示であってもよい。また例えば、所定のイベントが発生した場合に仮想視点画像の生成が開始されるように予約するための指示であってもよい。なお、画像処理装置１に対して仮想視点画像の生成指示を行う装置が表示装置３と異なる装置であってもよいし、ユーザが画像処理装置１に対して生成指示を直接入力してもよい。 Further, the display device 3 gives a generation instruction to the image processing device 1 to start the generation of the virtual viewpoint image based on the operation by the operator. The generation instruction is not limited to this, and may be an instruction for reserving the image processing device 1 to generate the virtual viewpoint image so that the generation of the virtual viewpoint image is started at a predetermined time, for example. Further, for example, it may be an instruction for making a reservation so that the generation of the virtual viewpoint image is started when a predetermined event occurs. The device that gives an instruction to generate a virtual viewpoint image to the image processing device 1 may be a device different from the display device 3, or the user may directly input the generation instruction to the image processing device 1. ..

表示装置４は、表示装置３を用いた操作者による仮想視点の指定に基づいて画像処理装置１により生成される仮想視点画像を、仮想視点を指定する操作者とは異なるユーザ（視聴者）に対して表示する。なお、画像処理システム１０は複数の表示装置４を有していてもよく、複数の表示装置４がそれぞれ異なる仮想視点画像を表示してもよい。例えば、生放送される仮想視点画像（ライブ画像）を表示する表示装置４と、収録後に放送される仮想視点画像（非ライブ画像）を表示する表示装置４とが、画像処理システム１０に含まれていてもよい。 The display device 4 transfers the virtual viewpoint image generated by the image processing device 1 based on the designation of the virtual viewpoint by the operator using the display device 3 to a user (viewer) different from the operator who designates the virtual viewpoint. Display against. The image processing system 10 may have a plurality of display devices 4, and the plurality of display devices 4 may display different virtual viewpoint images. For example, the image processing system 10 includes a display device 4 that displays a virtual viewpoint image (live image) that is broadcast live, and a display device 4 that displays a virtual viewpoint image (non-live image) that is broadcast after recording. You may.

画像処理装置１は、カメラ情報取得部１００、仮想視点情報取得部１１０（以降、視点取得部１１０）、画像生成部１２０、及び出力部１３０を有する。カメラ情報取得部１００は、カメラ群２による撮影に基づく画像や、カメラ群２に含まれる各カメラの外部パラメータ及び内部パラメータなどを、カメラ群２から取得し、画像生成部１２０へ出力する。視点取得部１１０は、操作者による仮想視点の指定に応じた情報を表示装置３から取得し、画像生成部１２０へ出力する。また視点取得部１１０は、表示装置３による仮想視点画像の生成指示を受け付ける。画像生成部１２０は、カメラ情報取得部１００により取得された撮影に基づく画像と、視点取得部１１０により取得された指定に応じた情報と、視点取得部１１０により受け付けられた生成指示とに基づいて、仮想視点画像を生成し、出力部１３０へ出力する。出力部１３０は、画像生成部１２０により生成された仮想視点画像を、表示装置３や表示装置４などの外部の装置へ出力する。 The image processing device 1 has a camera information acquisition unit 100, a virtual viewpoint information acquisition unit 110 (hereinafter referred to as a viewpoint acquisition unit 110), an image generation unit 120, and an output unit 130. The camera information acquisition unit 100 acquires an image based on shooting by the camera group 2, external parameters and internal parameters of each camera included in the camera group 2 from the camera group 2, and outputs the image to the image generation unit 120. The viewpoint acquisition unit 110 acquires information according to the designation of the virtual viewpoint by the operator from the display device 3 and outputs the information to the image generation unit 120. Further, the viewpoint acquisition unit 110 receives an instruction to generate a virtual viewpoint image by the display device 3. The image generation unit 120 is based on an image based on shooting acquired by the camera information acquisition unit 100, information according to the designation acquired by the viewpoint acquisition unit 110, and a generation instruction received by the viewpoint acquisition unit 110. , A virtual viewpoint image is generated and output to the output unit 130. The output unit 130 outputs the virtual viewpoint image generated by the image generation unit 120 to an external device such as the display device 3 or the display device 4.

なお、本実施形態において画像処理装置１は、画質の異なる複数の仮想視点画像を生成し、各仮想視点画像に応じた出力先に出力する。例えば、リアルタイム（低遅延）の仮想視点画像を要望する視聴者が見ている表示装置４には、生成に係る処理時間が短い低画質の仮想視点画像を出力する。一方、高画質の仮想視点画像を要望する視聴者が見ている表示装置４には、生成に係る処理時間が長い高画質の仮想視点画像を出力する。なお、本実施形態における遅延は、カメラ群２による撮影が行われてからその撮影に基づく仮想視点画像が表示されるまでの期間に対応する。ただし遅延の定義はこれに限らず、例えば現実世界の時刻と表示画像に対応する時刻との時間差を遅延としてもよい。 In the present embodiment, the image processing device 1 generates a plurality of virtual viewpoint images having different image quality and outputs them to an output destination corresponding to each virtual viewpoint image. For example, a low-quality virtual viewpoint image with a short processing time related to generation is output to the display device 4 viewed by a viewer who desires a real-time (low delay) virtual viewpoint image. On the other hand, the display device 4 viewed by the viewer who desires the high-quality virtual viewpoint image outputs the high-quality virtual viewpoint image having a long processing time related to the generation. The delay in the present embodiment corresponds to the period from the shooting by the camera group 2 to the display of the virtual viewpoint image based on the shooting. However, the definition of delay is not limited to this, and for example, the time difference between the time in the real world and the time corresponding to the displayed image may be defined as the delay.

続いて、画像処理装置１のハードウェア構成について、図２を用いて説明する。画像処理装置１は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、表示部２０５、操作部２０６、通信部２０７、及びバス２０８を有する。ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているコンピュータプログラムやデータを用いて画像処理装置１の全体を制御する。なお、画像処理装置１がＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有し、ＣＰＵ２０１による処理の少なくとも一部をＧＰＵが行ってもよい。ＲＯＭ２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ２０３は、補助記憶装置２０４から供給されるプログラムやデータ、及び通信部２０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置２０４は、例えばハードディスクドライブ等で構成され、静止画や動画などのコンテンツデータを記憶する。 Subsequently, the hardware configuration of the image processing apparatus 1 will be described with reference to FIG. The image processing device 1 includes a CPU 201, a ROM 202, a RAM 203, an auxiliary storage device 204, a display unit 205, an operation unit 206, a communication unit 207, and a bus 208. The CPU 201 controls the entire image processing device 1 by using computer programs and data stored in the ROM 202 and the RAM 203. The image processing device 1 may have a GPU (Graphics Processing Unit), and the GPU may perform at least a part of the processing by the CPU 201. The ROM 202 stores programs and parameters that do not need to be changed. The RAM 203 temporarily stores programs and data supplied from the auxiliary storage device 204, data supplied from the outside via the communication unit 207, and the like. The auxiliary storage device 204 is composed of, for example, a hard disk drive or the like, and stores content data such as still images and moving images.

表示部２０５は、例えば液晶ディスプレイ等で構成され、ユーザが画像処理装置１を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部２０６は、例えばキーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ２０１に入力する。通信部２０７は、カメラ群２や表示装置３、表示装置４などの外部の装置と通信を行う。例えば、画像処理装置１が外部の装置と有線で接続される場合には、ＬＡＮケーブル等が通信部２０７に接続される。なお、画像処理装置１が外部の装置と無線通信する機能を有する場合、通信部２０７はアンテナを備える。バス２０８は、画像処理装置１の各部を繋いで情報を伝達する。 The display unit 205 is composed of, for example, a liquid crystal display or the like, and displays a GUI (Graphical User Interface) for the user to operate the image processing device 1. The operation unit 206 is composed of, for example, a keyboard, a mouse, or the like, and inputs various instructions to the CPU 201 in response to an operation by the user. The communication unit 207 communicates with an external device such as the camera group 2, the display device 3, and the display device 4. For example, when the image processing device 1 is connected to an external device by wire, a LAN cable or the like is connected to the communication unit 207. When the image processing device 1 has a function of wirelessly communicating with an external device, the communication unit 207 includes an antenna. The bus 208 connects each part of the image processing device 1 to transmit information.

なお、本実施形態では表示部２０５と操作部２０６は画像処理装置１の内部に存在するが、画像処理装置１は表示部２０５及び操作部２０６の少なくとも一方を備えていなくてもよい。また、表示部２０５及び操作部２０６の少なくとも一方が画像処理装置１の外部に別の装置として存在していて、ＣＰＵ２０１が、表示部２０５を制御する表示制御部、及び操作部２０６を制御する操作制御部として動作してもよい。 In the present embodiment, the display unit 205 and the operation unit 206 exist inside the image processing device 1, but the image processing device 1 does not have to include at least one of the display unit 205 and the operation unit 206. Further, at least one of the display unit 205 and the operation unit 206 exists as another device outside the image processing device 1, and the CPU 201 controls the display control unit 205 that controls the display unit 205 and the operation unit 206. It may operate as a control unit.

［動作フロー］
次に図３を用いて、画像処理装置１の動作の１形態について説明する。図３に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始され、定期的（例えば仮想視点画像が動画である場合の１フレームごと）に繰り返される。ただし、図３に示す処理の開始タイミングは上記タイミングに限定されない。図３に示す処理は、ＣＰＵ２０１がＲＯＭ２０２に格納されたプログラムをＲＡＭ２０３に展開して実行することで実現される。なお、図３に示す処理の少なくとも一部を、ＣＰＵ２０１とは異なる専用のハードウェアにより実現してもよい。 [Operation flow]
Next, one mode of operation of the image processing apparatus 1 will be described with reference to FIG. The process shown in FIG. 3 is started at the timing when the viewpoint acquisition unit 110 receives the instruction to generate the virtual viewpoint image, and is repeated periodically (for example, every frame when the virtual viewpoint image is a moving image). However, the start timing of the process shown in FIG. 3 is not limited to the above timing. The process shown in FIG. 3 is realized by the CPU 201 expanding the program stored in the ROM 202 into the RAM 203 and executing the program. It should be noted that at least a part of the processing shown in FIG. 3 may be realized by dedicated hardware different from the CPU 201.

図３に示すフローにおいて、Ｓ２０１０とＳ２０２０は情報を取得する処理に対応し、Ｓ２０３０－Ｓ２０５０は操作者に仮想視点を指定させるための仮想視点画像（指定用画像）を生成し出力する処理に対応する。また、Ｓ２０７０－Ｓ２１００は、ライブ画像を生成し出力する処理に対応する。Ｓ２１１０－Ｓ２１３０は、非ライブ画像を生成し出力する処理に対応する。以下、各ステップにおける処理の詳細を説明する。 In the flow shown in FIG. 3, S2010 and S2020 correspond to the process of acquiring information, and S2030-S2050 correspond to the process of generating and outputting a virtual viewpoint image (designation image) for the operator to specify a virtual viewpoint. do. Further, S2070-S2100 corresponds to a process of generating and outputting a live image. S2110-S2130 corresponds to a process of generating and outputting a non-live image. Hereinafter, the details of the processing in each step will be described.

Ｓ２０１０において、カメラ情報取得部１００は、カメラ群２による撮影に基づく各カメラの撮影画像と、各カメラの外部パラメータ及び内部パラメータを取得する。外部パラメータはカメラの位置や姿勢に関する情報であり、内部パラメータはカメラの焦点距離や画像中心に関する情報である。 In S2010, the camera information acquisition unit 100 acquires an image taken by each camera based on the image taken by the camera group 2, and external parameters and internal parameters of each camera. External parameters are information about the position and orientation of the camera, and internal parameters are information about the focal length and image center of the camera.

Ｓ２０２０において、視点取得部１１０は、操作者による仮想視点の指定に応じた情報として仮想視点情報を取得する。本実施形態において仮想視点情報は、仮想視点から被写体を撮影する仮想カメラの外部パラメータと内部パラメータに対応し、仮想視点画像の１フレームを生成するために１つの仮想視点情報が必要となる。 In S2020, the viewpoint acquisition unit 110 acquires virtual viewpoint information as information according to the designation of the virtual viewpoint by the operator. In the present embodiment, the virtual viewpoint information corresponds to the external parameters and the internal parameters of the virtual camera that captures the subject from the virtual viewpoint, and one virtual viewpoint information is required to generate one frame of the virtual viewpoint image.

Ｓ２０３０において、画像生成部１２０は、カメラ群２による撮影画像に基づいて、被写体となるオブジェクトの３次元形状を推定する。被写体となるオブジェクトは、例えば、カメラ群２の撮影範囲内に存在する人物や動体などである。画像生成部１２０は、カメラ群２から取得した撮影画像と、予め取得した各カメラに対応する背景画像との差分を算出することにより、撮影画像内のオブジェクトに対応する部分（前景領域）が抽出されたシルエット画像を生成する。そして画像生成部１２０は、各カメラに対応するシルエット画像と各カメラのパラメータを用いて、オブジェクトの３次元形状を推定する。３次元形状の推定には、例えばＶｉｓｕａｌＨｕｌｌ手法が用いられる。この処理の結果、被写体となるオブジェクトの３次元形状を表現した３Ｄ点群（３次元座標を持つ点の集合）が得られる。なお、カメラ群２による撮影画像からオブジェクトの３次元形状を導出する方法はこれに限らない。 In S2030, the image generation unit 120 estimates the three-dimensional shape of the object to be the subject based on the image captured by the camera group 2. The object to be a subject is, for example, a person or a moving object existing in the shooting range of the camera group 2. The image generation unit 120 extracts a portion (foreground area) corresponding to an object in the captured image by calculating the difference between the captured image acquired from the camera group 2 and the background image corresponding to each camera acquired in advance. Generate a silhouette image that has been created. Then, the image generation unit 120 estimates the three-dimensional shape of the object by using the silhouette image corresponding to each camera and the parameters of each camera. For example, the Visual Hull method is used to estimate the three-dimensional shape. As a result of this processing, a 3D point cloud (a set of points having three-dimensional coordinates) expressing the three-dimensional shape of the object to be the subject is obtained. The method of deriving the three-dimensional shape of the object from the image captured by the camera group 2 is not limited to this.

Ｓ２０４０において、画像生成部１２０は、取得された仮想視点情報に基づいて、３Ｄ点群と背景３Ｄモデルをレンダリングし、仮想視点画像を生成する。背景３Ｄモデルは、例えばカメラ群２が設置されている競技場などのＣＧモデルであり、予め作成されて画像処理システム１０内に保存されている。ここまでの処理により生成される仮想視点画像において、オブジェクトに対応する領域や背景領域はそれぞれ所定の色（例えば一色）で表示される。なお、３Ｄ点群や背景３Ｄモデルをレンダリングする処理はゲームや映画の分野において既知であり、例えばＧＰＵを用いて処理する方法など、高速に処理を行うための方法が知られている。そのため、Ｓ２０４０までの処理で生成される仮想視点画像は、カメラ群２による撮影及び操作者による仮想視点の指定に応じて高速に生成可能である。 In S2040, the image generation unit 120 renders a 3D point cloud and a background 3D model based on the acquired virtual viewpoint information, and generates a virtual viewpoint image. The background 3D model is, for example, a CG model of a stadium where a camera group 2 is installed, and is created in advance and stored in the image processing system 10. In the virtual viewpoint image generated by the processing up to this point, the area corresponding to the object and the background area are each displayed in a predetermined color (for example, one color). The process of rendering a 3D point cloud or a background 3D model is known in the fields of games and movies, and a method for performing high-speed processing such as a method using a GPU is known. Therefore, the virtual viewpoint image generated by the processing up to S2040 can be generated at high speed according to the shooting by the camera group 2 and the designation of the virtual viewpoint by the operator.

Ｓ２０５０において、出力部１３０は、画像生成部１２０によりＳ２０４０で生成された仮想視点画像を、操作者に仮想視点を指定させるための表示装置３へ出力する。ここで、表示装置３により表示される表示画面３０の画面構成を、図４を用いて説明する。表示画面３０は領域３１０と領域３２０と領域３３０から構成される。例えば、指定用画像として生成された仮想視点画像は領域３１０に表示され、ライブ画像として生成された仮想視点画像は領域３２０に表示され、非ライブ画像として生成された仮想視点画像は領域３３０に表示される。すなわち、Ｓ２０４０において生成されＳ２０５０において出力された仮想視点画像は、領域３１０に表示される。そして操作者は領域３１０の画面を見ながら仮想視点の指定を行う。なお、表示装置３は少なくとも指定用画像を表示すればよく、ライブ画像や非ライブ画像を表示しなくてもよい。 In S2050, the output unit 130 outputs the virtual viewpoint image generated in S2040 by the image generation unit 120 to the display device 3 for allowing the operator to specify the virtual viewpoint. Here, the screen configuration of the display screen 30 displayed by the display device 3 will be described with reference to FIG. The display screen 30 is composed of an area 310, an area 320, and an area 330. For example, the virtual viewpoint image generated as a designated image is displayed in the area 310, the virtual viewpoint image generated as a live image is displayed in the area 320, and the virtual viewpoint image generated as a non-live image is displayed in the area 330. Will be done. That is, the virtual viewpoint image generated in S2040 and output in S2050 is displayed in the area 310. Then, the operator specifies the virtual viewpoint while looking at the screen of the area 310. The display device 3 may display at least a designated image, and may not display a live image or a non-live image.

Ｓ２０６０において、画像生成部１２０は、Ｓ２０４０で生成した仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、仮想視点を指定させるための低画質な画像だけが必要とされている場合は、Ｓ２０７０へは進まず処理を終了する。一方、より高画質な画像が必要である場合は、Ｓ２０７０へ進み処理を続ける。 In S2060, the image generation unit 120 determines whether or not to perform a process of generating a virtual viewpoint image having a higher image quality than the virtual viewpoint image generated in S2040. For example, if only a low-quality image for designating a virtual viewpoint is required, the process does not proceed to S2070 and the process ends. On the other hand, when a higher image quality image is required, the process proceeds to S2070 and the process is continued.

Ｓ２０７０において、画像生成部１２０は、Ｓ２０３０で推定したオブジェクトの形状モデル（３Ｄ点群）を、例えばＰｈｏｔｏＨｕｌｌ手法を用いてさらに高精度化する。具体的には、３Ｄ点群の各点を各カメラの撮影画像に射影し、各撮影画像における色の一致度を評価することで、その点が被写体形状を表現するために必要な点かどうかを判定する。例えば３Ｄ点群内のある点について、射影先の画素値の分散が閾値より大きければ、その点は被写体の形状を表す点としては正しくないと判定され、３Ｄ点群からその点が削除される。この処理を３Ｄ点群内の全点に対して行い、オブジェクトの形状モデルの高精度化を実現する。なお、オブジェクトの形状モデルを高精度化する方法はこれに限らない。 In S2070, the image generation unit 120 further improves the accuracy of the object shape model (3D point cloud) estimated in S2030 by using, for example, the PhotoHull method. Specifically, by projecting each point of the 3D point cloud onto the captured image of each camera and evaluating the degree of color matching in each captured image, whether or not that point is necessary for expressing the subject shape. To judge. For example, if the variance of the pixel value of the projection destination is larger than the threshold value for a certain point in the 3D point cloud, it is determined that the point is not correct as a point representing the shape of the subject, and the point is deleted from the 3D point cloud. .. This process is performed on all points in the 3D point cloud to realize high accuracy of the object shape model. The method for improving the accuracy of the object shape model is not limited to this.

Ｓ２０８０において、画像生成部１２０は、Ｓ２０７０で高精度化された３Ｄ点群に色を付け、それを仮想視点の座標に射影して前景領域に対応する前景画像を生成する処理と、仮想視点から見た背景画像を生成する処理とを実行する。そして画像生成部１２０は、生成された背景画像に前景画像を重ねることでライブ画像としての仮想視点画像を生成する。 In S2080, the image generation unit 120 colors the 3D point cloud refined in S2070 and projects it on the coordinates of the virtual viewpoint to generate a foreground image corresponding to the foreground region, and from the virtual viewpoint. Executes the process of generating the seen background image. Then, the image generation unit 120 generates a virtual viewpoint image as a live image by superimposing the foreground image on the generated background image.

ここで、仮想視点画像の前景画像（オブジェクトに対応する領域の画像）を生成する方法の一例について説明する。前景画像を生成するために、３Ｄ点群に色を付ける処理が実行される。色付け処理は点の可視性判定と色の算出処理で構成される。可視性の判定では、３Ｄ点群内の各点とカメラ群２に含まれる複数のカメラとの位置関係から、各点について撮影可能なカメラを特定することができる。次に各点について、その点を撮影可能なカメラの撮影画像に点を射影し、射影先の画素の色をその点の色とする。ある点が複数のカメラにより撮影可能な場合、複数のカメラの撮影画像に点を射影し、射影先の画素値を取得し、画素値の平均を算出することでその点の色を決める。このようにして色が付けられた３Ｄ点群を既存のＣＧレンダリング手法によりレンダリングすることで、仮想視点画像の前景画像を生成することができる。 Here, an example of a method of generating a foreground image (an image of an area corresponding to an object) of a virtual viewpoint image will be described. In order to generate a foreground image, a process of coloring a 3D point cloud is executed. The coloring process consists of point visibility determination and color calculation processing. In the determination of visibility, a camera capable of photographing each point can be specified from the positional relationship between each point in the 3D point cloud and a plurality of cameras included in the camera group 2. Next, for each point, the point is projected onto the captured image of the camera capable of capturing the point, and the color of the pixel at the projection destination is set as the color of the point. When a point can be photographed by a plurality of cameras, the color of the point is determined by projecting the point on the images captured by the plurality of cameras, acquiring the pixel value of the projection destination, and calculating the average of the pixel values. By rendering the 3D point cloud colored in this way by the existing CG rendering method, it is possible to generate a foreground image of the virtual viewpoint image.

次に、仮想視点画像の背景画像を生成する方法の一例について説明する。まず、背景３Ｄモデルの頂点（例えば競技場の端に対応する点）が設定される。そして、これらの頂点が、仮想視点に近い２台のカメラ（第１カメラ及び第２カメラとする）の座標系と仮想視点の座標系に射影される。また、仮想視点と第１カメラの対応点、及び仮想視点と第２カメラの対応点を用いて、仮想視点と第１カメラの間の第１射影行列と仮想視点と第２カメラの間の第２射影行列が算出される。そして、第１射影行列と第２射影行列を用いて、背景画像の各画素が第１カメラの撮影画像と第２カメラの撮影画像に射影され、射影先の２つの画素値の平均を算出することで、背景画像の画素値が決定される。なお、同様の方法により、３台以上のカメラの撮影画像から背景画像の画素値を決定してもよい。 Next, an example of a method of generating a background image of a virtual viewpoint image will be described. First, the vertices of the background 3D model (for example, the points corresponding to the edges of the stadium) are set. Then, these vertices are projected onto the coordinate system of the two cameras (the first camera and the second camera) close to the virtual viewpoint and the coordinate system of the virtual viewpoint. Further, using the corresponding point between the virtual viewpoint and the first camera and the corresponding point between the virtual viewpoint and the second camera, the first projection matrix between the virtual viewpoint and the first camera and the second between the virtual viewpoint and the second camera. 2 Projection matrix is calculated. Then, using the first projection matrix and the second projection matrix, each pixel of the background image is projected onto the captured image of the first camera and the captured image of the second camera, and the average of the two pixel values of the projection destination is calculated. As a result, the pixel value of the background image is determined. The pixel value of the background image may be determined from the images taken by three or more cameras by the same method.

このようにして得られた仮想視点画像の背景画像上に前景画像を重ねることで、色が付いた仮想視点画像が生成できる。すなわち、Ｓ２０８０で生成された仮想視点画像はＳ２０４０で生成された仮想視点画像よりも色の階調数に関して画質が高い。逆に言うと、Ｓ２０４０で生成された仮想視点画像に含まれる色の階調数は、Ｓ２０８０で生成された仮想視点画像に含まれる色の階調数より少ない。なお、仮想視点画像に色情報を付加する方法はこれに限らない。 By superimposing the foreground image on the background image of the virtual viewpoint image obtained in this way, a colored virtual viewpoint image can be generated. That is, the virtual viewpoint image generated in S2080 has a higher image quality in terms of the number of color gradations than the virtual viewpoint image generated in S2040. Conversely, the number of gradations of colors included in the virtual viewpoint image generated in S2040 is less than the number of gradations of colors included in the virtual viewpoint image generated in S2080. The method of adding color information to the virtual viewpoint image is not limited to this.

Ｓ２０９０において、出力部１３０は、画像生成部１２０によりＳ２０８０において生成された仮想視点画像を、ライブ画像として表示装置３及び表示装置４へ出力する。表示装置３に出力された画像は領域３２０へ表示されて操作者が見ることができ、表示装置４に出力された画像は視聴者が見ることができる。 In S2090, the output unit 130 outputs the virtual viewpoint image generated in S2080 by the image generation unit 120 to the display device 3 and the display device 4 as a live image. The image output to the display device 3 is displayed in the area 320 and can be viewed by the operator, and the image output to the display device 4 can be viewed by the viewer.

Ｓ２１００において、画像生成部１２０は、Ｓ２０８０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、仮想視点画像を視聴者に対して生放送でのみ提供する場合は、Ｓ２１１０へは進まず処理を終了する。一方、収録後に視聴者に向けてより高画質な画像を放送する場合は、Ｓ２１１０へ進み処理を続ける。 In S2100, the image generation unit 120 determines whether or not to perform a process of generating a virtual viewpoint image having a higher image quality than the virtual viewpoint image generated in S2080. For example, when the virtual viewpoint image is provided to the viewer only by live broadcasting, the process does not proceed to S2110 and the process ends. On the other hand, when broadcasting a higher quality image to the viewer after recording, the process proceeds to S2110 and the process is continued.

Ｓ２１１０において、画像生成部１２０は、Ｓ２０７０で生成されたオブジェクトの形状モデルをさらに高精度化する。本実施形態では、形状モデルの孤立点を削除することで高精度化を実現する。孤立点除去においては、まず、ＰｈｏｔｏＨｕｌｌで算出されたボクセル集合（３Ｄ点群）について、各ボクセルの周囲に別のボクセルが存在するか否か調べられる。周囲にボクセルがない場合、そのボクセルは孤立した点であると判断され、そのボクセルはボクセル集合から削除される。このようにして孤立点を削除した形状モデルを用いてＳ２０８０と同様の処理を実行することで、Ｓ２０８０で生成された仮想視点画像よりもオブジェクトの形状が高精度化された仮想視点画像が生成される。 In S2110, the image generation unit 120 further improves the accuracy of the shape model of the object generated in S2070. In the present embodiment, high accuracy is realized by deleting isolated points of the shape model. In the isolated point removal, first, for the voxel set (3D point cloud) calculated by PhotoHull, it is examined whether or not another voxel exists around each voxel. If there are no voxels around, the voxel is determined to be an isolated point and the voxel is removed from the voxel set. By executing the same processing as S2080 using the shape model with isolated points deleted in this way, a virtual viewpoint image with a higher accuracy in the shape of the object than the virtual viewpoint image generated in S2080 is generated. To.

Ｓ２１２０において、画像生成部１２０は、Ｓ２１１０で生成された仮想視点画像の前景領域と背景領域との境界に平滑化処理をかけ、境界領域が滑らかに表示されるように画像の修正を行う。 In S2120, the image generation unit 120 applies a smoothing process to the boundary between the foreground area and the background area of the virtual viewpoint image generated in S2110, and corrects the image so that the boundary area is displayed smoothly.

Ｓ２１３０において、出力部１３０は、画像生成部１２０によりＳ２１２０において生成された仮想視点画像を非ライブ画像として表示装置３及び表示装置４へ出力する。表示装置３へ出力された非ライブ画像は領域３３０へ表示される。 In S2130, the output unit 130 outputs the virtual viewpoint image generated in S2120 by the image generation unit 120 to the display device 3 and the display device 4 as a non-live image. The non-live image output to the display device 3 is displayed in the area 330.

以上の処理により画像処理装置１は、指定用画像としての仮想視点画像と、指定用画像より画質が高い仮想視点画像であるライブ画像とを、１組の撮影画像と仮想視点情報に基づいて生成する。また、画像処理装置１は、ライブ画像よりさらに画質が高い仮想視点画像である非ライブ画像も生成する。そして画像処理装置１は、生成したライブ画像及び非ライブ画像を、非ライブ画像が表示されるより前にライブ画像が表示されるように、表示装置４へ出力する。また画像処理装置１は、生成した指定用画像を、ライブ画像が表示装置４に表示されるより前に指定用画像が表示装置３に表示されるように、表示装置３へ出力する。 By the above processing, the image processing apparatus 1 generates a virtual viewpoint image as a designated image and a live image which is a virtual viewpoint image having higher image quality than the designated image based on a set of captured images and virtual viewpoint information. do. The image processing device 1 also generates a non-live image, which is a virtual viewpoint image having a higher image quality than the live image. Then, the image processing device 1 outputs the generated live image and non-live image to the display device 4 so that the live image is displayed before the non-live image is displayed. Further, the image processing device 1 outputs the generated designated image to the display device 3 so that the designated image is displayed on the display device 3 before the live image is displayed on the display device 4.

これにより、表示装置４は、低画質の指定用画像と、指定用画像より高画質であり生放送されるライブ画像と、ライブ画像よりさらに高画質であり収録後に放送される非ライブ画像とを表示することが可能となる。なお、表示装置４はライブ画像と非ライブ画像の何れか一方だけを表示してもよく、その場合には画像処理装置１は表示装置４に適した仮想視点画像を出力する。また、表示装置３は、指定用画像としての低画質の仮想視点画像と、ライブ画像としての中画質の仮想視点画像と、非ライブ画像としての高画質の仮想視点画像との、３種類の仮想視点画像を表示することが可能となる。なお、表示装置３はライブ画像及び非ライブ画像の少なくとも何れかを表示しなくてもよい。 As a result, the display device 4 displays a low-quality designated image, a live image having a higher image quality than the designated image and being broadcast live, and a non-live image having a higher image quality than the live image and being broadcast after recording. It becomes possible to do. The display device 4 may display only one of the live image and the non-live image, and in that case, the image processing device 1 outputs a virtual viewpoint image suitable for the display device 4. Further, the display device 3 has three types of virtual viewpoint images: a low-quality virtual viewpoint image as a designated image, a medium-quality virtual viewpoint image as a live image, and a high-quality virtual viewpoint image as a non-live image. It becomes possible to display a viewpoint image. The display device 3 does not have to display at least one of a live image and a non-live image.

すなわち、画像処理装置１は、ユーザに仮想視点を指定させるための表示装置３に対して指定用画像を出力する。そして画像処理装置１は、ユーザによる仮想視点の指定に基づいて生成される仮想視点画像を表示するための表示装置４に対して指定用画像より高画質なライブ画像及び非ライブ画像の少なくとも何れかを出力する。これにより、仮想視点を指定するために低遅延で仮想視点画像を表示させたい操作者と、高画質な仮想視点画像を見たい視聴者の、両方の要件に応えることができる。 That is, the image processing device 1 outputs a designation image to the display device 3 for allowing the user to specify the virtual viewpoint. Then, the image processing device 1 is at least one of a live image and a non-live image having higher image quality than the designated image for the display device 4 for displaying the virtual viewpoint image generated based on the designation of the virtual viewpoint by the user. Is output. This makes it possible to meet the requirements of both an operator who wants to display a virtual viewpoint image with low delay in order to specify a virtual viewpoint and a viewer who wants to see a high-quality virtual viewpoint image.

なお、以上の処理では、カメラ群２による撮影に基づく画像と仮想視点の指定に応じた情報とに基づいて仮想視点画像が生成され、その生成のための処理の結果に基づいてより高画質の仮想視点画像が生成される。そのため、低画質の仮想視点画像と高画質の仮想視点画像をそれぞれ独立した処理で生成する場合よりも、全体の処理量を低減することができる。ただし、低画質の仮想視点画像と高画質の仮想視点画像を独立した処理により生成してもよい。また、仮想視点画像を競技会場やライブ会場に設置されたディスプレイに表示させたり生放送したりする場合であって、収録後に放送する必要がない場合には、画像処理装置１は非ライブ画像を生成するための処理を行わない。これにより、高画質な非ライブ画像を生成するための処理量を削減することができる。 In the above processing, a virtual viewpoint image is generated based on the image taken by the camera group 2 and the information according to the designation of the virtual viewpoint, and the image quality is higher based on the result of the processing for the generation. A virtual viewpoint image is generated. Therefore, the total amount of processing can be reduced as compared with the case where the low-quality virtual viewpoint image and the high-quality virtual viewpoint image are generated by independent processing. However, a low-quality virtual viewpoint image and a high-quality virtual viewpoint image may be generated by independent processing. Further, when the virtual viewpoint image is displayed on a display installed in a competition venue or a live venue or is broadcast live, and it is not necessary to broadcast after recording, the image processing device 1 generates a non-live image. Do not perform any processing to do so. As a result, the amount of processing for generating a high-quality non-live image can be reduced.

次に図５を用いて、画像処理装置１の動作の別の１形態について説明する。図３を用いて上述した動作形態では、低画質の仮想視点画像を生成した後に、新たな種別の処理を追加で行うことで、高画質の仮想視点画像を生成する。一方、図５を用いて以下で説明する動作形態では、仮想視点画像を生成するために使用するカメラの台数を増やすことで仮想視点画像の高画質化を実現する。以下の説明において、図３の処理と同様の部分については説明を省略する。 Next, another mode of operation of the image processing apparatus 1 will be described with reference to FIG. In the operation mode described above with reference to FIG. 3, a high-quality virtual viewpoint image is generated by additionally performing a new type of processing after generating a low-quality virtual viewpoint image. On the other hand, in the operation mode described below with reference to FIG. 5, the image quality of the virtual viewpoint image is improved by increasing the number of cameras used to generate the virtual viewpoint image. In the following description, the description of the same parts as those in FIG. 3 will be omitted.

図５に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始される。ただし図５の処理の開始タイミングはこれに限定されない。Ｓ２０１０及びＳ２０２０において、画像処理装置１は、図３で説明したものと同様の処理により、カメラ群２の各カメラによる撮影画像と仮想視点情報とを取得する。 The process shown in FIG. 5 is started at the timing when the viewpoint acquisition unit 110 receives the instruction to generate the virtual viewpoint image. However, the start timing of the process of FIG. 5 is not limited to this. In S2010 and S2020, the image processing device 1 acquires the captured image and the virtual viewpoint information by each camera of the camera group 2 by the same processing as described with reference to FIG.

Ｓ４０３０において、画像生成部１２０は、仮想視点画像の生成に用いる撮影画像に対応するカメラの数を設定する。ここで画像生成部１２０は、Ｓ４０５０－Ｓ４０７０の処理が所定の閾値（例えば仮想視点画像が動画である場合の１フレームに対応する時間）以下の処理時間で完了するようにカメラの数を設定する。例えば、予め１００台のカメラの撮影画像を用いてＳ４０５０－Ｓ４０７０の処理を実行し、その処理時間が０．５秒であったとする。この場合に、フレームレートが６０ｆｐｓ（ｆｒａｍｅｐｅｒｓｅｃｏｎｄ）である仮想視点画像の１フレームに対応する０．０１６秒以内にＳ４０５０－Ｓ４０７０の処理を完了させたければ、カメラの数を３台に設定する。 In S4030, the image generation unit 120 sets the number of cameras corresponding to the captured images used for generating the virtual viewpoint image. Here, the image generation unit 120 sets the number of cameras so that the processing of S4050-S4070 is completed in a processing time equal to or less than a predetermined threshold value (for example, the time corresponding to one frame when the virtual viewpoint image is a moving image). .. For example, it is assumed that the processing of S4050-S4070 is executed in advance using the images taken by 100 cameras, and the processing time is 0.5 seconds. In this case, if you want to complete the processing of S4050-S4070 within 0.016 seconds corresponding to one frame of the virtual viewpoint image having a frame rate of 60 fps (frame per second), set the number of cameras to three. ..

なお、Ｓ４０５０－Ｓ４０７０の処理によって仮想視点画像が出力された後に、Ｓ４０８０において画像生成を続ける判断がされた場合、Ｓ４０３０に戻って使用するカメラの数を再設定する。ここでは、先に出力した仮想視点画像より高画質な仮想視点画像が生成されるように、許容する処理時間を長くし、それに応じてカメラの数を増やす。例えば、０．１秒以下の処理時間でＳ４０５０－Ｓ４０７０の処理が完了されるように、使用する撮影画像に対応するカメラの数を２０台に設定する。 If it is determined in S4080 to continue image generation after the virtual viewpoint image is output by the processing of S4050-S4070, the process returns to S4030 and the number of cameras to be used is reset. Here, the allowable processing time is lengthened and the number of cameras is increased accordingly so that the virtual viewpoint image having higher image quality than the previously output virtual viewpoint image is generated. For example, the number of cameras corresponding to the captured image to be used is set to 20 so that the processing of S4050-S4070 is completed in a processing time of 0.1 seconds or less.

Ｓ４０４０において、画像生成部１２０は、仮想視点画像を生成するために使用する撮影画像に対応するカメラを、Ｓ４０３０で設定されたカメラの数に応じてカメラ群２の中から選択する。例えば、１００台のカメラから３台のカメラを選択する場合、仮想視点に一番近いカメラと、そのカメラから数えて３４台目のカメラ及び６７台目のカメラを選択する。 In S4040, the image generation unit 120 selects a camera corresponding to the captured image used for generating the virtual viewpoint image from the camera group 2 according to the number of cameras set in S4030. For example, when selecting three cameras from 100 cameras, the camera closest to the virtual viewpoint, the 34th camera and the 67th camera counting from the cameras are selected.

また、仮想視点画像を１回生成した後に、使用する撮影画像の数を増やして２回目の処理を行う場合には、１回目の処理で推定した形状モデルをさらに高精度化することから、１回目で選択されたカメラ以外のカメラが選択される。具体的には、１００台のカメラから２０台のカメラを選択する場合、１回目の処理で選択されていないカメラの中から仮想視点に一番近いカメラをまず選択し、そこから５台間隔でカメラを選択していく。この際、１回目で既に選択したカメラは飛ばして次のカメラを選択する。なお、例えば非ライブ画像として最も高画質な仮想視点画像を生成する場合には、カメラ群２に含まれる全てのカメラを選択し、各カメラの撮影画像を使用してＳ４０５０－Ｓ４０７０の処理を実行する。 Further, when the number of captured images to be used is increased after the virtual viewpoint image is generated once and the second processing is performed, the shape model estimated in the first processing is further improved in accuracy. A camera other than the one selected at the first time is selected. Specifically, when selecting 20 cameras from 100 cameras, first select the camera closest to the virtual viewpoint from the cameras not selected in the first process, and then select the camera closest to the virtual viewpoint at intervals of 5 cameras. Select a camera. At this time, the camera already selected in the first time is skipped and the next camera is selected. For example, when generating the highest quality virtual viewpoint image as a non-live image, all the cameras included in the camera group 2 are selected, and the processing of S4050-S4070 is executed using the captured image of each camera. do.

なお、使用する撮影画像に対応するカメラを選択する方法はこれに限らない。例えば、仮想視点に近いカメラを優先して選択してもよい。この場合、被写体となるオブジェクトの形状推定において仮想視点からは見えない背面領域の形状推定の精度は低くなるが、仮想視点から見える前面領域の形状推定の精度は向上する。つまり、仮想視点画像の中で視聴者にとって目につき易い領域の画質を優先的に向上させることができる。 The method of selecting a camera corresponding to the captured image to be used is not limited to this. For example, a camera close to the virtual viewpoint may be preferentially selected. In this case, in the shape estimation of the object to be the subject, the accuracy of the shape estimation of the back region that cannot be seen from the virtual viewpoint is low, but the accuracy of the shape estimation of the front region that can be seen from the virtual viewpoint is improved. That is, it is possible to preferentially improve the image quality of a region of the virtual viewpoint image that is easily visible to the viewer.

Ｓ４０５０において、画像生成部１２０は、Ｓ４０４０で選択されたカメラによる撮影画像を用いて、オブジェクトの形状推定処理を実行する。ここでの処理は、例えば、図３のＳ２０３０における処理（ＶｉｓｕａｌＨｕｌｌ）とＳ２０７０における処理（ＰｈｏｔｏＨｕｌｌ）の組み合わせである。ＶｉｓｕａｌＨｕｌｌの処理は、使用する複数の撮影画像に対応する複数のカメラの視体積の論理積を計算する処理を含む。また、ＰｈｏｔｏＨｕｌｌの処理は形状モデルの各点を複数の撮影画像に射影して画素値の一貫性を計算する処理を含む。そのため、使用する撮影画像に対応するカメラの数が少ないほど、形状推定の精度は低くなり処理時間が短くなる。 In S4050, the image generation unit 120 executes the shape estimation process of the object by using the image taken by the camera selected in S4040. The processing here is, for example, a combination of the processing in S2030 (Visual Hull) and the processing in S2070 (PhotoHull) in FIG. The process of VisualHull includes the process of calculating the logical product of the visual volumes of a plurality of cameras corresponding to the plurality of captured images to be used. In addition, the PhotoHull process includes a process of projecting each point of the shape model onto a plurality of captured images to calculate the consistency of pixel values. Therefore, the smaller the number of cameras corresponding to the captured image to be used, the lower the accuracy of shape estimation and the shorter the processing time.

Ｓ４０６０において、画像生成部１２０は、レンダリング処理を実行する。ここでの処理は、図３のＳ２０８０における処理と同様であり、３Ｄ点群の色付け処理と背景画像の生成処理を含む。３Ｄ点群の色付け処理も背景画像の生成処理も、複数の撮影画像の対応する点の画素値を用いた計算により色を決定する処理を含む。そのため、使用する撮影画像に対応するカメラの数が少ないほど、レンダリングの精度は低くなり処理時間が短くなる。 In S4060, the image generation unit 120 executes the rendering process. The process here is the same as the process in S2080 of FIG. 3, and includes a 3D point cloud coloring process and a background image generation process. Both the 3D point cloud coloring process and the background image generation process include a process of determining a color by calculation using pixel values of corresponding points of a plurality of captured images. Therefore, the smaller the number of cameras corresponding to the captured image to be used, the lower the rendering accuracy and the shorter the processing time.

Ｓ４０７０において、出力部１３０は、画像生成部１２０によりＳ４０６０において生成された仮想視点画像を、表示装置３や表示装置４へ出力する。 In S4070, the output unit 130 outputs the virtual viewpoint image generated in S4060 by the image generation unit 120 to the display device 3 and the display device 4.

Ｓ４０８０において、画像生成部１２０は、Ｓ４０６０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、Ｓ４０６０において生成された仮想視点画像が操作者に仮想視点を指定させるための画像であり、さらにライブ画像を生成する場合には、Ｓ４０３０に戻って、使用するカメラの数を増やしてライブ画像としての仮想視点画像を生成する。また、さらにライブ画像を生成した後に、非ライブ画像を生成する場合には、さらにカメラの数を増やして非ライブ画像としての仮想視点画像を生成する。すなわち、ライブ用画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数は、指定用画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数より多いため、ライブ画像は指定用画像よりも画質が高い。同様に、非ライブ画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数は、ライブ画像としての仮想視点画像の生成に用いられる撮影画像に対応するカメラの数よりも多いため、非ライブ画像はライブ画像よりも画質が高い。 In S4080, the image generation unit 120 determines whether or not to perform a process of generating a virtual viewpoint image having a higher image quality than the virtual viewpoint image generated in S4060. For example, the virtual viewpoint image generated in S4060 is an image for allowing the operator to specify a virtual viewpoint, and when further generating a live image, return to S4030 and increase the number of cameras used to make the live image. Generate a virtual viewpoint image as. Further, when a non-live image is generated after the live image is further generated, the number of cameras is further increased to generate a virtual viewpoint image as a non-live image. That is, since the number of cameras corresponding to the captured image used to generate the virtual viewpoint image as the live image is larger than the number of cameras corresponding to the captured image used to generate the virtual viewpoint image as the designated image. Live images have higher image quality than designated images. Similarly, the number of cameras corresponding to the captured image used to generate the virtual viewpoint image as a non-live image is larger than the number of cameras corresponding to the captured image used to generate the virtual viewpoint image as a live image. , Non-live images have higher image quality than live images.

なおＳ４０８０において、既に生成した仮想視点画像より高画質な仮想視点画像を生成する必要がないと判断された場合、もしくはより高画質な仮想視点画像を生成することはできないと判断された場合には、処理を終了する。 In S4080, when it is determined that it is not necessary to generate a virtual viewpoint image having a higher image quality than the already generated virtual viewpoint image, or when it is determined that a virtual viewpoint image having a higher image quality cannot be generated. , End the process.

以上の処理により、画像処理装置１は、画質を段階的に向上させた複数の仮想視点画像をそれぞれ適切なタイミングで生成して出力することが可能となる。例えば、仮想視点画像の生成に使用するカメラを、設定された処理時間以内に生成処理が完了できるような台数に制限することで、遅延の少ない指定用画像を生成することができる。また、ライブ画像や非ライブ画像を生成する場合には、使用するカメラの数を増やして生成処理を行うことで、より高画質の画像を生成することができる。 Through the above processing, the image processing apparatus 1 can generate and output a plurality of virtual viewpoint images whose image quality is gradually improved at appropriate timings. For example, by limiting the number of cameras used for generating the virtual viewpoint image to the number of cameras that can complete the generation process within the set processing time, it is possible to generate the designated image with less delay. Further, when generating a live image or a non-live image, it is possible to generate a higher image quality image by increasing the number of cameras used and performing the generation process.

次に図６を用いて、画像処理装置１の動作の別の１形態について説明する。図５を用いて上述した動作形態では、仮想視点画像を生成するために使用するカメラの台数を増やすことで仮想視点画像の高画質化を実現する。一方、図６を用いて以下で説明する動作形態では、仮想視点画像の解像度を段階的に高めていくことで仮想視点画像の高画質化を実現する。以下の説明において、図３や図５の処理と同様の部分については説明を省略する。なお、以下で説明する動作形態においては、生成される仮想視点画像の画素数は常に４Ｋ（３８４０×２１６０）であり、画素値の計算を大きい画素ブロックごとに行うか小さい画素ブロックごとに行うかによって仮想視点画像の解像度を制御する。ただしこれに限らず、生成される仮想視点画像の画素数を変更することで解像度を制御してもよい。 Next, another mode of operation of the image processing apparatus 1 will be described with reference to FIG. In the operation mode described above with reference to FIG. 5, the image quality of the virtual viewpoint image is improved by increasing the number of cameras used to generate the virtual viewpoint image. On the other hand, in the operation mode described below with reference to FIG. 6, the image quality of the virtual viewpoint image is improved by gradually increasing the resolution of the virtual viewpoint image. In the following description, the description of the same parts as those in FIGS. 3 and 5 will be omitted. In the operation mode described below, the number of pixels of the generated virtual viewpoint image is always 4K (3840 × 2160), and whether the pixel value is calculated for each large pixel block or each small pixel block. Controls the resolution of the virtual viewpoint image. However, the present invention is not limited to this, and the resolution may be controlled by changing the number of pixels of the generated virtual viewpoint image.

図６に示す処理は、視点取得部１１０が仮想視点画像の生成指示の受付を行ったタイミングで開始される。ただし図６の処理の開始タイミングはこれに限定されない。Ｓ２０１０及びＳ２０２０において、画像処理装置１は、図３で説明したものと同様の処理により、カメラ群２の各カメラによる撮影画像と仮想視点情報とを取得する。 The process shown in FIG. 6 is started at the timing when the viewpoint acquisition unit 110 receives the instruction to generate the virtual viewpoint image. However, the start timing of the process of FIG. 6 is not limited to this. In S2010 and S2020, the image processing device 1 acquires the captured image and the virtual viewpoint information by each camera of the camera group 2 by the same processing as described with reference to FIG.

Ｓ５０３０において、画像生成部１２０は、生成する仮想視点画像の解像度を設定する。ここで画像生成部１２０は、Ｓ５０５０及びＳ４０７０の処理が所定の閾値以下の処理時間で完了するように解像度を設定する。例えば、予め４Ｋ解像度の仮想視点画像を生成する場合のＳ５０５０及びＳ４０７０の処理を実行し、その処理時間が０．５秒であったとする。この場合に、フレームレートが６０ｆｐｓである仮想視点画像の１フレームに対応する０．０１６秒以内にＳ５０５０及びＳ４０７０の処理を完了させたければ、解像度を４Ｋの０．０１６／０．５＝１／３１．２５倍以下にする必要がある。そこで、仮想視点画像の解像度を縦横それぞれ４Ｋ解像度の１／８倍に設定すれば、画素値を計算すべき画素ブロックの数は１／６４になり、０．０１６秒未満で処理を完了できる。 In S5030, the image generation unit 120 sets the resolution of the virtual viewpoint image to be generated. Here, the image generation unit 120 sets the resolution so that the processing of S5050 and S4070 is completed within the processing time of a predetermined threshold value or less. For example, it is assumed that the processing of S5050 and S4070 in the case of generating a virtual viewpoint image having a 4K resolution is executed in advance, and the processing time is 0.5 seconds. In this case, if it is desired to complete the processing of S5050 and S4070 within 0.016 seconds corresponding to one frame of the virtual viewpoint image having a frame rate of 60 fps, the resolution is 0.016 / 0.5 = 1 / of 4K. It is necessary to make it 31.25 times or less. Therefore, if the resolution of the virtual viewpoint image is set to 1/8 times the 4K resolution in each of the vertical and horizontal directions, the number of pixel blocks for which the pixel value should be calculated becomes 1/64, and the processing can be completed in less than 0.016 seconds.

なお、Ｓ５０５０及びＳ４０７０の処理によって仮想視点画像が出力された後に、Ｓ４０８０において画像生成を続ける判断がされた場合、Ｓ５０３０に戻って解像度を再設定する。ここでは、先に出力した仮想視点画像より高画質な仮想視点画像が生成されるように、許容する処理時間を長くし、それに応じて解像度を高くする。例えば、解像度を縦横それぞれ４Ｋ解像度の１／４に設定すると、０．１秒以下の処理時間でＳ５０５０及びＳ４０７０の処理が完了される。Ｓ５０４０において、画像生成部１２０は、仮想視点画像において画素値を計算すべき画素の位置を、Ｓ５０３０で設定された解像度に応じて決定する。例えば、仮想視点画像の解像度を４Ｋ解像度の１／８に設定した場合、縦横それぞれ８画素毎に画素値が算出される。そして、画素値が算出された画素（ｘ，ｙ）と画素（ｘ＋８，ｙ＋８）の間に存在する画素には、画素（ｘ，ｙ）と同じ画素値が設定される。 If it is determined in S4080 to continue image generation after the virtual viewpoint image is output by the processing of S5050 and S4070, the process returns to S5030 and the resolution is reset. Here, the allowable processing time is lengthened and the resolution is increased accordingly so that the virtual viewpoint image having higher image quality than the previously output virtual viewpoint image is generated. For example, if the resolution is set to 1/4 of the 4K resolution in each of the vertical and horizontal directions, the processing of S5050 and S4070 is completed in a processing time of 0.1 seconds or less. In S5040, the image generation unit 120 determines the position of the pixel for which the pixel value should be calculated in the virtual viewpoint image according to the resolution set in S5030. For example, when the resolution of the virtual viewpoint image is set to 1/8 of the 4K resolution, the pixel value is calculated for each of 8 pixels in the vertical and horizontal directions. Then, the same pixel value as the pixel (x, y) is set for the pixel existing between the pixel (x, y) from which the pixel value is calculated and the pixel (x + 8, y + 8).

また、仮想視点画像を１回生成した後に、解像度を高くして２回目の処理を行う場合には、１回目に画素値が算出された画素は飛ばして画素値を算出する。例えば、解像度が４Ｋ解像度の１／４に設定された場合、画素（ｘ＋４，ｙ＋４）の画素値を算出し、画素（ｘ＋４，ｙ＋４）と画素（ｘ＋８，ｙ＋８）の間に存在する画素には、画素（ｘ＋４，ｙ＋４）と同じ画素値が設定される。このように、画素値を算出する画素の数を増やしていくことで、仮想視点画像の解像度を最大で４Ｋ解像度まで高くすることができる。 Further, when the virtual viewpoint image is generated once and then the resolution is increased to perform the second processing, the pixel for which the pixel value is calculated for the first time is skipped and the pixel value is calculated. For example, when the resolution is set to 1/4 of the 4K resolution, the pixel value of the pixel (x + 4, y + 4) is calculated, and the pixel existing between the pixel (x + 4, y + 4) and the pixel (x + 8, y + 8) , The same pixel value as the pixel (x + 4, y + 4) is set. By increasing the number of pixels for which the pixel value is calculated in this way, the resolution of the virtual viewpoint image can be increased to a maximum of 4K resolution.

Ｓ５０５０において、画像生成部１２０は、Ｓ５０４０で決定された位置の画素の画素値を算出して仮想視点画像への色付け処理を行う。画素値の算出方法としては、例えばＩｍａｇｅ－ＢａｓｅｄＶｉｓｕａｌＨｕｌｌの方法を使用することができる。この方法では画素毎に画素値が算出されるので、画素値を算出すべき画素の数が少ないほど、すなわち仮想視点画像の解像度が低いほど、処理時間が短くなる。 In S5050, the image generation unit 120 calculates the pixel value of the pixel at the position determined in S5040 and performs the coloring process on the virtual viewpoint image. As a method for calculating the pixel value, for example, the method of Image-Based Visual Hull can be used. In this method, the pixel value is calculated for each pixel, so that the smaller the number of pixels for which the pixel value should be calculated, that is, the lower the resolution of the virtual viewpoint image, the shorter the processing time.

Ｓ４０７０において、出力部１３０は、画像生成部１２０によりＳ５０５０において生成された仮想視点画像を、表示装置３や表示装置４へ出力する。 In S4070, the output unit 130 outputs the virtual viewpoint image generated in S5050 by the image generation unit 120 to the display device 3 and the display device 4.

Ｓ４０８０において、画像生成部１２０は、Ｓ５０５０において生成された仮想視点画像よりも高画質な仮想視点画像を生成する処理を行うか否か判断する。例えば、Ｓ５０５０において生成された仮想視点画像が操作者に仮想視点を指定させるための画像であり、さらにライブ画像を生成する場合には、Ｓ５０３０に戻って、解像度を高くした仮想視点画像を生成する。また、ライブ画像を生成した後に、さらに非ライブ画像を生成する場合には、さらに解像度を高くした非ライブ画像としての仮想視点画像を生成する。すなわち、ライブ画像としての仮想視点画像は、指定用画像としての仮想視点画像より解像度が高いため、ライブ画像は指定用画像よりも画質が高い。同様に、非ライブ画像としての仮想視点画像は、ライブ画像としての仮想視点画像よりも解像度が高いため、非ライブ画像はライブ画像よりも画質が高い。 In S4080, the image generation unit 120 determines whether or not to perform a process of generating a virtual viewpoint image having a higher image quality than the virtual viewpoint image generated in S5050. For example, the virtual viewpoint image generated in S5050 is an image for allowing the operator to specify a virtual viewpoint, and when a live image is generated, the process returns to S5030 to generate a virtual viewpoint image with a higher resolution. .. Further, when a non-live image is generated after the live image is generated, a virtual viewpoint image as a non-live image with a higher resolution is generated. That is, since the virtual viewpoint image as a live image has a higher resolution than the virtual viewpoint image as a designated image, the live image has a higher image quality than the designated image. Similarly, since the virtual viewpoint image as a non-live image has a higher resolution than the virtual viewpoint image as a live image, the non-live image has a higher image quality than the live image.

以上の処理により、画像処理装置１は、解像度を段階的に向上させた複数の仮想視点画像をそれぞれ適切なタイミングで生成して出力することが可能となる。例えば、仮想視点画像の解像度を、設定された処理時間以内に生成処理が完了できるような解像度に設定することで、遅延の少ない指定用画像を生成することができる。また、ライブ画像や非ライブ画像を生成する場合には、解像度を高く設定して生成処理を行うことで、より高画質の画像を生成することができる。 Through the above processing, the image processing apparatus 1 can generate and output a plurality of virtual viewpoint images whose resolutions are gradually improved at appropriate timings. For example, by setting the resolution of the virtual viewpoint image to a resolution that allows the generation process to be completed within the set processing time, it is possible to generate a designated image with little delay. Further, when a live image or a non-live image is generated, a higher image quality image can be generated by setting a high resolution and performing the generation process.

以上のように、画像処理装置１は、仮想視点画像の画質を向上させるための画像処理を行うことにより高画質の画像（例えば非ライブ画像）を生成する。また画像処理装置１は、該画像処理に含まれる部分的な処理であって所定の閾値以下の処理時間で実行される処理によって低画質の画像（例えばライブ画像）を生成する。これにより、所定時間以下の遅延で表示される仮想視点画像と、高画質な仮想視点画像とを両方生成して表示することが可能となる。 As described above, the image processing device 1 generates a high-quality image (for example, a non-live image) by performing image processing for improving the image quality of the virtual viewpoint image. Further, the image processing apparatus 1 generates a low-quality image (for example, a live image) by a partial processing included in the image processing and executed in a processing time equal to or less than a predetermined threshold value. This makes it possible to generate and display both a virtual viewpoint image displayed with a delay of a predetermined time or less and a high-quality virtual viewpoint image.

なお、図６の説明においては、所定の閾値以下の処理時間で生成処理を完了させるための生成パラメータ（解像度）を推定し、推定された生成パラメータで仮想視点画像を生成するものとした。ただしこれに限らず、画像処理装置１は、仮想視点画像の画質を段階的に向上させていき、処理時間が所定の閾値に達した時点において生成済みの仮想視点画像を出力してもよい。例えば、処理時間が所定の閾値に達した時点において、解像度が４Ｋ解像度の１／８である仮想視点画像が生成済みであり、解像度が４Ｋ解像度の１／４である仮想視点画像が未完成である場合には、１／８の解像度の仮想視点画像を出力してもよい。また、１／８の解像度から１／４の解像度へ解像度を向上させる処理が途中まで行われた仮想視点画像を出力してもよい。 In the description of FIG. 6, a generation parameter (resolution) for completing the generation process in a processing time equal to or less than a predetermined threshold value is estimated, and a virtual viewpoint image is generated with the estimated generation parameter. However, the present invention is not limited to this, and the image processing apparatus 1 may gradually improve the image quality of the virtual viewpoint image and output the generated virtual viewpoint image when the processing time reaches a predetermined threshold value. For example, when the processing time reaches a predetermined threshold, a virtual viewpoint image having a resolution of 1/8 of the 4K resolution has already been generated, and a virtual viewpoint image having a resolution of 1/4 of the 4K resolution has not been completed. In some cases, a virtual viewpoint image having a resolution of 1/8 may be output. Further, a virtual viewpoint image may be output in which the process of improving the resolution from 1/8 resolution to 1/4 resolution is performed halfway.

本実施形態では、画像処理装置１が有する画像生成部１２０が、カメラ情報取得部１００が取得した画像と視点取得部１１０が取得した仮想視点情報とに基づいて仮想視点画像の生成を制御し、異なる画質の複数の仮想視点画像を生成する場合を中心に説明した。ただしこれに限らず、仮想視点画像の生成を制御する機能と、実際に仮想視点画像を生成する機能とが、それぞれ異なる装置に備わっていてもよい。 In the present embodiment, the image generation unit 120 of the image processing device 1 controls the generation of a virtual viewpoint image based on the image acquired by the camera information acquisition unit 100 and the virtual viewpoint information acquired by the viewpoint acquisition unit 110. The explanation has focused on the case of generating multiple virtual viewpoint images with different image quality. However, the present invention is not limited to this, and different devices may have a function of controlling the generation of the virtual viewpoint image and a function of actually generating the virtual viewpoint image.

例えば、画像処理システム１０内に、画像生成部１２０の機能を有し仮想視点画像を生成する生成装置（不図示）が存在してもよい。そして、画像処理装置１はカメラ情報取得部１００が取得した画像及び視点取得部１１０が取得した情報に基づいて生成装置による仮想視点画像の生成を制御してもよい。具体的には、画像処理装置１が撮影画像と仮想視点情報を生成装置に送信し、仮想視点画像の生成を制御する指示を行う。そして生成装置は、第１の仮想視点画像と、第１の仮想視点画像が表示されるより早いタイミングで表示されるべき第２の仮想視点画像であって第１の仮想視点画像より画質が低い第２の仮想視点画像とを、受信した撮影画像と仮想視点情報とに基づいて生成する。ここで第１の仮想視点画像は例えば非ライブ画像であり、第２の仮想視点画像は例えばライブ画像である。ただし第１の仮想視点画像と第２の仮想視点画像の用途はこれに限定されない。なお、画像処理装置１は、第１の仮想視点画像と第２の仮想視点画像とがそれぞれ異なる生成装置により生成されるように制御を行ってもよい。また、画像処理装置１は、生成装置による仮想視点画像の出力先や出力タイミングを制御する等の出力制御を行ってもよい。 For example, in the image processing system 10, there may be a generation device (not shown) that has the function of the image generation unit 120 and generates a virtual viewpoint image. Then, the image processing device 1 may control the generation of the virtual viewpoint image by the generation device based on the image acquired by the camera information acquisition unit 100 and the information acquired by the viewpoint acquisition unit 110. Specifically, the image processing device 1 transmits the captured image and the virtual viewpoint information to the generation device, and gives an instruction to control the generation of the virtual viewpoint image. The generator is a first virtual viewpoint image and a second virtual viewpoint image that should be displayed at an earlier timing than the first virtual viewpoint image is displayed, and the image quality is lower than that of the first virtual viewpoint image. The second virtual viewpoint image is generated based on the received captured image and the virtual viewpoint information. Here, the first virtual viewpoint image is, for example, a non-live viewpoint image, and the second virtual viewpoint image is, for example, a live image. However, the use of the first virtual viewpoint image and the second virtual viewpoint image is not limited to this. The image processing device 1 may be controlled so that the first virtual viewpoint image and the second virtual viewpoint image are generated by different generation devices. Further, the image processing device 1 may perform output control such as controlling the output destination and output timing of the virtual viewpoint image by the generation device.

また、生成装置が視点取得部１１０及び画像生成部１２０の機能を有しており、画像処理装置１がカメラ情報取得部１００により取得される画像に基づいて生成装置による仮想視点画像の生成を制御してもよい。ここでカメラ情報取得部１００により取得される画像は、カメラ群２により撮影された撮影画像や複数の撮影画像の差分に基づいて生成された画像などの、撮影に基づく画像である。また、生成装置がカメラ情報取得部１００及び画像生成部１２０の機能を有しており、画像処理装置１が視点取得部１１０により取得される画像に基づいて生成装置による仮想視点画像の生成を制御してもよい。ここで視点取得部１１０により取得される画像は、仮想視点画像における被写体の形状や向きなど仮想視点に応じて定まる内容を示す情報や仮想視点情報など、仮想視点の指定に応じた情報である。すなわち、画像処理装置１は、撮影に基づく画像及び仮想視点の指定に応じた情報の少なくとも何れかを含む仮想視点画像の生成に係る情報を取得し、取得した情報に基づいて仮想視点画像の生成を制御してもよい。 Further, the generation device has the functions of the viewpoint acquisition unit 110 and the image generation unit 120, and the image processing device 1 controls the generation of the virtual viewpoint image by the generation device based on the image acquired by the camera information acquisition unit 100. You may. Here, the image acquired by the camera information acquisition unit 100 is an image based on shooting, such as a captured image captured by the camera group 2 or an image generated based on the difference between a plurality of captured images. Further, the generation device has the functions of the camera information acquisition unit 100 and the image generation unit 120, and the image processing device 1 controls the generation of the virtual viewpoint image by the generation device based on the image acquired by the viewpoint acquisition unit 110. You may. Here, the image acquired by the viewpoint acquisition unit 110 is information according to the designation of the virtual viewpoint, such as information indicating contents determined according to the virtual viewpoint such as the shape and orientation of the subject in the virtual viewpoint image and virtual viewpoint information. That is, the image processing device 1 acquires information related to the generation of the virtual viewpoint image including at least one of the image based on the shooting and the information corresponding to the designation of the virtual viewpoint, and generates the virtual viewpoint image based on the acquired information. May be controlled.

また例えば、画像処理システム１０内に存在する生成装置がカメラ情報取得部１００、視点取得部１１０及び画像生成部１２０の機能を有しており、画像処理装置１は仮想視点画像の生成に係る情報に基づいて生成装置による仮想視点画像の生成を制御してもよい。この場合における仮想視点画像の生成に係る情報は、例えば生成装置により生成される第１の仮想視点画像の画質に関するパラメータ及び第２の仮想視点画像の画質に関するパラメータの少なくとも何れかを含む。画質に関するパラメータの具体例としては、仮想視点画像の生成に用いられる撮影画像に対応するカメラの数、仮想視点画像の解像度、仮想視点画像の生成に係る処理時間として許容される時間等がある。画像処理装置１は例えば操作者による入力に基づいてこれらの画質に関するパラメータを取得し、パラメータを生成装置に送信するなど、取得したパラメータに基づいて生成装置を制御する。これにより操作者は、それぞれ異なる所望の画質の複数の仮想視点画像を生成させることができる。 Further, for example, the generation device existing in the image processing system 10 has the functions of the camera information acquisition unit 100, the viewpoint acquisition unit 110, and the image generation unit 120, and the image processing device 1 has information related to the generation of a virtual viewpoint image. The generation of the virtual viewpoint image by the generator may be controlled based on the above. The information related to the generation of the virtual viewpoint image in this case includes, for example, at least one of a parameter relating to the image quality of the first virtual viewpoint image and a parameter relating to the image quality of the second virtual viewpoint image generated by the generator. Specific examples of parameters related to image quality include the number of cameras corresponding to the captured image used to generate the virtual viewpoint image, the resolution of the virtual viewpoint image, the time allowed as the processing time related to the generation of the virtual viewpoint image, and the like. The image processing device 1 controls the generation device based on the acquired parameters, for example, acquiring parameters related to these image quality based on the input by the operator and transmitting the parameters to the generation device. This allows the operator to generate a plurality of virtual viewpoint images having different desired image quality.

以上説明したように、画像処理装置１は、複数のカメラによるそれぞれ異なる方向からの被写体の撮影に基づく画像と仮想視点の指定に応じた情報とに基づく仮想視点画像の生成指示を受け付ける。そして画像処理装置１は、第１表示装置に出力される第１の仮想視点画像と第２表示装置に出力される第２の仮想視点画像とが、撮影に基づく画像と仮想視点の指定に応じた情報とに基づいて生成されるように、生成指示の受け付けに応じて制御を行う。ここで、第２の仮想視点画像は、第１の仮想視点画像より画質が高い仮想視点画像である。これにより、例えばリアルタイムで仮想視点画像を見たいユーザとリアルタイム性よりも仮想視点画像が高画質であることを優先するユーザの両方がいるような場合にも、表示されるべきタイミングに適した仮想視点画像を生成することができる。 As described above, the image processing device 1 receives an instruction to generate a virtual viewpoint image based on an image based on shooting of a subject from different directions by a plurality of cameras and information according to a designation of a virtual viewpoint. Then, in the image processing device 1, the first virtual viewpoint image output to the first display device and the second virtual viewpoint image output to the second display device correspond to the designation of the image based on shooting and the virtual viewpoint. Control is performed according to the acceptance of the generation instruction so that the image is generated based on the information received. Here, the second virtual viewpoint image is a virtual viewpoint image having higher image quality than the first virtual viewpoint image. This makes it suitable for the timing to be displayed even when there are both a user who wants to see the virtual viewpoint image in real time and a user who prioritizes high image quality of the virtual viewpoint image over real time. A viewpoint image can be generated.

なお、本実施形態では仮想視点画像の画質として色の階調、解像度、及び仮想視点画像の生成に用いられる撮影画像に対応するカメラの数を制御する場合について説明したが、画質としてその他のパラメータを制御してもよい。また、画質に関する複数のパラメータを同時に制御してもよい。 In the present embodiment, the case of controlling the color gradation, the resolution, and the number of cameras corresponding to the captured image used for generating the virtual viewpoint image as the image quality of the virtual viewpoint image has been described, but other parameters as the image quality have been described. May be controlled. Further, a plurality of parameters related to image quality may be controlled at the same time.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ等）によっても実現可能である。また、そのプログラムをコンピュータにより読み取り可能な記録媒体に記録して提供してもよい。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC or the like) that realizes one or more functions. Further, the program may be recorded and provided on a recording medium readable by a computer.

１画像処理装置
２カメラ群
１００カメラ情報取得部
１１０仮想視点情報取得部
１２０画像生成部 1 Image processing device 2 Camera group 100 Camera information acquisition unit 110 Virtual viewpoint information acquisition unit 120 Image generation unit

Claims

Image acquisition means for acquiring multiple images based on images taken from different positions by multiple imaging devices,
A reception means that accepts input according to the operation for specifying a virtual viewpoint,
A first generation means used for the operation, which is a generation means for generating a plurality of virtual viewpoint images based on the plurality of images acquired by the image acquisition means and the input received by the reception means. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. It has a virtual viewpoint image and a generation means for generating it.
The number of photographing devices corresponding to the plurality of images used to generate the second virtual viewpoint image by the generation means corresponds to the plurality of images used to generate the first virtual viewpoint image by the generation means. More than the number of shooting devices,
The plurality of photographing devices corresponding to the plurality of images used to generate the second virtual viewpoint image by the generation means correspond to the plurality of images used to generate the first virtual viewpoint image by the generation means. An image processing system characterized by including a plurality of photographing devices.

Image acquisition means for acquiring multiple images based on images taken from different positions by multiple imaging devices,
A reception means that accepts input according to the operation for specifying a virtual viewpoint,
A first generation means used for the operation, which is a generation means for generating a plurality of virtual viewpoint images based on the plurality of images acquired by the image acquisition means and the input received by the reception means. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. It has a virtual viewpoint image and a generation means for generating it.
The generation means performs image processing for increasing the image data size per frame of a moving image on a virtual viewpoint image generated based on the plurality of images and the input, thereby performing the second virtual viewpoint image. Is a partial process included in the process for generating the second virtual viewpoint image from the virtual viewpoint image, and the process is executed in a processing time equal to or less than a predetermined threshold. An image processing system characterized by generating a virtual viewpoint image of 1.

In addition to the first virtual viewpoint image and the second virtual viewpoint image, the generation means has a third virtual viewpoint in which the image data size per frame of the moving image is larger than that of the second virtual viewpoint image. The image processing system according to claim 2, wherein an image is generated based on the plurality of images and the input.

Image acquisition means for acquiring multiple images based on images taken from different positions by multiple imaging devices,
A reception means that accepts input according to the operation for specifying a virtual viewpoint,
A first generation means used for the operation, which is a generation means for generating a plurality of virtual viewpoint images based on the plurality of images acquired by the image acquisition means and the input received by the reception means. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. An image processing system characterized by generating a virtual viewpoint image and a third virtual viewpoint image having a larger image data size per frame of a moving image than the second virtual viewpoint image.

The second virtual viewpoint image is a virtual viewpoint image that is broadcast live.
The image processing system according to claim 4, wherein the third virtual viewpoint image is a virtual viewpoint image that is broadcast after recording.

The generation means performs image processing for increasing the image data size per frame of a moving image on a virtual viewpoint image generated based on the plurality of images and the input, thereby performing the second virtual viewpoint image. Is a partial process included in the process for generating the second virtual viewpoint image from the virtual viewpoint image, and the process is executed in a processing time equal to or less than a predetermined threshold. The image processing system according to claim 4 or 5, wherein the virtual viewpoint image of 1 is generated.

Claim 2 is characterized in that the number of gradations of colors included in the second virtual viewpoint image generated by the generation means is larger than the number of gradations of colors included in the first virtual viewpoint image. The image processing system according to any one of 6 to 6.

The number of photographing devices corresponding to the images used to generate the second virtual viewpoint image by the generation means is the number of photographing devices corresponding to the images used to generate the first virtual viewpoint image by the generation means. The image processing system according to any one of claims 2 to 7, wherein the number of images is greater than one.

The second virtual viewpoint image according to any one of claims 1 to 8, wherein the second virtual viewpoint image is an image for being displayed to a viewer who sees the image generated in response to the operation. Image processing system.

It has an output means for outputting the first virtual viewpoint image and the second virtual viewpoint image generated by the generation means.
Any of claims 1 to 9, wherein the timing at which the first virtual viewpoint image is output by the output means is earlier than the timing at which the second virtual viewpoint image is output by the output means. The image processing system according to item 1.

The generation means so that the first virtual viewpoint image is displayed in the first display area and the second virtual viewpoint image is displayed in a second display area different from the first display area. The image processing system according to any one of claims 1 to 10, further comprising an output control means for controlling the output of the first virtual viewpoint image and the second virtual viewpoint image generated by the above. ..

The generation means further uses at least one of the image data generated in the process of generating the first virtual viewpoint image from the image acquired by the image acquisition means and the first virtual viewpoint image. The image processing system according to any one of claims 1 to 11, wherein the second virtual viewpoint image is generated by performing the processing.

The first virtual viewpoint image is an image showing the shape of an object captured by at least one of the plurality of photographing devices.
Any of claims 1 to 12, wherein the second virtual viewpoint image is an image representing the color of the object that does not appear in the first virtual viewpoint image in addition to the shape of the object. The image processing system according to item 1.

The image processing system according to any one of claims 1 to 13, wherein the resolution of the second virtual viewpoint image is higher than the resolution of the first virtual viewpoint image.

An image acquisition process that acquires multiple images based on images taken from different positions by multiple imaging devices, and
The reception process that accepts input according to the operation to specify the virtual viewpoint,
A first generation step used for the operation, which is a generation step of generating a plurality of virtual viewpoint images based on the plurality of images acquired in the image acquisition step and the input received in the reception step. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. It has a virtual viewpoint image and a generation process to generate it.
In the generation step, the number of photographing devices corresponding to the plurality of images used for generating the second virtual viewpoint image is the number of photographing devices corresponding to the plurality of images used for generating the first virtual viewpoint image. More than a number,
In the generation step, the plurality of photographing devices corresponding to the plurality of images used for generating the second virtual viewpoint image are a plurality of photographing devices corresponding to the plurality of images used for generating the first virtual viewpoint image. An image processing method comprising a device.

An image acquisition process that acquires multiple images based on images taken from different positions by multiple imaging devices, and
The reception process that accepts input according to the operation to specify the virtual viewpoint,
A first generation step used for the operation, which is a generation step of generating a plurality of virtual viewpoint images based on the plurality of images acquired in the image acquisition step and the input received in the reception step. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. It has a virtual viewpoint image and a generation process to generate it.
In the generation step, the second virtual viewpoint image is performed by performing image processing for increasing the image data size per frame of the moving image for the virtual viewpoint image generated based on the plurality of images and the input. Is a partial process included in the process for generating the second virtual viewpoint image from the virtual viewpoint image, and the process is executed in a processing time equal to or less than a predetermined threshold. An image processing method characterized by generating a virtual viewpoint image of 1.

An image acquisition process that acquires multiple images based on images taken from different positions by multiple imaging devices, and
The reception process that accepts input according to the operation to specify the virtual viewpoint,
A first generation step used for the operation, which is a generation step of generating a plurality of virtual viewpoint images based on the plurality of images acquired in the image acquisition step and the input received in the reception step. A second virtual viewpoint image and a second virtual viewpoint image corresponding to the virtual viewpoint specified based on the operation, and the image data size per frame of the moving image is larger than that of the first virtual viewpoint image. An image processing method comprising a generation step of generating a virtual viewpoint image and a third virtual viewpoint image having a larger image data size per frame of a moving image than the second virtual viewpoint image. ..

A program for making a computer function as each means of the image processing system according to any one of claims 1 to 14.