JP2019003325A

JP2019003325A - Image processing system, image processing method and program

Info

Publication number: JP2019003325A
Application number: JP2017115986A
Authority: JP
Inventors: 京 ▲高▼橋; Kyo Takahashi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-06-13
Filing date: 2017-06-13
Publication date: 2019-01-10

Abstract

To reduce unnaturalness due to a difference of exposures in photographing of a foreground image and a background image when creating a virtual viewpoint image.SOLUTION: A back-end server (270) creates a virtual viewpoint image using multiple images photographed by multiple cameras (112). The back-end server (270) acquires a foreground image that is an image based on either one of the multiple images and corresponding to a prescribed object, acquires exposure information indicating an exposure when the foreground image is photographed, and acquires a background image that is an image based on either one of the multiple images and that is photographed with an exposure corresponding to the exposure information and that does not include the prescribed object on the basis of the exposure information. Then, the back-end server (270) creates a virtual viewpoint image using the foreground image and the background image.SELECTED DRAWING: Figure 1

Description

本発明は、被写体を撮影した画像を処理する画像処理装置、画像処理方法、プログラム及び画像処理システムに関する。 The present invention relates to an image processing apparatus, an image processing method, a program, and an image processing system that process an image obtained by photographing a subject.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、当該撮影により得られた複数視点画像を用いて、これら複数のカメラとは異なる仮想的な視点または仮想的なカメラからの仮想視点コンテンツを生成する技術が注目されている。複数視点画像から任意に視点を変更可能な仮想視点コンテンツを生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来る。したがって、仮想視点コンテンツを生成する技術は、任意に視点を変更できない通常の画像と比較してユーザに高臨場感を与えることが出来る。このようにして生成された画像は仮想視点画像と呼ばれる。 Nowadays, multiple cameras are installed at different positions and synchronized shooting from multiple viewpoints. Using multiple viewpoint images obtained by the shooting, virtual viewpoints or virtual cameras different from these multiple cameras are used. A technique for generating virtual viewpoint content has attracted attention. According to the technology for generating virtual viewpoint content in which viewpoints can be arbitrarily changed from a plurality of viewpoint images, for example, a soccer or basketball highlight scene can be viewed from various angles. Therefore, the technology for generating the virtual viewpoint content can give the user a higher sense of presence compared with a normal image in which the viewpoint cannot be arbitrarily changed. The image generated in this way is called a virtual viewpoint image.

特許文献１には、同一の範囲を取り囲むように複数のカメラを配置して、その同一の範囲を撮影した画像を用いて、任意の指定に対応する仮想視点画像を生成、表示する技術が開示されている。 Patent Document 1 discloses a technique for arranging and arranging a plurality of cameras so as to surround the same range, and generating and displaying a virtual viewpoint image corresponding to an arbitrary designation using an image obtained by photographing the same range. Has been.

また、仮想視点画像を生成するには、一般的に、画像内でモデル生成の対象となるオブジェクトが存在する部分が識別されている必要がある。そのオブジェクトが存在する部分は前景と呼ばれ、それ以外は背景と呼ばれる。 In order to generate a virtual viewpoint image, it is generally necessary to identify a portion where an object to be model generated exists in the image. The part where the object exists is called the foreground, otherwise it is called the background.

特開２０１４−２１５８２８号公報JP 2014-215828 A

複数の撮影画像を合成して例えば仮想視点画像のような１つの画像を生成する際には、撮影画像の輝度や色合いに差が出ないようにすることが望まれる。しかしながら、仮想視点画像を生成する際の合成に用いる前景画像と背景画像とで例えば撮影された際のタイミングや撮影した撮影装置が異なると、撮影時の露出が異なる場合が生じ得る。露出が異なる前景画像と背景画像とを仮想視点画像の生成に用いると不自然な画像が合成されるという問題が生ずることがある。 When a plurality of photographed images are combined to generate one image such as a virtual viewpoint image, it is desirable to prevent differences in brightness and color of the photographed images. However, if the foreground image and the background image used for composition when generating the virtual viewpoint image are different in timing at the time of shooting, for example, and the shooting device used for shooting, the exposure at the time of shooting may be different. When a foreground image and a background image having different exposures are used for generating a virtual viewpoint image, there may be a problem that an unnatural image is synthesized.

そこで、本発明は、仮想視点画像を生成する際に前景画像と背景画像との撮影時の露出の差による不自然さを低減することを目的とする。 Therefore, an object of the present invention is to reduce unnaturalness due to a difference in exposure during shooting of a foreground image and a background image when generating a virtual viewpoint image.

本発明は、複数の撮像装置にて撮影された複数の画像を用いて仮想視点画像を生成する画像処理装置であって、前記複数の画像のうちの何れかに基づく画像であって、所定のオブジェクトに対応する前景画像を取得する第１取得手段と、前記第１取得手段により取得された前記前景画像が撮影された際の露出を示す露出情報を取得する第２取得手段と、前記複数の画像のうちの何れかに基づく画像であって、前記第２取得手段により取得された露出情報に対応する露出で撮影され、前記所定のオブジェトが含まれない背景画像を、前記第２取得手段により取得された露出情報に基づいて取得する第３取得手段と、前記第１取得手段により取得された前記前景画像と前記第３取得手段により取得された前記背景画像とを用いて仮想視点画像を生成する生成手段と、を有することを特徴とする。 The present invention is an image processing device that generates a virtual viewpoint image using a plurality of images photographed by a plurality of imaging devices, and is an image based on any of the plurality of images, wherein A first acquisition unit that acquires a foreground image corresponding to an object; a second acquisition unit that acquires exposure information indicating exposure when the foreground image acquired by the first acquisition unit is captured; An image based on any one of the images, which is captured with an exposure corresponding to the exposure information acquired by the second acquisition unit, and does not include the predetermined object, a background image is acquired by the second acquisition unit. A virtual viewpoint image is generated using third acquisition means acquired based on the acquired exposure information, the foreground image acquired by the first acquisition means, and the background image acquired by the third acquisition means. Characterized in that it has a generation unit configured to, a.

また、本発明の他の態様は、複数の撮像装置にて撮影された複数の画像をそれぞれ分離した複数の前景画像および複数の背景画像の中から、仮想視点画像の生成に用いる前景画像および背景画像を決定する決定手段と、前記決定された前記前景画像の撮影時の露出情報と、前記決定された前記背景画像の撮影時の露出情報とを取得する取得手段と、前記前景画像と前記背景画像を基に、前記仮想視点画像を生成する処理手段と、を有し、前記処理手段は、前記決定された前記前景画像と前記背景画像との前記露出情報が異なる場合、前記決定された前景画像の露出に前記背景画像の露出を合わせる処理を行い、前記露出を合わせる処理後の前記背景画像と、前記決定された前景画像とを使用して、前記仮想視点画像を生成することを特徴とする。 Another aspect of the present invention provides a foreground image and a background used for generating a virtual viewpoint image from a plurality of foreground images and a plurality of background images obtained by separating a plurality of images taken by a plurality of imaging devices, respectively. Determining means for determining an image; acquisition means for acquiring exposure information at the time of shooting of the determined foreground image; and exposure information at the time of shooting of the determined background image; the foreground image and the background; Processing means for generating the virtual viewpoint image based on an image, and the processing means, when the exposure information of the determined foreground image and the background image is different, the determined foreground Performing the process of adjusting the exposure of the background image to the exposure of the image, and generating the virtual viewpoint image using the background image after the process of adjusting the exposure and the determined foreground image. You .

本発明によれば、仮想視点画像を生成する際に前景画像と背景画像との撮影時の露出の差による不自然さを低減することができる。 According to the present invention, when generating a virtual viewpoint image, it is possible to reduce unnaturalness due to a difference in exposure during photographing between the foreground image and the background image.

画像処理システムの構成例を示す図である。It is a figure which shows the structural example of an image processing system. カメラアダプタの機能構成例を示す図である。It is a figure which shows the function structural example of a camera adapter. 第１の実施形態のフロントエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the front end server of 1st Embodiment. 第１の実施形態のバックエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the back end server of 1st Embodiment. 第１の実施形態の仮想視点画像の生成処理フローを示す図である。It is a figure which shows the production | generation processing flow of the virtual viewpoint image of 1st Embodiment. 第２の実施形態のバックエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the back end server of 2nd Embodiment. 第３の実施形態のバックエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the back end server of 3rd Embodiment. 第３の実施形態における前景画像と背景画像の露出関係の説明図である。It is explanatory drawing of the exposure relationship of a foreground image and a background image in 3rd Embodiment. 第４の実施形態のフロントエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the front end server of 4th Embodiment. 第４の実施形態のバックエンドサーバの機能構成例を示す図である。It is a figure which shows the function structural example of the back end server of 4th Embodiment.

以下、図面を参照しながら、本発明の実施形態の一例を詳細に説明する。
本実施形態では、複数のカメラをスタジアム等に設置して同期撮影を行い、撮影途中に撮影パラメータが変更された撮影データを用いて、仮想視点画像を生成する画像処理システムを例に挙げて説明する。
＜第１の実施形態＞
第１の実施形態では、撮影時の撮影パラメータ変更によって仮想視点画像の生成に用いる画像の露出情報が異なっている場合に、露出情報の一致する画像をデータベースより再取得して仮想視点画像を生成する例について説明する。 Hereinafter, an example of an embodiment of the present invention will be described in detail with reference to the drawings.
In the present embodiment, an example of an image processing system that installs a plurality of cameras in a stadium or the like, performs synchronous shooting, and generates a virtual viewpoint image using shooting data in which shooting parameters are changed during shooting will be described as an example. To do.
<First Embodiment>
In the first embodiment, when exposure information of an image used for generating a virtual viewpoint image differs due to a change in shooting parameters at the time of shooting, a virtual viewpoint image is generated by re-acquiring images with matching exposure information from the database. An example will be described.

図１は、本実施形態に係わる画像処理システム１００の概略構成を示すブロック図である。
画像処理システム１００は、センサシステム１１０ａ〜センサシステム１１０ｚ、画像コンピューティングサーバ２００、コントローラ３００、スイッチングハブ１８０、及びエンドユーザ端末１９０を有する。
コントローラ３００は、制御ステーション３１０と仮想カメラ操作ＵＩ３３０を有する。制御ステーション３１０は、画像処理システム１００を構成するそれぞれのブロックに対してネットワーク３１０ａ〜３１０ｃ、１８０ａ、１８０ｂ、及びデイジーチェーン１７０ａ〜１７０ｙを通じて動作状態の管理及びパラメータ設定制御等を行う。これら各ネットワークは、例えば、Ｅｔｈｅｒｎｅｔ（Ｅｔｈｅｒｎｅｔ、イーサネットは登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサネット）や１０ＧｂＥを用いることができる。また、各ネットワークは、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用イーサネット等を組合せて構成されてもよい。さらに、これらにも限定されず、他の種別のネットワークであってもよい。 FIG. 1 is a block diagram showing a schematic configuration of an image processing system 100 according to the present embodiment.
The image processing system 100 includes a sensor system 110a to a sensor system 110z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190.
The controller 300 includes a control station 310 and a virtual camera operation UI 330. The control station 310 performs operation state management, parameter setting control, and the like through the networks 310a to 310c, 180a and 180b, and the daisy chains 170a to 170y for each block constituting the image processing system 100. For each of these networks, for example, GbE (Gigabit Ethernet) or 10 GbE conforming to the IEEE standard, which is Ethernet (Ethernet is a registered trademark), can be used. Each network may be configured by combining interconnect Infiniband, industrial Ethernet, and the like. Further, the present invention is not limited to these, and other types of networks may be used.

先ず、センサシステム１１０ａ〜１１０ｚの２６セットの画像及び音声をセンサシステム１１０ｚから画像コンピューティングサーバ２００へ送信する動作を説明する。本実施形態の画像処理システム１００では、センサシステム１１０ａ〜１１０ｚがデイジーチェーン１７０ａ〜１７０ｙにより接続される。
なお、本実施形態において、特別な説明がない場合には、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットの各システムを区別せずに、センサシステム１１０と記載する。各センサシステム１１０内の各装置についても同様に、特別な説明がない場合にはそれらを区別せずに、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１２０と記載する。なお、センサシステムの台数として２６セットを記載しているが、これはあくまでも一例であり、台数はこの例に限定されるものではない。また、本実施形態では、特に断りがない限り、画像という文言が動画と静止画の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１００は、静止画及び動画の何れについても処理可能である。また、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツには、仮想視点画像と仮想視点音声が含まれるが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくてもよい。また例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であってもよい。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略しているが、基本的に画像と音声は共に処理されるものとする。さらに、本実施形態では、特に断りがない限り、ネットワークを介して伝送される画像データ、音声データを、単に画像、音声と記載する。 First, an operation of transmitting 26 sets of images and sounds of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of the present embodiment, the sensor systems 110a to 110z are connected by daisy chains 170a to 170y.
In addition, in this embodiment, when there is no special description, each system of 26 sets from the sensor system 110a to the sensor system 110z is described as the sensor system 110 without distinguishing. Similarly, each device in each sensor system 110 is described as a microphone 111, a camera 112, a camera platform 113, an external sensor 114, and a camera adapter 120 without distinguishing them unless there is a special description. In addition, although 26 sets are described as the number of sensor systems, this is only an example, and the number is not limited to this example. Further, in this embodiment, unless otherwise specified, the term “image” is described as including the concept of a moving image and a still image. That is, the image processing system 100 according to the present embodiment can process both still images and moving images. In the present embodiment, the virtual viewpoint content provided by the image processing system 100 includes a virtual viewpoint image and a virtual viewpoint sound, but is not limited thereto. For example, sound may not be included in the virtual viewpoint content. Further, for example, the sound included in the virtual viewpoint content may be a sound collected by a microphone closest to the virtual viewpoint. Further, in this embodiment, for the sake of simplicity of explanation, the description of the sound is partially omitted, but it is basically assumed that both the image and the sound are processed. Furthermore, in this embodiment, unless otherwise specified, image data and audio data transmitted via a network are simply described as images and audio.

センサシステム１１０ａ〜１１０ｚは、それぞれが１台ずつのカメラ１１２ａ〜１１２ｚを有する。すなわち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。複数のセンサシステム１１０同士はデイジーチェーン１７０により接続されている。この接続形態により、撮影画像のいわゆる４Ｋや８Ｋ等への高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化が可能となっている。また、接続形態はデイジーチェーン１７０に限らず、各センサシステム１１０ａ〜１１０ｚがスイッチングハブ１８０に接続され、スイッチングハブ１８０を経由してセンサシステム１１０間のデータ送受信を行うスター型のネットワーク構成であってもよい。 Each of the sensor systems 110a to 110z includes one camera 112a to 112z. That is, the image processing system 100 includes a plurality of cameras for photographing a subject from a plurality of directions. The plurality of sensor systems 110 are connected by a daisy chain 170. With this connection form, it is possible to reduce the number of connection cables and save wiring work in increasing the capacity of image data as the resolution of captured images increases to 4K, 8K, etc. and the frame rate. . Further, the connection form is not limited to the daisy chain 170, and each of the sensor systems 110 a to 110 z is connected to the switching hub 180, and is a star type network configuration that transmits and receives data between the sensor systems 110 via the switching hub 180. Also good.

また、図１では、デイジーチェーン１７０となるようセンサシステム１１０ａ〜１１０ｚの全てがカスケード接続されている構成を示したが、この接続例に限定するものではない。例えば、複数のセンサシステム１１０をいくつかのグループに分割して、分割したグループ単位でセンサシステム１１０間をデイジーチェーン接続してもよい。そして、分割単位の終端となるカメラアダプタ１２０がスイッチングハブ１８０に接続されて画像コンピューティングサーバ２００へ画像データの入力を行うようになされていてもよい。このような構成は、スタジアムにおいて特に有効である。例えば、スタジアムが複数階で構成され、フロア毎にセンサシステム１１０を配備する場合が考えられる。この場合、フロア毎或いはスタジアムの半周毎に、画像コンピューティングサーバ２００への入力を行うことができ、全センサシステム１１０を一つのデイジーチェーンで接続する配線が困難な場所でも設置の簡便化及びシステムの柔軟化を図ることができる。 1 shows a configuration in which all of the sensor systems 110a to 110z are cascade-connected so as to form the daisy chain 170, but the present invention is not limited to this connection example. For example, the plurality of sensor systems 110 may be divided into several groups, and the sensor systems 110 may be daisy chain connected in divided groups. Then, the camera adapter 120 that is the end of the division unit may be connected to the switching hub 180 to input image data to the image computing server 200. Such a configuration is particularly effective in a stadium. For example, a case where a stadium is composed of a plurality of floors and the sensor system 110 is deployed on each floor can be considered. In this case, input to the image computing server 200 can be performed every floor or every half of the stadium, and the installation and system can be simplified even in places where wiring is difficult to connect all sensor systems 110 with a single daisy chain. Can be made flexible.

本実施形態において、センサシステム１１０ａは、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、外部センサ１１４ａ、及びカメラアダプタ１２０ａを有する。カメラアダプタ１２０の機能の少なくとも一部はフロントエンドサーバ２３０が有していてもよい。本実施形態の場合、センサシステム１１０ｂ〜１１０ｚについては、センサシステム１１０ａと同様の構成なのでその説明を省略する。なお、センサシステム１１０ｂ〜１１０ｚは、センサシステム１１０ａと同じ構成に限定されるものではなく、それぞれのセンサシステム１１０が異なる構成になされていてもよい。 In the present embodiment, the sensor system 110a includes a microphone 111a, a camera 112a, a pan head 113a, an external sensor 114a, and a camera adapter 120a. The front end server 230 may have at least part of the functions of the camera adapter 120. In the case of the present embodiment, the sensor systems 110b to 110z have the same configuration as the sensor system 110a, and thus description thereof is omitted. The sensor systems 110b to 110z are not limited to the same configuration as the sensor system 110a, and each sensor system 110 may have a different configuration.

カメラ１１２ａにて撮影された画像と、マイク１１１ａにて集音された音声とは、カメラアダプタ１２０ａにおいて様々な画像処理、音声処理が施された後、デイジーチェーン１７０ａを通してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。カメラアダプタ１２０は、撮影した画像から、オブジェクトの画像を含む前景画像と、それ以外の背景画像とを分離する。カメラアダプタ１２０ａで実施される画像処理の詳細については後述する。また、カメラ１１２ａは、撮影時の絞りやシャッタースピード、ＩＳＯ感度といったカメラ露出情報を出力する機能を備えている。カメラアダプタ１２０ａからは、カメラ１１２ａにて撮影された画像のデータと一緒に、その露出情報も、カメラアダプタ１２０ｂに送られる。本実施形態では、露出情報としてＥＶ値を用いることとする。なお、シャッタースピードをＴ、絞りをＦとした場合、ＥＶ値は、下記の式（１）により求めることができる。
ＥＶ＝ｌｏｇ₂Ｆ²−ｌｏｇ₂Ｔ式（１） The image captured by the camera 112a and the sound collected by the microphone 111a are subjected to various image processing and sound processing in the camera adapter 120a, and then the camera adapter 120b of the sensor system 110b through the daisy chain 170a. Is transmitted. The camera adapter 120 separates the foreground image including the object image and the other background images from the captured image. Details of the image processing performed by the camera adapter 120a will be described later. The camera 112a has a function of outputting camera exposure information such as an aperture, shutter speed, and ISO sensitivity at the time of shooting. The exposure information is also sent from the camera adapter 120a to the camera adapter 120b together with the data of the image taken by the camera 112a. In the present embodiment, an EV value is used as exposure information. When the shutter speed is T and the aperture is F, the EV value can be obtained by the following equation (1).
EV = log ₂ F ² -log ₂ T Formula (1)

同様に、センサシステム１１０ｂは、カメラ１１２ｂで撮影した画像とマイク１１１ｂで集音した音声とを、センサシステム１１０ａから取得した画像及び音声と合わせてセンサシステム１１０ｃに伝送する。このような動作が各センサシステム１１０にて続けられることにより、それらセンサシステム１１０ａ〜１１０ｚで取得された画像及び音声は、最終段のセンサシステム１１０ｚからネットワーク１８０ｂを用いてスイッチングハブ１８０に伝送される。そして、それら画像及び音声のデータは、スイッチングハブ１８０から画像コンピューティングサーバ２００へ伝送される。 Similarly, the sensor system 110b transmits the image captured by the camera 112b and the sound collected by the microphone 111b together with the image and sound acquired from the sensor system 110a to the sensor system 110c. By continuing such an operation in each sensor system 110, the images and sounds acquired by the sensor systems 110a to 110z are transmitted from the sensor system 110z at the final stage to the switching hub 180 using the network 180b. . These image and audio data are transmitted from the switching hub 180 to the image computing server 200.

次に、画像コンピューティングサーバ２００の構成及び動作について説明する。
本実施形態の画像コンピューティングサーバ２００は、スイッチングハブ１８０を介してセンサシステム１１０ｚから取得したデータの処理を行う。画像コンピューティングサーバ２００は、フロントエンドサーバ２３０、データベース２５０（データベースは、適宜、ＤＢとも記載する。）、バックエンドサーバ２７０、タイムサーバ２９０を有する。 Next, the configuration and operation of the image computing server 200 will be described.
The image computing server 200 according to this embodiment processes data acquired from the sensor system 110z via the switching hub 180. The image computing server 200 includes a front-end server 230, a database 250 (the database is also referred to as DB as appropriate), a back-end server 270, and a time server 290.

タイムサーバ２９０は、時刻及び同期信号を配信する機能を有し、スイッチングハブ１８０を介して、センサシステム１１０ａ〜１１０ｚに時刻及び同期信号を配信する。時刻と同期信号を受信したカメラアダプタ１２０ａ〜１２０ｚは、時刻と同期信号を基にカメラ１１２ａ〜１１２ｚをゲンロック（Ｇｅｎｌｏｃｋ）させて画像フレーム同期を行う。すなわち、タイムサーバ２９０は、複数のカメラ１１２の撮影タイミングを同期させる。これにより、画像処理システム１００は、同じタイミングで撮影された複数の撮影画像に基づいて仮想視点画像を生成できることになり、撮影タイミングのずれによる仮想視点画像の品質低下を抑制できる。 The time server 290 has a function of distributing the time and the synchronization signal, and distributes the time and the synchronization signal to the sensor systems 110a to 110z via the switching hub 180. The camera adapters 120a to 120z that have received the time and the synchronization signal perform image frame synchronization by genlocking the cameras 112a to 112z based on the time and the synchronization signal. That is, the time server 290 synchronizes the shooting timings of the plurality of cameras 112. As a result, the image processing system 100 can generate a virtual viewpoint image based on a plurality of captured images captured at the same timing, and can suppress deterioration in the quality of the virtual viewpoint image due to a shift in the capturing timing.

フロントエンドサーバ２３０は、スイッチングハブ１８０を介してセンサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換する。そして、フロントエンドサーバ２３０は、そのデータ形式変換後の画像データを、カメラの識別子やデータ種別、フレーム番号に応じてデータベース２５０に書き込む。この時、各カメラアダプタ１２０からの前景画像や背景画像と一緒に伝送されたカメラ露出情報は、データ形式変換後の画像データと関連付けてフレーム単位でデータベース２５０に書き込まれる。 The front-end server 230 reconstructs a segmented transmission packet from the image and sound acquired from the sensor system 110z via the switching hub 180 and converts the data format. Then, the front-end server 230 writes the image data after the data format conversion in the database 250 according to the camera identifier, data type, and frame number. At this time, the camera exposure information transmitted together with the foreground image and the background image from each camera adapter 120 is written in the database 250 in units of frames in association with the image data after the data format conversion.

次に、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から視点の指定を受け付け、受け付けられた視点に基づいて、データベース２５０の中に保存されている複数の画像及び音声の中から、仮想視点コンテンツの生成に必要な画像及び音声を決定する。そして、バックエンドサーバ２７０は、それら決定した画像及び音声のデータをデータベース２５０から読み出し、それら読み出したデータを用いて、レンダリング処理等を行うことにより仮想視点コンテンツを生成する。 Next, the back-end server 270 accepts the designation of the viewpoint from the virtual camera operation UI 330, and based on the accepted viewpoint, the virtual end point content of the virtual viewpoint content is selected from a plurality of images and sounds stored in the database 250. Determine the image and sound required for generation. Then, the back-end server 270 reads the determined image and sound data from the database 250, and generates virtual viewpoint content by performing rendering processing and the like using the read data.

なお、画像コンピューティングサーバ２００の構成は図１の構成例に限らない。例えば、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０のうち少なくとも二つが一体となって構成されていてもよい。また、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０の少なくとも何れかが複数含まれていてもよい。また、画像コンピューティングサーバ２００内の任意の位置に、それら装置以外の装置が含まれていてもよい。さらに、画像コンピューティングサーバ２００の機能の少なくとも一部は、エンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有していてもよい。 The configuration of the image computing server 200 is not limited to the configuration example of FIG. For example, at least two of the front end server 230, the database 250, and the back end server 270 may be configured integrally. A plurality of at least one of the front-end server 230, the database 250, and the back-end server 270 may be included. In addition, devices other than these devices may be included in arbitrary positions in the image computing server 200. Furthermore, at least part of the functions of the image computing server 200 may be included in the end user terminal 190 and the virtual camera operation UI 330.

バックエンドサーバ２７０は、複数のカメラ１１２により撮影された撮影画像（複数視点画像）と視点情報とに基づく仮想視点コンテンツをレンダリング処理により生成する。より具体的には、バックエンドサーバ２７０は、複数のカメラ１１２にて取得された撮影画像から、それぞれ対応したカメラアダプタ１２０により抽出された所定領域の画像データと、ユーザ操作により指定された視点とに基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ２７０は、生成した仮想視点コンテンツの画像と、音声とを、エンドユーザ端末１９０に送信する。エンドユーザ端末１９０はディスプレイとスピーカを有しており、バックエンドサーバ２７０から送信されてきた仮想視点コンテンツの画像をディスプレイに表示させ、音声をスピーカから出力させる。 The back-end server 270 generates virtual viewpoint content based on captured images (multi-viewpoint images) captured by the plurality of cameras 112 and viewpoint information by rendering processing. More specifically, the back-end server 270 includes image data of a predetermined area extracted by the corresponding camera adapter 120 from the captured images acquired by the plurality of cameras 112, and a viewpoint specified by a user operation. Based on the above, virtual viewpoint content is generated. Then, the back end server 270 transmits the generated virtual viewpoint content image and sound to the end user terminal 190. The end user terminal 190 has a display and a speaker, displays an image of the virtual viewpoint content transmitted from the back-end server 270 on the display, and outputs sound from the speaker.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像である。仮想的な視点（仮想視点）は、ユーザにより任意に指定されてもよいし、画像解析の結果等に基づいて自動的に指定されてもよい。すなわち、仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の視点候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点コンテンツには音声（オーディオデータ）が含まれる場合の例を挙げて説明しているが、必ずしも音声が含まれていなくてもよい。また、バックエンドサーバ２７０は、仮想視点画像をＨ．２６４やＨＥＶＣに代表される標準技術により圧縮符号化した上で、ＭＰＥＧ−ＤＡＳＨプロトコルを使ってエンドユーザ端末１９０へ送信してもよい。したがって、本実施形態によれば、エンドユーザ端末１９０を操作するユーザは、例えば自ら指定した任意の視点に応じた仮想視点画像と音声を視聴することができることになる。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, the virtual viewpoint image is an image representing the appearance at the designated viewpoint. The virtual viewpoint (virtual viewpoint) may be arbitrarily specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. An image corresponding to a viewpoint designated by the user from a plurality of viewpoint candidates and an image corresponding to a viewpoint automatically designated by the apparatus are also included in the virtual viewpoint image. In the present embodiment, the virtual viewpoint content is described with an example in which audio (audio data) is included, but the audio may not necessarily include audio. Further, the back-end server 270 converts the virtual viewpoint image into the H.264 format. The data may be compressed and encoded by a standard technique represented by H.264 or HEVC and transmitted to the end user terminal 190 using the MPEG-DASH protocol. Therefore, according to the present embodiment, the user who operates the end user terminal 190 can view a virtual viewpoint image and sound corresponding to an arbitrary viewpoint designated by the user, for example.

このように、本実施形態の画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという、３つの機能ドメインを有している。すなわち本実施形態において、映像収集ドメインはセンサシステム１１０ａ〜１１０ｚを含み、データ保存ドメインはデータベース２５０、フロントエンドサーバ２３０及びバックエンドサーバ２７０を含む。また、映像生成ドメインは仮想カメラ操作ＵＩ３３０及びエンドユーザ端末１９０を含む。なお、本構成に限らず、例えば、仮想カメラ操作ＵＩ３３０が直接センサシステム１１０ａ〜１１０ｚから画像を取得することも可能である。 As described above, the image processing system 100 according to the present embodiment has three functional domains: a video collection domain, a data storage domain, and a video generation domain. That is, in the present embodiment, the video collection domain includes the sensor systems 110a to 110z, and the data storage domain includes the database 250, the front end server 230, and the back end server 270. The video generation domain includes a virtual camera operation UI 330 and an end user terminal 190. For example, the virtual camera operation UI 330 can directly acquire images from the sensor systems 110a to 110z.

ただし、本実施形態の画像処理システム１００では、例えば仮想カメラ操作ＵＩ３３０がセンサシステム１１０ａ〜１１０ｚから直接画像を取得する構成ではなく、図１に示したように、データ保存ドメインを中間に配置する構成を採用している。具体的には、データ保存ドメインのフロントエンドサーバ２３０は、センサシステム１１０ａ〜１１０ｚで取得された画像や音声及びそれらのデータのメタ情報を、データベース２５０の共通スキーマ及びデータ型に変換している。これにより、センサシステム１１０ａ〜１１０ｚのカメラ１１２が例えば他機種のカメラに変更されたとしても、そのカメラ機種変更による差分をフロントエンドサーバ２３０により吸収して、データベース２５０に登録することができる。このことによって、カメラ１１２が他機種のカメラに変更された場合に、仮想カメラ操作ＵＩ３３０が適切に動作しなくなってしまうという状態が生ずるのを防ぐことができる。 However, in the image processing system 100 according to the present embodiment, for example, the virtual camera operation UI 330 is not configured to directly acquire images from the sensor systems 110a to 110z, but as illustrated in FIG. Is adopted. Specifically, the front-end server 230 of the data storage domain converts the image and sound acquired by the sensor systems 110 a to 110 z and meta information of the data into the common schema and data type of the database 250. Thereby, even if the camera 112 of the sensor systems 110a to 110z is changed to a camera of another model, for example, the difference due to the camera model change can be absorbed by the front end server 230 and registered in the database 250. As a result, when the camera 112 is changed to a camera of another model, it is possible to prevent a situation in which the virtual camera operation UI 330 stops operating properly.

また本実施形態の画像処理システム１００において、仮想カメラ操作ＵＩ３３０は、直接データベース２５０にアクセスせずにバックエンドサーバ２７０を介してアクセスする構成となされている。すなわち、バックエンドサーバ２７０で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を、仮想カメラ操作ＵＩ３３０で行っている。このことにより、例えば仮想カメラ操作ＵＩ３３０の開発において、ＵＩ操作デバイスや、生成したい仮想視点画像を操作するＵＩの機能要求に対する開発に注力することができるようになる。また、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０の要求に応じて画像生成処理に係わる共通処理を追加又は削除することも可能である。このことによって仮想カメラ操作ＵＩ３３０の要求に柔軟に対応することができる。このように、本実施形態の画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２による撮影で得られた画像データに基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。なお、本実施形態の画像処理システム１００は、前述した物理的な構成に限定される場合だけでなく、論理的に構成されていてもよい。 In the image processing system 100 according to the present embodiment, the virtual camera operation UI 330 is configured to be accessed via the back-end server 270 without directly accessing the database 250. That is, common processing related to image generation processing is performed by the back-end server 270, and a difference portion of an application related to the operation UI is performed by the virtual camera operation UI 330. Thus, for example, in the development of the virtual camera operation UI 330, it becomes possible to focus on the development of UI function devices and UI function requirements for operating a virtual viewpoint image to be generated. Further, the back-end server 270 can add or delete common processing related to image generation processing in response to a request from the virtual camera operation UI 330. As a result, it is possible to flexibly respond to the request of the virtual camera operation UI 330. As described above, in the image processing system 100 of the present embodiment, the virtual viewpoint image is generated by the back-end server 270 based on the image data obtained by photographing with the plurality of cameras 112 for photographing the subject from a plurality of directions. Generated. Note that the image processing system 100 according to the present embodiment is not limited to the physical configuration described above, but may be logically configured.

次に、本実施形態の画像処理システム１００におけるカメラアダプタ１２０、フロントエンドサーバ２３０、バックエンドサーバ２７０が備える機能について、図２以降の各図を用いて詳細に説明する。
図２は、本実施形態のカメラアダプタ１２０の機能ブロックを示した図である。
図２に示すように、カメラアダプタ１２０は、ネットワークアダプタ６１１０、伝送部６１２０、画像処理部６１３０、及び外部機器制御部６１４０を有して構成されている。 Next, functions of the camera adapter 120, the front-end server 230, and the back-end server 270 in the image processing system 100 according to the present embodiment will be described in detail with reference to FIGS.
FIG. 2 is a diagram illustrating functional blocks of the camera adapter 120 of the present embodiment.
As shown in FIG. 2, the camera adapter 120 includes a network adapter 6110, a transmission unit 6120, an image processing unit 6130, and an external device control unit 6140.

ネットワークアダプタ６１１０は、データ送受信部６１１１と時刻制御部６１１２を有して構成されている。
データ送受信部６１１１は、デイジーチェーン１７０、スイッチングハブ１８０、ネットワーク２９１，３１０ａ等を介して、他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、制御ステーション３１０等とデータ通信を行う。データ送受信部６１１１は、後述する前景背景分離部６１３１がカメラ１１２の撮影画像から分離した前景画像及び背景画像を、別のカメラアダプタ１２０に対して出力する。
また、データ送受信部６１１１は、前景画像と背景画像をそれぞれ異なるフレームレートで出力する機能を備えているものとする。本実施形態の場合、前景画像は高フレームレートで出力され、撮影対象（所定領域）を含まない背景画像は低フレームレートで出力されるものとする。出力先のカメラアダプタ１２０は、画像処理システム１００内の各カメラアダプタ１２０のうち、後述するデータルーティング処理部６１２２の処理に応じて予め定められた順序における次のカメラアダプタ１２０である。各カメラアダプタ１２０が前景画像と背景画像とを出力することで、バックエンドサーバ２７０では複数の視点から撮影された前景画像と背景画像に基づく仮想視点画像の生成が可能となる。 The network adapter 6110 includes a data transmission / reception unit 6111 and a time control unit 6112.
The data transmission / reception unit 6111 performs data communication with other camera adapters 120, the front-end server 230, the time server 290, the control station 310, and the like via the daisy chain 170, the switching hub 180, the networks 291 and 310a, and the like. The data transmission / reception unit 6111 outputs the foreground image and the background image separated from the captured image of the camera 112 by the foreground / background separation unit 6131 described later to another camera adapter 120.
Further, the data transmitting / receiving unit 6111 has a function of outputting the foreground image and the background image at different frame rates. In the case of this embodiment, the foreground image is output at a high frame rate, and the background image that does not include the shooting target (predetermined region) is output at a low frame rate. The output destination camera adapter 120 is the next camera adapter 120 in the order determined in advance according to the processing of the data routing processing unit 6122 described below among the camera adapters 120 in the image processing system 100. As each camera adapter 120 outputs the foreground image and the background image, the back-end server 270 can generate a virtual viewpoint image based on the foreground image and the background image taken from a plurality of viewpoints.

時刻制御部６１１２は、例えばＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能を有し、タイムサーバ２９０と時刻同期を行う。なお、時刻同期は、ＩＥＥＥ１５８８に限定されず、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバとの時刻同期が実現されてもよい。 The time control unit 6112 is based on, for example, the IEEE 1588 standard Ordinary Clock, and has a function of storing time stamps of data transmitted to and received from the time server 290, and performs time synchronization with the time server 290. Time synchronization is not limited to IEEE 1588, and time synchronization with a time server may be realized by other EtherAVB standards or a unique protocol.

伝送部６１２０は、ネットワークアダプタ６１１０を介してスイッチングハブ１８０等に対するデータの伝送を制御する機能を有する。伝送部６１２０は、データ圧縮・伸張部６１２１、データルーティング処理部６１２２、時刻同期制御部６１２３、画像・音声伝送処理部６１２４、データルーティング情報保持部６１２５の各機能部を有して構成されている。 The transmission unit 6120 has a function of controlling data transmission to the switching hub 180 and the like via the network adapter 6110. The transmission unit 6120 includes functional units such as a data compression / decompression unit 6121, a data routing processing unit 6122, a time synchronization control unit 6123, an image / audio transmission processing unit 6124, and a data routing information holding unit 6125. .

データルーティング情報保持部６１２５は、データ送受信部６１１１で送受信されるデータの送信先を決定するためのアドレス情報等を保持する機能を有する。
データ圧縮・伸張部６１２１は、データ送受信部６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能とを有している。 The data routing information holding unit 6125 has a function of holding address information and the like for determining a transmission destination of data transmitted / received by the data transmitting / receiving unit 6111.
The data compression / decompression unit 6121 has a function of compressing data transmitted / received via the data transmission / reception unit 6111 by applying a predetermined compression method, compression rate, and frame rate, and a function of decompressing the compressed data And have.

データルーティング処理部６１２２は、データルーティング情報保持部６１２５が保持しているアドレス情報を基に、データ送受信部６１１１が受信したデータ及び画像処理部６１３０で処理されたデータのルーティング先を決定する。さらに、データルーティング処理部６１２２は、決定したルーティング先へデータを送信する機能をも有している。ルーティング先としては、同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０とするのが、それぞれのカメラ１１２同士の画像フレーム相関が高いため画像処理を行う上で好適である。複数のカメラアダプタ１２０それぞれに対するデータルーティング処理部６１２２による決定に応じて、画像処理システム１００内において前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が決められる。 The data routing processing unit 6122 determines the routing destination of the data received by the data transmission / reception unit 6111 and the data processed by the image processing unit 6130 based on the address information held by the data routing information holding unit 6125. Further, the data routing processing unit 6122 also has a function of transmitting data to the determined routing destination. As a routing destination, the camera adapter 120 corresponding to the camera 112 focused on the same gazing point is suitable for performing image processing because the image frame correlation between the cameras 112 is high. In accordance with the determination by the data routing processing unit 6122 for each of the plurality of camera adapters 120, the order of the camera adapters 120 that output the foreground image and the background image in the relay format in the image processing system 100 is determined.

時刻同期制御部６１２３は、ＩＥＥＥ１５８８規格のＰＴＰ（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌ）に準拠し、タイムサーバ２９０と時刻同期に係わる処理を行う機能を有している。なお、ＰＴＰだけでなく、他の同様のプロトコルを利用して時刻同期が行われてもよい。 The time synchronization control unit 6123 conforms to the IEEE 1588 standard PTP (Precision Time Protocol) and has a function of performing processing related to time synchronization with the time server 290. Note that time synchronization may be performed using not only PTP but also other similar protocols.

画像・音声伝送処理部６１２４は、画像又は音声のデータを、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ転送するためのメッセージを作成する機能を有している。メッセージには、画像又は音声のデータ、及び各データのメタ情報が含まれる。本実施形態の場合、メタ情報には、画像の撮影または音声のサンプリングをした時のタイムコードまたはシーケンス番号、データ種別、カメラ１１２やマイク１１１の個体を示す識別子等が含まれる。また、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０から、データ送受信部６１１１を介してメッセージを受け取る。そして、画像・音声伝送処理部６１２４は、そのメッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を、画像または音声のデータに復元する。なお、データを復元した際に、そのデータが圧縮されている場合、データ圧縮・伸張部６１２１は、その圧縮に対応した伸張処理を行う。 The image / audio transmission processing unit 6124 has a function of creating a message for transferring image or audio data to another camera adapter 120 or the front-end server 230 via the data transmission / reception unit 6111. The message includes image or audio data and meta information of each data. In the case of the present embodiment, the meta information includes a time code or sequence number when an image is taken or audio is sampled, a data type, an identifier indicating the individual of the camera 112 or the microphone 111, and the like. The image / sound transmission processing unit 6124 receives a message from another camera adapter 120 via the data transmission / reception unit 6111. Then, the image / sound transmission processing unit 6124 restores the data information fragmented to the packet size defined by the transmission protocol into image or sound data according to the data type included in the message. If the data is compressed when the data is restored, the data compression / decompression unit 6121 performs decompression processing corresponding to the compression.

画像処理部６１３０は、カメラ制御部６１４１の制御によりカメラ１１２が撮影した画像データ、及び、他のカメラアダプタ１２０から受取った画像データに対して処理を行う機能を有する。画像処理部６１３０は、前景背景分離部６１３１、三次元モデル情報生成部６１３２、キャリブレーション制御部６１３３の各機能部を有して構成されている。 The image processing unit 6130 has a function of processing image data captured by the camera 112 under the control of the camera control unit 6141 and image data received from another camera adapter 120. The image processing unit 6130 includes functional units such as a foreground / background separation unit 6131, a 3D model information generation unit 6132, and a calibration control unit 6133.

前景背景分離部６１３１は、カメラ１１２が撮影した画像データを、前景画像と背景画像に分離する機能を有している。すなわち、複数のカメラアダプタ１２０それぞれの前景背景分離部６１３１は、それぞれ対応したカメラ１１２による撮影画像から所定領域を抽出する。本実施形態における所定領域は、例えば撮影画像に対するオブジェクト検出の結果得られる前景画像であり、前景背景分離部６１３１は、この所定領域の抽出を行うことにより、撮影画像を前景画像と背景画像に分離する。なお、本実施形態のように、例えばスタジアムに画像処理システム１００が設置される場合、撮影画像に対するオブジェクトとしては、例えば人物を例に挙げることができる。この場合の人物のオブジェクトは、特定人物（例えば選手、監督、審判等）であってもよいし、ボールやゴール等のように画像パターンが予め定められている物体であってもよい。また、オブジェクトは、人物等のような動体だけでなく、静止した物体であってもよい。本実施形態の画像処理システム１００では、人物等の重要なオブジェクトを含む前景画像と、それらオブジェクトを含まない背景領域とを分離して処理することで、生成される仮想視点画像のオブジェクトに該当する部分の画像の品質を向上させることができる。また、本実施形態によれば、前景画像と背景画像の分離を複数のカメラアダプタ１２０それぞれが行うことで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることができる。なお、所定領域は、前景画像に限らず、例えば背景画像であってもよい。 The foreground / background separator 6131 has a function of separating image data captured by the camera 112 into a foreground image and a background image. That is, the foreground / background separation unit 6131 of each of the plurality of camera adapters 120 extracts a predetermined area from the image captured by the corresponding camera 112. The predetermined area in the present embodiment is, for example, a foreground image obtained as a result of object detection on the captured image, and the foreground / background separation unit 6131 extracts the predetermined area to separate the captured image into the foreground image and the background image. To do. Note that when the image processing system 100 is installed in a stadium, for example, as in the present embodiment, a person can be cited as an example of an object for a captured image. The person object in this case may be a specific person (for example, a player, a director, a referee, etc.), or may be an object with a predetermined image pattern such as a ball or a goal. The object may be not only a moving object such as a person but also a stationary object. In the image processing system 100 according to the present embodiment, the foreground image including important objects such as a person and the background area not including these objects are processed separately to correspond to the object of the generated virtual viewpoint image. The quality of the image of the part can be improved. Further, according to the present embodiment, the foreground image and the background image are separated by each of the plurality of camera adapters 120, whereby the load on the image processing system 100 including the plurality of cameras 112 can be distributed. The predetermined area is not limited to the foreground image, and may be a background image, for example.

三次元モデル情報生成部６１３２は、前景背景分離部６１３１で分離された前景画像及び他のカメラアダプタ１２０から受取った前景画像を利用し、例えばステレオカメラの原理を用いて三次元モデルに係わる画像情報を生成する機能を有している。また、三次元モデル生成には、例えばＶｉｓｕａｌＨｕｌｌを用いる方法が使用されてもよい。 The 3D model information generation unit 6132 uses the foreground image separated by the foreground / background separation unit 6131 and the foreground image received from the other camera adapter 120, and uses the principle of a stereo camera, for example, to obtain image information related to the 3D model. It has the function to generate. Further, for example, a method using Visual Hull may be used for generating the three-dimensional model.

キャリブレーション制御部６１３３は、後述するカメラ制御部６１４１を介してカメラ１１２から、キャリブレーションに必要な画像データを取得して、キャリブレーションに係わる演算処理を行うフロントエンドサーバ２３０に送信する機能を有している。 The calibration control unit 6133 has a function of acquiring image data necessary for calibration from the camera 112 via the camera control unit 6141 to be described later and transmitting the image data to the front-end server 230 that performs arithmetic processing related to calibration. doing.

外部機器制御部６１４０は、カメラ制御部６１４１、マイク制御部６１４２、雲台制御部６１４３、センサ制御部６１４４の各機能部を有して構成されている。カメラ制御部６１４１はこのカメラアダプタ１２０に備えられているカメラ１１２と接続され、同様に、マイク制御部６１４２はマイク１１１と、雲台制御部６１４３は雲台１１３と、センサ制御部６１４４は外部センサ１１４と接続されている。 The external device control unit 6140 includes functional units such as a camera control unit 6141, a microphone control unit 6142, a pan head control unit 6143, and a sensor control unit 6144. The camera control unit 6141 is connected to the camera 112 provided in the camera adapter 120. Similarly, the microphone control unit 6142 is the microphone 111, the camera platform control unit 6143 is the camera platform 113, and the sensor control unit 6144 is an external sensor. 114.

カメラ制御部６１４１は、接続されているカメラ１１２の制御、撮影画像取得、同期信号提供、及び時刻設定等を行う機能を有している。カメラ１１２の制御には、例えば撮影パラメータ（画素数、色深度、フレームレート、ホワイトバランス等）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、エラー等）の取得、撮影の開始と停止、ピント調整等がある。同期信号提供は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に送ることで行われる。時刻設定は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで送ることにより行われる。これにより、カメラ１１２から受取る画像データには、そのタイムコードが付与されることになる。なお、タイムコードのフォーマットは、ＳＭＰＴＥ１２Ｍに限定されるわけではなく、他のフォーマットであってもよい。また、カメラ制御部６１４１は、カメラ１１２に対するタイムコードの送信を行わず、カメラ１１２から受取った画像データに自身がタイムコードを付与してもよい。 The camera control unit 6141 has a function of controlling the connected camera 112, acquiring a captured image, providing a synchronization signal, setting a time, and the like. For the control of the camera 112, for example, setting and referring to shooting parameters (number of pixels, color depth, frame rate, white balance, etc.), acquisition of the state of the camera 112 (shooting, stopping, synchronizing, error, etc.), shooting Start and stop, focus adjustment, etc. The synchronization signal is provided by sending the shooting timing (control clock) to the camera 112 using the time synchronized with the time server 290 by the time synchronization control unit 6123. The time setting is performed by the time synchronization control unit 6123 sending the time synchronized with the time server 290 using, for example, a time code conforming to the SMPTE12M format. As a result, the time code is assigned to the image data received from the camera 112. Note that the format of the time code is not limited to SMPTE12M, and may be another format. In addition, the camera control unit 6141 may add time code to the image data received from the camera 112 without transmitting the time code to the camera 112.

マイク制御部６１４２は、接続されているマイク１１１の制御、集音の開始や停止、集音された音声データの取得等を行う機能を有している。マイク１１１の制御には、例えば、ゲイン調整、状態取得等がある。また、カメラ制御部６１４１の場合と同様、マイク１１１には、音声サンプリングのタイミングとタイムコードが送信される。音声サンプリングのタイミングとなるクロック情報としては、タイムサーバ２９０からの時刻情報を例えば４８ＫＨｚのワードクロックに変換した情報等が用いられる。 The microphone control unit 6142 has functions of controlling the connected microphone 111, starting and stopping sound collection, obtaining collected sound data, and the like. Examples of the control of the microphone 111 include gain adjustment and state acquisition. Similarly to the case of the camera control unit 6141, the audio sampling timing and time code are transmitted to the microphone 111. As clock information used as the timing of audio sampling, information obtained by converting time information from the time server 290 into, for example, a 48 KHz word clock is used.

雲台制御部６１４３は、接続されている雲台１１３の制御を行う機能を有している。雲台１１３の制御は、例えばいわゆるパン・チルト制御や、状態取得等がある。
センサ制御部６１４４は、接続されている外部センサ１１４がセンシングしたセンサ情報を取得する機能を有する。例えば、外部センサ１１４としてジャイロセンサが利用される場合は、センサ制御部６１４４は、振動を表す情報を取得することができる。このセンサ制御部６１４４が取得した振動情報は、画像処理部６１３０に送られる。これにより、画像処理部６１３０は、センサ制御部６１４４が取得した振動情報を用いて、前景背景分離部６１３１での処理に先立って振動を抑えた画像を生成することができる。 The pan head control unit 6143 has a function of controlling the pan head 113 connected thereto. Control of the pan head 113 includes, for example, so-called pan / tilt control and state acquisition.
The sensor control unit 6144 has a function of acquiring sensor information sensed by the connected external sensor 114. For example, when a gyro sensor is used as the external sensor 114, the sensor control unit 6144 can acquire information representing vibration. The vibration information acquired by the sensor control unit 6144 is sent to the image processing unit 6130. Accordingly, the image processing unit 6130 can generate an image in which vibration is suppressed prior to the processing in the foreground / background separation unit 6131 using the vibration information acquired by the sensor control unit 6144.

図３は、本実施形態のフロントエンドサーバ２３０の機能ブロックを示した図である。
図３に示すように、フロントエンドサーバ２３０は、データ入力制御部２１２０、データ同期部２１３０、画像処理部２１５０、三次元モデル結合部２１６０、画像結合部２１７０、撮影データファイル生成部２１８０を有する。さらに、フロントエンドサーバ２３０は、制御部２１１０、ＣＡＤデータ記憶部２１３５、キャリブレーション部２１４０、非撮影データファイル生成部２１８５、ＤＢアクセス制御部２１９０を有して構成されている。 FIG. 3 is a diagram showing functional blocks of the front-end server 230 of the present embodiment.
As shown in FIG. 3, the front-end server 230 includes a data input control unit 2120, a data synchronization unit 2130, an image processing unit 2150, a 3D model combination unit 2160, an image combination unit 2170, and a captured data file generation unit 2180. Further, the front end server 230 includes a control unit 2110, a CAD data storage unit 2135, a calibration unit 2140, a non-photographed data file generation unit 2185, and a DB access control unit 2190.

制御部２１１０は、ＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリ等の記憶媒体、Ｅｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。そして、制御部２１１０は、フロントエンドサーバ２３０の各機能ブロック及びフロントエンドサーバ２３０のシステム全体の制御を行う。また、制御部２１１０は、モード制御を行って、キャリブレーション動作や撮影前の準備動作、及び撮影中動作等の動作モードを切り替える。また、制御部２１１０は、ネットワークを介して制御ステーション３１０からの制御指示を受信し、各モードの切り替えやデータの入出力等を行う。また制御部２１１０は、ネットワークを通じて制御ステーション３１０からスタジアムＣＡＤデータを取得し、そのスタジアムＣＡＤデータをＣＡＤデータ記憶部２１３５と非撮影データファイル生成部２１８５に送信する。 The control unit 2110 includes a CPU, DRAM, a storage medium such as an HDD or NAND memory that stores program data and various data, and hardware such as Ethernet (registered trademark). The control unit 2110 controls each functional block of the front end server 230 and the entire system of the front end server 230. In addition, the control unit 2110 performs mode control to switch operation modes such as a calibration operation, a preparatory operation before photographing, and an operation during photographing. Further, the control unit 2110 receives a control instruction from the control station 310 via the network, and performs switching of each mode, input / output of data, and the like. The control unit 2110 acquires stadium CAD data from the control station 310 via the network, and transmits the stadium CAD data to the CAD data storage unit 2135 and the non-photographed data file generation unit 2185.

データ入力制御部２１２０は、ネットワークとスイッチングハブ１８０を介して、カメラアダプタ１２０と接続されている。そして、データ入力制御部２１２０は、カメラアダプタ１２０から前述した前景画像、背景画像、前述したオブジェクト等の被写体の三次元モデル、音声データ、及びカメラ露出情報を取得する。また、データ入力制御部２１２０は、取得した前景画像及び背景画像をデータ同期部２１３０に送信し、カメラキャリブレーション撮影画像データをキャリブレーション部２１４０に送信する。また、データ入力制御部２１２０は、受信したデータの圧縮伸張やデータルーティング処理等を行う機能を有する。ここで、制御部２１１０とデータ入力制御部２１２０は共に、Ｅｔｈｅｒｎｅｔ（登録商標）等のネットワークによる通信機能を有しているが、通信機能はこれらで共有されていてもよい。その場合、制御ステーション３１０からの制御コマンドによる指示やスタジアムＣＡＤデータをデータ入力制御部２１２０が受けて、制御部２１１０に対して送る方法を用いてもよい。 The data input control unit 2120 is connected to the camera adapter 120 via the network and the switching hub 180. Then, the data input control unit 2120 acquires the above-described foreground image, background image, three-dimensional model of a subject such as the above-described object, audio data, and camera exposure information from the camera adapter 120. In addition, the data input control unit 2120 transmits the acquired foreground image and background image to the data synchronization unit 2130, and transmits camera calibration captured image data to the calibration unit 2140. The data input control unit 2120 has a function of performing compression / decompression of received data, data routing processing, and the like. Here, both the control unit 2110 and the data input control unit 2120 have a communication function via a network such as Ethernet (registered trademark), but the communication function may be shared by them. In that case, a method may be used in which the data input control unit 2120 receives an instruction based on a control command from the control station 310 or stadium CAD data and sends it to the control unit 2110.

データ同期部２１３０は、カメラアダプタ１２０から取得されたデータを不図示のＤＲＡＭ上に一次的に記憶し、前景画像、背景画像、音声データ、三次元モデルデータ、及びカメラ露出情報が揃うまでバッファリングする。なお、前景画像、背景画像、音声データ、三次元モデルデータ、及びカメラ露出情報をまとめて、以下の説明では撮影データと称する。撮影データには、ルーティング情報やタイムコード情報（時間情報）、カメラ識別子等のメタ情報が付与されており、データ同期部２１３０は、このメタ情報を元にデータの属性を確認する。これにより、データ同期部２１３０は、同一時刻のデータであることなどを判断してデータが揃ったことを確認する。データが揃ったら、データ同期部２１３０は、前景画像及び背景画像を画像処理部２１５０に、三次元モデルデータを三次元モデル結合部２１６０に、音声データを撮影データファイル生成部２１８０にそれぞれ送信する。なお、ここで揃えるデータは、後述する撮影データファイル生成部２１８０でファイル生成を行うために必要なデータである。 The data synchronizer 2130 temporarily stores data acquired from the camera adapter 120 on a DRAM (not shown), and buffers until the foreground image, background image, audio data, 3D model data, and camera exposure information are complete. To do. The foreground image, the background image, the sound data, the three-dimensional model data, and the camera exposure information are collectively referred to as shooting data in the following description. Meta information such as routing information, time code information (time information), and a camera identifier is added to the photographing data, and the data synchronization unit 2130 confirms the data attribute based on the meta information. As a result, the data synchronization unit 2130 determines that the data is ready by determining that the data is at the same time. When the data is ready, the data synchronization unit 2130 transmits the foreground image and the background image to the image processing unit 2150, the 3D model data to the 3D model combination unit 2160, and the audio data to the captured data file generation unit 2180. Note that the data aligned here is data necessary for generating a file in the shooting data file generation unit 2180 described later.

ＣＡＤデータ記憶部２１３５は、制御部２１１０から受け取ったスタジアム形状を示す三次元データ（以下、スタジアム形状データと称する。）を不図示のＤＲＡＭまたはＨＤＤやＮＡＮＤメモリ等の記憶媒体に保存する。そして、ＣＡＤデータ記憶部２１３５は、画像結合部２１７０に対して、スタジアム形状データの要求を受け取った際に保存されたスタジアム形状データを送信する。 The CAD data storage unit 2135 stores the three-dimensional data indicating the stadium shape received from the control unit 2110 (hereinafter referred to as stadium shape data) in a storage medium such as a DRAM, HDD, or NAND memory (not shown). The CAD data storage unit 2135 transmits the stadium shape data stored when the stadium shape data request is received to the image combining unit 2170.

キャリブレーション部２１４０は、カメラ１１２のキャリブレーション動作を行い、キャリブレーションによって得られたカメラパラメータを、後述する非撮影データファイル生成部２１８５に送る。また同時に、キャリブレーション部２１４０は、自身の記憶領域にもカメラパラメータを保持し、その保持したカメラパラメータの情報を、後述する三次元モデル結合部２１６０に送信する。 The calibration unit 2140 performs a calibration operation of the camera 112 and sends camera parameters obtained by the calibration to a non-photographed data file generation unit 2185 described later. At the same time, the calibration unit 2140 stores camera parameters in its own storage area, and transmits information about the stored camera parameters to a 3D model combining unit 2160 described later.

画像処理部２１５０は、データ同期部２１３０を介して供給された前景画像と背景画像に対し、カメラ間の色や輝度値の合わせこみ、ＲＡＷ画像データが入力される場合にはその現像処理、カメラのレンズ歪みの補正等の処理を行う。そして、画像処理部２１５０は、画像処理を行った前景画像を撮影データファイル生成部２１８０に、背景画像を画像結合部２１７０にそれぞれ送信する。 The image processing unit 2150 matches the foreground image and the background image supplied via the data synchronization unit 2130 with colors and brightness values between cameras, and when RAW image data is input, develops the camera, Processing such as correction of lens distortion is performed. Then, the image processing unit 2150 transmits the foreground image subjected to the image processing to the shooting data file generation unit 2180 and the background image to the image combination unit 2170, respectively.

三次元モデル結合部２１６０は、データ同期部２１３０を介して供給された同一時刻の三次元モデルデータを、キャリブレーション部２１４０が生成したカメラパラメータを用いて結合する。そして、三次元モデル結合部２１６０は、いわゆるＶｉｓｕａｌＨｕｌｌと呼ばれる方法を用いて、スタジアム全体における前景画像の三次元モデルデータを生成する。三次元モデル結合部２１６０にて生成された三次元モデルは、撮影データファイル生成部２１８０に送られる。 The 3D model combining unit 2160 combines the 3D model data at the same time supplied via the data synchronization unit 2130 using the camera parameters generated by the calibration unit 2140. Then, the three-dimensional model combining unit 2160 generates three-dimensional model data of the foreground image in the entire stadium using a method called “VisualHull”. The 3D model generated by the 3D model combining unit 2160 is sent to the imaging data file generating unit 2180.

画像結合部２１７０は、画像処理部２１５０から背景画像を取得し、ＣＡＤデータ記憶部２１３５からスタジアム形状データを取得し、そのスタジアム形状データの座標に対する背景画像の位置を特定する。画像結合部２１７０は、背景画像の各々についてスタジアム形状データの座標に対する位置が特定できると、それら背景画像を結合して一つの背景画像とする。なお、背景画像の三次元形状データの作成については、バックエンドサーバ２７０が実施してもよい。 The image combining unit 2170 acquires a background image from the image processing unit 2150, acquires stadium shape data from the CAD data storage unit 2135, and specifies the position of the background image with respect to the coordinates of the stadium shape data. When the position with respect to the coordinates of the stadium shape data can be specified for each of the background images, the image combining unit 2170 combines the background images into one background image. Note that the back-end server 270 may perform the creation of the three-dimensional shape data of the background image.

撮影データファイル生成部２１８０は、データ同期部２１３０から音声とカメラ露出情報を、画像処理部２１５０から前景画像を、三次元モデル結合部２１６０から三次元モデルデータを、画像結合部２１７０から三次元形状に結合された背景画像を、取得する。そして、撮影データファイル生成部２１８０は、それら取得したデータをＤＢアクセス制御部２１９０に対して出力する。ここで、撮影データファイル生成部２１８０は、これらのデータをそれぞれの時間情報に基づいて対応付けて出力する。なお、撮影データファイル生成部２１８０は、これらのデータの一部を対応付けて出力してもよい。例えば、撮影データファイル生成部２１８０は、前景画像と背景画像とを、前景画像の時間情報及び背景画像の時間情報に基づいて対応付けて出力する。なお、カメラアダプタ１２０が出力した背景画像のフレームレートは前景画像のフレームレートより低いため、ある時刻においては背景画像が存在しない場合がある。この場合、撮影データファイル生成部２１８０は、背景画像の対応付けは行わない。また、撮影データファイル生成部２１８０は、前景画像、背景画像、及び三次元モデルデータを、前景画像の時間情報、背景画像の時間情報、及び三次元モデルデータの時間情報に基づいて対応付けて出力する。なお、撮影データファイル生成部２１８０は、対応付けられたデータをデータの種類別にファイル化して出力してもよいし、複数種類のデータを時間情報が示す時刻ごとにまとめてファイル化して出力してもよい。そして、このようにして対応付けられた撮影データが、ＤＢアクセス制御部２１９０を介してデータベース２５０に出力される。これにより、バックエンドサーバ２７０は、時刻情報が対応した前景画像と背景画像とから仮想視点画像を生成可能となる。 The shooting data file generation unit 2180 receives audio and camera exposure information from the data synchronization unit 2130, foreground images from the image processing unit 2150, 3D model data from the 3D model combining unit 2160, and 3D shape from the image combining unit 2170. The background image combined with is acquired. Then, the shooting data file generation unit 2180 outputs the acquired data to the DB access control unit 2190. Here, the imaging data file generation unit 2180 outputs these data in association with each other based on the time information. Note that the shooting data file generation unit 2180 may output a part of these data in association with each other. For example, the shooting data file generation unit 2180 outputs the foreground image and the background image in association with each other based on the time information of the foreground image and the time information of the background image. Since the frame rate of the background image output from the camera adapter 120 is lower than the frame rate of the foreground image, the background image may not exist at a certain time. In this case, the shooting data file generation unit 2180 does not associate background images. Further, the shooting data file generation unit 2180 outputs the foreground image, the background image, and the 3D model data in association with each other based on the time information of the foreground image, the time information of the background image, and the time information of the 3D model data. To do. Note that the shooting data file generation unit 2180 may output the associated data as a file for each type of data, or output a plurality of types of data collectively as files for each time indicated by the time information. Also good. Then, the imaging data associated in this way is output to the database 250 via the DB access control unit 2190. As a result, the back-end server 270 can generate a virtual viewpoint image from the foreground image and the background image corresponding to the time information.

非撮影データファイル生成部２１８５は、キャリブレーション部２１４０からカメラパラメータ、制御部２１１０からスタジアム形状データを取得し、ファイル形式に応じて成形した後にＤＢアクセス制御部２１９０に送信する。なお、非撮影データファイル生成部２１８５は、入力されるデータであるカメラパラメータ又はスタジアム形状データを、個別にファイル形式に応じて成形する。すなわち、非撮影データファイル生成部２１８５は、カメラパラメータとスタジアム形状データのどちらか一方のデータを受信した場合、それらを個別にＤＢアクセス制御部２１９０に送信する。 The non-photographed data file generation unit 2185 acquires camera parameters from the calibration unit 2140 and stadium shape data from the control unit 2110, shapes them according to the file format, and transmits them to the DB access control unit 2190. Note that the non-photographed data file generation unit 2185 individually shapes camera parameters or stadium shape data, which are input data, according to the file format. That is, the non-photographed data file generation unit 2185, when receiving either one of the camera parameters and the stadium shape data, individually transmits them to the DB access control unit 2190.

ＤＢアクセス制御部２１９０は、いわゆるＩｎｆｉｎｉＢａｎｄ等により高速な通信が可能となるようにデータベース２５０と接続される。そして、ＤＢアクセス制御部２１９０は、撮影データファイル生成部２１８０と非撮影データファイル生成部２１８５から受信したデータを、データベース２５０に対して送信する。本実施形態の場合、撮影データファイル生成部２１８０が時刻情報に基づいて対応付けた撮影データは、ＤＢアクセス制御部２１９０を介して、データベース２５０へ出力される。 The DB access control unit 2190 is connected to the database 250 so that high-speed communication is possible by so-called InfiniBand. Then, the DB access control unit 2190 transmits the data received from the shooting data file generation unit 2180 and the non-shooting data file generation unit 2185 to the database 250. In the case of the present embodiment, the shooting data associated with the shooting data file generation unit 2180 based on the time information is output to the database 250 via the DB access control unit 2190.

なお、本実施形態では、フロントエンドサーバ２３０が前景画像と背景画像の対応付けを行うものとするが、これに限らず、データベース２５０が対応付けを行ってもよい。この場合、データベース２５０は、フロントエンドサーバ２３０から、時刻情報を有する前景画像及び背景画像を取得する。そして、データベース２５０は、前景画像と背景画像とを前景画像の時刻情報及び背景画像の時刻情報に基づいて対応付けて、データベース２５０が備える記憶部に出力してもよい。 In the present embodiment, the front-end server 230 associates the foreground image with the background image. However, the present invention is not limited to this, and the database 250 may perform the association. In this case, the database 250 acquires a foreground image and a background image having time information from the front end server 230. Then, the database 250 may associate the foreground image and the background image with each other based on the time information of the foreground image and the time information of the background image, and output them to the storage unit included in the database 250.

図４は、本実施形態のバックエンドサーバ２７０の機能ブロックを示した図である。
図４に示すように、バックエンドサーバ２７０は、データ受信部３００１、前景の処理のための、前景テクスチャ決定部３００３、テクスチャ境界色合わせ部３００４、仮想視点前景画像生成部３００５を有する。また、バックエンドサーバ２７０は、背景テクスチャ貼り付け部３００２、レンダリングモード管理部３０１４、レンダリング部３００６を有する。さらに、バックエンドサーバ２７０は、仮想視点音声生成部３００７、合成部３００８、画像出力部３００９を有する。また、バックエンドサーバ２７０は、前景のデータの読出しのために、前景オブジェクト決定部３０１０、要求リスト生成部３０１１、要求データ出力部３０１２を有する。加えて、バックエンドサーバ２７０は、前景画像カメラ露出情報取得部３０１５、背景画像カメラ露出情報取得部３０１６、カメラ露出情報比較部３０１７、背景画像取得判定部３０１８、背景画像要求部３０１９を有する。 FIG. 4 is a diagram illustrating functional blocks of the back-end server 270 according to the present embodiment.
As shown in FIG. 4, the back-end server 270 includes a data receiving unit 3001, a foreground texture determining unit 3003, a texture boundary color matching unit 3004, and a virtual viewpoint foreground image generating unit 3005 for foreground processing. Further, the back-end server 270 includes a background texture pasting unit 3002, a rendering mode management unit 3014, and a rendering unit 3006. Further, the back-end server 270 includes a virtual viewpoint audio generation unit 3007, a synthesis unit 3008, and an image output unit 3009. The back-end server 270 also includes a foreground object determination unit 3010, a request list generation unit 3011, and a request data output unit 3012 for reading out foreground data. In addition, the back-end server 270 includes a foreground image camera exposure information acquisition unit 3015, a background image camera exposure information acquisition unit 3016, a camera exposure information comparison unit 3017, a background image acquisition determination unit 3018, and a background image request unit 3019.

データ受信部３００１は、データベース２５０及びコントローラ３００から送信されるデータを受信する。また、データ受信部３００１は、データベース２５０から、スタジアム形状データ、前景画像、背景画像、前景画像の三次元モデル（以降、前景三次元モデルと称する）、及び音声を受信する。さらに、データ受信部３００１は、受信した前景画像及び背景画像に関連づけられたカメラ露出情報をもデータベース２５０から受信する。 The data receiving unit 3001 receives data transmitted from the database 250 and the controller 300. The data receiving unit 3001 receives stadium shape data, foreground image, background image, three-dimensional model of foreground image (hereinafter referred to as foreground three-dimensional model), and audio from the database 250. Furthermore, the data receiving unit 3001 also receives camera exposure information associated with the received foreground image and background image from the database 250.

また、データ受信部３００１は、コントローラ３００から、仮想視点画像の生成に係る視点を指定する仮想カメラパラメータを受信する。仮想カメラパラメータは、仮想視点の位置や姿勢等を表すデータであり、例えば外部パラメータの行列と内部パラメータの行列が用いられる。なお、データ受信部３００１がコントローラ３００から取得するデータは、仮想カメラパラメータに限らない。コントローラ３００から出力される情報は、視点の指定方法、コントローラ３００が動作させているアプリケーションを特定する情報、コントローラ３００の識別情報、コントローラ３００を使用するユーザの識別情報の、少なくとも何れかを含んでいてもよい。また、データ受信部３００１は、コントローラ３００から出力される上記の情報と同様の情報を、エンドユーザ端末１９０から取得してもよい。さらに、データ受信部３００１は、データベース２５０やコントローラ３００等の装置から、複数のカメラ１１２に関する情報を取得してもよい。複数のカメラ１１２に関する情報は、例えば、複数のカメラ１１２の数に関する情報や複数のカメラ１１２の動作状態に関する情報等である。カメラ１１２の動作状態には、例えば、カメラ１１２の正常状態、故障状態、待機状態、起動状態、及び再起動状態の少なくとも何れかが含まれる。 The data receiving unit 3001 receives from the controller 300 virtual camera parameters that specify a viewpoint related to generation of a virtual viewpoint image. The virtual camera parameter is data representing the position and orientation of the virtual viewpoint, and for example, an external parameter matrix and an internal parameter matrix are used. Note that data that the data receiving unit 3001 acquires from the controller 300 is not limited to virtual camera parameters. The information output from the controller 300 includes at least one of a viewpoint designation method, information that identifies an application that the controller 300 is operating, identification information of the controller 300, and identification information of a user who uses the controller 300. May be. Further, the data receiving unit 3001 may acquire the same information as the information output from the controller 300 from the end user terminal 190. Further, the data receiving unit 3001 may acquire information regarding the plurality of cameras 112 from devices such as the database 250 and the controller 300. The information regarding the plurality of cameras 112 is, for example, information regarding the number of the plurality of cameras 112, information regarding the operation state of the plurality of cameras 112, and the like. The operation state of the camera 112 includes, for example, at least one of a normal state, a failure state, a standby state, a start state, and a restart state of the camera 112.

背景テクスチャ貼り付け部３００２は、背景メッシュモデル管理部３０１３から背景メッシュモデル（スタジアム形状データ）を取得する。背景テクスチャ貼り付け部３００２は、背景メッシュモデルで示される三次元空間形状に対して、データベース２５０から取得した背景画像をテクスチャとして貼り付けることでテクスチャ付き背景メッシュモデルを生成する。メッシュモデルとは、例えばＣＡＤデータ等の三次元の空間形状を面の集合で表現したデータのことである。テクスチャとは、物体の表面の質感を表現するために貼り付ける画像のことである。ただし、データ受信部３００１がデータベース２５０より受信するデータの中に、背景メッシュモデル及び背景画像が存在しない場合がある。この場合、背景テクスチャ貼り付け部３００２は、直前にデータベース２５０より受信した背景メッシュモデル及び背景画像を用いてテクスチャ付き背景メッシュモデルを生成する。 The background texture pasting unit 3002 acquires a background mesh model (stadium shape data) from the background mesh model management unit 3013. The background texture pasting unit 3002 creates a textured background mesh model by pasting the background image acquired from the database 250 as a texture to the three-dimensional space shape indicated by the background mesh model. A mesh model is data that represents a three-dimensional space shape, such as CAD data, as a set of surfaces. A texture is an image that is pasted to express the texture of the surface of an object. However, the background mesh model and the background image may not exist in the data received by the data receiving unit 3001 from the database 250. In this case, the background texture pasting unit 3002 generates a textured background mesh model using the background mesh model and the background image received from the database 250 immediately before.

前景テクスチャ決定部３００３は、前景画像及び前景三次元モデル群より前景三次元モデルのテクスチャ情報を決定する。
前景テクスチャ境界色合わせ部３００４は、各前景三次元モデルのテクスチャ情報と各三次元モデル群からテクスチャの境界の色合わせを行い、前景オブジェクト毎に色付き前景三次元モデル群を生成する。
仮想視点前景画像生成部３００５は、仮想カメラパラメータに基づいて、前景画像群を仮想視点からの見た目となるように透視変換する。 The foreground texture determination unit 3003 determines texture information of the foreground 3D model from the foreground image and the foreground 3D model group.
The foreground texture boundary color matching unit 3004 performs texture matching between the texture information of each foreground 3D model and each 3D model group, and generates a colored foreground 3D model group for each foreground object.
The virtual viewpoint foreground image generation unit 3005 performs perspective transformation so that the foreground image group looks from the virtual viewpoint based on the virtual camera parameters.

レンダリング部３００６は、レンダリングモード管理部３０１４で決定された、仮想視点画像の生成に用いられる方式に基づいて、背景画像と前景画像をレンダリングして全景の仮想視点画像を生成する。なお、仮想視点画像の生成方式の一例としては、モデルベースレンダリング（Ｍｏｄｅｌ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＭＢＲ）やイメージベース（Ｉｍａｇｅ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＩＢＲ）がある。 The rendering unit 3006 renders the background image and the foreground image based on the method used for generating the virtual viewpoint image determined by the rendering mode management unit 3014 to generate a virtual viewpoint image of the entire view. Note that examples of a virtual viewpoint image generation method include model-based rendering (Model-Based Rendering: MBR) and image-based (Image-Based Rendering: IBR).

レンダリングモード管理部３０１４は、複数のレンダリングモードの中から使用するレンダリングモードを決定する。この決定は、データ受信部３００１が取得した情報に基づいて行われる。本実施形態では、複数のレンダリングモードを要求に応じて切り替え可能な構成にすることで、柔軟にシステムを構成することが可能になり、本実施形態の画像処理システム１００をスタジアム以外の被写体にも適用可能となる。なお、レンダリングモード管理部３０１４が保持するレンダリングモードは、システムに予め設定された方式であってもよい。 The rendering mode management unit 3014 determines a rendering mode to be used from among a plurality of rendering modes. This determination is made based on information acquired by the data receiving unit 3001. In the present embodiment, it is possible to flexibly configure the system by switching a plurality of rendering modes as required, and the image processing system 100 of the present embodiment can be applied to subjects other than the stadium. Applicable. Note that the rendering mode held by the rendering mode management unit 3014 may be a system preset in the system.

仮想視点音声生成部３００７は、仮想カメラパラメータに基づいて、仮想視点において聞こえる音声（音声群）を生成する。
合成部３００８は、レンダリング部３００６で生成された画像群と仮想視点音声生成部３００７で生成された音声とを合成して仮想視点コンテンツを生成する。なお、バックエンドサーバ２７０は、レンダリング部３００６で生成された、音声を含まない仮想視点画像を出力してもよい。
画像出力部３００９は、合成部３００８にて生成された仮想視点コンテンツを、ネットワークを介してコントローラ３００とエンドユーザ端末１９０へ出力する。ただし、外部への伝送には、Ｅｔｈｅｒｎｅｔ（登録商標）等のネットワークに限定されるものではなく、ＳＤＩ、ＤｉｓｐｌａｙＰｏｒｔ、ＨＤＭＩ（登録商標）等の信号伝送路が用いられてもよい。 The virtual viewpoint sound generation unit 3007 generates sound (sound group) that can be heard at the virtual viewpoint based on the virtual camera parameters.
The synthesizing unit 3008 synthesizes the image group generated by the rendering unit 3006 and the audio generated by the virtual viewpoint audio generating unit 3007 to generate virtual viewpoint content. Note that the back-end server 270 may output the virtual viewpoint image that is generated by the rendering unit 3006 and does not include sound.
The image output unit 3009 outputs the virtual viewpoint content generated by the synthesis unit 3008 to the controller 300 and the end user terminal 190 via the network. However, the transmission to the outside is not limited to a network such as Ethernet (registered trademark), and a signal transmission path such as SDI, Display Port, and HDMI (registered trademark) may be used.

前景オブジェクト決定部３０１０は、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの空間上の位置を示す前景オブジェクトの位置情報から、表示される前景オブジェクト群を決定して、前景オブジェクトリストを出力する。つまり、前景オブジェクト決定部３０１０は、仮想視点の画像情報を物理的なカメラ１１２にマッピングする処理を実施する。この仮想視点は、レンダリングモード管理部３０１４で決定されるレンダリングモードに応じてマッピング結果が異なる。そのため、前景オブジェクト決定部３０１０は、複数の前景オブジェクトを決定する制御部を備えており、その制御部がレンダリングモードと連動して制御を行うようになされている。 The foreground object determination unit 3010 determines the foreground object group to be displayed from the virtual camera parameters and the position information of the foreground object indicating the position of the foreground object included in the foreground 3D model, and outputs the foreground object list To do. That is, the foreground object determination unit 3010 performs a process of mapping the virtual viewpoint image information to the physical camera 112. This virtual viewpoint has different mapping results depending on the rendering mode determined by the rendering mode management unit 3014. Therefore, the foreground object determination unit 3010 includes a control unit that determines a plurality of foreground objects, and the control unit performs control in conjunction with the rendering mode.

要求リスト生成部３０１１は、指定時間の前景オブジェクトリストに対応する前景画像群と前景三次元モデル群、及び背景画像と音声データを、データベース２５０に要求するための、要求リストを生成する。前景オブジェクトについては仮想視点を考慮して選択されたデータがデータベース２５０に要求されるが、背景画像と音声データについてはそのフレームに関する全てのデータが要求される。バックエンドサーバ２７０の起動後、背景メッシュモデルが取得されるまで背景メッシュモデルの要求リストが生成される。また、要求リストには、要求する前景画像群及び背景画像に関連付けられたカメラ露出情報も含まれる。 The request list generation unit 3011 generates a request list for requesting the database 250 for the foreground image group, the foreground 3D model group, the background image, and the audio data corresponding to the foreground object list at the specified time. For the foreground object, data selected in consideration of the virtual viewpoint is requested from the database 250, but for the background image and the audio data, all data relating to the frame is requested. After the back-end server 270 is started, a background mesh model request list is generated until the background mesh model is acquired. The request list also includes camera exposure information associated with the requested foreground image group and background image.

要求データ出力部３０１２は、入力された要求リストを基に、データベース２５０に対してデータ要求のコマンドを出力する。
背景メッシュモデル管理部３０１３は、データベース２５０から受信した背景メッシュモデルを記憶する。
前景画像カメラ露出情報取得部３０１５は、データベース２５０から取得する前景画像群に関連付けられたカメラ露出情報を全て取得する。
背景画像カメラ露出情報取得部３０１６は、データベース２５０から取得した背景画像に関連付けられたカメラ露出情報を取得する。 The request data output unit 3012 outputs a data request command to the database 250 based on the input request list.
The background mesh model management unit 3013 stores the background mesh model received from the database 250.
The foreground image camera exposure information acquisition unit 3015 acquires all camera exposure information associated with the foreground image group acquired from the database 250.
The background image camera exposure information acquisition unit 3016 acquires camera exposure information associated with the background image acquired from the database 250.

カメラ露出情報比較部３０１７は、前景画像カメラ露出情報取得部３０１５が取得した全てのカメラ露出情報と、背景画像カメラ露出情報取得部３０１６が取得したカメラ露出情報とを比較する。本実施形態の場合、カメラ露出情報比較部３０１７は、背景画像のカメラ露出情報が、前景画像群のカメラ露出情報と一致しているかの観点で比較を行う。カメラ露出情報比較部３０１７は、先ず、取得した前景画像群のカメラ露出情報について、各前景画像のカメラ露出情報のＥＶ値が同一値であるかを確認する。次に、カメラ露出情報比較部３０１７は、前景画像群のカメラ露出情報のうち異なる数値が複数存在する場合には、露出情報ごとの数が最も多いカメラ露出情報のＥＶ値を、前景画像のカメラ露出情報とする。そして、カメラ露出情報比較部３０１７は、前記前景画像のカメラ露出情報のＥＶ値と、背景画像カメラ露出情報取得部３０１６が取得した背景画像のカメラ露出情報のＥＶ値との差分を算出し、差分値を背景画像取得判定部３０１８に出力する。なお、本実施形態では、前景画像と背景画像のＥＶ値の差分を比較結果としたが、一致または不一致を示すフラグであってもよい。 The camera exposure information comparison unit 3017 compares all the camera exposure information acquired by the foreground image camera exposure information acquisition unit 3015 with the camera exposure information acquired by the background image camera exposure information acquisition unit 3016. In the case of the present embodiment, the camera exposure information comparison unit 3017 performs comparison from the viewpoint of whether the camera exposure information of the background image matches the camera exposure information of the foreground image group. First, the camera exposure information comparison unit 3017 confirms whether the EV value of the camera exposure information of each foreground image is the same for the acquired camera exposure information of the foreground image group. Next, when there are a plurality of different numerical values among the camera exposure information of the foreground image group, the camera exposure information comparison unit 3017 determines the EV value of the camera exposure information having the largest number for each exposure information as the camera of the foreground image. Let it be exposure information. Then, the camera exposure information comparison unit 3017 calculates the difference between the EV value of the camera exposure information of the foreground image and the EV value of the camera exposure information of the background image acquired by the background image camera exposure information acquisition unit 3016. The value is output to the background image acquisition determination unit 3018. In the present embodiment, the difference between the EV values of the foreground image and the background image is used as the comparison result, but a flag indicating a match or mismatch may be used.

背景画像取得判定部３０１８は、カメラ露出情報比較部３０１７による比較結果に応じて背景画像の再取得の指示を生成して出力する。例えば比較結果が前景画像と背景画像のカメラ露出情報の一致を示す場合、背景画像取得判定部３０１８は、背景テクスチャ貼り付け部３００２に対し、生成したテクスチャ付き背景メッシュモデルをレンダリング部３００６に出力するよう指示する。一方、例えば、比較結果が前景画像と背景画像のカメラ露出情報の不一致を示す場合、背景画像取得判定部３０１８は、背景画像要求部３０１９に対して、背景画像の再取得を指示する。その際、背景画像取得判定部３０１８は、カメラ露出情報比較部３０１７より前景画像のカメラ露出情報も取得し、背景画像要求部３０１９に出力する。 The background image acquisition determination unit 3018 generates and outputs a background image reacquisition instruction according to the comparison result by the camera exposure information comparison unit 3017. For example, when the comparison result indicates that the camera exposure information of the foreground image and the background image match, the background image acquisition determination unit 3018 outputs the generated textured background mesh model to the background texture pasting unit 3002 to the rendering unit 3006. Instruct. On the other hand, for example, when the comparison result indicates a mismatch between the camera exposure information of the foreground image and the background image, the background image acquisition determination unit 3018 instructs the background image request unit 3019 to reacquire the background image. At that time, the background image acquisition determination unit 3018 also acquires the camera exposure information of the foreground image from the camera exposure information comparison unit 3017 and outputs it to the background image request unit 3019.

背景画像要求部３０１９は、背景画像取得判定部３０１８からの指示に応じて、前景画像のカメラ露出情報と一致したカメラ露出情報を持つ背景画像を要求するコマンドを、データベース２５０に出力する。本実施形態の場合、データベース２５０に要求する背景画像は、仮想視点画像の生成に使用可能な背景画像の中で、前景画像のカメラ露出情報と一致したカメラ露出情報を持つ背景画像である。具体的には、データベース２５０に要求する背景画像は、データ受信部３００１が直前に取得した背景画像における撮影時刻より後の時刻であって、且つ最も近い撮影時刻の背景画像とする。この結果、前景画像のカメラ露出情報と同一のカメラ露出情報を持つ背景画像データが、背景テクスチャ貼り付け０３００２に入力されることになる。 In response to an instruction from the background image acquisition determination unit 3018, the background image request unit 3019 outputs a command for requesting a background image having camera exposure information that matches the camera exposure information of the foreground image to the database 250. In the present embodiment, the background image requested from the database 250 is a background image having camera exposure information that matches the camera exposure information of the foreground image among the background images that can be used to generate the virtual viewpoint image. Specifically, the background image requested from the database 250 is a background image at the closest shooting time that is a time after the shooting time in the background image acquired immediately before by the data receiving unit 3001. As a result, background image data having the same camera exposure information as the camera exposure information of the foreground image is input to the background texture pasting 03002.

なお、本実施形態では、バックエンドサーバ２７０が仮想視点画像の生成方式の決定と仮想視点画像の生成の両方を行う場合を中心に説明するが、これには限らない。すなわち、生成方式を決定した装置はその決定結果に応じたデータを出力すればよい。例えば、フロントエンドサーバ２３０が、複数のカメラ１１２に関する情報や仮想視点画像の生成に係る視点を指定する装置から出力される情報等に基づいて、仮想視点画像の生成に用いられる生成方式を決定してもよい。そして、フロントエンドサーバ２３０は、カメラ１１２による撮影に基づく画像データと決定された生成方式を示す情報とを、データベース２５０等の記憶装置とバックエンドサーバ２７０等の画像生成装置の少なくとも何れかに出力してもよい。この場合、例えばフロントエンドサーバ２３０が出力した生成方式を示す情報に基づいて、バックエンドサーバ２７０が仮想視点画像を生成する。このようにフロントエンドサーバ２３０が生成方式を決定することで、決定された方式とは別の方式での画像生成のためのデータを、データベース２５０やバックエンドサーバ２７０が処理することによる処理負荷を低減できる。一方、本実施形態のように、バックエンドサーバ２７０が生成方式を決定する場合、データベース２５０は、複数の生成方式に対応可能なデータを保持するため、複数の生成方式それぞれに対応する複数の仮想視点画像の生成が可能となる。 In this embodiment, the case where the back-end server 270 performs both the determination of the generation method of the virtual viewpoint image and the generation of the virtual viewpoint image will be mainly described, but the present invention is not limited to this. That is, the device that has determined the generation method may output data corresponding to the determination result. For example, the front-end server 230 determines a generation method used for generating a virtual viewpoint image based on information on a plurality of cameras 112, information output from a device that specifies a viewpoint for generating a virtual viewpoint image, and the like. May be. Then, the front-end server 230 outputs the image data based on the photographing by the camera 112 and the information indicating the determined generation method to at least one of a storage device such as the database 250 and an image generation device such as the back-end server 270. May be. In this case, for example, the back-end server 270 generates a virtual viewpoint image based on information indicating the generation method output by the front-end server 230. As described above, the front-end server 230 determines the generation method, thereby reducing the processing load caused by the database 250 and the back-end server 270 processing data for image generation using a method different from the determined method. Can be reduced. On the other hand, when the back-end server 270 determines a generation method as in the present embodiment, the database 250 holds data that can correspond to a plurality of generation methods, and thus a plurality of virtual models corresponding to each of the plurality of generation methods. A viewpoint image can be generated.

次に、本実施形態における仮想視点画像生成方法について、図５を用いて説明する。
図５は、オペレータ（ユーザ）により仮想カメラ操作ＵＩ３３０の操作が行われてから、仮想視点画像が生成されてエンドユーザ端末１９０に表示されるまでの、仮想カメラ操作ＵＩ３３０、バックエンドサーバ２７０及びデータベース２５０の処理フローである。 Next, a virtual viewpoint image generation method according to this embodiment will be described with reference to FIG.
FIG. 5 illustrates the virtual camera operation UI 330, the back-end server 270, and the database from when the operator (user) operates the virtual camera operation UI 330 to when the virtual viewpoint image is generated and displayed on the end user terminal 190. 250 processing flow.

先ず、図５のＳ３３００において、仮想カメラ操作ＵＩ３３０は、オペレータ（ユーザ）から仮想カメラを操作するための入力を取得する。仮想カメラ操作ＵＩ３３０への入力装置としては、ジョイスティック、ジョグダイヤル、タッチパネル、キーボード、及びマウス等が用いられるとする。ここでは、オペレータによる仮想カメラの操作として、例えば仮想カメラの位置や姿勢、ズーム倍率等の操作指示が入力されたとする。オペレータから仮想カメラの操作指示が入力されると、仮想カメラ操作ＵＩ３３０は、Ｓ３３０１において、その操作指示に応じた仮想カメラパラメータを導出（算出）する。この場合の仮想カメラパラメータには、仮想カメラの位置と姿勢等を示す外部パラメータ、及び仮想カメラのズーム倍率等を示す内部パラメータが含まれる。そして、仮想カメラ操作ＵＩ３３０は、Ｓ３３０２において、導出した仮想カメラパラメータを、バックエンドサーバ２７０に送信する。 First, in S3300 of FIG. 5, the virtual camera operation UI 330 obtains an input for operating the virtual camera from an operator (user). As an input device to the virtual camera operation UI 330, a joystick, a jog dial, a touch panel, a keyboard, a mouse, and the like are used. Here, as an operation of the virtual camera by the operator, for example, it is assumed that an operation instruction such as the position and orientation of the virtual camera and the zoom magnification is input. When a virtual camera operation instruction is input from the operator, the virtual camera operation UI 330 derives (calculates) a virtual camera parameter corresponding to the operation instruction in S3301. The virtual camera parameters in this case include external parameters indicating the position and orientation of the virtual camera and internal parameters indicating the zoom magnification of the virtual camera. In step S3302, the virtual camera operation UI 330 transmits the derived virtual camera parameter to the back-end server 270.

バックエンドサーバ２７０は、仮想カメラパラメータを受信すると、Ｓ３３０３において、データベース２５０に対して前景三次元モデル群を要求する。データベース２５０は、Ｓ３３０４において、バックエンドサーバ２７０からの要求に応じて前景オブジェクトの位置情報を含む前景三次元モデル群をバックエンドサーバ２７０に送信する。これにより、バックエンドサーバ２７０は、Ｓ３３０５において、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの位置情報に基づいて、仮想カメラの視野に入る前景オブジェクト群を幾何学的に導出（算出）する。そして、バックエンドサーバ２７０は、Ｓ３３０６において、前景オブジェクト群を要求するリクエストを、データベース２５０に送信する。 When receiving the virtual camera parameter, the back-end server 270 requests the foreground three-dimensional model group from the database 250 in S3303. In step S <b> 3304, the database 250 transmits a foreground three-dimensional model group including foreground object position information to the back-end server 270 in response to a request from the back-end server 270. Thereby, in S3305, the back-end server 270 geometrically derives (calculates) a foreground object group that falls within the field of view of the virtual camera based on the virtual camera parameters and the position information of the foreground objects included in the foreground three-dimensional model. To do. In step S <b> 3306, the back-end server 270 transmits a request for requesting the foreground object group to the database 250.

データベース２５０は、バックエンドサーバ２７０からリクエストを受信すると、Ｓ３３０７において、そのリクエストに応じたデータを読み出してバックエンドサーバ２７０に送信する。バックエンドサーバ２７０は、データベース２５０からデータを受信すると、Ｓ３３０８において、前景オブジェクト群の前景画像に関連付けられた全てのカメラ露出情報から前景画像のカメラ露出情報のＥＶ値を算出する。さらに、バックエンドサーバ２７０は、その算出したＥＶ値と、受信した背景画像のカメラ露出情報ＥＶ値とが一致しているか否かの比較を行う。なお、バックエンドサーバ２７０は、仮想視点画像の生成に用いる複数の前景画像の平均ＥＶ値を前景画像のカメラ露出情報ＥＶ値とし、仮想視点画像の生成に用いる複数の背景画像の平均ＥＶ値を背景画像のカメラ露出情報ＥＶ値としてもよい。なお、バックエンドサーバ２７０は、前景画像のカメラ露出情報のＥＶ値と、背景画像のカメラ露出情報ＥＶ値との差が所定の閾値より小さいか否かを判定するようにしてもよい。そして、バックエンドサーバ２７０は、前景画像と背景画像のカメラ露出情報（ＥＶ値）が不一致の場合、Ｓ３３０９において、前景画像と同じカメラ露出情報が関連付けられた背景画像の送信を、データベース２５０にリクエストする。なお、バックエンドサーバ２７０は、仮想視点画像の生成に用いる複数の前景画像を取得し、取得した前景画像に基づいて定めたＥＶ値を有する背景画像をデータベース２５０にリクエストするようにしてもよい。 Upon receiving a request from the backend server 270, the database 250 reads out data corresponding to the request and transmits it to the backend server 270 in S3307. When the back-end server 270 receives data from the database 250, in S3308, the back-end server 270 calculates the EV value of the camera exposure information of the foreground image from all the camera exposure information associated with the foreground image of the foreground object group. Further, the back-end server 270 compares the calculated EV value with the camera exposure information EV value of the received background image. The back-end server 270 uses the average EV value of the plurality of foreground images used for generating the virtual viewpoint image as the camera exposure information EV value of the foreground image, and uses the average EV value of the plurality of background images used for generating the virtual viewpoint image. The camera exposure information EV value of the background image may be used. Note that the back-end server 270 may determine whether or not the difference between the camera exposure information EV value of the foreground image and the camera exposure information EV value of the background image is smaller than a predetermined threshold value. If the camera exposure information (EV value) of the foreground image and the background image does not match, the back-end server 270 requests the database 250 to transmit the background image associated with the same camera exposure information as the foreground image in S3309. To do. Note that the back-end server 270 may acquire a plurality of foreground images used for generating a virtual viewpoint image, and request a background image having an EV value determined based on the acquired foreground image from the database 250.

データベース２５０は、バックエンドサーバ２７０からのリクエストを受信すると、Ｓ３３１０において、そのリクエストに応じて、前景画像と同じカメラ露出情報が関連付けられた背景画像のデータを読み出して、バックエンドサーバ２７０に送信する。バックエンドサーバ２７０は、Ｓ３３１１において、データベース２５０から受信した前景画像、前景三次元モデル及び背景画像を基に仮想視点の前景画像及び背景画像を生成し、それらを合成して仮想視点の全景画像を生成する。また、バックエンドサーバ２７０は、Ｓ３３１１において、音声データ群に基づいて仮想カメラの位置に応じた音声データの合成を行い、仮想視点の全景画像と統合して仮想視点の画像及び音声を生成する。そして、バックエンドサーバ２７０は、その生成した仮想視点の画像及び音声を、Ｓ３３１２において、仮想カメラ操作ＵＩ３３０に送信する。仮想カメラ操作ＵＩ３３０は、受信した画像を例えばエンドユーザ端末１９０に表示等させることで、仮想カメラの撮影画像の表示等を実現する。 Upon receiving the request from the back-end server 270, the database 250 reads out the background image data associated with the same camera exposure information as the foreground image and transmits it to the back-end server 270 in response to the request in S3310. . In S3311, the back-end server 270 generates a virtual viewpoint foreground image and a background image based on the foreground image, the foreground three-dimensional model, and the background image received from the database 250, and synthesizes them to generate a virtual viewpoint whole scene image. Generate. Further, in S3311, the back-end server 270 synthesizes audio data according to the position of the virtual camera based on the audio data group, and integrates it with the full-view image of the virtual viewpoint to generate a virtual viewpoint image and audio. Then, the back-end server 270 transmits the generated virtual viewpoint image and sound to the virtual camera operation UI 330 in S3312. The virtual camera operation UI 330 realizes display of a captured image of the virtual camera by displaying the received image on the end user terminal 190, for example.

以上、前景画像と背景画像とを異なるフレームレートで出力する第１の実施形態の画像処理システム１００において、撮影時の撮影パラメータ変更等によって前景画像と背景画像とでカメラ露出情報が異なった際の仮想視点画像の生成方法について説明した。
本実施形態の画像処理システム１００においては、仮想視点画像生成に使用される前景画像と背景画像のカメラ露出情報を一致させることができる。これにより、本実施形態の画像処理システム１００によれば、複数のカメラ１１２にて撮像された画像データを基に仮想視点画像を生成する場合に、カメラ露出情報の一致する前景画像と背景画像を用いて仮想視点画像を生成できる。すなわち、本実施形態によれば、輝度が不連続になるような不自然さのない自然な仮想視点画像の生成が可能となる。なお、バックエンドサーバ２７０は、前景画像と背景画像とでカメラ露出情報が異なる場合、背景画像を優先し、背景画像が撮影された際の露出で撮影された前景画像を仮想視点画像の生成に用いる構成としてもよい。 As described above, in the image processing system 100 according to the first embodiment that outputs the foreground image and the background image at different frame rates, when the camera exposure information differs between the foreground image and the background image due to a change in shooting parameters at the time of shooting or the like. A method for generating a virtual viewpoint image has been described.
In the image processing system 100 of the present embodiment, the camera exposure information of the foreground image and the background image used for generating the virtual viewpoint image can be matched. Thus, according to the image processing system 100 of the present embodiment, when generating a virtual viewpoint image based on image data captured by a plurality of cameras 112, the foreground image and the background image with matching camera exposure information are displayed. It is possible to generate a virtual viewpoint image. That is, according to the present embodiment, it is possible to generate a natural virtual viewpoint image without unnaturalness such that the luminance becomes discontinuous. Note that if the camera exposure information differs between the foreground image and the background image, the back-end server 270 gives priority to the background image, and generates the virtual viewpoint image from the foreground image captured with the exposure when the background image is captured. It is good also as a structure to use.

＜第２の実施形態＞
以下、第２の実施形態として、撮影時の撮影パラメータ変更等によって仮想視点画像の生成に用いる前景画像と背景画像の露出情報が異なった場合に、背景画像の画質を前景画像に適するように調整して、仮想視点画像を生成する方法について説明する。 <Second Embodiment>
Hereinafter, as the second embodiment, when the exposure information of the foreground image and the background image used for generating the virtual viewpoint image is different due to the shooting parameter change at the time of shooting or the like, the image quality of the background image is adjusted to be suitable for the foreground image. A method for generating a virtual viewpoint image will be described.

第２の実施形態における画像処理システム１００の概略構成は前述した第１の実施形態と同様である。センサシステム１１０、スイッチングハブ１８０、エンドユーザ端末１９０、フロントエンドサーバ２３０、データベース２５０、タイムサーバ２９０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０は前述同様であるためその説明は省略する。 The schematic configuration of the image processing system 100 in the second embodiment is the same as that of the first embodiment described above. Since the sensor system 110, the switching hub 180, the end user terminal 190, the front end server 230, the database 250, the time server 290, the control station 310, and the virtual camera operation UI 330 are the same as described above, the description thereof is omitted.

図６は、第２の実施形態におけるバックエンドサーバ２７０の機能ブロックを示した図である。第２の実施形態のバックエンドサーバ２７０は、図４に示したバックエンドサーバ２７０のカメラ露出情報比較部３０１７、背景画像取得判定部３０１８、及び背景画像要求部３０１９を備えず、一方、背景画像画質補正部３０２０を備えている。なお、図６のデータ受信部３００１〜背景画像カメラ露出情報取得部３０１６は、概ね図４に示したデータ受信部３００１〜背景画像カメラ露出情報取得部３０１６と同様の構成であるためそれらの説明は省略する。 FIG. 6 is a diagram illustrating functional blocks of the back-end server 270 in the second embodiment. The back-end server 270 according to the second embodiment does not include the camera exposure information comparison unit 3017, the background image acquisition determination unit 3018, and the background image request unit 3019 of the back-end server 270 illustrated in FIG. An image quality correction unit 3020 is provided. Note that the data receiving unit 3001 to background image camera exposure information acquisition unit 3016 in FIG. 6 have substantially the same configuration as the data reception unit 3001 to background image camera exposure information acquisition unit 3016 shown in FIG. Omitted.

背景画像画質補正部３０２０は、前景画像及び背景画像のカメラ露出情報に基づき、背景画像の画質を補正する。このため、背景画像画質補正部３０２０は、先ず、前景画像カメラ露出情報取得部３０１５が取得した全てのカメラ露出情報と、背景画像カメラ露出情報取得部３０１６が取得したカメラ露出情報とを取得する。なお、第２の実施形態においても第１の実施形態と同様に、前景画像群のカメラ露出情報のうち異なる数値が存在する場合、それら複数ある前景画像のカメラ露出情報のうち最も多数のカメラ露出情報のＥＶ値が、前景画像のカメラ露出情報となされる。そして、背景画像画質補正部３０２０は、データベース２５０からデータ受信部３００１を介して供給される背景画像の画質を、それら前景画像及び背景画像のカメラ露出情報を基に補正する。具体的には、前景画像のカメラ露出情報をＥＶｆ、背景画像のカメラ露出情報をＥＶｂ、背景画像データをＢＰとして表した場合、背景画像画質補正部３０２０は、下記式（２）演算により、補正後の背景画像データＢＰｃを算出する。
ＢＰｃ＝ＢＰ＊２^(EVf-EVb) 式（２） The background image image quality correction unit 3020 corrects the image quality of the background image based on the camera exposure information of the foreground image and the background image. Therefore, the background image quality correction unit 3020 first acquires all the camera exposure information acquired by the foreground image camera exposure information acquisition unit 3015 and the camera exposure information acquired by the background image camera exposure information acquisition unit 3016. Also in the second embodiment, as in the first embodiment, when there are different numerical values among the camera exposure information of the foreground image group, the largest number of camera exposures among the camera exposure information of the plurality of foreground images. The EV value of the information is used as camera exposure information for the foreground image. The background image image quality correction unit 3020 corrects the image quality of the background image supplied from the database 250 via the data reception unit 3001 based on the camera exposure information of the foreground image and the background image. Specifically, when the camera exposure information of the foreground image is expressed as EVf, the camera exposure information of the background image is expressed as EVb, and the background image data is expressed as BP, the background image quality correction unit 3020 corrects by performing the following equation (2). Later background image data BPc is calculated.
BPc = BP * 2 ^(EVf-EVb) formula (2)

第２の実施形態の場合、背景画像画質補正部３０２０により画質補正がなされた後の背景画像（ＢＰｃ）は、背景テクスチャ貼り付け部３００２に出力される。第２の実施形態の場合、背景テクスチャ貼り付け部３００２は、画質補正後の背景画像を三次元空間形状に貼り付けて、テクスチャ付き背景メッシュモデルを生成する。 In the case of the second embodiment, the background image (BPc) after the image quality correction by the background image image quality correction unit 3020 is output to the background texture pasting unit 3002. In the case of the second embodiment, the background texture pasting unit 3002 creates a textured background mesh model by pasting the background image after image quality correction into a three-dimensional space shape.

第２の実施形態のバックエンドサーバ２７０は、前景画像のカメラ露出情報と背景画像のカメラ露出情報が異なる場合、前景画像のカメラ露出情報に基づいて背景画像の画質補正を行うことで、不自然さのない仮想視点画像を得ることができる。また、バックエンドサーバ２７０は、前景画像のカメラ露出情報に基づいて背景画像の画像処理を行うことで前景画像の明るさや色味と背景画像の明るさや色味とを対応させることができる。したがって、このような前景画像および背景画像を用いて仮想視点画像を生成するので不自然さのない仮想視点画像を得ることができる。 When the camera exposure information of the foreground image and the camera exposure information of the background image are different from each other, the back-end server 270 of the second embodiment performs unnaturalness by correcting the image quality of the background image based on the camera exposure information of the foreground image. A virtual viewpoint image can be obtained. Further, the back-end server 270 can associate the brightness and color of the foreground image with the brightness and color of the background image by performing image processing of the background image based on the camera exposure information of the foreground image. Therefore, since a virtual viewpoint image is generated using such a foreground image and a background image, a virtual viewpoint image without unnaturalness can be obtained.

なお、第２の実施形態の場合、背景画像画質補正部３０２０による背景画像の補正処理は、バックエンドサーバ２７０が前景画像及び背景画像をデータベース２５０より取得する度に実行されるが、この例に限定されるものではない。例えば、背景画像画質補正部３０２０は、前景画像のカメラ露出情報と背景画像のカメラ露出情報とを比較し、これら二つのカメラ露出情報が不一致である場合にのみ背景画像データの補正を行うようにしてもよい。 In the case of the second embodiment, the background image correction processing by the background image image quality correction unit 3020 is executed every time the back-end server 270 acquires the foreground image and the background image from the database 250. It is not limited. For example, the background image quality correction unit 3020 compares the camera exposure information of the foreground image and the camera exposure information of the background image, and corrects the background image data only when these two camera exposure information does not match. May be.

また、本実施形態のバックエンドサーバ２７０は、前景画像群のカメラ露出情報のうち異なる数値が存在する場合、最も多数のカメラ露出情報のＥＶ値を前景画像のカメラ露出情報とし、それ以外のＥＶ値を持つ前景画像を背景画像と同様に補正してもよい。これにより、カメラ間でも露出が異なる場合、前景画像と背景画像の輝度の不一致を簡単に調整することができる。なお、バックエンドサーバ２７０は、前景画像と背景画像とでカメラ露出情報が異なる場合、前景画像を背景画像に対応させるために画像処理し、前景画像の色調や明るさを補正する構成としてもよい。 Further, when there are different values among the camera exposure information of the foreground image group, the back-end server 270 of this embodiment uses the EV value of the most camera exposure information as the camera exposure information of the foreground image, and other EVs. A foreground image having a value may be corrected in the same manner as the background image. This makes it possible to easily adjust the brightness mismatch between the foreground image and the background image when the exposure differs between the cameras. Note that the back-end server 270 may be configured to correct the tone and brightness of the foreground image by performing image processing so that the foreground image corresponds to the background image when the camera exposure information differs between the foreground image and the background image. .

＜第３の実施形態＞
前述した実施形態では、前景画像のカメラ露出情報と背景画像のカメラ露出情報が一致しなくなった時点で、適切な背景画像を再取得（第１の実施形態）、または背景画像の画質補正（第２の実施形態）を行い、前景画像と背景画像の輝度の不一致を解消している。しかしながら、例えば前景画像のカメラ露出情報が或る時刻を挟んだ前後で急激に変更された場合、生成された仮想視点画像は、時間方向で輝度が不連続になった不自然な映像になってしまう虞がある。 <Third Embodiment>
In the above-described embodiment, when the camera exposure information of the foreground image and the camera exposure information of the background image no longer match, an appropriate background image is reacquired (first embodiment), or the image quality of the background image is corrected (first image). Embodiment 2) is performed, and the brightness mismatch between the foreground image and the background image is eliminated. However, for example, when the camera exposure information of the foreground image is changed abruptly before and after a certain time, the generated virtual viewpoint image becomes an unnatural image with luminance discontinuity in the time direction. There is a risk of it.

そこで、第３の実施形態では、前景画像のカメラ露出情報が或る時刻を挟んで急激に変更された場合でも、時間方向で輝度が不連続になることなのない自然な仮想視点画像を生成可能にする処理について説明する。
第３の実施形態における画像処理システム１００の概略構成は前述した第１の実施形態と同様である。センサシステム１１０、スイッチングハブ１８０、エンドユーザ端末１９０、フロントエンドサーバ２３０、データベース２５０、タイムサーバ２９０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０は前述同様であるためその説明は省略する。 Therefore, in the third embodiment, even when the camera exposure information of the foreground image is abruptly changed at a certain time, it is possible to generate a natural virtual viewpoint image in which the luminance does not become discontinuous in the time direction. The process of making will be described.
The schematic configuration of the image processing system 100 in the third embodiment is the same as that of the first embodiment described above. Since the sensor system 110, the switching hub 180, the end user terminal 190, the front end server 230, the database 250, the time server 290, the control station 310, and the virtual camera operation UI 330 are the same as described above, the description thereof is omitted.

図７は、第３の実施形態におけるバックエンドサーバ２７０の機能ブロックを示した図である。第３の実施形態のバックエンドサーバ２７０は、図４のバックエンドサーバ２７０のカメラ露出情報比較部３０１７、背景画像取得判定部３０１８、背景画像要求部３０１９を備えず、一方、背景画像画質補正部３０２０と時刻情報要求部３０２１を有する。なお、図７のデータ受信部３００１〜背景画像カメラ露出情報取得部３０１６は、概ね図４に示したデータ受信部３００１〜背景画像カメラ露出情報取得部３０１６と同様の構成であるためそれらの説明は省略する。 FIG. 7 is a diagram illustrating functional blocks of the back-end server 270 according to the third embodiment. The back-end server 270 of the third embodiment does not include the camera exposure information comparison unit 3017, the background image acquisition determination unit 3018, and the background image request unit 3019 of the back-end server 270 of FIG. 3020 and a time information request unit 3021. Note that the data receiving unit 3001 to the background image camera exposure information acquisition unit 3016 in FIG. 7 have substantially the same configuration as the data reception unit 3001 to the background image camera exposure information acquisition unit 3016 shown in FIG. Omitted.

時刻情報要求部３０２１は、バックエンドサーバ２７０が取得した前景画像及び背景画像の時刻情報、さらに、前景画像のカメラ露出情報と同じ値が関連付けられた背景画像の時刻情報をデータベース２５０に要求する。 The time information request unit 3021 requests the time information of the foreground image and the background image acquired by the back-end server 270 and the time information of the background image associated with the same value as the camera exposure information of the foreground image to the database 250.

背景画像画質補正部３０２０は、前景画像及び背景画像のカメラ露出情報と、データベース２５０から取得した時刻情報とを用いて、仮想視点画像を生成したいフレームの時刻に適した背景画像のカメラ露出情報を算出し、背景画像の画質補正を行う。 The background image quality correction unit 3020 uses the camera exposure information of the foreground image and the background image and the time information acquired from the database 250 to obtain the camera exposure information of the background image suitable for the time of the frame for which the virtual viewpoint image is to be generated. Calculate and perform image quality correction of the background image.

以下、第３の実施形態における背景画像の画質補正処理について、図８を用いて詳細に説明する。
図８において、仮想視点画像を生成したいフレームの時刻をｔｘ、背景画像がデータベース２５０に存在する時刻をｔ１、ｔ２（ｔ１＜ｔ２）とする。また、露出変更は時刻ｔｘの直前で行われており、露出変更以前のカメラ露出情報はＥＶα、露出変更後のカメラ露出情報はＥＶβとする。 Hereinafter, the image quality correction processing of the background image in the third embodiment will be described in detail with reference to FIG.
In FIG. 8, the time of the frame for which the virtual viewpoint image is to be generated is tx, and the time when the background image exists in the database 250 is t1, t2 (t1 <t2). The exposure change is performed immediately before the time tx. The camera exposure information before the exposure change is EVα, and the camera exposure information after the exposure change is EVβ.

先ず、前景画像カメラ露出情報取得部３０１５は前景画像のカメラ露出情報を取得し、背景画像カメラ露出情報取得部３０１６は背景画像のカメラ露出情報を取得する。ここで、時刻ｔｘ時点ではその直前に露出変更が行われているため、前景画像カメラ露出情報取得部３０１５が取得している前景画像のカメラ露出情報はＥＶβとなる。一方、背景画像には対応するデータが存在しないため、背景画像カメラ露出情報取得部３０１６が取得する背景画像のカメラ露出情報は、バックエンドサーバ２７０が直前に取得した背景画像のカメラ露出情報であるＥＶαとなる。 First, the foreground image camera exposure information acquisition unit 3015 acquires camera exposure information of the foreground image, and the background image camera exposure information acquisition unit 3016 acquires camera exposure information of the background image. Here, since the exposure change is performed immediately before time tx, the camera exposure information of the foreground image acquired by the foreground image camera exposure information acquisition unit 3015 is EVβ. On the other hand, since there is no corresponding data in the background image, the camera exposure information of the background image acquired by the background image camera exposure information acquisition unit 3016 is the camera exposure information of the background image acquired immediately before by the back-end server 270. EVα.

次に、時刻情報要求部３０２１は、バックエンドサーバ２７０が直前に取得した背景画像の時刻情報、及び、前景画像のカメラ露出情報ＥＶβと同じ値をもつ背景画像が存在する時刻情報を、データベース２５０に要求する。 Next, the time information request unit 3021 stores the time information of the background image acquired immediately before by the back-end server 270 and the time information of the background image having the same value as the camera exposure information EVβ of the foreground image. To request.

データベース２５０は、時刻情報要求部３０２１の要求に応じて所定の時刻情報をバックエンドサーバ２７０に出力する。なお、バックエンドサーバ２７０が直前に取得した背景画像の時刻情報はｔ１、前景画像のカメラ露出情報ＥＶβと同じ値をもつ背景画像が存在する時刻情報はｔ２である。 The database 250 outputs predetermined time information to the back-end server 270 in response to a request from the time information request unit 3021. Note that the time information of the background image acquired immediately before by the back-end server 270 is t1, and the time information where the background image having the same value as the camera exposure information EVβ of the foreground image exists is t2.

バックエンドサーバ２７０が時刻情報をデータベース２５０より取得すると、背景画像画質補正部３０２０は、前景画像カメラ露出情報取得部３０１５と背景画像カメラ露出情報取得部３０１６とから、それぞれカメラ露出情報を取得する。第３の実施形態においても第１、第２の実施形態と同様に、前景画像群のカメラ露出情報のうち異なる数値が存在する場合には、最も多数のカメラ露出情報のＥＶ値が前景画像のカメラ露出情報となされる。 When the back-end server 270 acquires time information from the database 250, the background image image quality correction unit 3020 acquires camera exposure information from the foreground image camera exposure information acquisition unit 3015 and the background image camera exposure information acquisition unit 3016, respectively. Also in the third embodiment, as in the first and second embodiments, when there are different numerical values among the camera exposure information of the foreground image group, the EV values of the most camera exposure information are the values of the foreground image. Camera exposure information is used.

第３の実施形態の場合、背景画像画質補正部３０２０は、下記式（３）の演算により、仮想視点画像を生成するフレーム時刻ｔｘのカメラ露出情報ＥＶｘを算出する。
ＥＶｘ＝ＥＶβ＊（ｔｘ−ｔ１）／（ｔ２−ｔ１）＋ＥＶα＊（ｔ２−ｔｘ）／（ｔ２−ｔ１）式（３） In the case of the third embodiment, the background image quality correction unit 3020 calculates the camera exposure information EVx at the frame time tx at which the virtual viewpoint image is generated by the calculation of the following formula (3).
EVx = EVβ * (tx−t1) / (t2−t1) + EVα * (t2−tx) / (t2−t1) Equation (3)

さらに、背景画像画質補正部３０２０は、下記式（４）の演算を行い、背景画像データの補正を行う。なお、式（４）のＢＰは補正前の背景画像データ、ＢＰｃは補正後の背景画像データを表している。
ＢＰｃ＝ＢＰ＊２^(EVx-EVα⁾ 式（４） Further, the background image image quality correction unit 3020 performs the calculation of the following equation (4) to correct the background image data. In Equation (4), BP represents background image data before correction, and BPc represents background image data after correction.
BPc = BP * 2 ^(EVx-EVα ⁾ Formula (4)

そして、背景画像画質補正部３０２０は、画質補正した背景画像を背景テクスチャ貼り付け部３００２に出力する。背景テクスチャ貼り付け部３００２は、画質補正後の背景画像を三次元空間形状に貼り付けて、テクスチャ付き背景メッシュモデルを生成する。 Then, the background image quality correction unit 3020 outputs the background image whose image quality has been corrected to the background texture pasting unit 3002. The background texture pasting unit 3002 pastes the background image after the image quality correction into a three-dimensional space shape, and generates a textured background mesh model.

以上説明したように、第３の実施形態のバックエンドサーバ２７０は、前景画像のカメラ露出情報と背景画像のカメラ露出情報が異なる場合、前景画像及び背景画像のカメラ露出情報と時刻情報に基づいて、背景画像の画質補正を行う。これにより、第３の実施形態によれば、時間方向においても輝度が不連続になることのない自然な仮想視点画像を生成することができる。 As described above, the back-end server 270 of the third embodiment is based on the camera exposure information and time information of the foreground image and the background image when the camera exposure information of the foreground image and the camera exposure information of the background image are different. The image quality of the background image is corrected. Thus, according to the third embodiment, a natural virtual viewpoint image can be generated in which the luminance does not become discontinuous in the time direction.

＜第４の実施形態＞
前述の実施形態では、仮想視点画像生成時に前景画像と背景画像のカメラ露出情報が不一致の場合に、適切な背景画像を再取得（第１の実施形態）、または背景画像の画質補正（第２、第３の実施形態）行って前景画像と背景画像の輝度の不一致を解消している。
第４の実施形態では、データベース２５０に撮影データを格納する前に、前景画像と背景画像のカメラ露出情報の不一致を検出して、輝度の不一致を解消した背景画像を生成する処理について説明する。
第４の実施形態における画像処理システム１００の概略構成は前述した第１の実施形態と同様である。センサシステム１１０、スイッチングハブ１８０、エンドユーザ端末１９０、データベース２５０、タイムサーバ２９０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０は前述同様であるためその説明は省略する。 <Fourth Embodiment>
In the above-described embodiment, if the camera exposure information of the foreground image and the background image does not match when the virtual viewpoint image is generated, an appropriate background image is reacquired (first embodiment), or the image quality of the background image is corrected (the second image). In the third embodiment, the brightness mismatch between the foreground image and the background image is eliminated.
In the fourth embodiment, a process of detecting a mismatch between camera exposure information of a foreground image and a background image before storing shooting data in the database 250 and generating a background image in which the mismatch of luminance is eliminated will be described.
The schematic configuration of the image processing system 100 in the fourth embodiment is the same as that of the first embodiment described above. Since the sensor system 110, the switching hub 180, the end user terminal 190, the database 250, the time server 290, the control station 310, and the virtual camera operation UI 330 are the same as described above, the description thereof is omitted.

図９は、第４の実施形態におけるフロントエンドサーバ２３０の機能ブロックを示した図である。第４の実施形態のフロントエンドサーバ２３０は、図３に示したフロントエンドサーバ２３０にカメラ露出情報比較部２２００が加えられた構成となされている。なお、図９のデータ入力制御部２１２０〜ＤＢアクセス制御部２１９０は、概ね図３に示したデータ入力制御部２１２０〜ＤＢアクセス制御部２１９０と同様の構成であるためそれらの説明は省略する。 FIG. 9 is a functional block diagram of the front-end server 230 in the fourth embodiment. The front-end server 230 according to the fourth embodiment has a configuration in which a camera exposure information comparison unit 2200 is added to the front-end server 230 illustrated in FIG. The data input control unit 2120 to DB access control unit 2190 in FIG. 9 has substantially the same configuration as the data input control unit 2120 to DB access control unit 2190 shown in FIG.

フロントエンドサーバ２３０のデータ同期部２１３０は、第１の実施形態と同様に、同一時刻の撮影データを揃うまでデータをバッファする。データ同期部２１３０は、撮影データが揃うと、前景画像及び背景画像を画像処理部２１５０に、三次元モデルデータを三次元モデル結合部２１６０に、音声データを撮影データファイル生成部２１８０にそれぞれ送信する。また、データ同期部２１３０は、カメラ露出情報をカメラ露出情報比較部２２００に出力する。 Similar to the first embodiment, the data synchronization unit 2130 of the front-end server 230 buffers data until image data at the same time is collected. When the photographing data is prepared, the data synchronization unit 2130 transmits the foreground image and the background image to the image processing unit 2150, the three-dimensional model data to the three-dimensional model combination unit 2160, and the audio data to the photographing data file generation unit 2180. . Further, the data synchronization unit 2130 outputs the camera exposure information to the camera exposure information comparison unit 2200.

カメラ露出情報比較部２２００は、データ同期部２１３０から、全てのセンサシステム１１０の前景画像及び背景画像のカメラ露出情報を取得する。先ず、カメラ露出情報比較部２２００は、複数ある前景画像のカメラ露出情報のうち異なる数値が存在する場合には、最も多数のカメラ露出情報のＥＶ値を前景画像のカメラ露出情報とする。次に、カメラ露出情報比較部２２００は、全ての背景画像のカメラ露出情報について、前景画像のカメラ露出情報との比較を行う。第４の実施形態において、カメラ露出情報比較部２２００は、前景画像のカメラ露出情報のＥＶ値と、各センサシステム１１０の背景画像のＥＶ値との差分を算出し、それら差分値をカメラ識別子と一緒に画像処理部２１５０へ出力する。 The camera exposure information comparison unit 2200 acquires camera exposure information of foreground images and background images of all sensor systems 110 from the data synchronization unit 2130. First, the camera exposure information comparison unit 2200 sets the EV value of the largest number of camera exposure information as the camera exposure information of the foreground image when there are different numerical values among the plurality of camera exposure information of the foreground image. Next, the camera exposure information comparison unit 2200 compares the camera exposure information of all the background images with the camera exposure information of the foreground image. In the fourth embodiment, the camera exposure information comparison unit 2200 calculates a difference between the EV value of the camera exposure information of the foreground image and the EV value of the background image of each sensor system 110, and uses the difference value as a camera identifier. Together, it is output to the image processing unit 2150.

第４の実施形態における画像処理部２１５０は、第１の実施形態と同様に、前景画像及び背景画像のＲＡＷ画像データが入力される場合にはその現像処理、及び、カメラのレンズ歪みの補正等の処理を行う。そして、画像処理部２１５０は、カメラ露出情報比較部２２００から入力されたＥＶ値の差分値及びカメラ識別子を基に、前景画像のカメラ露出情報の値とは異なる値をもつ背景画像の画質を補正する。背景画像の画質の補正の際には、第２の実施形態の式（２）、又は第３の実施形態の式（３）、式（４）等を用いた補正処理を行う。そして、画像処理部２１５０は、画像処理後の前景画像を撮影データファイル生成部２１８０に、また、画像処理後の背景画像を画像結合部２１７０にそれぞれ出力する。 As in the first embodiment, the image processing unit 2150 according to the fourth embodiment develops the raw image data of the foreground image and the background image, corrects camera lens distortion, and the like. Perform the process. The image processing unit 2150 corrects the image quality of the background image having a value different from the camera exposure information value of the foreground image based on the EV value difference value and the camera identifier input from the camera exposure information comparison unit 2200. To do. When the image quality of the background image is corrected, correction processing is performed using Expression (2) of the second embodiment, Expression (3), Expression (4), or the like of the third embodiment. Then, the image processing unit 2150 outputs the foreground image after the image processing to the shooting data file generation unit 2180, and outputs the background image after the image processing to the image combining unit 2170.

画像結合部２１７０は、画像処理部２１５０から背景画像を取得し、ＣＡＤデータ記憶部２１３５からスタジアムの三次元形状データ（スタジアム形状データ）を取得して、取得したスタジアム形状データの座標に対する背景画像の位置を特定する。そして、画像結合部２１７０は、背景画像の各々についてスタジアム形状データの座標に対する位置が特定できると、背景画像を結合して一つの背景画像とする。 The image combining unit 2170 acquires the background image from the image processing unit 2150, acquires the stadium three-dimensional shape data (stadium shape data) from the CAD data storage unit 2135, and obtains the background image with respect to the coordinates of the acquired stadium shape data. Identify the location. Then, when the position relative to the coordinates of the stadium shape data can be specified for each of the background images, the image combining unit 2170 combines the background images into one background image.

図１０は、第４の実施形態におけるバックエンドサーバ２７０の機能ブロックを示した図である。第４の実施形態のバックエンドサーバ２７０は、図４の前景画像カメラ露出情報取得部３０１５、背景画像カメラ露出情報取得部３０１６、カメラ露出情報比較部３０１７、背景画像取得判定部３０１８、背景画像要求部３０１９を除いた構成となされている。なお、図１０のデータ受信部３００１〜レンダリングモード管理部３０１４は、概ね図４に示したデータ受信部３００１〜レンダリングモード管理部３０１４と同様の構成であるためそれらの説明は省略する。 FIG. 10 is a diagram illustrating functional blocks of the back-end server 270 according to the fourth embodiment. The back-end server 270 of the fourth embodiment includes a foreground image camera exposure information acquisition unit 3015, a background image camera exposure information acquisition unit 3016, a camera exposure information comparison unit 3017, a background image acquisition determination unit 3018, and a background image request in FIG. The portion 3019 is excluded. Note that the data reception unit 3001 to the rendering mode management unit 3014 in FIG. 10 have substantially the same configuration as the data reception unit 3001 to the rendering mode management unit 3014 illustrated in FIG.

第４の実施形態におけるバックエンドサーバ２７０は、前景画像及び背景画像のカメラ露出情報を取得してカメラ露出情報の比較処理を行う必要がない。したがって、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０を介して指定された視点を基に、データベース２５０から対応する前景画像及び背景画像と音声データを読み出し、レンダリング処理を行って仮想視点画像を生成する。 The back-end server 270 in the fourth embodiment does not need to acquire the camera exposure information of the foreground image and the background image and perform the camera exposure information comparison process. Accordingly, the back-end server 270 reads the corresponding foreground image, background image, and audio data from the database 250 based on the viewpoint designated via the virtual camera operation UI 330, and performs rendering processing to generate a virtual viewpoint image. .

すなわち、第４の実施形態の画像処理システム１００は、前景画像のカメラ露出情報と背景画像のカメラ露出情報が異なる場合、フロントエンドサーバ２３０にて背景画像の画質補正を行うことで、輝度が不連続になることにない自然な仮想視点画像を生成できる。 That is, in the image processing system 100 according to the fourth embodiment, when the camera exposure information of the foreground image is different from the camera exposure information of the background image, the image quality of the background image is corrected by the front-end server 230, so that the luminance is low. Natural virtual viewpoint images that do not become continuous can be generated.

＜第５の実施形態＞
第５の実施形態では、センサシステム１１０にて背景画像の伝送タイミングを変更することで、前景画像と背景画像のカメラ露出情報の不一致解消する処理について説明する。
第５の実施形態における画像処理システム１００の概略構成は前述した第１の実施形態と同様である。センサシステム１１０、スイッチングハブ１８０、エンドユーザ端末１９０、フロントエンドサーバ２３０、データベース２５０、タイムサーバ２９０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０は前述同様であるためその説明は省略する。また、第５の実施形態のバックエンドサーバ２７０は、前述した第４の実施形態と同様の機能を備えるため、その説明も省略する。 <Fifth Embodiment>
In the fifth embodiment, description will be given of processing for eliminating the mismatch between the camera exposure information of the foreground image and the background image by changing the transmission timing of the background image in the sensor system 110.
The schematic configuration of the image processing system 100 in the fifth embodiment is the same as that of the first embodiment described above. Since the sensor system 110, the switching hub 180, the end user terminal 190, the front end server 230, the database 250, the time server 290, the control station 310, and the virtual camera operation UI 330 are the same as described above, the description thereof is omitted. Further, the back-end server 270 of the fifth embodiment has the same function as that of the above-described fourth embodiment, and thus the description thereof is also omitted.

以下、図２を参照して、第５の実施形態において、センサシステム１１０内のカメラアダプタ１２０で実施される、背景画像の伝送タイミング変更処理について説明する。
第５の実施形態のカメラアダプタ１２０のカメラ制御部６１４１は、カメラ１１２と接続され、撮影パラメータ（画素数、色深度、フレームレート、ホワイトバランスの設定等）の変更指示に応じてカメラ制御を行う。同時に、カメラ制御部６１４１は、撮影パラメータを変更したことをデータ送受信部６１１１に通知する。 Hereinafter, with reference to FIG. 2, background image transmission timing change processing performed by the camera adapter 120 in the sensor system 110 in the fifth embodiment will be described.
A camera control unit 6141 of the camera adapter 120 of the fifth embodiment is connected to the camera 112 and performs camera control in response to an instruction to change shooting parameters (such as the number of pixels, color depth, frame rate, and white balance). . At the same time, the camera control unit 6141 notifies the data transmission / reception unit 6111 that the shooting parameters have been changed.

データ送受信部６１１１は、前述の第１の実施形態と同様に、前景背景分離部６１３１がカメラ１１２の撮影画像から分離した前景画像及び背景画像を、別のカメラアダプタ１２０に対して出力する。この時、データ送受信部６１１１は、カメラ１１２の撮影画像から分離した背景画像の一部フレームを間引きし、前景画像より低フレームレートで背景画像を出力する。ただし、データ送受信部６１１１は、カメラ制御部６１４１より撮影パラメータ変更の通知を受信すると、撮影パラメータ変更直後の背景画像の間引き処理をキャンセルし、その背景画像を別のカメラアダプタ１２０に対して出力する。 Similar to the first embodiment, the data transmission / reception unit 6111 outputs the foreground image and the background image separated from the captured image of the camera 112 by the foreground / background separation unit 6131 to another camera adapter 120. At this time, the data transmission / reception unit 6111 thins out some frames of the background image separated from the captured image of the camera 112 and outputs the background image at a lower frame rate than the foreground image. However, when the data transmission / reception unit 6111 receives the notification of the imaging parameter change from the camera control unit 6141, the data transmission / reception unit 6111 cancels the background image thinning process immediately after the imaging parameter change and outputs the background image to another camera adapter 120. .

以上説明したように、第５の本実施形態の画像処理システム１００では、撮影パラメータ変更直後の前景画像と背景画像が共にデータベース２５０に格納されることになり、バックエンドサーバ２７０では輝度が不連続になることのない仮想視点画像を生成できる。 As described above, in the image processing system 100 of the fifth embodiment, both the foreground image and the background image immediately after the shooting parameter change are stored in the database 250, and the back-end server 270 has discontinuous luminance. It is possible to generate a virtual viewpoint image that never becomes.

＜第６の実施形態＞
第５の実施形態では、センサシステム１１０にて背景画像の伝送タイミングを変更することで、前景画像と背景画像のカメラ露出情報の不一致解消する処理を例に挙げた。第６の実施形態では、カメラアダプタ１２０の外部より撮影パラメータ変更指示が入力された際、カメラ１１２に撮影パラメータを設定するタイミングを制御することで前景画像と背景画像のカメラ露出情報の不一致を解消する処理について説明する。なお、第６の実施形態の画像処理システム１００の構成は、第５の実施形態の場合と同様である。 <Sixth Embodiment>
In the fifth embodiment, an example is given of processing for eliminating the mismatch between the camera exposure information of the foreground image and the background image by changing the transmission timing of the background image in the sensor system 110. In the sixth embodiment, when a shooting parameter change instruction is input from the outside of the camera adapter 120, the camera exposure information mismatch between the foreground image and the background image is resolved by controlling the timing of setting the shooting parameter in the camera 112. Processing to be performed will be described. Note that the configuration of the image processing system 100 of the sixth embodiment is the same as that of the fifth embodiment.

第６の実施形態の場合、先ず、カメラ制御部６１４１は、データ送受信部６１１１で実施される背景画像の間引きタイミングを取得しておく。そして、カメラ制御部６１４１は、撮影パラメータ変更指示が入力されても、前景画像及び背景画像の両方が別のカメラアダプタ１２０に出力されることが確定している撮影画像から、変更後の撮影パラメータが適用されるようにカメラ１１２を制御する。 In the case of the sixth embodiment, first, the camera control unit 6141 acquires the background image thinning timing performed by the data transmission / reception unit 6111. The camera control unit 6141 then changes the shooting parameter after the change from the shot image in which both the foreground image and the background image are output to the other camera adapter 120 even if the shooting parameter change instruction is input. The camera 112 is controlled so that is applied.

第６の実施形態によれば、撮影パラメータ変更直後の前景画像と背景画像が共にデータベース２５０に格納されることになり、バックエンドサーバ２７０では輝度が不連続になることのない仮想視点画像を生成できる。 According to the sixth embodiment, both the foreground image and the background image immediately after the shooting parameter change are stored in the database 250, and the back-end server 270 generates a virtual viewpoint image that does not have a discontinuous luminance. it can.

本発明は、上述の各実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read the program. It can also be realized by processing to be executed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

上述の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。即ち、本発明は、その技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１００：画像処理システム、１１０：センサシステム、１１２：カメラ、１２０：カメラアダプタ、１８０：スイッチングハブ、２００：画像コンピューティングサーバ、２３０：フロントエンドサーバ、２５０：データベース、２７０：バックエンドサーバ、２９０：タイムサーバ、３００：コントローラ、３１０：制御ステーション、３３０：仮想カメラ操作ＵＩ、１９０：エンドユーザ端末 100: Image processing system, 110: Sensor system, 112: Camera, 120: Camera adapter, 180: Switching hub, 200: Image computing server, 230: Front end server, 250: Database, 270: Back end server, 290: Time server, 300: Controller, 310: Control station, 330: Virtual camera operation UI, 190: End user terminal

Claims

An image processing device that generates a virtual viewpoint image using a plurality of images photographed by a plurality of imaging devices,
First acquisition means for acquiring a foreground image corresponding to a predetermined object, the image being based on any of the plurality of images;
Second acquisition means for acquiring exposure information indicating exposure when the foreground image acquired by the first acquisition means is captured;
A background image that is based on any one of the plurality of images and is photographed with an exposure corresponding to the exposure information acquired by the second acquisition means and does not include the predetermined object, Third acquisition means for acquiring based on the exposure information acquired by the acquisition means;
Generating means for generating a virtual viewpoint image using the foreground image acquired by the first acquisition means and the background image acquired by the third acquisition means;
An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the third acquisition unit acquires the background image captured with the same exposure as the exposure indicated by the exposure information acquired by the second acquisition unit.

The first acquisition means acquires a plurality of foreground images corresponding to the predetermined object,
The said 3rd acquisition means acquires the said background image image | photographed with the exposure defined based on the exposure information of these foreground images acquired by the said 2nd acquisition means. An image processing apparatus according to 1.

The third acquisition unit requests the background image captured at an exposure determined based on the exposure information acquired by the second acquisition unit from a database, and acquires the background image from the database. The image processing apparatus according to any one of claims 1 to 3.

Determining means for determining a foreground image and a background image to be used for generating a virtual viewpoint image from a plurality of foreground images and a plurality of background images obtained by separating a plurality of images captured by a plurality of imaging devices, respectively;
Acquisition means for acquiring exposure information at the time of shooting of the determined foreground image and exposure information at the time of shooting of the determined background image;
Processing means for generating the virtual viewpoint image based on the foreground image and the background image,
When the exposure information of the determined foreground image and the background image is different, the processing means performs a process of adjusting the exposure of the background image to the exposure of the determined foreground image, and a process of adjusting the exposure An image processing apparatus that generates the virtual viewpoint image using the subsequent background image and the determined foreground image.

The processing means, as the processing for adjusting the exposure, is a predetermined adjustment for adjusting the image quality of the foreground image with respect to the determined background image based on the determined exposure information of the foreground image and the background image. The image processing apparatus according to claim 5, wherein image quality correction processing is performed.

The processing means includes
When there are a plurality of the determined foreground images and the exposure information of the plurality of foreground images is different, the exposure information having the largest number for each exposure information among the plurality of exposure information of the plurality of foreground images. Set as exposure information of the determined foreground image,
The image processing according to claim 6, wherein the image quality correction process is performed on a foreground image having exposure information different from the set exposure information so as to match an image quality with the foreground image having the set exposure information. apparatus.

The processing means obtains a background image of exposure information that matches the exposure information of the foreground image from a plurality of background images that can be used to generate the virtual viewpoint image as the processing for adjusting the exposure. The image processing apparatus according to claim 5, wherein the image processing apparatus is performed.

When there are a plurality of background images of exposure information that match the exposure information of the foreground image, the processing means, based on the determined shooting time of the background image, among the plurality of background images that match the exposure information. The image processing apparatus according to claim 8, wherein a background image at a shooting time closest to the shooting time is acquired later.

Holding means for holding a plurality of foreground images and a plurality of background images obtained by respectively separating a plurality of images taken by a plurality of imaging devices;
When the foreground image and the background image from which the captured image is separated have different exposure information at the time of shooting, the exposure information of the foreground image and the background image is used for the background image. Processing means for performing a predetermined image quality correction process for matching the image quality to the foreground image, and holding the background image after the image quality correction process is performed in the holding means;
Determining means for determining a foreground image and a background image used for generating a virtual viewpoint image from among a plurality of foreground images and a plurality of background images held in the holding means;
Generating means for generating a virtual viewpoint image using the determined foreground image and the background image;
An image processing apparatus comprising:

When the exposure at the time of shooting is changed by the imaging device, the processing means applies the exposure to the background image based on exposure information at shooting times before and after the exposure is changed as the process of adjusting the exposure. The image processing apparatus according to claim 5, wherein the image quality correction process for adjusting an image quality to the foreground image is performed.

The processing means performs the image quality correction processing on the background image based on exposure information calculated by a predetermined calculation using the exposure information at shooting times before and after the exposure at the time of shooting is changed. The image processing apparatus according to claim 11.

When the processing means includes a plurality of foreground images used for generating the virtual viewpoint image and the exposure information of the plurality of foreground images is different, exposure is performed among the plurality of exposure information of the plurality of foreground images. The image processing apparatus according to claim 8, wherein exposure information having the largest number of pieces of information is set as exposure information of the determined foreground image.

The image processing apparatus according to claim 5, wherein a frame rate of the foreground image is higher than a frame rate of the background image.

Control means for separating a captured image into a foreground image and a background image, performing a thinning process on the separated background image, and outputting a frame rate of the background image as a frame rate lower than the frame rate of the foreground image; ,
Processing means for generating a virtual viewpoint image based on the foreground image and the background image output from the control means,
The image processing apparatus according to claim 1, wherein the control unit cancels the thinning-out process for the background image when an exposure is changed at the time of photographing the image.

Control means for separating a captured image into a foreground image and a background image, and outputting the foreground image and the background image at different frame rates;
Processing means for generating a virtual viewpoint image based on the foreground image and the background image output from the control means,
When an instruction to change the exposure at the time of capturing the image is input, the control unit adjusts the exposure in accordance with a timing to output both the foreground image and the background image separated from the captured image. An image processing apparatus characterized by applying the change.

A determination step of determining a foreground image and a background image used for generating a virtual viewpoint image from a plurality of foreground images and a plurality of background images obtained by separating a plurality of images captured by a plurality of imaging devices, respectively;
An acquisition step of acquiring exposure information at the time of shooting of the determined foreground image and exposure information at the time of shooting of the determined background image;
A process for generating the virtual viewpoint image based on the foreground image and the background image, and
In the processing step, if the exposure information of the determined foreground image and the background image is different, a process of adjusting the exposure of the background image to the exposure of the determined foreground image, and the process of adjusting the exposure An image processing method for an image processing apparatus, wherein the virtual viewpoint image is generated by using the background image after and the determined foreground image.

A holding step of holding a plurality of foreground images and a plurality of background images obtained by separating a plurality of images taken by a plurality of imaging devices, respectively;
When the foreground image and the background image from which the captured image is separated have different exposure information at the time of shooting, the exposure information of the foreground image and the background image is used for the background image. Performing a predetermined image quality correction process for matching the image quality to the foreground image, and holding the background image after the image quality correction process in the holding process;
A determining step for determining a foreground image and a background image used for generating a virtual viewpoint image from a plurality of foreground images and a plurality of background images held in the holding step;
Generating a virtual viewpoint image using the determined foreground image and the background image;
An image processing method for an image processing apparatus, comprising:

A control step of separating the captured image into a foreground image and a background image, performing a thinning process on the separated background image, and outputting the frame rate of the background image as a frame rate lower than the frame rate of the foreground image; ,
A processing step of generating a virtual viewpoint image based on the foreground image and the background image output from the control step,
An image processing method of an image processing apparatus, wherein, in the control step, the thinning process for the background image is canceled when an exposure is changed when the image is captured.

A control step of separating the captured image into a foreground image and a background image and outputting the foreground image and the background image at different frame rates;
A processing step of generating a virtual viewpoint image based on the foreground image and the background image output from the control step,
In the control step, when an instruction to change the exposure at the time of capturing the image is input, the exposure is performed in accordance with a timing for outputting both the foreground image and the background image separated from the captured image. An image processing method for an image processing apparatus, characterized by applying a change of

The program for functioning a computer as each means of the image processing apparatus of any one of Claim 1 to 16.

An imaging device that separates a captured image into a foreground image and a background image and outputs them at different frame rates, and outputs exposure information at the time of capturing the image together with the foreground image and the background image;
The image processing device according to any one of claims 5 to 14, wherein the foreground image and the background image output from the imaging device and the exposure information are acquired.
An image processing system.