JP2019047431A

JP2019047431A - Image processing device, control method thereof, and image processing system

Info

Publication number: JP2019047431A
Application number: JP2017171373A
Authority: JP
Inventors: 吉村　雄一郎; Yuichiro Yoshimura; 雄一郎吉村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-09-06
Filing date: 2017-09-06
Publication date: 2019-03-22

Abstract

To make it possible to precisely detect a change in an installation state of an imaging device.SOLUTION: An image processing device includes: reception means for receiving a photographic image taken with an imaging device; separation means for separating a background image from the photographic image; first storage means for storing a reference image based on a first photographic image taken at a first time with the imaging device; second storage means for storing a current image based on a second photographic image taken at a second time; and detection means for detecting a state change of the imaging device by comparing the reference image and current image. The current image is the background image separated from the second photographic image by the separation means.SELECTED DRAWING: Figure 24

Description

本発明は、撮像装置の設置状態の変化を検出する技術に関するものである。 The present invention relates to a technology for detecting a change in the installation state of an imaging device.

昨今、スタジアム等で複数のカメラを異なる位置に設置して多視点で同期撮影し、当該撮影により得られた複数視点画像を用いて仮想視点コンテンツを生成する技術が注目されている。仮想視点コンテンツにより、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、ユーザは通常の画像と比較して高い臨場感を得ることが出来る。 Recently, a technique for generating virtual viewpoint content using a plurality of viewpoint images obtained by installing a plurality of cameras at different positions in a stadium or the like at different positions and capturing images from multiple viewpoints has attracted attention. The virtual viewpoint content allows, for example, a highlight scene of soccer or basketball to be viewed from various angles, so that the user can obtain a higher sense of reality compared to a normal image.

ところで、上述の複数視点画像を撮影する複数のカメラの設置時及び撮影前ワークフローには、設置時キャリブレーションが含まれる。設置時キャリブレーション後のカメラ設置状態の変化を検知し、キャリブレーションを再度行うことにより、複数視点画像を用いた仮想視点コンテンツの画質の低下を防ぐことができる。具体的には、設置時キャリブレーション後の画像を基準画像とし、その後に得られる画像（現画像）と比較し画像の変化を検出することにより、カメラ設置状態の変化を検知している。特許文献１には、被写体の存在しない背景画像を予め用意しておき、当該背景画像と現在の撮影画像との差を比較して被写体の有無を判定する技術が開示されている。 By the way, at the time of installation of the plurality of cameras for capturing a plurality of viewpoint images described above and the workflow before capturing, calibration at the time of installation is included. By detecting the change in the camera installation state after calibration at the time of installation and performing calibration again, it is possible to prevent the deterioration of the image quality of the virtual viewpoint content using the multi-viewpoint image. Specifically, the image after calibration at the time of installation is used as a reference image, and a change in the camera installation state is detected by detecting a change in the image in comparison with an image (current image) obtained thereafter. Patent Document 1 discloses a technique in which a background image in which no subject is present is prepared in advance, and the difference between the background image and the current captured image is compared to determine the presence or absence of a subject.

特開２００４−１７２９４６号公報Unexamined-Japanese-Patent No. 2004-172946

しかしながら、上述の従来技術においては、比較に利用する撮影画像（比較画像）に時間的に変化する人物等の前景が入った場合、カメラ設置状態の変化を正確に検知することができない場合がある。つまり、スタジアムのフィールドに選手や会場関係者が存在する場合、正確な検出の為に必要な比較画像を撮影することが出来ない。そのため、例えば、フィールドから人がいなくなるまで待機し比較画像を撮影する必要があった。 However, in the above-described prior art, when the captured image (comparative image) used for comparison includes the foreground of a person or the like that changes with time, the change in the camera installation state may not be accurately detected. . That is, when there are players or people in the stadium in the field of the stadium, it is not possible to take a comparative image necessary for accurate detection. Therefore, for example, it is necessary to stand by until there are no more people in the field and to take a comparison image.

本発明は、このような問題に鑑みてなされたものであり、撮像装置の設置状態の変化を正確に検知可能とする技術を提供することを目的としている。 The present invention has been made in view of such problems, and an object of the present invention is to provide a technology capable of accurately detecting a change in the installation state of an imaging device.

上述の問題点を解決するため、本発明に係る画像処理装置は以下の構成を備える。すなわち、画像処理装置は、撮像装置により撮影された撮像画像を受信する受信手段と、前記受信手段により受信された撮像画像から背景画像を分離する分離手段と、前記撮像装置により第１の時刻に撮影された第１の撮像画像に基づく基準画像を格納する第１の格納手段と、前記撮像装置により前記第１の時刻とは異なる第２の時刻に撮影された第２の撮像画像に基づく現画像を格納する第２の格納手段と、前記基準画像と前記現画像とを比較することで、前記撮像装置の状態変化を検出する検出手段と、を有し、前記現画像は、前記分離手段により前記第２の撮像画像から分離された背景画像である。 In order to solve the above-mentioned problems, the image processing apparatus according to the present invention has the following configuration. That is, the image processing apparatus comprises: receiving means for receiving a captured image captured by the imaging device; separating means for separating a background image from the captured image received by the receiving means; and at a first time by the imaging device. First storage means for storing a reference image based on a first captured image captured; and current based on a second captured image captured at a second time different from the first time by the imaging device A second storage unit for storing an image, and a detection unit for detecting a change in the state of the imaging device by comparing the reference image and the current image, the current image being the separation unit Are background images separated from the second captured image.

本発明によれば、撮像装置の設置状態の変化を正確に検知可能とする技術を提供することができる。 According to the present invention, it is possible to provide a technology capable of accurately detecting a change in the installation state of an imaging device.

第１実施形態に係る画像処理システムの全体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of an image processing system according to a first embodiment. カメラアダプタの機能構成を示すブロック図である。It is a block diagram showing functional composition of a camera adapter. 画像処理部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of an image processing part. フロントエンドサーバの機能構成を示すブロック図である。It is a block diagram showing functional composition of a front end server. データベースの機能構成を示すブロック図である。It is a block diagram which shows the function structure of a database. バックエンドサーバの機能構成を示すブロック図である。It is a block diagram showing functional composition of a back end server. 仮想カメラ操作ＵＩの機能構成を示すブロック図である。It is a block diagram showing functional composition of virtual camera operation UI. 仮想カメラを説明する図である。It is a figure explaining a virtual camera. エンドユーザ端末の機能構成を示すブロック図である。It is a block diagram which shows the function structure of an end user terminal. 第１実施形態におけるワークフロー全体を示すフローチャートである。It is a flowchart which shows the whole workflow in 1st Embodiment. 設置時処理の詳細フローチャートである。It is a detailed flowchart of processing at the time of installation. 設置時キャリブレーション処理のシーケンス図である。It is a sequence diagram of calibration processing at the time of installation. カメラパラメータ推定処理の詳細フローチャートである。It is a detailed flowchart of camera parameter estimation processing. 三次元モデル情報の生成処理のシーケンス図である。It is a sequence diagram of generation processing of three-dimensional model information. カメラから撮像画像を受信した際のカメラアダプタの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the camera adapter at the time of receiving a captured image from a camera. 隣接するカメラアダプタからデータを受信した際のカメラアダプタの動作を示すフローチャートである。It is a flow chart which shows operation of a camera adapter at the time of receiving data from an adjoining camera adapter. 注視点グループを説明する図である。It is a figure explaining a gaze point group. バイパス伝送制御を説明する図である。It is a figure explaining bypass transmission control. カメラアダプタにおけるバイパス制御を説明する図である。It is a figure explaining bypass control in a camera adapter. 複数のカメラアダプタ間のデータの流れを例示的に示す図である。It is a figure showing an example of the flow of data between a plurality of camera adapters. 伝送部における出力処理のフローチャートである。It is a flowchart of the output process in a transmission part. 背景画像を説明する図である。It is a figure explaining a background image. 画像処理部における各種処理のフローチャートである。5 is a flowchart of various processes in the image processing unit. ズレ検出報知部における処理のフローチャートである。It is a flow chart of processing in a gap detection informing part. 前景背景分離処理を例示的に示す図である。It is a figure which shows a foreground background isolation | separation process illustratively. 第２実施形態におけるズレ検出報知部における処理のフローチャートである。It is a flowchart of a process in a gap detection informing part in a 2nd embodiment. 第３実施形態におけるズレ検出報知部における処理のフローチャートである。It is a flowchart of a process in a gap detection informing part in a 3rd embodiment. 第４実施形態におけるズレ検出報知部における処理のフローチャートである。It is a flowchart of a process in a gap detection informing part in a 4th embodiment. カメラアダプタのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a camera adapter.

以下に、図面を参照して、この発明の実施の形態の一例を詳しく説明する。なお、以下の実施の形態はあくまで例示であり、本発明の範囲を限定する趣旨のものではない。 Hereinafter, an example of an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiment is merely an example and is not intended to limit the scope of the present invention.

（第１実施形態）
本発明に係る画像処理装置の第１実施形態として、競技場（スタジアム）やコンサートホールなどの施設に複数のカメラ及びマイクを設置し撮影及び集音を行うシステムを例に挙げて以下に説明する。 First Embodiment
As a first embodiment of the image processing apparatus according to the present invention, a system for installing a plurality of cameras and microphones in a facility such as a stadium or a concert hall and performing photography and sound collection will be described below as an example. .

＜システムの全体構成＞
図１は、第１実施形態に係る画像処理システムの全体構成を示す図である。画像処理システム１００は、センサシステム１１０ａ〜１１０ｚ、画像コンピューティングサーバ２００、コントローラ３００、スイッチングハブ１８０、及びエンドユーザ端末１９０を含む。 <Whole system configuration>
FIG. 1 is a diagram showing an overall configuration of an image processing system according to the first embodiment. The image processing system 100 includes sensor systems 110 a to 110 z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190.

コントローラ３００は、制御ステーション３１０と仮想カメラ操作ＵＩ３３０を有する。制御ステーション３１０は、画像処理システム１００を構成するそれぞれのブロックに対してネットワーク３１０ａ〜３１０ｃ、１８０ａ、１８０ｂ、及び１７０ａ〜１７０ｙを通じて動作状態の管理及びパラメータ設定制御などを行う。ここで、ネットワークはＥｔｈｅｒｎｅｔ（登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサーネット）や１０ＧｂＥでもよいし、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用イーサーネット等を組合せて構成されてもよい。また、これらに限定されず、他の種別のネットワークであってもよい。 The controller 300 has a control station 310 and a virtual camera operation UI 330. The control station 310 performs operation state management, parameter setting control, and the like on the blocks constituting the image processing system 100 through the networks 310a to 310c, 180a, 180b, and 170a to 170y. Here, the network may be Ethernet (registered trademark) IEEE standard compliant GbE (Gigabit Ethernet) or 10 GbE, or may be configured by combining an interconnect Infiniband, industrial Ethernet, or the like. Moreover, it is not limited to these, It may be another type of network.

最初に、センサシステム１１０ａ〜１１０ｚの２６セットの画像及び音声をセンサシステム１１０ｚから画像コンピューティングサーバ２００へ送信する動作を説明する。本実施形態の画像処理システム１００は、センサシステム１１０ａ〜１１０ｚがデイジーチェーンにより接続される。 First, an operation of transmitting 26 sets of images and sounds of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the image processing system 100 of this embodiment, sensor systems 110a to 110z are connected by a daisy chain.

本実施形態において、特別な説明がない場合は、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットのシステムを区別せず、単にセンサシステム１１０と記載する。各センサシステム１１０内の装置についても同様に、特別な説明がない場合は区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１２０と記載する。なお、センサシステムの台数として２６セットと記載しているが、あくまでも一例であり、台数をこれに限定するものではない。なお、本実施形態では、特に断りがない限り、画像という文言は、動画及び静止画の双方の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１００は、静止画及び動画の何れについても処理可能である。また、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツには、仮想視点画像と仮想視点音声が含まれる例を中心に説明するが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくても良い。また例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であっても良い。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略しているが、基本的に画像と音声は共に処理されるものとする。 In the present embodiment, 26 sets of systems from the sensor system 110a to the sensor system 110z are simply distinguished as the sensor system 110 unless otherwise described. Similarly, the devices in each sensor system 110 are described as the microphone 111, the camera 112, the camera platform 113, the external sensor 114, and the camera adapter 120 without distinction unless otherwise described. Although 26 sets are described as the number of sensor systems, this is merely an example, and the number is not limited to this. In the present embodiment, unless otherwise noted, the term "image" is described as including the concepts of both moving and still images. That is, the image processing system 100 of the present embodiment can process both still images and moving images. Further, in the present embodiment, the virtual viewpoint content provided by the image processing system 100 will be described focusing on an example in which a virtual viewpoint image and a virtual viewpoint sound are included, but the present invention is not limited thereto. For example, the audio may not be included in the virtual viewpoint content. Also, for example, the sound included in the virtual viewpoint content may be the sound collected by the microphone closest to the virtual viewpoint. Further, in the present embodiment, although the description of the voice is partially omitted for simplification of the description, basically both the image and the voice are processed.

センサシステム１１０ａ〜１１０ｚは、それぞれ１台ずつのカメラ１１２ａ〜１１２ｚを有する。即ち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。複数のセンサシステム１１０同士はデイジーチェーンにより接続される。この接続形態により、撮影画像の４Ｋや８Ｋなどへの高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化ができる効果があることをここに明記しておく。 Each of the sensor systems 110a to 110z has one camera 112a to 112z. That is, the image processing system 100 has a plurality of cameras for photographing a subject from a plurality of directions. The plurality of sensor systems 110 are connected by a daisy chain. With this connection form, it is possible to reduce the number of connection cables and save labor for wiring work in increasing the resolution of the captured image to 4K, 8K, etc. and increasing the capacity of image data accompanying the increase in frame rate. I will specify it here.

なおこれに限らず、接続形態として、各センサシステム１１０ａ〜１１０ｚがスイッチングハブ１８０に接続されて、スイッチングハブ１８０を経由してセンサシステム１１０間のデータ送受信を行うスター型のネットワーク構成としてもよい。 The present invention is not limited to this, and as a connection form, each sensor system 110a to 110z may be connected to the switching hub 180, and may have a star network configuration for transmitting and receiving data between the sensor systems 110 via the switching hub 180.

また、図１では、デイジーチェーンとなるようセンサシステム１１０ａ〜１１０ｚの全てがカスケード接続されている構成を示したがこれに限定するものではない。例えば、複数のセンサシステム１１０をいくつかのグループに分割して、分割したグループ単位でセンサシステム１１０間をデイジーチェーン接続してもよい。そして、分割単位の終端となるカメラアダプタ１２０がスイッチングハブに接続されて画像コンピューティングサーバ２００へ画像の入力を行うようにしてもよい。このような構成は、スタジアムにおいてとくに有効である。例えば、スタジアムが複数階で構成され、フロア毎にセンサシステム１１０を配備する場合が考えられる。この場合に、フロア毎、あるいはスタジアムの半周毎に画像コンピューティングサーバ２００への入力を行うことができ、全センサシステム１１０を１つのデイジーチェーンで接続する配線が困難な場所でも設置の簡便化及びシステムの柔軟化を図ることができる。 Further, although FIG. 1 shows a configuration in which all the sensor systems 110a to 110z are cascaded to form a daisy chain, the present invention is not limited to this. For example, the plurality of sensor systems 110 may be divided into several groups, and the sensor systems 110 may be daisy-chained among the divided group units. Then, the camera adapter 120 serving as the end of the division unit may be connected to the switching hub to input an image to the image computing server 200. Such an arrangement is particularly effective in stadiums. For example, there may be a case where a stadium is composed of a plurality of floors and the sensor system 110 is deployed on each floor. In this case, the input to the image computing server 200 can be performed every floor or every half cycle of the stadium, and installation is simplified even in a place where it is difficult to connect all the sensor systems 110 by one daisy chain and System flexibility can be achieved.

また、デイジーチェーン接続されて画像コンピューティングサーバ２００へ画像入力を行うカメラアダプタ１２０が１つであるか２つ以上であるかに応じて、画像コンピューティングサーバ２００での画像処理の制御が切り替えられる。すなわち、センサシステム１１０が複数のグループに分割されているかどうかに応じて制御が切り替えられる。画像入力を行うカメラアダプタ１２０が１つの場合は、デイジーチェーン接続で画像伝送を行いながら競技場全周画像が生成されるため、画像コンピューティングサーバ２００において全周の画像データが揃うタイミングは同期がとられている。すなわち、センサシステム１１０がグループに分割されていなければ、同期はとれる。 Further, control of image processing in the image computing server 200 can be switched depending on whether there is one or two or more camera adapters 120 connected in a daisy chain and performing image input to the image computing server 200. . That is, the control is switched depending on whether the sensor system 110 is divided into a plurality of groups. When there is one camera adapter 120 that performs image input, images are generated all around the stadium while performing image transmission by daisy chain connection, so the image computing server 200 is synchronized when the image data of all the circumferences are aligned It is taken. That is, if the sensor system 110 is not divided into groups, synchronization can be achieved.

しかし、画像入力を行うカメラアダプタ１２０が複数になる（センサシステム１１０がグループに分割される）場合は、それぞれのデイジーチェーンのレーン（経路）によって遅延が異なる場合が考えられる。そのため、画像コンピューティングサーバ２００において全周の画像データが揃うまで待って同期をとる同期制御によって、画像データの集結をチェックしながら後段の画像処理を行う必要があることを明記しておく。 However, when there are a plurality of camera adapters 120 that perform image input (the sensor system 110 is divided into groups), it is conceivable that the delay may differ depending on the lanes (paths) of the respective daisy chains. Therefore, it is clearly stated that it is necessary to perform image processing in the latter stage while checking the concentration of the image data by synchronous control in which the image computing server 200 waits and synchronizes until the image data of all the circumferences are complete.

本実施形態では、センサシステム１１０ａはマイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、外部センサ１１４ａ、及びカメラアダプタ１２０ａを有する。なお、この構成に限定するものではなく、少なくとも１台のカメラアダプタ１２０ａと、１台のカメラ１１２ａまたは１台のマイク１１１ａを有していれば良い。また例えば、センサシステム１１０ａは１台のカメラアダプタ１２０ａと、複数のカメラ１１２ａで構成されてもよいし、１台のカメラ１１２ａと複数のカメラアダプタ１２０ａで構成されてもよい。即ち、画像処理システム１００内の複数のカメラ１１２と複数のカメラアダプタ１２０はＮ対Ｍ（ＮとＭは共に１以上の整数）で対応する。また、センサシステム１１０は、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、及びカメラアダプタ１２０ａ以外の装置を含んでいてもよい。また、カメラ１１２とカメラアダプタ１２０が一体となって構成されていてもよい。さらに、カメラアダプタ１２０の機能の少なくとも一部をフロントエンドサーバ２３０が有していてもよい。本実施形態では、センサシステム１１０ｂ〜１１０ｚについては、センサシステム１１０ａと同様の構成なので省略する。なお、センサシステム１１０ａと同じ構成に限定されるものではなく、其々のセンサシステム１１０が異なる構成でもよい。 In the present embodiment, the sensor system 110a includes a microphone 111a, a camera 112a, a camera platform 113a, an external sensor 114a, and a camera adapter 120a. The present invention is not limited to this configuration, as long as at least one camera adapter 120a and one camera 112a or one microphone 111a are included. Further, for example, the sensor system 110a may be configured by one camera adapter 120a and a plurality of cameras 112a, or may be configured by a single camera 112a and a plurality of camera adapters 120a. That is, the plurality of cameras 112 and the plurality of camera adapters 120 in the image processing system 100 correspond to each other by N pairs M (N and M are both integers of 1 or more). The sensor system 110 may also include devices other than the microphone 111a, the camera 112a, the camera platform 113a, and the camera adapter 120a. Also, the camera 112 and the camera adapter 120 may be integrally configured. Furthermore, the front end server 230 may have at least a part of the functions of the camera adapter 120. In the present embodiment, the sensor systems 110 b to 110 z are omitted because they have the same configuration as the sensor system 110 a. The configuration is not limited to the same as that of the sensor system 110a, and each sensor system 110 may have a different configuration.

マイク１１１ａにて集音された音声と、カメラ１１２ａにて撮影された画像は、カメラアダプタ１２０ａにおいて後述の画像処理が施された後、デイジーチェーン１７０ａを通してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。同様にセンサシステム１１０ｂは、集音された音声と撮影された画像を、センサシステム１１０ａから取得した画像及び音声と合わせてセンサシステム１１０ｃに伝送する。 The sound collected by the microphone 111a and the image captured by the camera 112a are subjected to image processing described later in the camera adapter 120a, and then transmitted to the camera adapter 120b of the sensor system 110b through the daisy chain 170a. . Similarly, the sensor system 110b transmits the collected voice and the captured image to the sensor system 110c along with the image and voice acquired from the sensor system 110a.

前述した動作を続けることにより、センサシステム１１０ａ〜１１０ｚが取得した画像及び音声は、センサシステム１１０ｚからネットワーク１８０ｂを介してスイッチングハブ１８０に伝わり、その後、画像コンピューティングサーバ２００へ伝送される。 By continuing the operation described above, the images and sounds acquired by the sensor systems 110a to 110z are transmitted from the sensor system 110z to the switching hub 180 via the network 180b and then transmitted to the image computing server 200.

なお、本実施形態では、カメラ１１２ａ〜１１２ｚとカメラアダプタ１２０ａ〜１２０ｚが分離された構成にしているが、同一筺体で一体化されていてもよい。その場合、マイク１１１ａ〜１１１ｚは一体化されたカメラ１１２に内蔵されてもよいし、カメラ１１２の外部に接続されていてもよい。 In the present embodiment, the cameras 112a to 112z and the camera adapters 120a to 120z are separated, but may be integrated by the same housing. In that case, the microphones 111 a to 111 z may be incorporated in the integrated camera 112 or may be connected to the outside of the camera 112.

次に、画像コンピューティングサーバ２００の構成及び動作について説明する。本実施形態の画像コンピューティングサーバ２００は、センサシステム１１０ｚから取得したデータの処理を行う。画像コンピューティングサーバ２００はフロントエンドサーバ２３０、データベース（ＤＢ）２５０、バックエンドサーバ２７０、タイムサーバ２９０を有する。 Next, the configuration and operation of the image computing server 200 will be described. The image computing server 200 of the present embodiment processes data acquired from the sensor system 110 z. The image computing server 200 includes a front end server 230, a database (DB) 250, a back end server 270, and a time server 290.

タイムサーバ２９０は、時刻及び同期信号を配信する機能を有し、スイッチングハブ１８０を介してセンサシステム１１０ａ〜１１０ｚに時刻及び同期信号を配信する。時刻と同期信号を受信したカメラアダプタ１２０ａ〜１２０ｚは、カメラ１１２ａ〜１１２ｚを時刻と同期信号をもとにＧｅｎｌｏｃｋさせ画像フレーム同期を行う。即ち、タイムサーバ２９０は、複数のカメラ１１２の撮影タイミングを同期させる。これにより、画像処理システム１００は同じタイミングで撮影された複数の撮影画像に基づいて仮想視点画像を生成できるため、撮影タイミングのずれによる仮想視点画像の品質低下を抑制できる。なお、本実施形態ではタイムサーバ２９０が複数のカメラ１１２の時刻同期を管理するものとするが、これに限らず、時刻同期のための処理を各カメラ１１２又は各カメラアダプタ１２０が独立して行ってもよい。 The time server 290 has a function of distributing time and synchronization signals, and distributes time and synchronization signals to the sensor systems 110a to 110z through the switching hub 180. The camera adapters 120a to 120z having received the time and synchronization signal genlock the cameras 112a to 112z based on the time and synchronization signal to perform image frame synchronization. That is, the time server 290 synchronizes the imaging timings of the plurality of cameras 112. As a result, the image processing system 100 can generate a virtual viewpoint image based on a plurality of captured images captured at the same timing, so that it is possible to suppress the degradation of the quality of the virtual viewpoint image due to the shift of the capturing timing. In the present embodiment, the time server 290 manages time synchronization of a plurality of cameras 112. However, the present invention is not limited to this. Each camera 112 or each camera adapter 120 independently performs processing for time synchronization. May be

フロントエンドサーバ２３０は、センサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換した後に、カメラの識別子やデータ種別、フレーム番号に応じてデータベース２５０に書き込む。 The front end server 230 reconstructs a segmented transmission packet from the image and sound acquired from the sensor system 110 z and converts the data format, and then converts it into the database 250 according to the camera identifier, data type, and frame number. Write.

次に、バックエンドサーバ２７０では、仮想カメラ操作ＵＩ３３０から視点の指定を受け付け、受け付けられた視点に基づいて、データベース２５０から対応する画像及び音声データを読み出し、レンダリング処理を行って仮想視点画像を生成する。 Next, the back end server 270 receives specification of a viewpoint from the virtual camera operation UI 330, reads out the corresponding image and audio data from the database 250 based on the received viewpoint, and performs rendering processing to generate a virtual viewpoint image Do.

なお、画像コンピューティングサーバ２００の構成はこれに限らない。例えば、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０のうち少なくとも２つが一体となって構成されていてもよい。また、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０の少なくとも何れかが複数含まれていてもよい。また、画像コンピューティングサーバ２００内の任意の位置に上記の装置以外の装置が含まれていてもよい。さらに、画像コンピューティングサーバ２００の機能の少なくとも一部をエンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有していてもよい。 The configuration of the image computing server 200 is not limited to this. For example, at least two of the front end server 230, the database 250, and the back end server 270 may be integrally configured. Also, a plurality of at least one of the front end server 230, the database 250, and the back end server 270 may be included. In addition, devices other than the above-described devices may be included at any position in the image computing server 200. Furthermore, the end user terminal 190 or the virtual camera operation UI 330 may have at least a part of the functions of the image computing server 200.

レンダリング処理された画像は、バックエンドサーバ２７０からエンドユーザ端末１９０に送信され、エンドユーザ端末１９０を操作するユーザは視点の指定に応じた画像閲覧及び音声視聴が出来る。すなわち、バックエンドサーバ２７０は、複数のカメラ１１２により撮影された撮影画像（複数視点画像）と視点情報とに基づく仮想視点コンテンツを生成する。より具体的には、バックエンドサーバ２７０は、例えば複数のカメラアダプタ１２０により複数のカメラ１１２による撮影画像から抽出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そしてバックエンドサーバ２７０は、生成した仮想視点コンテンツをエンドユーザ端末１９０に提供する。カメラアダプタ１２０による所定領域の抽出の詳細については後述する。 The image subjected to the rendering process is transmitted from the back end server 270 to the end user terminal 190, and the user operating the end user terminal 190 can view and listen to images according to the designation of the viewpoint. That is, the back-end server 270 generates virtual viewpoint content based on the captured images (multi-viewpoint images) captured by the plurality of cameras 112 and the viewpoint information. More specifically, the back-end server 270 is a virtual viewpoint based on, for example, image data of a predetermined area extracted from images captured by the plurality of cameras 112 by the plurality of camera adapters 120 and a viewpoint designated by the user operation. Generate content. Then, the back end server 270 provides the end user terminal 190 with the generated virtual viewpoint content. Details of extraction of the predetermined area by the camera adapter 120 will be described later.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像であるとも言える。仮想的な視点（仮想視点）は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる場合の例を中心に説明するが、必ずしも音声データが含まれていなくても良い。また、バックエンドサーバ２７０は、仮想視点画像をＨ．２６４やＨＥＶＣに代表される標準技術により圧縮符号化したうえで、ＭＰＥＧ−ＤＡＳＨプロトコルを使ってエンドユーザ端末１９０へ送信してもよい。また、仮想視点画像は、非圧縮でエンドユーザ端末１９０へ送信されてもよい。とくに圧縮符号化を行う前者はエンドユーザ端末１９０としてスマートフォンやタブレットを想定しており、後者は非圧縮画像を表示可能なディスプレイを想定している。すなわち、エンドユーザ端末１９０の種別に応じて画像フォーマットが切り替え可能であることを明記しておく。また、画像の送信プロトコルはＭＰＥＧ−ＤＡＳＨに限らず、例えば、ＨＬＳ（HTTP Live Streaming）やその他の送信方法を用いても良い。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when the subject is photographed from a virtual viewpoint. In other words, it can be said that the virtual viewpoint image is an image representing the appearance at the designated viewpoint. The virtual viewpoint (virtual viewpoint) may be designated by the user, or may be automatically designated based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to a viewpoint arbitrarily specified by the user. Further, an image corresponding to a viewpoint specified by the user from a plurality of candidates and an image corresponding to a viewpoint automatically specified by the apparatus are also included in the virtual viewpoint image. In the present embodiment, an example in which audio data (audio data) is included in virtual viewpoint content is mainly described. However, audio data may not necessarily be included. Also, the back end server 270 transmits the virtual viewpoint image to the H.264 system. Alternatively, the data may be compressed and encoded by a standard technology represented by H.264 and HEVC, and then transmitted to the end user terminal 190 using the MPEG-DASH protocol. Also, the virtual viewpoint image may be transmitted to the end user terminal 190 uncompressed. In particular, the former that performs compression coding assumes a smartphone or a tablet as the end user terminal 190, and the latter assumes a display capable of displaying uncompressed images. That is, it is specified that the image format can be switched according to the type of the end user terminal 190. Further, the transmission protocol of the image is not limited to the MPEG-DASH, and for example, HLS (HTTP Live Streaming) or another transmission method may be used.

この様に、画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインはセンサシステム１１０ａ〜１１０ｚを含み、データ保存ドメインはデータベース２５０、フロントエンドサーバ２３０及びバックエンドサーバ２７０を含み、映像生成ドメインは仮想カメラ操作ＵＩ３３０及びエンドユーザ端末１９０を含む。なお本構成に限らず、例えば、仮想カメラ操作ＵＩ３３０が直接センサシステム１１０ａ〜１１０ｚから画像を取得する事も可能である。しかしながら、本実施形態では、センサシステム１１０ａ〜１１０ｚから直接画像を取得する方法ではなくデータ保存機能を中間に配置する方法をとる。具体的には、フロントエンドサーバ２３０がセンサシステム１１０ａ〜１１０ｚが生成した画像データや音声データ及びそれらのデータのメタ情報をデータベース２５０の共通スキーマ及びデータ型に変換している。これにより、センサシステム１１０ａ〜１１０ｚのカメラ１１２が他機種のカメラに変化しても、変化した差分をフロントエンドサーバ２３０が吸収し、データベース２５０に登録することができる。このことによって、カメラ１１２が他機種カメラに変わった場合に、仮想カメラ操作ＵＩ３３０が適切に動作しない虞を低減できる。 Thus, the image processing system 100 has three functional domains: a video acquisition domain, a data storage domain, and a video generation domain. The image acquisition domain includes sensor systems 110a to 110z, the data storage domain includes a database 250, a front end server 230 and a back end server 270, and the image generation domain includes a virtual camera operation UI 330 and an end user terminal 190. Note that the present invention is not limited to this configuration. For example, the virtual camera operation UI 330 can also obtain an image directly from the sensor systems 110a to 110z. However, in the present embodiment, not the method of directly acquiring an image from the sensor systems 110a to 110z but the method of arranging the data storage function in the middle is adopted. Specifically, the front end server 230 converts the image data and audio data generated by the sensor systems 110a to 110z and the meta information of those data into the common schema and data type of the database 250. Thereby, even if the cameras 112 of the sensor systems 110a to 110z change to cameras of other types, the front end server 230 can absorb the changed difference and register it in the database 250. This can reduce the possibility that the virtual camera operation UI 330 does not operate properly when the camera 112 is changed to another model camera.

また、仮想カメラ操作ＵＩ３３０は、直接データベース２５０にアクセスせずにバックエンドサーバ２７０を介してアクセスする構成である。バックエンドサーバ２７０で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を仮想カメラ操作ＵＩ３３０で行っている。このことにより、仮想カメラ操作ＵＩ３３０の開発において、ＵＩ操作デバイスや、生成したい仮想視点画像を操作するＵＩの機能要求に対する開発に注力する事ができる。また、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０の要求に応じて画像生成処理に係わる共通処理を追加又は削除する事も可能である。このことによって仮想カメラ操作ＵＩ３３０の要求に柔軟に対応する事ができる。 Also, the virtual camera operation UI 330 is configured to access via the back end server 270 without directly accessing the database 250. The back-end server 270 performs common processing related to image generation processing, and the difference part of the application related to the operation UI is performed using the virtual camera operation UI 330. As a result, in the development of the virtual camera operation UI 330, it is possible to focus on the development for the function request of the UI operation device and the UI for operating the virtual viewpoint image to be generated. The back-end server 270 can also add or delete common processing related to image generation processing in response to a request from the virtual camera operation UI 330. By this, it is possible to flexibly cope with the request of the virtual camera operation UI 330.

このように、画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２による撮影に基づく画像データに基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。なお、本実施形態における画像処理システム１００は、上記で説明した物理的な構成に限定される訳ではなく、論理的に構成されていてもよい。 As described above, in the image processing system 100, the virtual viewpoint image is generated by the back end server 270 based on the image data based on the photographing by the plurality of cameras 112 for photographing the subject from a plurality of directions. The image processing system 100 in the present embodiment is not limited to the physical configuration described above, and may be logically configured.

＜各ノードの機能構成＞
次にシステムを構成する各ノード（カメラアダプタ１２０、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、仮想カメラ操作ＵＩ３３０、エンドユーザ端末１９０）の機能構成を説明する。 <Functional configuration of each node>
Next, the functional configuration of each node (camera adapter 120, front end server 230, database 250, back end server 270, virtual camera operation UI 330, end user terminal 190) constituting the system will be described.

図２は、カメラアダプタ１２０の機能構成を示すブロック図である。なお、カメラアダプタ１２０の機能ブロック間でのデータの流れの詳細は図２０を用いて後述する。 FIG. 2 is a block diagram showing a functional configuration of the camera adapter 120. As shown in FIG. The details of the flow of data between the functional blocks of the camera adapter 120 will be described later with reference to FIG.

カメラアダプタ１２０は、ネットワークアダプタ６１１０、伝送部６１２０、画像処理部６１３０及び、外部機器制御部６１４０から構成されている。ネットワークアダプタ６１１０は、データ送受信部６１１１及び時刻制御部６１１２から構成されている。 The camera adapter 120 includes a network adapter 6110, a transmission unit 6120, an image processing unit 6130, and an external device control unit 6140. The network adapter 6110 includes a data transmission / reception unit 6111 and a time control unit 6112.

データ送受信部６１１１は、デイジーチェーン１７０、ネットワーク２９１、及びネットワーク３１０ａを介し他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、及び制御ステーション３１０とデータ通信を行う。例えばデータ送受信部６１１１は、カメラ１１２による撮影画像から前景背景分離部６１３１により分離された前景画像と背景画像とを、別のカメラアダプタ１２０に対して出力する。出力先のカメラアダプタ１２０は、画像処理システム１００内のカメラアダプタ１２０のうち、データルーティング処理部６１２２の処理に応じて予め定められた順序において次のカメラアダプタ１２０である。各カメラアダプタ１２０が所定形式の画像として前景画像と背景画像とを出力することで、複数の視点から撮影された前景画像と背景画像に基づいて仮想視点画像が生成される。なお、撮影画像から分離した前景画像を出力して背景画像は出力しないカメラアダプタ１２０が存在してもよい。 The data transmission / reception unit 6111 performs data communication with the other camera adapter 120, the front end server 230, the time server 290, and the control station 310 via the daisy chain 170, the network 291, and the network 310a. For example, the data transmission / reception unit 6111 outputs the foreground image and the background image separated by the foreground / background separation unit 6131 from the image captured by the camera 112 to another camera adapter 120. The camera adapter 120 of the output destination is the next camera adapter 120 among the camera adapters 120 in the image processing system 100 in the order determined in advance according to the processing of the data routing processing unit 6122. Each camera adapter 120 outputs a foreground image and a background image as an image of a predetermined format, so that a virtual viewpoint image is generated based on the foreground image and the background image captured from a plurality of viewpoints. Note that there may be a camera adapter 120 that outputs a foreground image separated from a captured image and does not output a background image.

時刻制御部６１１２は、例えばＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ２９０と時刻同期を行う。なお、ＩＥＥＥ１５８８に限定する訳ではなく、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバ２９０との時刻同期を実現してもよい。本実施形態では、ネットワークアダプタ６１１０としてＮＩＣ（Network Interface Card）を利用するが、ＮＩＣに限定するものではなく、同様の他のＩｎｔｅｒｆａｃｅを利用してもよい。また、ＩＥＥＥ１５８８はＩＥＥＥ１５８８−２００２、ＩＥＥＥ１５８８−２００８のように標準規格として更新されており、後者については、ＰＴＰｖ２（Precision Time Protocol Version 2）とも呼ばれる。 The time control unit 6112 performs, for example, a function of storing a time stamp of data transmitted to and received from the time server 290 and time synchronization with the time server 290 in compliance with the Ordinay Clock of IEEE 1588 standard. The present invention is not limited to the IEEE 1588, and time synchronization with the time server 290 may be realized by another EtherAVB standard or a unique protocol. In the present embodiment, a NIC (Network Interface Card) is used as the network adapter 6110. However, the present invention is not limited to the NIC, and the same other interface may be used. Also, IEEE 1588 is updated as a standard such as IEEE 1588-2002 and IEEE 1588-2008, and the latter is also called PTPv2 (Precision Time Protocol Version 2).

伝送部６１２０は、ネットワークアダプタ６１１０を介してスイッチングハブ１８０等に対するデータの伝送を制御する機能を有し、以下の機能部から構成されている。 The transmission unit 6120 has a function of controlling data transmission to the switching hub 180 and the like via the network adapter 6110, and is configured by the following function units.

データ圧縮・伸張部６１２１は、データ送受信部６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能を有している。 A data compression / decompression unit 6121 has a function of performing compression to which data is transmitted and received via the data transmission / reception unit 6111 by applying a predetermined compression method, compression ratio, and frame rate, and a function of decompressing compressed data. have.

データルーティング処理部６１２２は、後述するデータルーティング情報保持部６１２５が保持するデータを利用し、データ送受信部６１１１が受信したデータ及び画像処理部６１３０で処理されたデータのルーティング先を決定する。さらに、決定したルーティング先へデータを送信する機能を有している。ルーティング先としては、同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０とするのが、それぞれのカメラ１１２同士の画像フレーム相関が高いため画像処理を行う上で好適である。複数のカメラアダプタ１２０それぞれのデータルーティング処理部６１２２による決定に応じて、画像処理システム１００内において前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が定まる。 The data routing processing unit 6122 determines the routing destination of the data received by the data transmitting / receiving unit 6111 and the data processed by the image processing unit 6130 using the data held by the data routing information holding unit 6125 described later. Furthermore, it has a function of transmitting data to the determined routing destination. As a routing destination, it is preferable to use a camera adapter 120 corresponding to the cameras 112 focused on the same fixation point in order to perform image processing because the image frame correlation between the cameras 112 is high. In accordance with the determination by the data routing processing unit 6122 of each of the plurality of camera adapters 120, the order of the camera adapters 120 for outputting the foreground image and the background image in a relay format in the image processing system 100 is determined.

時刻同期制御部６１２３は、ＩＥＥＥ１５８８規格のＰＴＰ（Precision Time Protocol）に準拠し、タイムサーバ２９０と時刻同期に係わる処理を行う機能を有している。なお、ＰＴＰに限定するのではなく他の同様のプロトコルを利用して時刻同期してもよい。 The time synchronization control unit 6123 conforms to PTP (Precision Time Protocol) of the IEEE 1588 standard, and has a function of performing processing relating to time synchronization with the time server 290. Note that time synchronization may be performed using another similar protocol instead of PTP.

画像・音声伝送処理部６１２４は、画像データ又は音声データを、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ転送するためのメッセージを作成する機能を有している。メッセージには画像データ又は音声データ、及び各データのメタ情報が含まる。本実施形態のメタ情報には画像の撮影または音声のサンプリングをした時のタイムコードまたはシーケンス番号、データ種別、及びカメラ１１２やマイク１１１の個体を示す識別子などが含まれる。なお送信する画像データまたは音声データはデータ圧縮・伸張部６１２１でデータ圧縮されていてもよい。また、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０からデータ送受信部６１１１を介してメッセージを受取る。そして、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を画像データまたは音声データに復元する。なお、データを復元した際にデータが圧縮されている場合は、データ圧縮・伸張部６１２１が伸張処理を行う。 The image / sound transmission processing unit 6124 has a function of creating a message for transferring image data or sound data to another camera adapter 120 or the front end server 230 via the data transmission / reception unit 6111. The message includes image data or audio data, and meta information of each data. The meta information of the present embodiment includes a time code or sequence number when shooting an image or sampling an audio, a data type, an identifier indicating an individual of the camera 112 or the microphone 111, and the like. The image data or audio data to be transmitted may be compressed by the data compression / decompression unit 6121. Also, the image / voice transmission processing unit 6124 receives a message from another camera adapter 120 via the data transmission / reception unit 6111. Then, in accordance with the data type included in the message, the data information fragmented in the packet size specified in the transmission protocol is restored into image data or audio data. If the data is compressed when the data is restored, the data compression / decompression unit 6121 performs decompression processing.

データルーティング情報保持部６１２５は、データ送受信部６１１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する機能を有する。ルーティング方法については後述する。 The data routing information holding unit 6125 has a function of holding address information for determining a transmission destination of data transmitted and received by the data transmission / reception unit 6111. The routing method will be described later.

画像処理部６１３０は、カメラ制御部６１４１の制御によりカメラ１１２が撮影した画像データ及び他のカメラアダプタ１２０から受取った画像データに対して処理を行う機能を有し、以下の機能部から構成されている。 The image processing unit 6130 has a function of processing image data captured by the camera 112 under control of the camera control unit 6141 and image data received from other camera adapters 120, and includes the following functional units: There is.

前景背景分離部６１３１は、カメラ１１２が撮影した画像データを前景画像と背景画像に分離する機能を有している。すなわち、複数のカメラアダプタ１２０それぞれの前景背景分離部６１３１は、複数のカメラ１１２のうち対応するカメラ１１２による撮影画像から所定領域を抽出する。所定領域は例えば撮影画像に対するオブジェクト検出の結果得られる前景画像であり、この抽出により前景背景分離部６１３１は、撮影画像を前景画像と背景画像に分離する。なお、オブジェクトとは、例えば人物である。ただし、オブジェクトが特定人物（選手、監督、及び／又は審判など）であっても良いし、ボールやゴールなど、画像パターンが予め定められている物体であっても良い。また、オブジェクトとして、動体が検出されるようにしても良い。人物等の重要なオブジェクトを含む前景画像とそのようなオブジェクトを含まない背景領域を分離して処理することで、画像処理システム１００において生成される仮想視点画像の上記のオブジェクトに該当する部分の画像の品質を向上できる。また、前景と背景の分離を複数のカメラアダプタ１２０それぞれが行うことで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることができる。なお、所定領域は前景画像に限らず、例えば背景画像であってもよい。また、後述するズレ検出のために前景背景分離部６１３１において行われる前景画像と背景画像に分離する処理は、上記画像処理であってもよい。しかし、本実施形態においては、より正確なズレ測定の為に、時間的相関及び空間的相関の両方を用いて前景と背景を分離する前景背景分離処理を行う。このズレ検出に用いる前景背景分離に関しては、後述する。 The foreground / background separation unit 6131 has a function of separating image data captured by the camera 112 into a foreground image and a background image. That is, the foreground / background separation unit 6131 of each of the plurality of camera adapters 120 extracts a predetermined area from an image captured by the corresponding camera 112 among the plurality of cameras 112. The predetermined area is, for example, a foreground image obtained as a result of object detection on the photographed image, and the foreground / background separation unit 6131 separates the photographed image into a foreground image and a background image by this extraction. The object is, for example, a person. However, the object may be a specific person (such as a player, a director, and / or an umpire), or an object such as a ball or a goal, for which an image pattern is predetermined. Also, a moving body may be detected as an object. An image of a portion corresponding to the above object of the virtual viewpoint image generated in the image processing system 100 by separating and processing the foreground image including important objects such as a person and the background region not including such objects Can improve the quality of In addition, by separating the foreground and the background from each other by the plurality of camera adapters 120, it is possible to disperse the load in the image processing system 100 provided with the plurality of cameras 112. The predetermined area is not limited to the foreground image, and may be, for example, a background image. Further, the process of separating into the foreground image and the background image performed in the foreground / background separation unit 6131 for the detection of deviation described later may be the above-mentioned image processing. However, in the present embodiment, foreground / background separation processing is performed to separate the foreground and the background using both temporal correlation and spatial correlation for more accurate measurement of deviation. The foreground / background separation used for this shift detection will be described later.

三次元モデル情報生成部６１３２は、前景背景分離部６１３１で分離された前景画像及び他のカメラアダプタ１２０から受取った前景画像を利用し、例えばステレオカメラの原理を用いて三次元モデルに係わる画像情報を生成する機能を有している。 The three-dimensional model information generation unit 6132 uses the foreground image separated by the foreground / background separation unit 6131 and the foreground image received from the other camera adapter 120, and uses, for example, the principle of a stereo camera to generate image information related to a three-dimensional model. Have the ability to generate

キャリブレーション制御部６１３３は、キャリブレーションに必要な画像データを、カメラ制御部６１４１を介してカメラ１１２から取得し、キャリブレーションに係わる演算処理を行うフロントエンドサーバ２３０に送信する機能を有している。なお本実施形態ではキャリブレーションに係わる演算処理をフロントエンドサーバ２３０で行っているが、演算処理を行うノードはフロントエンドサーバ２３０に限定されない。例えば、制御ステーション３１０やカメラアダプタ１２０（他のカメラアダプタ１２０を含む）など他のノードで演算処理が行われてもよい。またキャリブレーション制御部６１３３は、カメラ制御部６１４１を介してカメラ１１２から取得した画像データに対して、予め設定されたパラメータに応じて撮影中のキャリブレーション（動的キャリブレーション）を行う機能を有している。 The calibration control unit 6133 has a function of acquiring image data necessary for calibration from the camera 112 via the camera control unit 6141 and transmitting the image data to the front end server 230 that performs calculation processing related to calibration. . In the present embodiment, the front end server 230 performs calculation processing related to calibration, but the node performing the calculation processing is not limited to the front end server 230. For example, the computing process may be performed at another node such as the control station 310 or the camera adapter 120 (including the other camera adapter 120). Also, the calibration control unit 6133 has a function of performing calibration (dynamic calibration) during shooting according to a preset parameter on image data acquired from the camera 112 via the camera control unit 6141. doing.

ズレ検出報知部６１３４は、前景背景分離部６１３１で分離された背景画像から画像の位置ズレ（以降ではズレと呼ぶ）を検出する機能を有する。まず、カメラ初期設定時に撮影した画像に対して時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。そして、得られた背景画像を基準画像として保存する。次に、起動時や定期的に現画像を撮影し、同様の前景背景分離処理を行い、得られた背景画像を現画像として保存する。そして、基準画像と現画像とを比較してズレを検出する。ズレが所定の基準値より大きい場合には、カメラの設置状態が状態変化したと判断し、警報を報知する。 The shift detection / notification unit 6134 has a function of detecting displacement (hereinafter referred to as shift) of an image from the background image separated by the foreground / background separation unit 6131. First, temporal correlation is performed on an image captured at the time of camera initialization, and foreground / background separation processing is performed to separate foreground and background from spatial correlation. Then, the obtained background image is stored as a reference image. Next, the current image is photographed at startup or periodically, the same foreground / background separation processing is performed, and the obtained background image is stored as the current image. Then, the reference image and the current image are compared to detect a deviation. If the deviation is larger than a predetermined reference value, it is determined that the installation state of the camera has changed, and an alarm is notified.

外部機器制御部６１４０は、カメラアダプタ１２０に接続する機器を制御する機能を有し、下記機能ブロックから構成されている。 The external device control unit 6140 has a function of controlling devices connected to the camera adapter 120, and is configured of the following functional blocks.

カメラ制御部６１４１は、カメラ１１２と接続し、カメラ１１２の制御、撮影画像取得、同期信号提供、及び時刻設定などを行う機能を有している。カメラ１１２の制御には、例えば撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランスの設定など）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、及びエラーなど）の取得、撮影の開始及び停止や、ピント調整などがある。なお、本実施形態ではカメラ１１２を介してピント調整を行っているが、取り外し可能なレンズがカメラ１１２に装着されている場合は、カメラアダプタ１２０がレンズに接続し、直接レンズの調整を行ってもよい。また、カメラアダプタ１２０がカメラ１１２を介してズーム等のレンズ調整を行ってもよい。同期信号提供は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。時刻設定は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受取る画像データに提供したタイムコードが付与されることになる。なおタイムコードのフォーマットはＳＭＰＴＥ１２Ｍに限定されるわけではなく、他のフォーマットであってもよい。また、カメラ制御部６１４１は、カメラ１１２に対するタイムコードの提供はせず、カメラ１１２から受取った画像データに自身がタイムコードを付与してもよい。 The camera control unit 6141 is connected to the camera 112, and has a function of controlling the camera 112, acquiring a captured image, providing a synchronization signal, setting a time, and the like. For control of the camera 112, for example, setting and reference of shooting parameters (number of pixels, color depth, frame rate, setting of white balance, etc.), status of the camera 112 (during shooting, stopping, synchronizing, error etc.) Acquisition, start and stop of shooting, and focus adjustment. In the present embodiment, focus adjustment is performed via the camera 112. However, when a removable lens is attached to the camera 112, the camera adapter 120 is connected to the lens to directly adjust the lens. It is also good. In addition, the camera adapter 120 may perform lens adjustment such as zooming via the camera 112. The synchronization signal is provided by the time synchronization control unit 6123 using the time synchronized with the time server 290 and providing the camera 112 with a photographing timing (control clock). The time setting is performed by providing the time synchronized with the time server 290 by the time synchronization control unit 6123 with, for example, a time code conforming to the format of SMPTE 12M. As a result, the time code provided to the image data received from the camera 112 is given. The format of the time code is not limited to SMPTE 12M, and may be another format. In addition, the camera control unit 6141 may not provide a time code to the camera 112, and may assign a time code to image data received from the camera 112.

マイク制御部６１４２は、マイク１１１と接続し、マイク１１１の制御、収音の開始及び停止や収音された音声データの取得などを行う機能を有している。マイク１１１の制御は例えば、ゲイン調整や、状態取得などである。またカメラ制御部６１４１と同様にマイク１１１に対して音声サンプリングするタイミングとタイムコードを提供する。音声サンプリングのタイミングとなるクロック情報としては、タイムサーバ２９０からの時刻情報が例えば４８ＫＨｚのワードクロックに変換されてマイク１１１に供給される。 The microphone control unit 6142 is connected to the microphone 111, and has a function of controlling the microphone 111, starting and stopping sound collection, and acquiring voice data collected. The control of the microphone 111 is, for example, gain adjustment, state acquisition, or the like. Further, similarly to the camera control unit 6141, timing and time code for audio sampling are provided to the microphone 111. As clock information for audio sampling timing, time information from the time server 290 is converted to, for example, a 48 KHz word clock and supplied to the microphone 111.

雲台制御部６１４３は、雲台１１３と接続し、雲台１１３の制御を行う機能を有している。雲台１１３の制御は例えば、パン・チルト制御や、状態取得などがある。 The pan head control unit 6143 is connected to the pan head 113 and has a function of controlling the pan head 113. The control of the pan head 113 includes, for example, pan / tilt control, state acquisition, and the like.

センサ制御部６１４４は、外部センサ１１４と接続し、外部センサ１１４がセンシングしたセンサ情報を取得する機能を有する。例えば、外部センサ１１４としてジャイロセンサが利用される場合は、振動を表す情報を取得することができる。そして、センサ制御部６１４４が取得した振動情報を用いて、画像処理部６１３０は、前景背景分離部６１３１での処理に先立って、振動を抑えた画像を生成することができる。振動情報は例えば、８Ｋカメラの画像データを、振動情報を考慮して、元の８Ｋサイズよりも小さいサイズで切り出して、隣接設置されたカメラ１１２の画像との位置合わせを行う場合に利用される。これにより、建造物の躯体振動が各カメラに異なる周波数で伝搬しても、カメラアダプタ１２０に配備された本機能で位置合わせを行う。その結果、電子的に防振された画像データを生成でき、画像コンピューティングサーバ２００におけるカメラ１１２の台数分の位置合わせの処理負荷を軽減する効果が得られる。なお、センサシステム１１０のセンサは外部センサ１１４に限定するわけではなく、カメラアダプタ１２０に内蔵されたセンサであっても同様の効果が得られる。 The sensor control unit 6144 is connected to the external sensor 114, and has a function of acquiring sensor information sensed by the external sensor 114. For example, when a gyro sensor is used as the external sensor 114, information representing vibration can be acquired. Then, using the vibration information acquired by the sensor control unit 6144, the image processing unit 6130 can generate an image in which the vibration is suppressed prior to the processing in the foreground / background separation unit 6131. The vibration information is used, for example, in the case where image data of an 8K camera is cut out in a size smaller than the original 8K size in consideration of the vibration information, and alignment with the image of the camera 112 installed adjacently is performed. . Thereby, even if the casing vibration of the building propagates to each camera at different frequencies, alignment is performed using this function provided in the camera adapter 120. As a result, it is possible to generate image data that has been subjected to electronic image stabilization, and it is possible to obtain an effect of reducing the processing load of alignment for the number of cameras 112 in the image computing server 200. In addition, the sensor of the sensor system 110 is not necessarily limited to the external sensor 114, and even if it is a sensor incorporated in the camera adapter 120, the same effect is acquired.

続いて、上述の各実施形態を構成する各装置のハードウェア構成について、より詳細に説明する。上述の通り、カメラアダプタ１２０がＦＰＧＡ及び／又はＡＳＩＣなどのハードウェアを実装し、これらのハードウェアによって、上述した各処理を実行する場合の例を中心に説明した。それはセンサシステム１１０内の各種装置や、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、及びコントローラ３００についても同様である。しかしながら、上記装置のうち、少なくとも何れかが、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用い、ソフトウェア処理によって各実施形態の処理を実行するようにしても良い。 Subsequently, the hardware configuration of each device constituting each of the above-described embodiments will be described in more detail. As described above, the camera adapter 120 implements hardware such as an FPGA and / or an ASIC, and the hardware executes the above-described processes. The same applies to various devices in the sensor system 110, the front end server 230, the database 250, the back end server 270, and the controller 300. However, at least one of the above-described devices may execute the processing of each embodiment by software processing using, for example, a CPU, a GPU, a DSP, or the like.

図２９は、カメラアダプタ１２０のハードウェア構成を示す図である。ここでは、図２に示した機能構成を、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用いたソフトウェア処理によって実現することを想定している。なお、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０、及びエンドユーザ端末１９０などの装置も、図２９のハードウェア構成となりうる。カメラアダプタ１２０は、ＣＰＵ１２０１、ＲＯＭ１２０２、ＲＡＭ１２０３、補助記憶装置１２０４、表示部１２０５、操作部１２０６、通信部１２０７、及びバス１２０８を有する。 FIG. 29 is a diagram showing a hardware configuration of the camera adapter 120. As shown in FIG. Here, it is assumed that the functional configuration shown in FIG. 2 is realized by software processing using, for example, a CPU, a GPU, a DSP, and the like. Note that devices such as the front end server 230, the database 250, the back end server 270, the control station 310, the virtual camera operation UI 330, and the end user terminal 190 can also have the hardware configuration shown in FIG. The camera adapter 120 includes a CPU 1201, a ROM 1202, a RAM 1203, an auxiliary storage device 1204, a display unit 1205, an operation unit 1206, a communication unit 1207, and a bus 1208.

ＣＰＵ１２０１は、ＲＯＭ１２０２やＲＡＭ１２０３に格納されているコンピュータプログラムやデータを用いてカメラアダプタ１２０の全体を制御する。ＲＯＭ１２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ１２０３は、補助記憶装置１２０４から供給されるプログラムやデータ、及び通信部１２０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１２０４は、例えばハードディスクドライブ等で構成され、静止画や動画などのコンテンツデータを記憶する。 The CPU 1201 controls the entire camera adapter 120 using computer programs and data stored in the ROM 1202 and the RAM 1203. The ROM 1202 stores programs and parameters that do not need to be changed. The RAM 1203 temporarily stores programs and data supplied from the auxiliary storage device 1204, data supplied from the outside via the communication unit 1207, and the like. The auxiliary storage device 1204 is configured by, for example, a hard disk drive and stores content data such as still images and moving images.

表示部１２０５は、例えば液晶ディスプレイ等で構成され、ユーザがカメラアダプタ１２０を操作するためのＧＵＩ（Graphical User Interface）などを表示する。操作部１２０６は、例えばキーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１２０１に入力する。通信部１２０７は、カメラ１１２やフロントエンドサーバ２３０などの外部の装置と通信を行う。例えば、カメラアダプタ１２０が外部の装置と有線で接続される場合には、ＬＡＮケーブル等が通信部１２０７に接続される。なお、カメラアダプタ１２０が外部の装置と無線通信する機能を有する場合、通信部１２０７はアンテナを備える。バス１２０８は、カメラアダプタ１２０の各部を繋いで情報を伝達する。 The display unit 1205 is configured of, for example, a liquid crystal display, and displays a GUI (Graphical User Interface) or the like for the user to operate the camera adapter 120. The operation unit 1206 includes, for example, a keyboard and a mouse, and receives various operations from the user and inputs various instructions to the CPU 1201. A communication unit 1207 communicates with external devices such as the camera 112 and the front end server 230. For example, when the camera adapter 120 is connected to an external device by wire, a LAN cable or the like is connected to the communication unit 1207. When the camera adapter 120 has a function of performing wireless communication with an external device, the communication unit 1207 includes an antenna. A bus 1208 connects each unit of the camera adapter 120 to transmit information.

なお、例えばカメラアダプタ１２０の処理のうち一部をＦＰＧＡで行い、別の一部の処理を、ＣＰＵを用いたソフトウェア処理によって実現するようにしても良い。また、図２９に示したカメラアダプタ１２０の各構成要素は、単一の電子回路で構成されていてもよいし、複数の電子回路で構成されていてもよい。例えば、カメラアダプタ１２０は、ＣＰＵ１２０１として動作する電子回路を複数備えていてもよい。これら複数の電子回路がＣＰＵ１２０１としての処理を並行して行うことで、カメラアダプタの処理速度を向上することができる。 For example, a part of the processing of the camera adapter 120 may be performed by an FPGA, and another part of the processing may be realized by software processing using a CPU. Further, each component of the camera adapter 120 shown in FIG. 29 may be configured by a single electronic circuit or may be configured by a plurality of electronic circuits. For example, the camera adapter 120 may include a plurality of electronic circuits operating as the CPU 1201. The processing speed of the camera adapter can be improved by performing processing of the CPU 1201 in parallel by the plurality of electronic circuits.

また、上述の各実施形態では表示部１２０５と操作部１２０６はカメラアダプタ１２０の内部に存在するが、カメラアダプタ１２０は表示部１２０５及び操作部１２０６の少なくとも一方を備えていなくてもよい。また、表示部１２０５及び操作部１２０６の少なくとも一方がカメラアダプタ１２０の外部に別の装置として存在していて、ＣＰＵ１２０１が、表示部１２０５を制御する表示制御部、及び操作部１２０６を制御する操作制御部として動作してもよい。画像処理システム１００内の他の装置についても同様である。また例えば、フロントエンドサーバ２３０、データベース２５０及びバックエンドサーバ２７０は表示部１２０５を備えず、制御ステーション３１０、仮想カメラ操作ＵＩ３３０及びエンドユーザ端末１９０は表示部１２０５を備えていてもよい。また、上述の各実施形態では、画像処理システム１００が競技場やコンサートホールなどの施設に設置される場合の例を中心に説明した。施設の他の例としては、例えば、遊園地、公園、競馬場、競輪場、カジノ、プール、スケートリンク、スキー場、ライブハウスなどがある。また、各種施設で行われるイベントは、屋内で行われるものであっても屋外で行われるものであっても良い。また、各実施形態における施設は、一時的に（期間限定で）建設される施設も含む。 In each of the embodiments described above, the display unit 1205 and the operation unit 1206 exist inside the camera adapter 120, but the camera adapter 120 may not include at least one of the display unit 1205 and the operation unit 1206. In addition, at least one of the display unit 1205 and the operation unit 1206 exists as another device outside the camera adapter 120, and the CPU 1201 controls the display control unit that controls the display unit 1205, and operation control that controls the operation unit 1206. It may operate as a part. The same applies to other devices in the image processing system 100. For example, the front end server 230, the database 250, and the back end server 270 may not include the display unit 1205, and the control station 310, the virtual camera operation UI 330, and the end user terminal 190 may include the display unit 1205. Moreover, in the above-mentioned each embodiment, it demonstrated centering on the example in case the image processing system 100 is installed in institutions, such as a stadium and a concert hall. Other examples of the facility include, for example, an amusement park, a park, a racetrack, a bicycle race track, a casino, a pool, a skating rink, a ski resort, a live house and the like. In addition, events performed in various facilities may be performed indoors or outdoor. Moreover, the facility in each embodiment includes a facility that is temporarily (limitedly) built.

図３は、カメラアダプタ１２０内部の画像処理部６１３０の機能構成を示すブロック図である。キャリブレーション制御部６１３３は、入力された画像に対して、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因するブレに対して画像の位置を安定させるためのブレ補正処理（電子防振処理）などを行う。 FIG. 3 is a block diagram showing the functional configuration of the image processing unit 6130 in the camera adapter 120. As shown in FIG. The calibration control unit 6133 performs color correction processing for suppressing variation in color for each camera on the input image, and shake correction for stabilizing the position of the image against shake due to camera vibration. Perform processing (electronic anti-vibration processing) and the like.

前景背景分離部６１３１の機能ブロックについて説明する。前景分離部５００１は、カメラ１１２の画像に関して位置合わせが行われた画像データに対して、背景画像５００２との比較により前景画像の分離処理を行う。 The functional blocks of the foreground / background separator 6131 will be described. The foreground separation unit 5001 performs separation processing of the foreground image on the image data on which the alignment of the image of the camera 112 has been performed, by comparison with the background image 5002.

背景更新部５００３は、背景画像５００２とカメラ１１２の位置合わせが行われた画像を用いて新しい背景画像を生成し、背景画像５００２を新しい背景画像に更新する。 The background updating unit 5003 generates a new background image using the image in which the background image 5002 and the camera 112 are aligned, and updates the background image 5002 with the new background image.

背景切出部５００４は、背景画像５００２の一部を切り出す制御を行う。ここで、三次元モデル情報生成部６１３２の機能について説明する。 The background cutting out unit 5004 controls to cut out a part of the background image 5002. Here, the function of the three-dimensional model information generation unit 6132 will be described.

三次元モデル処理部５００５は、前景分離部５００１で分離された前景画像と、伝送部６１２０を介して受信した他のカメラ１１２の前景画像を用いて、例えばステレオカメラの原理等から三次元モデルに関わる画像情報を逐次生成する。 The three-dimensional model processing unit 5005 uses a foreground image separated by the foreground separation unit 5001 and a foreground image of another camera 112 received via the transmission unit 6120, for example, based on the principle of a stereo camera etc. It sequentially generates related image information.

他カメラ前景受信部５００６は、他のカメラアダプタ１２０で前景背景分離された前景画像を受信する。 The other camera foreground reception unit 5006 receives the foreground image whose foreground / background is separated by the other camera adapter 120.

カメラパラメータ受信部５００７は、カメラ固有の内部パラメータ（焦点距離、画像中心、及びレンズ歪みパラメータ等）と、カメラの位置姿勢を表す外部パラメータ（回転行列及び位置ベクトル等）を受信する。これらのパラメータは、後述のキャリブレーション処理で得られる情報であり、制御ステーション３１０から対象となるカメラアダプタ１２０に対して送信及び設定される。つぎに、三次元モデル処理部５００５は、カメラパラメータ受信部５００７と他カメラ前景受信部５００６によって三次元モデル情報を生成する。 The camera parameter receiving unit 5007 receives internal parameters specific to the camera (focal length, image center, lens distortion parameter, etc.) and external parameters (rotation matrix, position vector, etc.) representing the position and orientation of the camera. These parameters are information obtained by the calibration processing described later, and are transmitted and set from the control station 310 to the target camera adapter 120. Next, the three-dimensional model processing unit 5005 generates three-dimensional model information by the camera parameter receiving unit 5007 and the other camera foreground receiving unit 5006.

ここで、ズレ検出報知部６１３４の機能について説明する。ズレ検出報知部６１３４は、前景背景分離部６１３１で分離された背景画像から画像のズレを検出する機能を有する。 Here, the function of the deviation detection notification unit 6134 will be described. The shift detection notification unit 6134 has a function of detecting a shift of an image from the background image separated by the foreground / background separation unit 6131.

まず、カメラ初期設定時のキャリブレーション終了後（第１の時刻）に画像を撮影する。次に、このカメラ初期設定時に撮影した画像に対して前景分離部５００１において時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。この時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理に関しての詳細に関しては、後述する。そして、この背景画像５００２を基準画像として基準背景画像記憶部５００８（第１の格納部）に保存する。次に、起動時や定期的（第２の時刻）に現画像を撮影する。次に、この画像に対して前景分離部５００１において上述と同様の時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。そして、背景画像５００２を比較用の現画像（比較現画像）として比較現背景画像記憶部５００９（第２の格納部）に保存する。次に、ズレ検出部５０１０で基準背景画像記憶部５００８の基準画像と比較現背景画像記憶部５００９の比較現画像とを比較してズレを検出する。比較する手法は、対応する特徴点間のベクトルを用いる方法、その他、周知の技法を用いる。ズレが所定の基準値より大きい場合には、カメラの設置状態が状態変化したと判断し、警報を報知し警報情報を伝送部６１２０に伝送する。 First, an image is captured after the calibration at the time of initial setting of the camera (first time). Next, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the temporal correlation and the spatial correlation with the image captured at the time of this camera initialization. Details regarding this temporal correlation and foreground / background separation processing for separating foreground and background from spatial correlation will be described later. Then, the background image 5002 is stored as a reference image in the reference background image storage unit 5008 (first storage unit). Next, the current image is taken at startup or periodically (second time). Next, with respect to this image, the foreground separation unit 5001 performs the same temporal correlation as described above and foreground / background separation processing for separating the foreground and the background from the spatial correlation. Then, the background image 5002 is stored in the comparison current background image storage unit 5009 (second storage unit) as a comparison current image (comparison current image). Next, the shift detection unit 5010 compares the reference image of the reference background image storage unit 5008 with the comparison current image of the comparison current background image storage unit 5009 to detect a shift. The comparison method uses a method using vectors between corresponding feature points, and other known techniques. If the deviation is larger than a predetermined reference value, it is determined that the installation state of the camera has changed, and an alarm is notified and alarm information is transmitted to the transmission unit 6120.

図４は、フロントエンドサーバ２３０の機能構成を示すブロック図である。制御部２１１０はＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリなどの記憶媒体、Ｅｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。そして、フロントエンドサーバ２３０の各機能ブロック及びフロントエンドサーバ２３０のシステム全体の制御を行う。また、モード制御を行って、キャリブレーション動作や撮影前の準備動作、及び撮影中動作などの動作モードを切り替える。また、Ｅｔｈｅｒｎｅｔ（登録商標）を通じて制御ステーション３１０からの制御指示を受信し、各モードの切り替えやデータの入出力などを行う。また、同じくネットワークを通じて制御ステーション３１０からスタジアムＣＡＤデータ（スタジアム形状データ）を取得し、スタジアムＣＡＤデータをＣＡＤデータ記憶部２１３５と撮影データファイル生成部２１８０に送信する。なお、本実施形態におけるスタジアムＣＡＤデータ（スタジアム形状データ）はスタジアムの形状を示す三次元データであり、メッシュモデルやその他の三次元形状を表すデータであればよく、ＣＡＤ形式に限定されない。 FIG. 4 is a block diagram showing a functional configuration of the front end server 230. As shown in FIG. The control unit 2110 is configured by a CPU, a DRAM, a storage medium such as an HDD or a NAND memory storing program data and various data, and hardware such as Ethernet (registered trademark). Then, each functional block of the front end server 230 and the entire system of the front end server 230 are controlled. In addition, mode control is performed to switch operation modes such as a calibration operation, a preparation operation before shooting, and an operation during shooting. Further, the control instruction from the control station 310 is received through Ethernet (registered trademark), and switching of each mode, data input / output, etc. are performed. Similarly, stadium CAD data (stadium shape data) is acquired from the control station 310 through the network, and stadium CAD data is transmitted to the CAD data storage unit 2135 and the photographed data file generation unit 2180. The stadium CAD data (stadium shape data) in the present embodiment is three-dimensional data indicating the shape of the stadium, and may be a mesh model or data representing another three-dimensional shape, and is not limited to the CAD format.

データ入力制御部２１２０は、Ｅｔｈｅｒｎｅｔ（登録商標）等の通信路とスイッチングハブ１８０を介して、カメラアダプタ１２０とネットワーク接続されている。そしてデータ入力制御部２１２０は、ネットワークを通してカメラアダプタ１２０から前景画像、背景画像、被写体の三次元モデル、音声データ、及びカメラキャリブレーション撮影画像データを取得する。ここで、前景画像は仮想視点画像の生成のための撮影画像の前景領域に基づく画像データであり、背景画像は当該撮影画像の背景領域に基づく画像データである。カメラアダプタ１２０は、カメラ１１２による撮影画像に対する所定のオブジェクトの検出処理の結果に応じて、前景領域及び背景領域を特定し、前景画像及び背景画像を生成する。所定のオブジェクトとは、例えば人物である。なお、所定のオブジェクトは特定の人物（選手、監督、及び／又は審判など）であっても良い。また、所定のオブジェクトには、ボールやゴールなど、画像パターンが予め定められている物体が含まれていてもよい。また、所定のオブジェクトとして、動体が検出されるようにしても良い。 The data input control unit 2120 is network-connected to the camera adapter 120 via a communication path such as Ethernet (registered trademark) and the switching hub 180. Then, the data input control unit 2120 acquires the foreground image, the background image, the three-dimensional model of the subject, the sound data, and the camera calibration captured image data from the camera adapter 120 through the network. Here, the foreground image is image data based on the foreground area of the captured image for generation of a virtual viewpoint image, and the background image is image data based on the background area of the captured image. The camera adapter 120 specifies a foreground area and a background area according to the result of detection processing of a predetermined object on a captured image by the camera 112, and generates a foreground image and a background image. The predetermined object is, for example, a person. The predetermined object may be a specific person (such as a player, a manager, and / or an umpire). Further, the predetermined object may include an object such as a ball or a goal, for which an image pattern is predetermined. In addition, a moving object may be detected as a predetermined object.

また、データ入力制御部２１２０は、取得した前景画像及び背景画像をデータ同期部２１３０に送信し、カメラキャリブレーション撮影画像データをキャリブレーション部２１４０に送信する。また、データ入力制御部２１２０は受信したデータの圧縮伸張やデータルーティング処理等を行う機能を有する。また、制御部２１１０とデータ入力制御部２１２０は共にＥｔｈｅｒｎｅｔ（登録商標）等のネットワークによる通信機能を有しているが、通信機能はこれらで共有されていてもよい。その場合は、制御ステーション３１０からの制御コマンドによる指示やスタジアムＣＡＤデータをデータ入力制御部２１２０で受けて、制御部２１１０に対して送る方法を用いてもよい。 Further, the data input control unit 2120 transmits the acquired foreground image and background image to the data synchronization unit 2130, and transmits camera calibration photographed image data to the calibration unit 2140. Also, the data input control unit 2120 has a function of performing compression / decompression of received data, data routing processing, and the like. Although both the control unit 2110 and the data input control unit 2120 have a communication function by a network such as Ethernet (registered trademark), the communication function may be shared by these. In that case, a method may be used in which the data input control unit 2120 receives an instruction by a control command from the control station 310 or stadium CAD data and sends it to the control unit 2110.

データ同期部２１３０は、カメラアダプタ１２０から取得したデータをＤＲＡＭ上に一次的に記憶し、前景画像、背景画像、音声データ及び三次元モデルデータが揃うまでバッファする。なお、前景画像、背景画像、音声データ及び三次元モデルデータをまとめて、以降では撮影データと称する。撮影データにはルーティング情報やタイムコード情報（時間情報）、カメラ識別子等のメタ情報が付与されており、データ同期部２１３０はこのメタ情報を元にデータの属性を確認する。これによりデータ同期部２１３０は、同一時刻のデータであることなどを判断してデータがそろったことを確認する。これは、ネットワークによって各カメラアダプタ１２０から転送されたデータについて、ネットワークパケットの受信順序は保証されず、ファイル生成に必要なデータが揃うまでバッファする必要があるためである。 The data synchronization unit 2130 temporarily stores the data acquired from the camera adapter 120 on the DRAM, and buffers the foreground image, the background image, the audio data, and the three-dimensional model data until they are aligned. The foreground image, the background image, the sound data, and the three-dimensional model data are collectively referred to as shooting data hereinafter. The shooting data is attached with meta information such as routing information, time code information (time information), camera identifier and the like, and the data synchronization unit 2130 confirms the attribute of the data based on the meta information. Thus, the data synchronization unit 2130 determines that the data is identical by determining that the data is the same at the same time. This is because, for the data transferred from each camera adapter 120 by the network, the reception order of the network packets is not guaranteed, and it is necessary to buffer until the data necessary for file generation is complete.

データがそろったら、データ同期部２１３０は、前景画像及び背景画像を画像処理部２１５０に、三次元モデルデータを三次元モデル結合部２１６０に、音声データを撮影データファイル生成部２１８０にそれぞれ送信する。なお、ここで揃えるデータは、後述される撮影データファイル生成部２１８０に於いてファイル生成を行うために必要なデータである。また、背景画像は前景画像とは異なるフレームレートで撮影されてもよい。例えば、背景画像のフレームレートが１ｆｐｓである場合、１秒毎に１つの背景画像が取得されるため、背景画像が取得されない時間については、背景画像が無い状態で全てのデータがそろったとしてよい。また、データ同期部２１３０は、所定時間を経過しデータが揃っていない場合には、データ集結ができないことを示す情報をデータベース２５０に通知する。そして、後段のデータベース２５０が、データを格納する際に、カメラ番号やフレーム番号とともにデータ欠落を示す情報を格納する。これにより、仮想カメラ操作ＵＩ３３０からバックエンドサーバ２７０への視点指示に応じて、データ集結したカメラ１１２の撮影画像から所望の画像が形成できるか否かをレンダリング前に自動通知することが可能となる。その結果、仮想カメラ操作ＵＩ３３０のオペレータの目視負荷を軽減できる。 When the data is prepared, the data synchronization unit 2130 transmits the foreground image and the background image to the image processing unit 2150, the three-dimensional model data to the three-dimensional model combination unit 2160, and the audio data to the photographed data file generation unit 2180. Note that the data to be aligned here is data necessary for file generation in a shooting data file generation unit 2180 described later. Also, the background image may be taken at a different frame rate than the foreground image. For example, when the frame rate of the background image is 1 fps, one background image is acquired every one second, and therefore, it may be assumed that all the data are prepared without the background image for the time when the background image is not acquired. . Further, the data synchronization unit 2130 notifies the database 250 of information indicating that data aggregation can not be performed when the predetermined time has elapsed and the data is not complete. Then, when storing the data, the database 250 in the latter stage stores information indicating a missing data together with the camera number and the frame number. This makes it possible to automatically notify before rendering whether or not a desired image can be formed from the captured image of the camera 112 in accordance with the viewpoint instruction from the virtual camera operation UI 330 to the back end server 270. . As a result, the visual load on the operator of the virtual camera operation UI 330 can be reduced.

ＣＡＤデータ記憶部２１３５は制御部２１１０から受け取ったスタジアム形状を示す三次元データをＤＲＡＭまたはＨＤＤやＮＡＮＤメモリ等の記憶媒体に保存する。そして、画像結合部２１７０に対して、スタジアム形状データの要求を受け取った際に保存されたスタジアム形状データを送信する。 The CAD data storage unit 2135 stores three-dimensional data indicating the stadium shape received from the control unit 2110 in a storage medium such as a DRAM, an HDD, or a NAND memory. Then, the stadium shape data stored when the stadium shape data request is received is transmitted to the image combining unit 2170.

キャリブレーション部２１４０はカメラのキャリブレーション動作を行い、キャリブレーションによって得られたカメラパラメータを後述する非撮影データファイル生成部２１８５に送る。また同時に、自身の記憶領域にもカメラパラメータを保持し、後述する三次元モデル結合部２１６０にカメラパラメータ情報を提供する。 The calibration unit 2140 performs a calibration operation of the camera, and sends camera parameters obtained by the calibration to a non-shooting data file generation unit 2185 described later. At the same time, camera parameters are held in its own storage area, and camera parameter information is provided to a three-dimensional model combining unit 2160 described later.

画像処理部２１５０は前景画像や背景画像に対して、カメラ間の色や輝度値の合わせこみ、ＲＡＷ画像データが入力される場合には現像処理、及びカメラのレンズ歪みの補正等の処理を行う。そして、画像処理を行った前景画像は撮影データファイル生成部２１８０に、背景画像は２１７０にそれぞれ送信する。 An image processing unit 2150 performs processing such as matching of colors and luminance values between cameras with a foreground image or background image, development processing when RAW image data is input, and correction of lens distortion of the camera. . Then, the foreground image subjected to the image processing is transmitted to the photographed data file generation unit 2180, and the background image is transmitted to 2170.

三次元モデル結合部２１６０は、カメラアダプタ１２０から取得した同一時刻の三次元モデルデータをキャリブレーション部２１４０が生成したカメラパラメータを用いて結合する。そして、ＶｉｓｕａｌＨｕｌｌと呼ばれる方法を用いて、スタジアム全体における前景画像の三次元モデルデータを生成する。生成した三次元モデルは撮影データファイル生成部２１８０に送信される。 The three-dimensional model combining unit 2160 combines the three-dimensional model data of the same time acquired from the camera adapter 120 using the camera parameters generated by the calibration unit 2140. Then, using a method called Visual Hull, three-dimensional model data of the foreground image in the entire stadium is generated. The generated three-dimensional model is transmitted to the imaging data file generation unit 2180.

画像結合部２１７０は画像処理部２１５０から背景画像を取得し、ＣＡＤデータ記憶部２１３５からスタジアムの三次元形状データ（スタジアム形状データ）を取得し、取得したスタジアムの三次元形状データの座標に対する背景画像の位置を特定する。背景画像の各々についてスタジアムの三次元形状データの座標に対する位置が特定できると、背景画像を結合して１つの背景画像とする。なお、本背景画像の三次元形状データの作成については、バックエンドサーバ２７０が実施してもよい。 The image combining unit 2170 obtains a background image from the image processing unit 2150, obtains three-dimensional shape data (stadium shape data) of a stadium from the CAD data storage unit 2135, and obtains a background image with respect to coordinates of the obtained three-dimensional shape data of the stadium. Identify the location of When the position with respect to the coordinates of the three-dimensional shape data of the stadium can be specified for each of the background images, the background images are combined into one background image. The back-end server 270 may perform the creation of three-dimensional shape data of the present background image.

撮影データファイル生成部２１８０はデータ同期部２１３０から音声データを、画像処理部２１５０から前景画像を、三次元モデル結合部２１６０から三次元モデルデータを、画像結合部２１７０から三次元形状に結合された背景画像を取得する。そして、取得したこれらのデータをＤＢアクセス制御部２１９０に対して出力する。ここで、撮影データファイル生成部２１８０は、これらのデータをそれぞれの時間情報に基づいて対応付けて出力する。ただし、これらのデータの一部を対応付けて出力してもよい。例えば、撮影データファイル生成部２１８０は、前景画像と背景画像とを、前景画像の時間情報及び背景画像の時間情報に基づいて対応付けて出力する。また例えば、撮影データファイル生成部２１８０は、前景画像、背景画像、及び三次元モデルデータを、前景画像の時間情報、背景画像の時間情報、及び三次元モデルデータの時間情報に基づいて対応付けて出力する。なお、撮影データファイル生成部２１８０は、対応付けられたデータをデータの種類別にファイル化して出力してもよいし、複数種類のデータを時間情報が示す時刻ごとにまとめてファイル化して出力してもよい。このように対応付けられた撮影データがＤＢアクセス制御部２１９０によってデータベース２５０に出力されることで、バックエンドサーバ２７０は時間情報が対応する前景画像と背景画像とから仮想視点画像を生成できる。 The photographed data file generation unit 2180 combines voice data from the data synchronization unit 2130, a foreground image from the image processing unit 2150, three-dimensional model data from the three-dimensional model combining unit 2160, and a three-dimensional shape from the image combining unit 2170. Get background image. Then, the acquired data is output to the DB access control unit 2190. Here, the shooting data file generation unit 2180 associates and outputs these data based on the respective time information. However, part of these data may be associated and output. For example, the shooting data file generation unit 2180 associates and outputs the foreground image and the background image based on the time information of the foreground image and the time information of the background image. Also, for example, the shooting data file generation unit 2180 associates the foreground image, the background image, and the three-dimensional model data with the time information of the foreground image, the time information of the background image, and the time information of the three-dimensional model data. Output. Note that the shooting data file generation unit 2180 may file and output the correlated data according to the type of data, or may group and output a plurality of types of data at every time indicated by the time information. It is also good. By outputting the imaging data associated in this way to the database 250 by the DB access control unit 2190, the back end server 270 can generate a virtual viewpoint image from the foreground image and the background image to which the time information corresponds.

なお、データ入力制御部２１２０により取得される前景画像と背景画像のフレームレートが異なる場合、撮影データファイル生成部２１８０は、常に同時刻の前景画像と背景画像を対応付けて出力することは難しい。そこで、撮影データファイル生成部２１８０は、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像とを対応付けて出力する。ここで、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像は、例えば、撮影データファイル生成部２１８０が取得した背景画像のうち前景画像の時間情報に最も近い時間情報を有する背景画像である。 When the frame rates of the foreground image and the background image acquired by the data input control unit 2120 are different, it is difficult for the shooting data file generation unit 2180 to always associate the foreground image and the background image at the same time and output. Therefore, the shooting data file generation unit 2180 associates the time information of the foreground image with the background image having the time information in a relationship based on a predetermined rule, and outputs it. Here, the background image having time information in a relationship based on the time information of the foreground image and the predetermined rule is, for example, time information closest to the time information of the foreground image among the background images acquired by the imaging data file generation unit 2180 Background image.

このように、所定の規則に基づいて前景画像と背景画像を対応付けることにより、前景画像と背景画像のフレームレートが異なる場合でも、近い時刻に撮影された前景画像と背景画像とから仮想視点画像を生成することができる。なお、前景画像と背景画像の対応付けの方法は上記のものに限らない。例えば、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像は、取得された背景画像であって前景画像より前の時刻に対応する時間情報を有する背景画像のうち、前景画像の時間情報に最も近い時間情報を有する背景画像であってよい。この方法によれば、前景画像よりフレームレートの低い背景画像の取得を待つことなく、対応付けられた前景画像と背景画像とを低遅延で出力することができる。また、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像は、取得された背景画像であって前景画像より後の時刻に対応する時間情報を有する背景画像のうち、前景画像の時間情報に最も近い時間情報を有する背景画像でもよい。 As described above, by associating the foreground image with the background image based on a predetermined rule, even when the frame rates of the foreground image and the background image are different, a virtual viewpoint image is generated from the foreground image and the background image captured at a near time. Can be generated. The method of associating the foreground image with the background image is not limited to the above. For example, a background image having temporal information having a relationship based on predetermined time and temporal information of a foreground image is an acquired background image, and among background images having temporal information corresponding to a time before the foreground image, It may be a background image having temporal information closest to temporal information of the foreground image. According to this method, it is possible to output the associated foreground image and background image with low delay, without waiting for acquisition of a background image having a frame rate lower than that of the foreground image. In addition, a background image having time information in a relationship based on predetermined time rules with time information on a foreground image is an acquired background image, and among background images having time information corresponding to a time later than the foreground image, It may be a background image having temporal information closest to temporal information of the foreground image.

非撮影データファイル生成部２１８５は、キャリブレーション部２１４０からカメラパラメータ、制御部２１１０からスタジアムの三次元形状データを取得し、ファイル形式に応じて成形した後にＤＢアクセス制御部２１９０に送信する。なお、非撮影データファイル生成部２１８５に入力されるデータであるカメラパラメータまたはスタジアム形状データは、個別にファイル形式に応じて成形される。すなわち、非撮影データファイル生成部２１８５は、どちらか一方のデータを受信した場合、それらを個別にＤＢアクセス制御部２１９０に送信する。 The non-shooting data file generation unit 2185 acquires camera parameters from the calibration unit 2140 and three-dimensional shape data of the stadium from the control unit 2110, and forms the data according to the file format, and transmits the data to the DB access control unit 2190. Note that camera parameters or stadium shape data, which are data input to the non-shooting data file generation unit 2185, are individually formed according to the file format. That is, when one of the data is received, the non-shooting data file generation unit 2185 individually transmits them to the DB access control unit 2190.

ＤＢアクセス制御部２１９０は、ＩｎｆｉｎｉＢａｎｄなどにより高速な通信が可能となるようにデータベース２５０と接続される。そして、撮影データファイル生成部２１８０及び非撮影データファイル生成部２１８５から受信したファイルをデータベース２５０に対して送信する。本実施形態では、撮影データファイル生成部２１８０が時間情報に基づいて対応付けた撮影データは、フロントエンドサーバ２３０とネットワークを介して接続される記憶装置であるデータベース２５０へＤＢアクセス制御部２１９０を介して出力される。ただし、対応付けられた撮影データの出力先はこれに限らない。例えば、フロントエンドサーバ２３０は、時間情報に基づいて対応付けられた撮影データを、フロントエンドサーバ２３０とネットワークを介して接続され仮想視点画像を生成する画像生成装置であるバックエンドサーバ２７０に出力してもよい。また、データベース２５０とバックエンドサーバ２７０の両方に出力してもよい。 The DB access control unit 2190 is connected to the database 250 so that high-speed communication can be performed by InfiniBand or the like. Then, the file received from the photographed data file generation unit 2180 and the non-photographed data file generation unit 2185 is transmitted to the database 250. In this embodiment, the shooting data associated with the shooting data file generation unit 2180 based on the time information is transferred to the database 250, which is a storage device connected to the front end server 230 via the network, via the DB access control unit 2190. Output. However, the output destination of the associated imaging data is not limited to this. For example, the front end server 230 outputs shooting data associated based on time information to the back end server 270, which is an image generation device that is connected to the front end server 230 via a network and generates a virtual viewpoint image. May be Also, it may be output to both the database 250 and the back end server 270.

また、本実施形態ではフロントエンドサーバ２３０が前景画像と背景画像の対応付けを行うものとするが、これに限らず、データベース２５０が対応付けを行ってもよい。例えば、データベース２５０はフロントエンドサーバ２３０から時間情報を有する前景画像及び背景画像を取得する。そしてデータベース２５０は、前景画像と背景画像とを前景画像の時間情報及び背景画像の時間情報に基づいて対応付けて、データベース２５０が備える記憶部に出力してもよい。 Further, in the present embodiment, the front end server 230 associates the foreground image with the background image. However, the present invention is not limited to this, and the database 250 may perform the association. For example, database 250 may obtain foreground and background images with temporal information from front end server 230. Then, the database 250 may associate the foreground image with the background image based on the time information of the foreground image and the time information of the background image, and may output the same to the storage unit included in the database 250.

図５は、データベース２５０の機能構成を示すブロック図である。制御部２４１０はＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリなどの記憶媒体、及びＥｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。そして、データベース２５０の各機能ブロック及びデータベース２５０のシステム全体の制御を行う。 FIG. 5 is a block diagram showing a functional configuration of the database 250. As shown in FIG. The control unit 2410 is configured by a CPU, a DRAM, a storage medium such as an HDD or a NAND memory storing program data and various data, and hardware such as Ethernet (registered trademark). Then, control of each functional block of the database 250 and the entire system of the database 250 is performed.

データ入力部２４２０はＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、フロントエンドサーバ２３０から撮影データや非撮影データのファイルを受信する。受信したファイルはキャッシュ２４４０に送られる。また、受信した撮影データのメタ情報を読み出し、メタ情報に記録されたタイムコード情報やルーティング情報、カメラ識別子等の情報を元に、取得したデータへのアクセスが可能になるようにデータベーステーブルを作成する。 The data input unit 2420 receives a file of shooting data or non-shooting data from the front end server 230 by high-speed communication such as InfiniBand. The received file is sent to the cache 2440. Also, it reads meta information of the received shooting data, and creates a database table to enable access to the acquired data based on information such as time code information, routing information, and camera identifier recorded in the meta information. Do.

データ出力部２４３０は、バックエンドサーバ２７０から要求されたデータが後述するキャッシュ２４４０、一次ストレージ２４５０、二次ストレージ２４６０のいずれに保存されているか判断する。そして、ＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、保存された先からデータを読み出してバックエンドサーバ２７０に送信する。 The data output unit 2430 determines whether the data requested from the back end server 270 is stored in a cache 2440, a primary storage 2450, or a secondary storage 2460 described later. Then, by high-speed communication such as InfiniBand, data is read from the stored destination and transmitted to the back end server 270.

キャッシュ２４４０は高速な入出力スループットを実現可能なＤＲＡＭ等の記憶装置を有しており、データ入力部２４２０から取得した撮影データや非撮影データを記憶装置に格納する。格納されたデータは一定量保持され、それを超えるデータが入力される場合に、古いデータから随時一次ストレージ２４５０へと書き出され、書き出し済みのデータは新たなデータによって上書きされる。ここでキャッシュ２４４０に一定量保存されるデータは少なくとも１フレーム分の撮影データである。それによって、バックエンドサーバ２７０に於いて画像のレンダリング処理を行う際に、データベース２５０内でのスループットを最小限に抑え、最新の画像フレームを低遅延かつ連続的にレンダリングすることが可能となる。ここで、前述の目的を達成するためにはキャッシュされているデータの中に背景画像が含まれている必要がある。そのため、背景画像を有さないフレームの撮影データがキャッシュされる場合、キャッシュ上の背景画像は更新されず、そのままキャッシュ上に保持される。キャッシュ可能なＤＲＡＭの容量は、予めシステムに設定されたキャッシュフレームサイズ、または制御ステーション３１０からの指示によって決められる。なお、非撮影データについては、入出力の頻度が少なく、また、試合前などにおいては高速なスループットを要求されないため、すぐに一次ストレージ２４５０へとコピーされる。キャッシュされたデータはデータ出力部２４３０によって読み出される。 The cache 2440 has a storage device such as a DRAM that can realize high-speed input / output throughput, and stores the imaging data and non-imaging data acquired from the data input unit 2420 in the storage device. The stored data is held in a fixed amount, and when more data is input, the old data is written to the primary storage 2450 at any time, and the written data is overwritten by the new data. Here, the data stored in the cache 2440 by a fixed amount is shooting data for at least one frame. This makes it possible to minimize the throughput in the database 250 and render the latest image frame continuously with low delay when rendering the image in the back end server 270. Here, in order to achieve the above-mentioned purpose, it is necessary to include a background image in cached data. Therefore, when shooting data of a frame having no background image is cached, the background image on the cache is not updated, and is held on the cache as it is. The capacity of the cacheable DRAM is determined by a cache frame size set in advance in the system or an instruction from the control station 310. The non-shooting data is copied to the primary storage 2450 immediately because the frequency of input / output is low and high throughput is not required before the game. The cached data is read by the data output unit 2430.

一次ストレージ２４５０はＳＳＤ等のストレージメディアを並列につなぐなどして構成されデータ入力部２４２０からの大量のデータの書き込み及びデータ出力部２４３０からのデータ読み出しが同時に実現できるなど高速化される。そして、一次ストレージ２４５０には、キャッシュ２４４０上に格納されたデータの古いものから順に書き出される。二次ストレージ２４６０はＨＤＤやテープメディア等で構成され、高速性よりも大容量が重視され、一次ストレージと比較して安価で長期間の保存に適するメディアであることが求められる。二次ストレージ２４６０には、撮影が完了した後、データのバックアップとして一次ストレージ２４５０に格納されたデータが書き出される。 The primary storage 2450 is configured by connecting storage media such as SSDs in parallel, etc., and high speed operation can be realized such that writing of a large amount of data from the data input unit 2420 and data reading from the data output unit 2430 can be realized simultaneously. Then, the primary storage 2450 is written out in order from the oldest data stored on the cache 2440. The secondary storage 2460 is composed of an HDD, a tape medium, etc., and a large capacity is more important than high speed, and it is required to be a medium which is inexpensive and suitable for long-term storage as compared with the primary storage. Data stored in the primary storage 2450 is written to the secondary storage 2460 as a backup of the data after the completion of shooting.

図６は、バックエンドサーバ２７０の機能構成を示すブロック図である。バックエンドサーバ２７０は、データ受信部３００１、背景テクスチャ貼り付け部３００２、前景テクスチャ決定部３００３、テクスチャ境界色合わせ部３００４、仮想視点前景画像生成部３００５、及びレンダリング部３００６を有する。さらに、仮想視点音声生成部３００７、合成部３００８、画像出力部３００９、前景オブジェクト決定部３０１０、要求リスト生成部３０１１、要求データ出力部３０１２、及びレンダリングモード管理部３０１４を有する。 FIG. 6 is a block diagram showing the functional configuration of the back end server 270. As shown in FIG. The back end server 270 includes a data receiving unit 3001, a background texture pasting unit 3002, a foreground texture determining unit 3003, a texture boundary color matching unit 3004, a virtual viewpoint foreground image generating unit 3005, and a rendering unit 3006. Furthermore, it has a virtual viewpoint voice generation unit 3007, a synthesis unit 3008, an image output unit 3009, a foreground object determination unit 3010, a request list generation unit 3011, a request data output unit 3012, and a rendering mode management unit 3014.

データ受信部３００１は、データベース２５０およびコントローラ３００から送信されるデータを受信する。またデータベース２５０からは、スタジアムの形状を示す三次元データ（スタジアム形状データ）、前景画像、背景画像、前景画像の三次元モデル（以降、前景三次元モデルと称する）、及び音声を受信する。 The data receiving unit 3001 receives data transmitted from the database 250 and the controller 300. Also, from the database 250, three-dimensional data (stadium shape data) indicating the shape of a stadium, a foreground image, a background image, a three-dimensional model of the foreground image (hereinafter referred to as a foreground three-dimensional model), and voice are received.

また、データ受信部３００１は、仮想視点画像の生成に係る視点を指定するコントローラ３００から出力される仮想カメラパラメータを受信する。仮想カメラパラメータとは、仮想視点の位置や姿勢などを表すデータであり、例えば、外部パラメータの行列と内部パラメータの行列が用いられる。 Also, the data receiving unit 3001 receives virtual camera parameters output from the controller 300 that specifies a viewpoint related to generation of a virtual viewpoint image. The virtual camera parameter is data representing the position, orientation, and the like of the virtual viewpoint, and, for example, a matrix of external parameters and a matrix of internal parameters are used.

なお、データ受信部３００１がコントローラ３００から取得するデータは仮想カメラパラメータに限らない。例えばコントローラ３００から出力される情報は、視点の指定方法、コントローラが動作させているアプリケーションを特定する情報、コントローラ３００の識別情報、及びコントローラ３００を使用するユーザの識別情報の少なくとも何れかを含んでいてよい。また、データ受信部３００１は、コントローラ３００から出力される上記の情報と同様の情報を、エンドユーザ端末１９０から取得してもよい。さらに、データ受信部３００１は、データベース２５０やコントローラ３００などの外部の装置から、複数のカメラ１１２に関する情報を取得してもよい。複数のカメラ１１２に関する情報は、例えば、複数のカメラ１１２の数に関する情報や複数のカメラ１１２の動作状態に関する情報などである。カメラ１１２の動作状態には、例えば、カメラ１１２の正常状態、故障状態、待機状態、起動準備状態、及び再起動状態の少なくとも何れかが含まれる。背景テクスチャ貼り付け部３００２は、背景メッシュモデル管理部３０１３から取得する背景メッシュモデル（スタジアム形状データ）で示される三次元空間形状に対して背景画像をテクスチャとして貼り付けることでテクスチャ付背景メッシュモデルを生成する。メッシュモデルとは、例えばＣＡＤデータなど三次元の空間形状を面の集合で表現したデータのことである。テクスチャとは、物体の表面の質感を表現するために貼り付ける画像のことである。 The data acquired by the data reception unit 3001 from the controller 300 is not limited to virtual camera parameters. For example, the information output from the controller 300 includes at least one of a method of specifying a viewpoint, information specifying an application operated by the controller, identification information of the controller 300, and identification information of a user who uses the controller 300. You may Also, the data receiving unit 3001 may obtain, from the end user terminal 190, information similar to the above information output from the controller 300. Furthermore, the data reception unit 3001 may acquire information on the plurality of cameras 112 from an external device such as the database 250 or the controller 300. The information on the plurality of cameras 112 is, for example, information on the number of the plurality of cameras 112 or information on the operation state of the plurality of cameras 112. The operation state of the camera 112 includes, for example, at least one of a normal state, a failure state, a standby state, a start preparation state, and a restart state of the camera 112. The background texture pasting unit 3002 pastes the background mesh model with texture by pasting the background image as a texture to the three-dimensional space shape indicated by the background mesh model (stadium shape data) acquired from the background mesh model management unit 3013. Generate The mesh model is, for example, data representing a three-dimensional space shape such as CAD data as a set of faces. A texture is an image pasted to express the texture of the surface of an object.

前景テクスチャ決定部３００３は、前景画像及び前景三次元モデル群より前景三次元モデルのテクスチャ情報を決定する。 The foreground texture determination unit 3003 determines texture information of the foreground three-dimensional model from the foreground image and the foreground three-dimensional model group.

前景テクスチャ境界色合わせ部３００４は、各前景三次元モデルのテクスチャ情報と各三次元モデル群からテクスチャの境界の色合わせを行い、前景オブジェクト毎に色付き前景三次元モデル群を生成する。 The foreground texture boundary color matching unit 3004 performs color matching of texture boundaries from the texture information of each foreground three-dimensional model and each three-dimensional model group, and generates a colored foreground three-dimensional model group for each foreground object.

仮想視点前景画像生成部３００５は、仮想カメラパラメータに基づいて、前景画像群を仮想視点からの見た目となるように透視変換する。レンダリング部３００６は、レンダリングモード管理部３０１４で決定された、仮想視点画像の生成に用いられる生成方式に基づいて、背景画像と前景画像をレンダリングして全景の仮想視点画像を生成する。 The virtual viewpoint foreground image generation unit 3005 perspective-transforms the foreground image group so as to look like a virtual viewpoint based on the virtual camera parameters. The rendering unit 3006 renders a background image and a foreground image to generate a virtual viewpoint image of a full view based on the generation method used for generating a virtual viewpoint image determined by the rendering mode management unit 3014.

本実施形態では仮想視点画像の生成方式として、モデルベースレンダリング（Model-Based Rendering：ＭＢＲ）とイメージベースレンダリング（Image-Based Rendering：ＩＢＲ）の２つのレンダリングモードが用いられる。 In this embodiment, two rendering modes, model-based rendering (MBR) and image-based rendering (IBR), are used as a virtual viewpoint image generation method.

ＭＢＲとは、被写体を複数の方向から撮影した複数の撮影画像に基づいて生成される三次元モデルを用いて仮想視点画像を生成する方式である。具体的には、視体積交差法、Ｍｕｌｔｉ−Ｖｉｅｗ−Ｓｔｅｒｅｏ（ＭＶＳ）などの三次元形状復元手法により得られた対象シーンの三次元形状（モデル）を利用し，仮想視点からのシーンの見えを画像として生成する技術である。 The MBR is a method of generating a virtual viewpoint image using a three-dimensional model generated based on a plurality of photographed images obtained by photographing a subject from a plurality of directions. Specifically, using the three-dimensional shape (model) of the target scene obtained by the three-dimensional shape restoration method such as the visual volume intersection method or Multi-View-Stereo (MVS), the appearance of the scene from the virtual viewpoint is It is a technology to generate as an image.

ＩＢＲとは、対象のシーンを複数視点から撮影した入力画像群を変形、合成することによって仮想視点からの見えを再現した仮想視点画像を生成する技術である。本実施形態では、ＩＢＲを用いる場合、ＭＢＲを用いて三次元モデルを生成するための複数の撮影画像より少ない１又は複数の撮影画像に基づいて仮想視点画像が生成される。 The IBR is a technology for generating a virtual viewpoint image in which the appearance from a virtual viewpoint is reproduced by transforming and combining an input image group obtained by photographing a target scene from a plurality of viewpoints. In the present embodiment, in the case of using IBR, a virtual viewpoint image is generated based on one or a plurality of photographed images smaller than the plurality of photographed images for generating a three-dimensional model using MBR.

レンダリングモードがＭＢＲの場合、背景メッシュモデルと前景テクスチャ境界色合わせ部３００４で生成した前景三次元モデル群を合成することで全景モデルが生成され、その全景モデルから仮想視点画像が生成される。 When the rendering mode is MBR, a panoramic model is generated by combining the background mesh model and the foreground three-dimensional model group generated by the foreground texture boundary color matching unit 3004, and a virtual viewpoint image is generated from the panoramic model.

レンダリングモードがＩＢＲの場合、背景テクスチャモデルに基づいて仮想視点から見た背景画像が生成され、そこに仮想視点前景画像生成部３００５で生成された前景画像を合成することで仮想視点画像が生成される。 When the rendering mode is IBR, a background image viewed from a virtual viewpoint is generated based on the background texture model, and a virtual viewpoint image is generated by combining the foreground image generated by the virtual viewpoint foreground image generation unit 3005 with this. Ru.

なお、レンダリング部３００６はＭＢＲとＩＢＲ以外のレンダリング手法を用いてもよい。また、レンダリングモード管理部３０１４が決定する仮想視点画像の生成方式はレンダリングの方式に限らず、レンダリングモード管理部３０１４は仮想視点画像を生成するためのレンダリング以外の処理の方式を決定してもよい。レンダリングモード管理部３０１４は、仮想視点画像の生成に用いられる生成方式としてのレンダリングモードを決定し、決定結果を保持する。 The rendering unit 3006 may use a rendering method other than MBR and IBR. Further, the method of generating a virtual viewpoint image determined by the rendering mode management unit 3014 is not limited to the method of rendering, and the rendering mode management unit 3014 may determine a method of processing other than rendering for generating a virtual viewpoint image. . The rendering mode management unit 3014 determines a rendering mode as a generation method used to generate a virtual viewpoint image, and holds the determination result.

本実施形態では、レンダリングモード管理部３０１４は、複数のレンダリングモードから使用するレンダリングモードを決定する。この決定は、データ受信部３００１が取得した情報に基づいて行われる。例えば、レンダリングモード管理部３０１４は、取得された情報から特定されるカメラの数が閾値以下である場合に、仮想視点画像の生成に用いられる生成方式をＩＢＲに決定する。一方、カメラ数が閾値より多い場合は生成方式をＭＢＲに決定する。これにより、カメラ数が多い場合にはＭＢＲを用いて仮想視点画像を生成することで視点の指定可能範囲が広くなる。また、カメラ数が少ない場合には、ＩＢＲを用いることで、ＭＢＲを用いた場合の三次元モデルの精度の低下による仮想視点画像の画質低下を回避することができる。また例えば、撮影から画像出力までの許容される処理遅延時間の長短に基づいて生成方式を決めてもよい。遅延時間が長くても視点の自由度を優先する場合はＭＢＲ、遅延時間が短いことを要求する場合はＩＢＲを用いる。また例えば、コントローラ３００やエンドユーザ端末１９０が視点の高さを指定可能であることを示す情報をデータ受信部３００１が取得した場合には、仮想視点画像の生成に用いられる生成方式をＭＢＲに決定する。これにより、生成方式がＩＢＲであることによってユーザによる視点の高さの変更要求が受け入れられなくなることを防ぐことができる。このように、状況に応じて仮想視点画像の生成方式を決定することで、適切に決定された生成方式で仮想視点画像を生成できる。また、複数のレンダリングモードを要求に応じて切り替え可能な構成にすることで、柔軟にシステムを構成することが可能になり、スタジアム以外の被写体にも適用可能である。 In the present embodiment, the rendering mode management unit 3014 determines a rendering mode to be used from a plurality of rendering modes. This determination is performed based on the information acquired by the data receiving unit 3001. For example, when the number of cameras specified from the acquired information is equal to or less than a threshold, the rendering mode management unit 3014 determines, as IBR, a generation method used to generate a virtual viewpoint image. On the other hand, when the number of cameras is larger than the threshold, the generation method is determined to be MBR. As a result, when the number of cameras is large, the specifiable range of viewpoints is widened by generating the virtual viewpoint image using the MBR. Further, when the number of cameras is small, it is possible to avoid the image quality degradation of the virtual viewpoint image due to the degradation of the accuracy of the three-dimensional model when using the MBR by using the IBR. Also, for example, the generation method may be determined based on the length of an allowable processing delay time from imaging to image output. Even if the delay time is long, the MBR is used if priority is given to the degree of freedom of the viewpoint, and the IBR is used to require the delay time to be short. Further, for example, when the data receiving unit 3001 acquires information indicating that the controller 300 or the end user terminal 190 can specify the height of the viewpoint, the generation method used for generating the virtual viewpoint image is determined to be MBR. Do. This makes it possible to prevent the user's request for changing the height of the viewpoint from becoming unacceptable due to the generation method being IBR. As described above, by determining the generation method of the virtual viewpoint image according to the situation, the virtual viewpoint image can be generated by the generation method appropriately determined. In addition, by making the plurality of rendering modes switchable in response to a request, the system can be flexibly configured, and can be applied to subjects other than the stadium.

なお、レンダリングモード管理部３０１４が保持するレンダリングモードは、システムに予め設定された方式であってもよい。また、仮想カメラ操作ＵＩ３３０やエンドユーザ端末１９０を操作するユーザがレンダリングモードを任意に設定できてもよい。 The rendering mode held by the rendering mode management unit 3014 may be a system preset in the system. Also, the user operating the virtual camera operation UI 330 or the end user terminal 190 may be able to arbitrarily set the rendering mode.

仮想視点音声生成部３００７は、仮想カメラパラメータに基づいて、仮想視点において聞こえる音声（音声群）を生成する。合成部３００８は、レンダリング部３００６で生成された画像群と仮想視点音声生成部３００７で生成された音声を合成して仮想視点コンテンツを生成する。 The virtual viewpoint audio generation unit 3007 generates audio (audio group) that can be heard from the virtual viewpoint based on the virtual camera parameters. The synthesizing unit 3008 synthesizes the image group generated by the rendering unit 3006 and the voice generated by the virtual viewpoint sound generating unit 3007 to generate virtual viewpoint content.

画像出力部３００９は、コントローラ３００とエンドユーザ端末１９０へＥｔｈｅｒｎｅｔ（登録商標）を用いて仮想視点コンテンツを出力する。ただし、外部への伝送手段はＥｔｈｅｒｎｅｔ（登録商標）に限定されるものではなく、ＳＤＩ、ＤｉｓｐｌａｙＰｏｒｔ、及びＨＤＭＩ（登録商標）などの信号伝送手段を用いてもよい。なお、バックエンドサーバ２７０は、レンダリング部３００６で生成された、音声を含まない仮想視点画像を出力してもよい。 The image output unit 3009 outputs virtual viewpoint content to the controller 300 and the end user terminal 190 using Ethernet (registered trademark). However, the transmission means to the outside is not limited to Ethernet (registered trademark), and signal transmission means such as SDI, DisplayPort, and HDMI (registered trademark) may be used. The back-end server 270 may output a virtual viewpoint image generated by the rendering unit 3006 and not including audio.

前景オブジェクト決定部３０１０は、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの空間上の位置を示す前景オブジェクトの位置情報から、表示される前景オブジェクト群を決定して、前景オブジェクトリストを出力する。つまり、前景オブジェクト決定部３０１０は、仮想視点の画像情報を物理的なカメラ１１２にマッピングする処理を実施する。本仮想視点は、レンダリングモード管理部３０１４で決定されるレンダリングモードに応じてマッピング結果が異なる。そのため、複数の前景オブジェクトを決定する制御部が前景オブジェクト決定部３０１０に配備されレンダリングモードと連動して制御を行うことを明記しておく。 The foreground object determination unit 3010 determines the displayed foreground object group from the position information of the foreground object indicating the position in space of the foreground object included in the virtual camera parameter and the foreground three-dimensional model, and outputs the foreground object list Do. That is, the foreground object determination unit 3010 performs a process of mapping the image information of the virtual viewpoint to the physical camera 112. The present virtual viewpoint has different mapping results depending on the rendering mode determined by the rendering mode management unit 3014. Therefore, it is specified that the control unit that determines a plurality of foreground objects is deployed to the foreground object determination unit 3010 and performs control in conjunction with the rendering mode.

要求リスト生成部３０１１は、指定時間の前景オブジェクトリストに対応する前景画像群と前景三次元モデル群、及び背景画像と音声データをデータベース２５０に要求するための、要求リストを生成する。前景オブジェクトについては仮想視点を考慮して選択されたデータがデータベース２５０に要求されるが、背景画像と音声データについてはそのフレームに関する全てのデータが要求される。バックエンドサーバ２７０の起動後、背景メッシュモデルが取得されるまで背景メッシュモデルの要求リストが生成される。 The request list generation unit 3011 generates a request list for requesting, from the database 250, a foreground image group and a foreground three-dimensional model group corresponding to a foreground object list of a designated time, and a background image and audio data. For foreground objects, data selected in consideration of virtual viewpoints is required in the database 250, while for background images and audio data, all data for that frame is required. After the back-end server 270 is launched, a request list of background mesh models is generated until a background mesh model is obtained.

要求データ出力部３０１２は、入力された要求リストを元にデータベース２５０に対してデータ要求のコマンドを出力する。背景メッシュモデル管理部３０１３は、データベース２５０から受信した背景メッシュモデルを記憶する。 The request data output unit 3012 outputs a data request command to the database 250 based on the input request list. The background mesh model management unit 3013 stores the background mesh model received from the database 250.

なお、本実施形態ではバックエンドサーバ２７０が仮想視点画像の生成方式の決定と仮想視点画像の生成の両方を行う場合を中心に説明するが、これに限らない。即ち、生成方式を決定した装置はその決定結果に応じたデータを出力すればよい。例えば、フロントエンドサーバ２３０が、複数のカメラ１１２に関する情報や仮想視点画像の生成に係る視点を指定する装置から出力される情報などに基づいて、仮想視点画像の生成に用いられる生成方式を決定してもよい。そしてフロントエンドサーバ２３０は、カメラ１１２による撮影に基づく画像データと決定された生成方式を示す情報とを、データベース２５０などの記憶装置及びバックエンドサーバ２７０などの画像生成装置の少なくとも何れかに出力してもよい。この場合には、例えばフロントエンドサーバ２３０が出力した生成方式を示す情報に基づいてバックエンドサーバ２７０が仮想視点画像を生成する。フロントエンドサーバ２３０が生成方式を決定することで、決定された方式とは別の方式での画像生成のためのデータをデータベース２５０やバックエンドサーバ２７０が処理することによる処理負荷を低減できる。一方、本実施形態のようにバックエンドサーバ２７０が生成方式を決定する場合、データベース２５０は複数の生成方式に対応可能なデータを保持するため、複数の生成方式それぞれに対応する複数の仮想視点画像の生成が可能となる。 In the present embodiment, although the case is mainly described where the back end server 270 determines both the generation method of the virtual viewpoint image and the generation of the virtual viewpoint image, the present invention is not limited thereto. That is, the device that has determined the generation method may output data according to the determination result. For example, the front end server 230 determines the generation method to be used for generating the virtual viewpoint image based on the information related to the plurality of cameras 112 and the information output from the device for specifying the viewpoints related to the generation of the virtual viewpoint image. May be Then, the front end server 230 outputs the image data based on the photographing by the camera 112 and the information indicating the determined generation method to at least one of the storage device such as the database 250 and the image generation device such as the back end server 270. May be In this case, for example, the back end server 270 generates a virtual viewpoint image based on the information indicating the generation method output by the front end server 230. By the front end server 230 determining the generation method, it is possible to reduce the processing load due to the database 250 and the back end server 270 processing data for image generation in a method different from the determined method. On the other hand, when the back end server 270 determines the generation method as in the present embodiment, the database 250 holds data compatible with the plurality of generation methods, and thus, a plurality of virtual viewpoint images corresponding to the plurality of generation methods. Can be generated.

図７は、仮想カメラ操作ＵＩ３３０の機能構成を示すブロック図である。また、図８は、仮想カメラ８００１を説明する図である。図８（ａ）に示す仮想カメラ８００１は、設置されたどのカメラ１１２とも異なる視点において撮影を行うことができる仮想的なカメラである。即ち、画像処理システム１００において生成される仮想視点画像が、仮想カメラ８００１による撮影画像である。図８（ａ）において、円周上に配置された複数のセンサシステム１１０それぞれがカメラ１１２を有している。例えば、仮想視点画像を生成することにより、あたかもサッカーゴールの近くの仮想カメラ８００１で撮影されたかのような画像を生成することができる。仮想カメラ８００１の撮影画像である仮想視点画像は、設置された複数のカメラ１１２の画像を画像処理することで生成される。オペレータ（ユーザ）は仮想カメラ８００１の位置等操作することで、自由な視点からの撮影画像を得ることができる。 FIG. 7 is a block diagram showing a functional configuration of the virtual camera operation UI 330. As shown in FIG. FIG. 8 is a diagram for explaining a virtual camera 8001. A virtual camera 8001 illustrated in FIG. 8A is a virtual camera that can perform shooting at a viewpoint different from that of any installed camera 112. That is, the virtual viewpoint image generated in the image processing system 100 is a captured image by the virtual camera 8001. In FIG. 8A, each of the plurality of sensor systems 110 arranged on the circumference has a camera 112. For example, by generating a virtual viewpoint image, it is possible to generate an image as if taken by the virtual camera 8001 near the soccer goal. A virtual viewpoint image which is a captured image of the virtual camera 8001 is generated by performing image processing on the images of the plurality of installed cameras 112. The operator (user) can obtain a photographed image from a free viewpoint by operating the position of the virtual camera 8001 or the like.

仮想カメラ操作ＵＩ３３０は、仮想カメラ管理部８１３０および操作ＵＩ部８１２０を有する。これらは同一機器上に実装されてもよいし、それぞれサーバとなる装置とクライアントとなる装置に別々に実装されてもよい。例えば、放送局が使用する仮想カメラ操作ＵＩ３３０においては、中継車内のワークステーションに仮想カメラ管理部８１３０と操作ＵＩ部８１２０が実装されてもよい。また例えば、仮想カメラ管理部８１３０をｗｅｂサーバに実装し、エンドユーザ端末１９０に操作ＵＩ部８１２０を実装することで、同様の機能を実現してもよい。 The virtual camera operation UI 330 has a virtual camera management unit 8130 and an operation UI unit 8120. These may be implemented on the same device, or may be implemented separately on the device serving as the server and the device serving as the client. For example, in a virtual camera operation UI 330 used by a broadcasting station, a virtual camera management unit 8130 and an operation UI unit 8120 may be mounted on a workstation in a relay vehicle. Also, for example, the virtual camera management unit 8130 may be mounted on a web server, and the operation UI unit 8120 may be mounted on the end user terminal 190 to realize the same function.

仮想カメラ操作部８１０１は、オペレータの仮想カメラ８００１に対する操作、すなわち仮想視点画像の生成に係る視点を指定するためのユーザによる指示を受け付けて処理する。オペレータの操作内容は、例えば、位置の変更（移動）、姿勢の変更（回転）、及びズーム倍率の変更などである。オペレータは、仮想カメラ８００１を操作するために、例えば、ジョイスティック、ジョグダイヤル、タッチパネル、キーボード、及びマウスなどの入力装置を使う。各入力装置による入力と仮想カメラ８００１の操作との対応は予め決められる。例えば、キーボードの「Ｗ」キーを、仮想カメラ８００１を前方へ１メートル移動する操作に対応付ける。また、オペレータは軌跡を指定して仮想カメラ８００１を操作することができる。例えばオペレータは、ゴールポストを中心とする円周上を仮想カメラ８００１が回るという軌跡を、タッチパッドに円を書いて指定する。仮想カメラ８００１は、指定された軌跡に沿ってゴールポストの回りを移動する。このとき、仮想カメラ８００１が常にゴールポストの方を向くように自動で姿勢を変更してもよい。仮想カメラ操作部８１０１は、ライブ画像およびリプレイ画像の生成に利用することができる。リプレイ画像を生成する際は、カメラの位置及び姿勢の他に時間を指定する操作が行われる。リプレイ画像では、例えば、時間を止めて仮想カメラ８００１を移動させることも可能である。 The virtual camera operation unit 8101 receives and processes an operation of the operator on the virtual camera 8001, that is, an instruction from the user for specifying a viewpoint related to generation of a virtual viewpoint image. The operation content of the operator is, for example, change of position (movement), change of posture (rotation), and change of zoom magnification. The operator uses input devices such as a joystick, a jog dial, a touch panel, a keyboard, and a mouse to operate the virtual camera 8001. The correspondence between the input by each input device and the operation of the virtual camera 8001 is determined in advance. For example, the “W” key of the keyboard is associated with an operation of moving the virtual camera 8001 one meter forward. Also, the operator can operate the virtual camera 8001 by designating a trajectory. For example, the operator writes a circle on the touch pad to designate a locus that the virtual camera 8001 turns on the circumference centered on the goal post. The virtual camera 8001 moves around the goalpost along the designated trajectory. At this time, the posture may be changed automatically so that the virtual camera 8001 always faces the goal post. The virtual camera operation unit 8101 can be used to generate a live image and a replay image. When generating a replay image, an operation is performed to specify time in addition to the position and orientation of the camera. In the replay image, for example, it is also possible to stop the time and move the virtual camera 8001.

仮想カメラパラメータ導出部８１０２は、仮想カメラ８００１の位置や姿勢などを表す仮想カメラパラメータを導出する。仮想パラメータは、演算によって導出されてもよいし、ルックアップテーブルの参照などによって導出されてもよい。仮想カメラパラメータとして、例えば、外部パラメータを表す行列と内部パラメータを表す行列が用いられる。ここで、仮想カメラ８００１の位置と姿勢は外部パラメータに含まれ、ズーム値は内部パラメータに含まれる。 The virtual camera parameter derivation unit 8102 derives virtual camera parameters representing the position, attitude, and the like of the virtual camera 8001. The virtual parameters may be derived by operation or may be derived by reference to a look-up table or the like. As virtual camera parameters, for example, a matrix representing external parameters and a matrix representing internal parameters are used. Here, the position and orientation of the virtual camera 8001 are included in the external parameters, and the zoom value is included in the internal parameters.

仮想カメラ制約管理部８１０３は、仮想カメラ操作部８１０１により受け付けられる指示に基づく視点の指定が制限される制限領域を特定するための情報を取得し管理する。この情報は例えば、仮想カメラ８００１の位置や姿勢、ズーム値などに関する制約である。仮想カメラ８００１は、カメラ１１２と異なり、自由に視点を移動して撮影を行うことができるが、常にあらゆる視点からの画像を生成できるとは限らない。例えば、どのカメラ１１２にも映っていない対象物が映る向きに仮想カメラ８００１を向けても、その撮影画像を取得することはできない。また、仮想カメラ８００１のズーム倍率を上げると、解像度の制約により画質が劣化する。そこで、一定基準の画質を保つ範囲のズーム倍率などを仮想カメラ制約としてよい。仮想カメラ制約は、例えば、カメラ１１２の配置などから事前に導出しておいてもよい。また、伝送部６１２０がネットワークの負荷に応じて伝送データ量の削減を図ることがある。このデータ量削減により、撮影画像に関するパラメータが変化し、画像を生成できる範囲や画質を保つことができる範囲が動的に変わる。仮想カメラ制約管理部８１０３は、伝送部６１２０から出力データのデータ量の削減に用いた方法を示す情報を受け取り、その情報に応じて仮想カメラ制約を動的に更新する構成であってもよい。これにより、伝送部６１２０によりデータ量削減が図られても、仮想視点画像の画質を一定基準に保つことが可能となる。 The virtual camera restriction management unit 8103 acquires and manages information for specifying a restricted area in which designation of a viewpoint based on an instruction accepted by the virtual camera operation unit 8101 is restricted. This information is, for example, a constraint on the position and orientation of the virtual camera 8001, the zoom value, and the like. Unlike the camera 112, the virtual camera 8001 can freely move the viewpoint and shoot, but it can not always generate images from all viewpoints. For example, even if the virtual camera 8001 is directed to a direction in which an object not shown in any camera 112 is shown, the captured image can not be acquired. In addition, when the zoom magnification of the virtual camera 8001 is increased, the image quality is deteriorated due to the restriction of the resolution. Therefore, a zoom magnification in a range that maintains a certain standard image quality may be used as the virtual camera restriction. The virtual camera constraints may be derived in advance from the arrangement of the cameras 112, for example. Also, the transmission unit 6120 may reduce the amount of transmission data according to the load on the network. By this data amount reduction, parameters relating to the photographed image change, and the range in which the image can be generated and the range in which the image quality can be maintained change dynamically. The virtual camera restriction management unit 8103 may be configured to receive information indicating the method used to reduce the data amount of output data from the transmission unit 6120, and dynamically update the virtual camera restriction according to the information. As a result, even if the amount of data is reduced by the transmission unit 6120, it is possible to maintain the image quality of the virtual viewpoint image at a constant reference.

また、仮想カメラ８００１に関する制約は上記の物に限定されない。本実施形態では、視点の指定が制限される制限領域（仮想カメラ制約を満たさない領域）は、画像処理システム１００に含まれる装置の動作状態及び仮想視点画像を生成するための画像データに関するパラメータの少なくとも何れかに応じて変化する。例えば、制限領域は、画像処理システム１００において伝送される画像データのデータ量が所定範囲内となるように制御されるパラメータに応じて変化する。当該パラメータは、画像データのフレームレート、解像度、量子化ステップ、及び撮影範囲などのうち少なくとも何れかを含む。例えば、伝送データ量削減のために画像データの解像度が低減されると、所定の画質を維持可能なズーム倍率の範囲が変化する。このような場合に、仮想カメラ制約管理部８１０３がパラメータに応じて変化する制限領域を特定する情報を取得することで、仮想カメラ操作ＵＩ３３０はパラメータの変化に応じた範囲でユーザによる視点の指定がなされるよう制御できる。なお、パラメータの内容は上記のものに限定されない。また、本実施形態において上記のデータ量が制御される画像データは複数のカメラ１１２による複数の撮影画像の差分に基づいて生成されるデータであるものとするが、これに限らず、例えば撮影画像そのものでもよい。 In addition, restrictions on the virtual camera 8001 are not limited to the above. In the present embodiment, the restricted area (area not satisfying the virtual camera restriction) in which the designation of the viewpoint is restricted is an operation state of a device included in the image processing system 100 and a parameter related to image data for generating a virtual viewpoint image. It changes according to at least one of them. For example, the restricted area changes in accordance with a parameter controlled such that the data amount of image data transmitted in the image processing system 100 falls within a predetermined range. The parameters include at least one of a frame rate of image data, a resolution, a quantization step, and an imaging range. For example, when the resolution of image data is reduced to reduce the amount of transmission data, the range of zoom magnifications that can maintain a predetermined image quality changes. In such a case, the virtual camera operation management unit 8103 acquires information specifying the restricted area that changes in accordance with the parameter, whereby the virtual camera operation UI 330 allows the user to specify the viewpoint in the range according to the change in parameter. It can be controlled to be done. The contents of the parameters are not limited to the above. Further, in the present embodiment, the image data whose data amount is controlled is data generated based on the difference between a plurality of photographed images by the plurality of cameras 112, but the present invention is not limited to this. It may be itself.

また例えば、制限領域は、画像処理システム１００に含まれる装置の動作状態に応じて変化する。ここで画像処理システム１００に含まれる装置には、例えばカメラ１１２及びカメラ１１２による撮影画像に対する画像処理を行って画像データを生成するカメラアダプタ１２０の少なくとも何れかが含まれる。そして装置の動作状態には、例えば当該装置の正常状態、故障状態、起動準備状態、及び再起動状態の少なくとも何れかが含まれる。例えば、何れかのカメラ１１２が故障状態や再起動状態にある場合、そのカメラの周辺位置に視点を指定することができなくなる場合が考えられる。このような場合に、仮想カメラ制約管理部８１０３が装置の動作状態に応じて変化する制限領域を特定する情報を取得することで、仮想カメラ操作ＵＩ３３０は装置の動作状態の変化に応じた範囲でユーザによる視点の指定がなされるよう制御できる。なお、制限領域の変化に関係する装置及びその動作状態は上記のものに限定されない。 Further, for example, the restricted area changes in accordance with the operation state of the apparatus included in the image processing system 100. Here, the apparatus included in the image processing system 100 includes, for example, at least one of a camera 112 and a camera adapter 120 that performs image processing on a captured image by the camera 112 to generate image data. The operation state of the device includes, for example, at least one of a normal state, a failure state, a start preparation state, and a restart state of the device. For example, when any of the cameras 112 is in a failure state or in a restart state, it may be possible that the viewpoint can not be specified at the peripheral position of the camera. In such a case, the virtual camera operation control unit 8103 acquires the information for specifying the restricted area that changes in accordance with the operation state of the device, whereby the virtual camera operation UI 330 is in a range corresponding to the change in the operation state of the device. Control can be performed so that the user can specify the viewpoint. The apparatus related to the change of the restricted area and the operation state thereof are not limited to the above.

衝突判定部８１０４は、仮想カメラパラメータ導出部８１０２で導出された仮想カメラパラメータが仮想カメラ制約を満たしているかを判定する。制約を満たしていない場合は、例えば、オペレータによる操作入力をキャンセルし、制約を満たす位置から仮想カメラ８００１が動かないよう制御したり、制約を満たす位置に仮想カメラ８００１を戻したりする。 The collision determination unit 8104 determines whether the virtual camera parameter derived by the virtual camera parameter derivation unit 8102 satisfies the virtual camera constraint. If the constraint is not satisfied, for example, the operation input by the operator is canceled, and the virtual camera 8001 is controlled not to move from the position satisfying the constraint, or the virtual camera 8001 is returned to the position satisfying the constraint.

フィードバック出力部８１０５は、衝突判定部８１０４の判定結果をオペレータにフィードバックする。例えば、オペレータの操作により、仮想カメラ制約が満たされなくなる場合に、そのことをオペレータに通知する。例えば、オペレータが仮想カメラ８００１を上方に移動しようと操作したが、移動先が仮想カメラ制約を満たさないとする。その場合、オペレータに、これ以上上方に仮想カメラ８００１を移動できないことを通知する。通知方法としては、音、メッセージ出力、画面の色変化、及び仮想カメラ操作部８１０１をロックする等の方法がある。さらには、制約を満たす位置まで仮想カメラの位置を自動で戻してもよく、これによりオペレータの操作簡便性につながる効果がある。フィードバックが画像表示により行われる場合、フィードバック出力部８１０５は、仮想カメラ制約管理部８１０３が取得した情報に基づいて、制限領域に応じた表示制御に基づく画像を表示部に表示させる。例えば、フィードバック出力部８１０５は、仮想カメラ操作部８１０１により受け付けられた指示に応じて、当該指示に対応する視点が制限領域内であることを表す画像を表示部に表示させる。これにより、オペレータは指定している視点が制限領域内であって所望の仮想視点画像を生成できない虞があることを認識でき、制限領域外の位置（制約を満たす位置）に視点を指定し直すことができる。即ち、仮想視点画像の生成において、状況に応じて変化する範囲内で視点を指定できるようになる。なお、フィードバック出力部８１０５により表示部に表示される内容はこれに限定されない。例えば、視点の指定の対象となる領域（スタジアムの内部など）のうち制限領域に当たる部分を所定の色で塗りつぶした画像が表示されてもよい。本実施形態では表示部が仮想カメラ操作ＵＩ３３０と接続される外部のディスプレイであるものとするが、これに限らず、表示部が仮想カメラ操作ＵＩ３３０の内部に存在してもよい。 The feedback output unit 8105 feeds back the determination result of the collision determination unit 8104 to the operator. For example, when the virtual camera constraint is not satisfied by the operation of the operator, the operator is notified of that. For example, it is assumed that the operator operates to move the virtual camera 8001 upward, but the destination does not satisfy the virtual camera constraint. In that case, the operator is notified that the virtual camera 8001 can not be moved further upward. As a notification method, there are methods such as sound, message output, screen color change, and lock of the virtual camera operation unit 8101. Furthermore, the position of the virtual camera may be automatically returned to a position that satisfies the constraints, which has the effect of leading to the operator's ease of operation. When the feedback is performed by image display, the feedback output unit 8105 causes the display unit to display an image based on display control according to the restricted area based on the information acquired by the virtual camera restriction management unit 8103. For example, in response to the instruction received by the virtual camera operation unit 8101, the feedback output unit 8105 causes the display unit to display an image indicating that the viewpoint corresponding to the instruction is within the restricted area. As a result, the operator can recognize that the specified viewpoint is within the restricted area and there is a possibility that the desired virtual viewpoint image can not be generated, and the viewpoint is respecified to the position outside the restricted area (the position satisfying the constraint). be able to. That is, in the generation of the virtual viewpoint image, it is possible to specify a viewpoint within a range which changes according to the situation. Note that the content displayed on the display unit by the feedback output unit 8105 is not limited to this. For example, an image may be displayed in which a portion corresponding to a restricted area in an area (for example, inside a stadium) which is a target of designation of a viewpoint is filled with a predetermined color. In the present embodiment, the display unit is an external display connected to the virtual camera operation UI 330. However, the present invention is not limited to this, and the display unit may exist inside the virtual camera operation UI 330.

仮想カメラパス管理部８１０６は、オペレータの操作に応じた仮想カメラ８００１のパス（仮想カメラパス８００２）を管理する。仮想カメラパス８００２とは、仮想カメラ８００１の１フレームごと位置や姿勢を表す情報の列である。図８（ｂ）を参照して説明する。例えば、仮想カメラ８００１の位置や姿勢を表す情報として仮想カメラパラメータが用いられる。例えば、６０フレーム／秒のフレームレートの設定における１秒分の情報は、６０個の仮想カメラパラメータの列となる。仮想カメラパス管理部８１０６は、衝突判定部８１０４で判定済みの仮想カメラパラメータを、バックエンドサーバ２７０に送信する。バックエンドサーバ２７０は、受信した仮想カメラパラメータを用いて、仮想視点画像及び仮想視点音声を生成する。また、仮想カメラパス管理部８１０６は、仮想カメラパラメータを仮想カメラパス８００２に付加して保持する機能も有する。例えば、仮想カメラ操作ＵＩ３３０を用いて、１時間分の仮想視点画像及び仮想視点音声を生成した場合、１時間分の仮想カメラパラメータが仮想カメラパス８００２として保存される。本仮想カメラパスを保存しておくことによって、データベースの二次ストレージ２４６０に蓄積された画像情報と仮想カメラパスを後から参照することで、仮想視点画像及び仮想視点音声を再度生成することが可能になる。つまり、高度な仮想カメラ操作を行うオペレータが生成した仮想カメラパスと二次ストレージ２４６０に蓄積された画像情報を他のユーザが再利用できる。なお、複数の仮想カメラパスに対応する複数のシーンを選択可能となるように仮想カメラ管理部８１３０に蓄積することもできる。複数の仮想カメラパスを仮想カメラ管理部８１３０に蓄積する際には、各仮想カメラパスに対応するシーンのスクリプトや試合の経過時間、シーンの前後指定時間、及びプレーヤ情報等のメタ情報もあわせて入力及び蓄積することができる。仮想カメラ操作ＵＩ３３０は、これらの仮想カメラパスを仮想カメラパラメータとして、バックエンドサーバ２７０に通知する。 The virtual camera path management unit 8106 manages the path (virtual camera path 8002) of the virtual camera 8001 according to the operation of the operator. The virtual camera path 8002 is a sequence of information representing the position and orientation of each frame of the virtual camera 8001. It demonstrates with reference to FIG.8 (b). For example, virtual camera parameters are used as information representing the position and orientation of the virtual camera 8001. For example, one second's worth of information in the setting of the frame rate of 60 frames per second is a series of 60 virtual camera parameters. The virtual camera path management unit 8106 transmits the virtual camera parameters determined by the collision determination unit 8104 to the back end server 270. The back end server 270 generates a virtual viewpoint image and a virtual viewpoint sound using the received virtual camera parameters. The virtual camera path management unit 8106 also has a function of adding and holding virtual camera parameters to the virtual camera path 8002. For example, when a virtual viewpoint image and virtual viewpoint sound for one hour are generated using the virtual camera operation UI 330, virtual camera parameters for one hour are stored as a virtual camera path 8002. By storing this virtual camera path, it is possible to generate virtual viewpoint images and virtual viewpoint sound again by referring to the image information and virtual camera path stored in the secondary storage 2460 of the database later. become. That is, other users can reuse the virtual camera path generated by the operator who performs advanced virtual camera operation and the image information stored in the secondary storage 2460. A plurality of scenes corresponding to a plurality of virtual camera paths can also be stored in the virtual camera management unit 8130 so as to be selectable. When storing a plurality of virtual camera paths in the virtual camera management unit 8130, the script of the scene corresponding to each virtual camera path, the elapsed time of the game, the specified time before and after the scene, and meta information such as player information are also included. It can be input and stored. The virtual camera operation UI 330 notifies the back end server 270 of these virtual camera paths as virtual camera parameters.

エンドユーザ端末１９０は、バックエンドサーバ２７０に仮想カメラパスを選択するための選択情報を要求することで、シーン名やプレーヤ、及び試合経過時間などから、仮想カメラパスを選択できる。バックエンドサーバ２７０はエンドユーザ端末１９０に選択可能な仮想カメラパスの候補を通知し、エンドユーザはエンドユーザ端末１９０を操作して、複数の候補の中から希望の仮想カメラパスを選択する。そして、エンドユーザ端末１９０は選択された仮想カメラパスに応じた画像生成をバックエンドサーバ２７０に要求することで、画像配信サービスをインタラクティブに享受することができる。 The end user terminal 190 can select the virtual camera path from the scene name, the player, the game elapsed time, and the like by requesting the back end server 270 for selection information for selecting the virtual camera path. The back end server 270 notifies the end user terminal 190 of selectable virtual camera path candidates, and the end user operates the end user terminal 190 to select a desired virtual camera path from a plurality of candidates. The end user terminal 190 can interactively enjoy the image distribution service by requesting the back end server 270 to generate an image according to the selected virtual camera path.

オーサリング部８１０７は、オペレータがリプレイ画像を生成する際の編集機能を提供する。オーサリング部８１０７は、ユーザ操作に応じて、リプレイ画像用の仮想カメラパス８００２の初期値として、仮想カメラパス管理部８１０６が保持する仮想カメラパス８００２の一部を取り出す。前述したように、仮想カメラパス管理部８１０６には、仮想カメラパス８００２と対応付けてシーン名、プレーヤ、経過時間、及びシーンの前後指定時間などのメタ情報が保持されている。例えば、シーン名がゴールシーン、シーンの前後指定時間が前後合わせて１０秒分である仮想カメラパス８００２が取り出される。また、オーサリング部８１０７は、編集したカメラパスに再生速度を設定する。例えば、ボールがゴールに飛んで行く間の仮想カメラパス８００２にスロー再生を設定する。なお、異なる視点からの画像に変更する場合、つまり仮想カメラパス８００２を変更する場合は、ユーザは仮想カメラ操作部８１０１を用いて再度、仮想カメラ８００１を操作する。 The authoring unit 8107 provides an editing function when the operator generates a replay image. The authoring unit 8107 extracts a part of the virtual camera path 8002 held by the virtual camera path management unit 8106 as an initial value of the virtual camera path 8002 for replay image according to the user operation. As described above, the virtual camera path management unit 8106 holds meta information such as a scene name, a player, an elapsed time, and a designated time before and after a scene in association with the virtual camera path 8002. For example, a virtual camera path 8002 in which the scene name is a goal scene and the designated time before and after the scene is 10 seconds in total is taken out. Also, the authoring unit 8107 sets the playback speed for the edited camera path. For example, the virtual camera path 8002 is set to slow playback while the ball flies to the goal. When changing to an image from a different viewpoint, that is, when changing the virtual camera path 8002, the user operates the virtual camera 8001 again using the virtual camera operation unit 8101.

仮想カメラ画像・音声出力部８１０８は、バックエンドサーバ２７０から受け取った仮想カメラ画像・音声を出力する。オペレータは出力された画像及び音声を確認しながら仮想カメラ８００１を操作する。なお、フィードバック出力部８１０５によるフィードバックの内容によっては、仮想カメラ画像・音声出力部８１０８は、制限領域に応じた表示制御に基づく画像を表示部に表示させる。例えば、仮想カメラ画像・音声出力部８１０８は、オペレータが指定した視点の位置が制限領域に含まれる場合に、指定された位置の近辺であり且つ制限領域外である位置を視点とした仮想視点画像を表示させてもよい。これにより、オペレータが制限領域外に視点を指定し直す手間が削減される。 The virtual camera image / sound output unit 8108 outputs the virtual camera image / sound received from the back end server 270. The operator operates the virtual camera 8001 while confirming the output image and sound. Note that, depending on the content of feedback by the feedback output unit 8105, the virtual camera image / sound output unit 8108 causes the display unit to display an image based on display control according to the restricted area. For example, when the position of the viewpoint designated by the operator is included in the restricted area, the virtual camera image / voice output unit 8108 is a virtual viewpoint image in which the position which is near the designated position and outside the restricted area is the viewpoint May be displayed. This reduces the need for the operator to respecify the viewpoint outside the restricted area.

図９は、エンドユーザ端末１９０の機能構成を示すブロック図である。 FIG. 9 is a block diagram showing the functional configuration of the end user terminal 190. As shown in FIG.

アプリケーション管理部１０００１は、後述する基本ソフト部１０００２から入力されたユーザ入力情報を、バックエンドサーバ２７０のバックエンドサーバコマンドに変換して、基本ソフト部１０００２へ出力する。また、アプリケーション管理部１０００１は、基本ソフト部１０００２から入力された画像を、所定の表示領域に描画するための画像描画指示を、基本ソフト部１０００２へ出力する。 The application management unit 10001 converts user input information input from a basic software unit 10002 described later into a back end server command of the back end server 270 and outputs the back end server command to the basic software unit 10002. In addition, the application management unit 10001 outputs, to the basic software unit 10002, an image drawing instruction for drawing the image input from the basic software unit 10002 in a predetermined display area.

基本ソフト部１０００２は、例えばＯＳ（Operating System）であり、後述するユーザ入力部１０００４から入力されたユーザ入力情報を、アプリケーション管理部１０００１へ出力する。また、後述するネットワーク通信部１０００３から入力された画像や音声をアプリケーション管理部１０００１へ出力したり、アプリケーション管理部１０００１から入力されたバックエンドサーバコマンドをネットワーク通信部１０００３へ出力したりする。さらに、アプリケーション管理部１０００１から入力された画像描画指示を、画像出力部１０００５へ出力する。 The basic software unit 10002 is, for example, an OS (Operating System), and outputs user input information input from a user input unit 10004 described later to the application management unit 10001. Also, it outputs an image and voice input from a network communication unit 10003 to be described later to the application management unit 10001 and outputs a back end server command input from the application management unit 10001 to the network communication unit 10003. Furthermore, the image drawing instruction input from the application management unit 10001 is output to the image output unit 10005.

ネットワーク通信部１０００３は、基本ソフト部１０００２から入力されたバックエンドサーバコマンドを、ＬＡＮケーブル上で通信可能なＬＡＮ通信信号に変換して、バックエンドサーバ２７０へ出力する。そして、バックエンドサーバ２７０から受信した画像や音声データが加工可能となるように、基本ソフト部１０００２へデータを渡す。 The network communication unit 10003 converts the back end server command input from the basic software unit 10002 into a LAN communication signal that can be communicated on a LAN cable, and outputs the LAN communication signal to the back end server 270. Then, the data is passed to the basic software unit 10002 so that the image and audio data received from the back end server 270 can be processed.

ユーザ入力部１０００４は、キーボード入力（物理キーボード又はソフトキーボード）やボタン入力に基づくユーザ入力情報や、ユーザ入力機器からＵＳＢケーブルを介して入力されたユーザ入力情報を取得し、基本ソフト部１０００２へ出力する。 The user input unit 10004 acquires user input information based on keyboard input (physical keyboard or soft keyboard) or button input, or user input information input from a user input device via a USB cable, and outputs it to the basic software unit 10002 Do.

画像出力部１０００５は、基本ソフト部１０００２から出力された画像表示指示に基づく画像を画像信号に変換して、外部ディスプレイや一体型のディスプレイなどに出力する。 An image output unit 10005 converts an image based on an image display instruction output from the basic software unit 10002 into an image signal, and outputs the image signal to an external display, an integral display, or the like.

音声出力部１０００６は、基本ソフト部１０００２から出力された音声出力指示に基づく音声データを外部スピーカあるいは一体型スピーカに出力する。端末属性管理部１０００７は、端末１９０の表示解像度、画像符号化コーデック種別、及び端末種別（スマートフォンなのか、大型ディスプレイなのかなど）を管理する。 The audio output unit 10006 outputs audio data based on the audio output instruction output from the basic software unit 10002 to an external speaker or an integrated speaker. The terminal attribute management unit 10007 manages the display resolution of the terminal 190, the image encoding codec type, and the terminal type (such as a smartphone or a large display).

サービス属性管理部１０００８は、エンドユーザ端末１９０に提供するサービス種別に関する情報を管理する。例えば、エンドユーザ端末１９０に搭載されるアプリケーションの種別や利用可能な画像配信サービスなどが管理される。 The service attribute management unit 10008 manages information on the service type provided to the end user terminal 190. For example, the types of applications installed in the end user terminal 190 and available image distribution services are managed.

課金管理部１０００９では、ユーザの画像配信サービスへの登録決済状況や課金金額に応じた、受信可能な画像配信シーン数の管理などが行われる。 The charge management unit 10009 manages the number of image distribution scenes that can be received, etc., according to the user's registered payment status to the image distribution service and the charge amount.

＜システムのワークフロー＞
次に、競技場やコンサートホールなどの施設に複数のカメラ１１２やマイク１１１を設置し撮影を行う場合のワークフローについて説明する。 <System workflow>
Next, a workflow in the case where a plurality of cameras 112 and microphones 111 are installed in a facility such as a stadium or a concert hall to perform photographing will be described.

図１０は、第１実施形態におけるワークフロー全体を示すフローチャートである。なお、以下で説明するワークフローの処理は、特に明示の記述がない場合、コントローラ３００の制御により実現される。すなわち、コントローラ３００が、画像処理システム１００内の他の装置（例えばバックエンドサーバ２７０やデータベース２５０等）を制御することにより、ワークフローの制御が実現される。 FIG. 10 is a flowchart showing the entire workflow in the first embodiment. The process of the workflow described below is realized by the control of the controller 300 unless there is a specific description. That is, control of the workflow is realized by the controller 300 controlling another device (for example, the back end server 270, the database 250, etc.) in the image processing system 100.

ワークフローの処理開始前において、画像処理システム１００の設置や操作を行う操作者（ユーザ）は設置前に必要な情報（事前情報）を収集し計画の立案を行う。また、操作者は、予め、対象となる施設に機材を設置しているものとする。 Before the start of processing of the workflow, an operator (user) who installs or operates the image processing system 100 collects necessary information (prior information) before setting up and prepares a plan. In addition, it is assumed that the operator has installed equipment in the target facility in advance.

Ｓ１１００において、コントローラ３００の制御ステーション３１０は、ユーザから事前情報に基づく設定を受け付ける。つぎに、ステップＳ１１０１において画像処理システム１００の各装置は、ユーザからの操作に基づいてコントローラ３００から発行されたコマンドに従って、システムの動作確認のための処理を実行する。 In S1100, the control station 310 of the controller 300 receives the setting based on the prior information from the user. Next, in step S1101, each device of the image processing system 100 executes a process for confirming the operation of the system in accordance with a command issued from the controller 300 based on an operation from the user.

ステップＳ１１０２において、仮想カメラ操作ＵＩ３３０は、競技等のための撮影開始前に画像や音声を出力する。これにより、ユーザは、競技等の前に、マイク１１１により集音された音声やカメラ１１２により撮像された画像を確認できる。 In step S1102, the virtual camera operation UI 330 outputs an image and sound before the start of shooting for a competition or the like. Thereby, the user can confirm the sound collected by the microphone 111 and the image captured by the camera 112 before the competition or the like.

Ｓ１１０３において、コントローラ３００の制御ステーション３１０は、各マイク１１１に集音を実施させ、各カメラ１１２に撮影を実施させる。本ステップにおける撮影はマイク１１１を用いた集音を含むものとするがこれに限らず、画像の撮影だけであってもよい。そして、ステップＳ１１０１で行った設定を変更する場合、または撮影を終了する場合はステップＳ１１０４に進む。つぎに、Ｓ１１０４において、Ｓ１１０１で行われた設定を変更して撮影を継続する場合はＳ１１０５に進み、撮影を完了する場合はＳ１１０６に進む。Ｓ１１０４における判定は、典型的には、ユーザからコントローラ３００への入力に基づいて行われる。ただしこの例に限らない。Ｓ１１０５において、コントローラ３００は、Ｓ１１０１で行われた設定を変更する。変更内容は、典型的には、Ｓ１１０４にて取得されたユーザ入力に基づいて決定される。本ステップにおける設定の変更において撮影を停止する必要がある場合は、一度撮影を停止し、設定を変更した後に撮影を再開する。また、撮影を停止する必要がない場合は、撮影と並行して設定の変更を実施する。 In step S1103, the control station 310 of the controller 300 causes each microphone 111 to perform sound collection and causes each camera 112 to perform shooting. The shooting in this step includes sound collection using the microphone 111, but the invention is not limited thereto, and may be only shooting of an image. Then, in the case of changing the setting made in step S1101 or in the case of ending the photographing, the process proceeds to step S1104. Next, in S1104, if the setting made in S1101 is changed and shooting is continued, the process proceeds to S1105, and if shooting is completed, the process proceeds to S1106. The determination in S1104 is typically performed based on an input from the user to the controller 300. However, it is not limited to this example. In S1105, the controller 300 changes the setting made in S1101. The content of the change is typically determined based on the user input acquired in S1104. If it is necessary to stop the shooting in the setting change in this step, the shooting is once stopped, and the shooting is restarted after changing the setting. In addition, when it is not necessary to stop shooting, the setting is changed in parallel with shooting.

Ｓ１１０６において、コントローラ３００は、複数のカメラ１１２により撮影された画像及び複数のマイク１１１により集音された音声の編集を実施する。当該編集は、典型的には、仮想カメラ操作ＵＩ３３０を介して入力されたユーザ操作に基づいて行われる。なお、Ｓ１１０６とＳ１１０３の処理は並行して行われるようにしても良い。例えば、スポーツ競技やコンサートなどがリアルタイムに配信される（例えば競技中に競技の画像が配信される）場合は、Ｓ１１０３の撮影とＳ１１０６の編集が同時に実施される。また、スポーツ競技におけるハイライト画像が競技後に配信される場合は、Ｓ１１０４において撮影を終了した後に編集が実施される。 In S1106, the controller 300 edits the images captured by the plurality of cameras 112 and the sound collected by the plurality of microphones 111. The editing is typically performed based on a user operation input via the virtual camera operation UI 330. The processes of S1106 and S1103 may be performed in parallel. For example, when a sports competition or a concert is distributed in real time (for example, the image of the competition is distributed during the competition), the shooting in S1103 and the editing in S1106 are performed simultaneously. In addition, when the highlight image in the sports competition is distributed after the competition, the editing is performed after the photographing is finished in S1104.

図１１は、設置時処理（Ｓ１１０１）の詳細フローチャートである。 FIG. 11 is a detailed flowchart of the installation process (S1101).

Ｓ１３００において、制御ステーション３１０は、設置機材の過不足の有無に関するユーザ入力を受け付ける。ユーザは、Ｓ１２０１で入力された機器情報と設置する機材を比較し過不足の有無を確認することで、設置機材の過不足の有無を判定できる。つぎに、Ｓ１３０１において制御ステーション３１０は、Ｓ１３００で不足すると判定された機材の設置確認処理を実行する。つまり、ユーザは、Ｓ１３００とＳ１３０１との間に、不足機材を設置することができ、制御ステーション３１０は、ユーザにより不足機材が設置されたことを確認する。 In S1300, control station 310 receives user input regarding the presence or absence of excess or deficiency of installed equipment. The user can determine the presence or absence of the installed equipment by checking the presence or absence of the equipment information input in S1201 and the installed equipment. Next, in step S1301, the control station 310 executes installation confirmation processing of equipment determined to be insufficient in step S1300. That is, the user can install the lacking equipment between S1300 and S1301, and the control station 310 confirms that the insufficient equipment has been installed by the user.

Ｓ１３０２において、制御ステーション３１０は、Ｓ１３０１で設置された機材を起動し正常に動作するかの調整前システム動作確認を行う。なお、Ｓ１３０２の処理は、ユーザがシステム動作確認を実施し、その確認結果を制御ステーション３１０に対してユーザが入力するようにしても良い。 In step S1302, the control station 310 checks the pre-adjustment system operation whether the equipment installed in step S1301 is activated and operates normally. In the process of S1302, the user may check the system operation, and the user may input the check result to the control station 310.

ここで、機材の過不足や動作にエラーが発生した場合には、制御ステーション３１０に対して、エラー通知が行われる（Ｓ１３０３）。制御ステーション３１０は、エラーが解除されるまで次のステップには進まないロック状態となる。エラー状態が解除された場合には、制御ステーション３１０に正常通知が行われ（Ｓ１３０４）、次のステップに進む。これにより、初期段階でエラーを検知することができる。確認の後、カメラ１１２に関する処理についてはＳ１３０５へ、マイク１１１に関する処理についてはＳ１３０８に進む。 Here, if an error occurs in the excess or deficiency of equipment or operation, an error notification is sent to the control station 310 (S1303). The control station 310 is in a locked state which does not advance to the next step until the error is released. When the error state is released, the control station 310 is notified of normality (S1304), and the process proceeds to the next step. This makes it possible to detect an error at an early stage. After the confirmation, the process proceeds to step S1305 for the process related to the camera 112, and to step S1308 for the process related to the microphone 111.

最初に、カメラ１１２について述べる。Ｓ１３０５において、制御ステーション３１０は、設置されたカメラ１１２の調整を実施する。本ステップのカメラ１１２の調整とは、画角合わせと色合わせを指し、設置されたカメラ１１２全てについて実施される。Ｓ１３０５の調整は、ユーザ操作に基づいて行われるようにしても良いし、自動調整機能により実現されても良い。 First, the camera 112 will be described. In S1305, the control station 310 performs adjustment of the installed camera 112. The adjustment of the camera 112 in this step refers to angle of view alignment and color alignment, and is performed for all the installed cameras 112. The adjustment in S1305 may be performed based on a user operation or may be realized by an automatic adjustment function.

また、画角合わせでは、ズーム、パン、チルト、及びフォーカスの調整が並行して実施され、それらの調整結果が制御ステーション３１０に保存される。そして、色合わせでは、ＩＲＩＳ、ＩＳＯ／ゲイン、ホワイトバランス、シャープネス、及びシャッタースピードの調整が同時に実施され、それらの調整結果が制御ステーション３１０に保存される。 In the field angle alignment, adjustments of zoom, pan, tilt, and focus are performed in parallel, and the adjustment results are stored in the control station 310. In color matching, adjustments of IRIS, ISO / gain, white balance, sharpness, and shutter speed are simultaneously performed, and the adjustment results are stored in the control station 310.

Ｓ１３０６において、制御ステーション３１０は、設置されたカメラ全てが同期する様に調整する。Ｓ１３０６における同期の調整は、ユーザ操作に基づいて行われるようにしても良いし、自動調整機能により実現されても良い。さらに、Ｓ１３０７において、制御ステーション３１０は、カメラ設置時キャリブレーションを行う。より具体的には、制御ステーション３１０は、設置されたカメラ全ての座標が世界座標に一致する様に調整を行う。キャリブレーション処理の詳細については図１２を参照して後述する。なお、カメラ１１２の制御コマンドやタイムサーバとの同期に関するネットワーク経路の疎通確認もあわせて実施される。そして、マイク調整が進むまで調整後システム動作正常確認処理で待つ（Ｓ１３１１）。 At S1306, the control station 310 adjusts all the installed cameras to be synchronized. The adjustment of synchronization in S1306 may be performed based on a user operation or may be realized by an automatic adjustment function. Furthermore, in step S1307, the control station 310 performs calibration at the time of camera installation. More specifically, the control station 310 adjusts so that the coordinates of all the installed cameras match the world coordinates. The details of the calibration process will be described later with reference to FIG. It should be noted that the communication check of the network path regarding synchronization with the control command of the camera 112 and the time server is also carried out. Then, it waits in the post-adjustment system operation normality confirmation process until the microphone adjustment proceeds (S1311).

マイク１１１に関する処理について述べる。まず、Ｓ１３０８において、制御ステーション３１０は、設置されたマイク１１１の調整を実施する。本ステップのマイク１１１の調整とは、ゲイン調整を指し、設置したマイク全てについて実施される。Ｓ１３０８におけるマイク１１１の調整は、ユーザ操作に基づいて行われても良いし、自動調整機能により実現されても良い。 The process regarding the microphone 111 will be described. First, in S1308, the control station 310 performs adjustment of the installed microphone 111. The adjustment of the microphone 111 in this step refers to gain adjustment, and is performed for all the installed microphones. The adjustment of the microphone 111 in S1308 may be performed based on a user operation or may be realized by an automatic adjustment function.

Ｓ１３０９において、制御ステーション３１０は、設置されたマイク全てが同期する様に調整する。具体的には、同期クロックの確認を実施する。Ｓ１３０９における同期の調整は、ユーザ操作に基づいて行われるようにしても良いし、自動調整機能により実現されても良い。 At S1309, the control station 310 adjusts all the installed microphones to be synchronized. Specifically, confirmation of the synchronization clock is performed. The adjustment of synchronization in S1309 may be performed based on a user operation or may be realized by an automatic adjustment function.

Ｓ１３１０において、制御ステーション３１０は、設置されたマイク１１１のうち、フィールドに設置されたマイク１１１について位置の調整を実施する。Ｓ１３１０におけるマイク１１１の位置の調整は、ユーザ操作に基づいて行われても良いし、自動調整機能により実現されても良い。なお、マイク１１１の制御コマンドやタイムサーバとの同期に関するネットワーク経路の疎通確認もあわせて実施される。 In S1310, the control station 310 adjusts the position of the microphone 111 installed in the field among the installed microphones 111. The adjustment of the position of the microphone 111 in S1310 may be performed based on a user operation or may be realized by an automatic adjustment function. Note that the communication check of the network path regarding synchronization with the control command of the microphone 111 and the time server is also implemented.

Ｓ１３１１において、制御ステーション３１０は、カメラ１１２ａ〜１１２ｚ、およびマイク１１１ａ〜１１１ｚが正しく調整できたかを確認することを目的として調整後システム動作確認を実施する。Ｓ１３１１の処理は、ユーザ指示に基づいて実行されうる。カメラ１１２、マイク１１１ともに調整後システム動作正常確認がとれた場合には、Ｓ１３１３において、制御ステーション３１０へ正常通知が行われる。一方、エラーが発生した場合には、カメラ１１２あるいはマイク１１１の種別及び個体番号と共に制御ステーション３１０へエラー通知が行われる（Ｓ１３１２）。制御ステーション３１０は、エラーが発生した機器の種別と個体番号をもとに再調整の指示を出す。 In step S1311, the control station 310 performs the post-adjustment system operation check for the purpose of confirming whether the cameras 112a to 112z and the microphones 111a to 111z are correctly adjusted. The process of S1311 may be performed based on a user instruction. If the camera 112 and the microphone 111 both confirm the system operation after adjustment, in step S1313, a normal notification is issued to the control station 310. On the other hand, when an error occurs, an error notification is sent to the control station 310 together with the type and individual number of the camera 112 or the microphone 111 (S1312). The control station 310 issues a readjustment instruction based on the type of the device in which the error has occurred and the individual number.

次に設置時処理（Ｓ１１０１）及び撮影前処理（Ｓ１１０２）の動作例について説明する。画像処理システム１００は、設置時キャリブレーションを行う状態と通常の撮影を行う状態を動作モード変更により切替制御できる。なお、撮影中にある特定カメラのキャリブレーションが必要になるケースもあり、この場合には撮影とキャリブレーションという二種類の動作が両立する。 Next, an operation example of the installation process (S1101) and the pre-shooting process (S1102) will be described. The image processing system 100 can switch and control the state of calibration at the time of installation and the state of normal imaging by changing the operation mode. In some cases, calibration of a specific camera is required during shooting, and in this case, two types of operations, shooting and calibration, are compatible.

図１２は、設置時キャリブレーション処理のシーケンス図である。ここでは、装置間で行われる指示に対するデータの受信完了や処理完了の通知についての記載は省略するが、指示に対して何らかのレスポンスが返却されるものとする。 FIG. 12 is a sequence diagram of installation calibration processing. Although a description of notification of completion of reception of data and completion of processing in response to an instruction performed between devices is omitted here, it is assumed that some response is returned in response to the instruction.

まず、カメラ１１２の設置が完了すると、ユーザは制御ステーション３１０に対して、設置時キャリブレーションの実行を指示する。すると、制御ステーション３１０は、フロントエンドサーバ２３０およびカメラアダプタ１２０に対して、キャリブレーション開始を指示する（Ｓ４１００）。 First, when the installation of the camera 112 is completed, the user instructs the control station 310 to perform calibration at installation. Then, the control station 310 instructs the front end server 230 and the camera adapter 120 to start calibration (S4100).

フロントエンドサーバ２３０は、キャリブレーション開始指示を受けると、それ以降に受信した画像データをキャリブレーション用データと判定し、キャリブレーション部２１４０が処理できるように制御モードを変更する（Ｓ４１０２ａ）。また、カメラアダプタ１２０は、キャリブレーション開始指示を受けると、前景背景分離等の画像処理を行わず非圧縮のフレーム画像を扱う制御モードに移行する（Ｓ４１０２ｂ）。さらに、カメラアダプタ１２０は、カメラ１１２に対してカメラモード変更を指示する（Ｓ４１０１）。これを受けたカメラ１１２は、例えば、フレームレートを１ｆｐｓに設定する。あるいは、カメラ１１２が動画でなく静止画を伝送するモードに設定してもよい（Ｓ４１０２ｃ）。また、カメラアダプタ１２０によってフレームレートが制御されてキャリブレーション画像が伝送されるモードに設定してもよい。 When receiving the calibration start instruction, the front end server 230 determines that the image data received thereafter is data for calibration, and changes the control mode so that the calibration unit 2140 can process it (S4102a). In addition, when receiving the calibration start instruction, the camera adapter 120 does not perform image processing such as foreground / background separation, and shifts to a control mode for handling an uncompressed frame image (S4102b). Furthermore, the camera adapter 120 instructs the camera 112 to change the camera mode (S4101). The camera 112 having received this sets, for example, the frame rate to 1 fps. Alternatively, the camera 112 may be set to a mode for transmitting a still image instead of a moving image (S4102c). In addition, the frame rate may be controlled by the camera adapter 120 and a mode may be set in which a calibration image is transmitted.

制御ステーション３１０は、カメラアダプタ１２０に対して、カメラのズーム値とフォーカス値の取得を指示し（Ｓ４１０３）、カメラアダプタ１２０は、制御ステーション３１０に、カメラ１１２のズーム値とフォーカス値を送信する（Ｓ４１０４）。 The control station 310 instructs the camera adapter 120 to acquire the zoom value and the focus value of the camera (S4103), and the camera adapter 120 transmits the zoom value and the focus value of the camera 112 to the control station 310 (S4103) S4104).

なお図１２においては、カメラアダプタ１２０及びカメラ１１２はそれぞれ１つしか記載しないが、カメラアダプタ１２０及びカメラ１１２に関する制御は、画像処理システム１００内の全カメラアダプタ１２０及び全カメラ１１２に対してそれぞれ実行される。そのため、Ｓ４１０３及びＳ４１０４はカメラ台数分実行され、全カメラ１１２に対するＳ４１０３及びＳ４１０４の処理が完了した時点で、制御ステーション３１０は、全カメラ分のズーム値とフォーカス値を受信できている状態となる。 Although only one camera adapter 120 and one camera 112 are described in FIG. 12, control relating to the camera adapter 120 and the camera 112 is performed for all the camera adapters 120 and all the cameras 112 in the image processing system 100. Be done. Therefore, S4103 and S4104 are executed by the number of cameras, and when the processes of S4103 and S4104 for all the cameras 112 are completed, the control station 310 can receive zoom values and focus values for all the cameras.

制御ステーション３１０は、フロントエンドサーバ２３０に、Ｓ４１０４で受信した全カメラ分のズーム値とフォーカス値を送信する（Ｓ４１０５）。次いで、制御ステーション３１０は、フロントエンドサーバ２３０に、設置時キャリブレーション用撮影の撮影パターンを通知する（Ｓ４１０６）。 The control station 310 transmits the zoom values and focus values for all the cameras received in S4104 to the front end server 230 (S4105). Next, the control station 310 notifies the front end server 230 of a shooting pattern for shooting during calibration (S4106).

ここで撮影パターンには、画像特徴点となるマーカ等をグラウンド内で動かして複数回撮影する場合の、別タイミングで撮影された画像を区別するためのパターン名（例えばパターン１〜１０）の属性が付加される。つまり、フロントエンドサーバ２３０は、Ｓ４１０６以降に受信したキャリブレーション用の画像データを、Ｓ４１０６で受信した撮影パターンにおける撮影画像であると判定する。そして、制御ステーション３１０は、カメラアダプタ１２０に対して同期静止画撮影を指示し（Ｓ４１０７）、カメラアダプタ１２０は、全カメラで同期した静止画撮影をカメラ１１２に指示する（Ｓ４１０８）。そして、カメラ１１２は撮影画像をカメラアダプタ１２０に送信する（Ｓ４１０９）。 Here, in the shooting pattern, an attribute of a pattern name (for example, patterns 1 to 10) for distinguishing an image captured at another timing when a marker or the like serving as an image feature point is moved within the ground and captured a plurality of times Is added. That is, the front end server 230 determines that the calibration image data received after S4106 is a captured image in the captured pattern received in S4106. Then, the control station 310 instructs the camera adapter 120 to perform synchronous still image shooting (S4107), and the camera adapter 120 instructs the camera 112 to perform still image shooting synchronized with all the cameras (S4108). Then, the camera 112 transmits the photographed image to the camera adapter 120 (S4109).

なお、注視点のグループが複数ある場合には、注視点グループ毎にＳ４１０６からＳ４１１１のキャリブレーション用画像撮影を行っても良い。 If there are a plurality of gaze point groups, the calibration image shooting of S4106 to S4111 may be performed for each gaze point group.

そして、制御ステーション３１０は、カメラアダプタ１２０に対して、Ｓ４１０７で撮影指示した画像をフロントエンドサーバ２３０に伝送するように指示する（Ｓ４１１０）。さらに、カメラアダプタ１２０は、伝送先として指定されたフロントエンドサーバ２３０にＳ４１０９で受信した画像を伝送する（Ｓ４１１１）。 Then, the control station 310 instructs the camera adapter 120 to transmit the image instructed to be shot in S4107 to the front end server 230 (S4110). Furthermore, the camera adapter 120 transmits the image received in S4109 to the front end server 230 specified as the transmission destination (S4111).

Ｓ４１１１で伝送するキャリブレーション用画像については、前景背景分離等の画像処理が行われず、撮影された画像が圧縮せずにそのまま伝送されるものとする。そのため、全カメラが高解像度で撮影を行う場合や、カメラ台数が多くなった場合、伝送帯域の制約上、全ての非圧縮画像を同時に送信することができなくなることが発生する虞がある。その結果、ワークフローの中でキャリブレーションに要する時間が長くなる虞がある。その場合、Ｓ４１１０の画像伝送指示において、カメラアダプタ１２０の１台ずつに対して、キャリブレーションのパターン属性に応じた非圧縮画像の伝送指示が順番に行われる。さらにこのような場合、マーカのパターン属性に応じたより多くの特徴点を撮影する必要があるため、複数マーカを用いたキャリブレーション用の画像撮影が行われる。この場合、負荷分散の観点から、画像撮影と非圧縮画像伝送を非同期に行ってもよい。また、キャリブレーション用の画像撮影で取得した非圧縮画像を、カメラアダプタ１２０にパターン属性ごとに逐次蓄積する。更に、並行して非圧縮画像の伝送をＳ４１１０の画像伝送指示に応じて行うことで、ワークフローの処理時間やヒューマンエラーの削減を図ることができる効果がある。 For the calibration image transmitted in step S4111, image processing such as foreground / background separation is not performed, and the captured image is transmitted as it is without compression. Therefore, when all the cameras shoot at high resolution, or when the number of cameras increases, it may occur that all uncompressed images can not be transmitted simultaneously due to the limitation of the transmission band. As a result, the time required for calibration in the workflow may be long. In that case, in the image transmission instruction of S4110, the transmission instruction of the non-compressed image according to the pattern attribute of the calibration is sequentially issued to each camera adapter 120 one by one. Furthermore, in such a case, it is necessary to capture more feature points in accordance with the pattern attribute of the marker, so that calibration image capture is performed using a plurality of markers. In this case, image capture and non-compression image transmission may be performed asynchronously from the viewpoint of load distribution. In addition, the non-compressed image acquired by the image capturing for calibration is sequentially accumulated in the camera adapter 120 for each pattern attribute. Furthermore, by transmitting the non-compressed image in parallel according to the image transmission instruction in S4110, it is possible to reduce the workflow processing time and human error.

全カメラ１１２においてＳ４１１１の処理が完了した時点で、フロントエンドサーバ２３０は、全カメラ分の撮影画像を受信できている状態となる。 When the process of S4111 is completed in all the cameras 112, the front end server 230 is in a state in which the photographed images for all the cameras can be received.

前述したように、撮影パターンが複数ある場合には、Ｓ４１０６からＳ４１１１の処理をパターン数分繰り返す。 As described above, when there are a plurality of shooting patterns, the processing of S4106 to S4111 is repeated for the number of patterns.

次いで、全てのキャリブレーション用撮影が完了すると、制御ステーション３１０は、フロントエンドサーバ２３０に対して、カメラパラメータ推定処理を指示する（Ｓ４１１２）。 Next, when all calibration imaging has been completed, the control station 310 instructs the front end server 230 to perform camera parameter estimation processing (S4112).

フロントエンドサーバ２３０は、カメラパラメータ推定処理指示を受けると、Ｓ４１０５で受信した全カメラ分のズーム値とフォーカス値、及びＳ４１１１で受信した全カメラ分の撮影画像を用いて、カメラパラメータ推定処理を行う（Ｓ４１１３）。Ｓ４１１３におけるカメラパラメータ推定処理の詳細については後述する。なお、注視点が複数ある場合には、注視点グループ毎にＳ４１１３のカメラパラメータ推定処理を行うものとする。 When receiving the camera parameter estimation processing instruction, the front end server 230 performs camera parameter estimation processing using the zoom values and focus values for all cameras received in S4105 and the captured images for all cameras received in S4111. (S4113). Details of the camera parameter estimation process in S4113 will be described later. When there are a plurality of fixation points, the camera parameter estimation process of S4113 is performed for each fixation point group.

そして、フロントエンドサーバ２３０は、Ｓ４１１３のカメラパラメータ推定処理の結果として導出された全カメラ分のカメラパラメータをデータベース２５０に送信して保存する（Ｓ４１１４）。 Then, the front end server 230 transmits the camera parameters for all the cameras derived as a result of the camera parameter estimation process of S4113 to the database 250 for storage (S4114).

また、フロントエンドサーバ２３０は、制御ステーション３１０に対しても同様に全カメラ分のカメラパラメータを送信（Ｓ４１１５）する。制御ステーション３１０は、カメラアダプタ１２０に対して、各カメラ１１２に対応するカメラパラメータを送信し（Ｓ４１１６）、カメラアダプタ１２０は、受信した自カメラ１１２のカメラパラメータを保存する（Ｓ４１１７）。 The front end server 230 similarly transmits camera parameters for all cameras to the control station 310 (S4115). The control station 310 transmits camera parameters corresponding to each camera 112 to the camera adapter 120 (S4116), and the camera adapter 120 stores the received camera parameters of the own camera 112 (S4117).

そして、制御ステーション３１０は、キャリブレーション結果を確認する（Ｓ４１１８）。確認方法としては、導出されたカメラパラメータの数値を確認しても良いし、Ｓ４１１４のカメラパラメータ推定処理の演算過程を確認しても良いし、カメラパラメータを用いて画像生成を行い、生成された画像を確認するようにしても良い。そして、制御ステーション３１０は、フロントエンドサーバ２３０に対して、キャリブレーション終了を指示する（Ｓ４１１９）。 Then, the control station 310 confirms the calibration result (S4118). As a confirmation method, the numerical value of the derived camera parameter may be confirmed, or the calculation process of the camera parameter estimation process of S4114 may be confirmed, or an image is generated using the camera parameter. The image may be confirmed. Then, the control station 310 instructs the front end server 230 to finish the calibration (S4119).

フロントエンドサーバ２３０はキャリブレーション終了指示を受けると、Ｓ４１０１で実行したキャリブレーション開始処理とは逆に、それ以降に受信した画像データをキャリブレーション用データでないと判定するよう制御モードを変更する。（Ｓ４１２０）
以上の処理により、設置時キャリブレーション処理として、全カメラ分のカメラパラメータを導出し、導出されたカメラパラメータをカメラアダプタ１２０及びデータベース２５０に保存することができる。 In response to the calibration end instruction, the front end server 230 changes the control mode so as to determine that the image data received thereafter is not calibration data, contrary to the calibration start processing executed in S4101. (S4120)
By the above-described processing, as installation calibration processing, camera parameters for all cameras can be derived, and the derived camera parameters can be stored in the camera adapter 120 and the database 250.

また、上述した設置時キャリブレーション処理は、カメラ設置後及び撮影前に実施され、カメラが動かされなければ再度処理する必要はないが、カメラを動かす場合（例えば、試合の前半と後半とで注視点を変更するなど）には、再度同様の処理が行われるも。 Also, the above-mentioned installation calibration process is performed after camera installation and before shooting, and there is no need to process it again if the camera is not moved, but when moving the camera (for example, note in the first half and the second half of the game) The same process is performed again to change the viewpoint, etc.).

また、撮影中にボールがぶつかる等のアクシデントにより所定の閾値以上にカメラ１１２が動いてしまった場合に、当該カメラ１１２を撮影状態からキャリブレーション開始状態に遷移させ上述の設置時キャリブレーションを行っても良い。その場合、システムとしては通常の撮影状態を維持し、当該カメラ１１２のみがキャリブレーション用画像を伝送している旨をフロントエンドサーバ２３０に通知することで、システム全体をキャリブレーションモードにする必要はなく撮影の継続性を図れる。さらには、本システムのデイジーチェーンでの伝送においては、通常の撮影における画像データの伝送帯域にキャリブレーション用の非圧縮画像を送ると、伝送帯域制限を超過する場合が考えられる。この場合、非圧縮画像の伝送優先度を下げたり、非圧縮画像を分割して送信したりすることで対応する。さらには、カメラアダプタ１２０間の接続が１０ＧｂＥなどの場合は、全二重の特徴を使うことで、通常の撮影の画像データ伝送とは逆向きに非圧縮画像を伝送することで帯域確保が図れるという効果がある。 In addition, when the camera 112 moves above a predetermined threshold due to an accident such as ball collision during shooting, the camera 112 is transitioned from the shooting state to the calibration start state, and the above-described calibration is performed at the time of installation. Also good. In that case, it is necessary to keep the entire system in the calibration mode by maintaining the normal shooting state as the system and notifying the front end server 230 that only the camera 112 is transmitting the calibration image. It is possible to achieve the continuity of shooting. Furthermore, in the daisy chain transmission of the present system, when the uncompressed image for calibration is sent to the transmission band of the image data in normal imaging, the transmission band limit may be exceeded. In this case, the transmission priority of the non-compressed image is lowered or the non-compressed image is divided and transmitted. Furthermore, when the connection between the camera adapters 120 is 10 GbE or the like, by using the full-duplex feature, it is possible to secure a band by transmitting an uncompressed image in the opposite direction to the image data transmission of normal shooting. It has the effect of

また、複数の注視点のうちの１つの注視点を変更したい場合など、１つの注視点グループのカメラ１１２のみ、上述した設置時キャリブレーション処理を再度行うようにしても良い。その場合、キャリブレーション処理中は、対象の注視点グループのカメラ１１２については、通常の画像撮影及び仮想視点画像生成を行うことができない。そのため、キャリブレーション処理中であることが制御ステーション３１０に通知され、制御ステーション３１０が仮想カメラ操作ＵＩ３３０に対して視点操作の制限をかけるなどの処理を要求する。フロントエンドサーバ２３０では、仮想視点画像生成の処理に影響が出ないよう制御してカメラパラメータ推定処理を行うものとする。 Further, when it is desired to change one of the plurality of fixation points, only the camera 112 of one fixation point group may perform the above-mentioned calibration process at the time of installation again. In that case, normal imaging and virtual viewpoint image generation can not be performed for the camera 112 of the target gaze point group during the calibration process. Therefore, the control station 310 is notified that the calibration process is in progress, and the control station 310 requests the virtual camera operation UI 330 to perform processing such as restriction of the viewpoint operation. The front end server 230 performs camera parameter estimation processing by performing control so as not to affect the processing of virtual viewpoint image generation.

図１３は、カメラパラメータ推定処理の詳細フローチャートである。なお、キャリブレーション部２１４０は、制御ステーション３１０からの指示に基づいて、カメラパラメータ推定処理を実行する。本シーケンスを開始する時点で、内部パラメータマップ、スタジアムデータ、全カメラ分のズーム値とフォーカス値、及び全カメラ分のキャリブレーション用撮影画像は、キャリブレーション部２１４０が既に保持しているものとする。 FIG. 13 is a detailed flowchart of the camera parameter estimation process. The calibration unit 2140 executes camera parameter estimation processing based on an instruction from the control station 310. At the time of starting this sequence, it is assumed that the calibration unit 2140 already holds the internal parameter map, stadium data, zoom values and focus values for all cameras, and calibration photographed images for all cameras. .

まずキャリブレーション部２１４０は、カメラ１１２を特定し（Ｓ４２０１）、対応するズーム値とフォーカス値を特定し、特定したズーム値とフォーカス値より、内部パラメータマップを用いて内部パラメータ初期値を導出する（Ｓ４２０２）。Ｓ４２０２における内部パラメータ初期値の導出が全カメラ分完了するまで、Ｓ４２０１とＳ４２０２の処理が繰り返される（Ｓ４２０３）。 First, the calibration unit 2140 identifies the camera 112 (S4201), identifies the corresponding zoom value and focus value, and derives an internal parameter initial value using the internal parameter map from the identified zoom value and focus value ( S4202). The processes of S4201 and S4202 are repeated until the derivation of the internal parameter initial value in S4202 is completed for all the cameras (S4203).

次いでキャリブレーション部２１４０は、再度カメラ１１２を特定し、対応するキャリブレーション用撮影画像を特定し（Ｓ４２０４）、画像内の特徴点（画像特徴点）を検出する（Ｓ４２０５）。画像特徴点としては、例えば、キャリブレーション用に用意したマーカや、予めスタジアムの地面に描かれているピッチラインや、予め置かれている物（例えば、サッカーゴールや選手控えベンチなど）のエッジ部分などが挙げられる。 Next, the calibration unit 2140 identifies the camera 112 again, identifies the corresponding calibration pickup image (S4204), and detects a feature point (image feature point) in the image (S4205). As image feature points, for example, markers prepared for calibration, pitch lines drawn in advance on the ground of the stadium, and edge portions of objects placed in advance (for example, soccer goals, player benches, etc.) Etc.

Ｓ４２０５における画像特徴点検出が全カメラ分完了するまで、Ｓ４２０４とＳ４２０５の処理が繰り返される（Ｓ４２０６）。 The processes in S4204 and S4205 are repeated until the detection of the image feature points in S4205 for all the cameras is completed (S4206).

次いでキャリブレーション部２１４０は、Ｓ４２０５で検出した各カメラ１１２におけるキャリブレーション用撮影画像の画像特徴点のマッチングを行う（Ｓ４２０７）。そして、カメラ１１２間でマッチングされた使用特徴点数が閾値以下であるかを判定する（Ｓ４２０８）。Ｓ４２０８で用いる使用特徴点数の閾値については予め設定しておいても良いし、カメラ台数や画角などの撮影条件によって自動で導出するようにしても良く、外部パラメータ推定を行うために最低限必要である値が用いられる。 Next, the calibration unit 2140 performs matching of the image feature points of the captured image for calibration in each camera 112 detected in S4205 (S4207). Then, it is determined whether the use feature score matched between the cameras 112 is equal to or less than a threshold (S4208). The threshold value of the feature number to be used in S4208 may be set in advance, or may be automatically derived according to the shooting conditions such as the number of cameras and the angle of view, and is at least necessary for external parameter estimation. The value is used.

Ｓ４２０８でキャリブレーション部２１４０は、使用特徴点数が閾値以下でない場合、各カメラ１１２の外部パラメータ推定処理を行う（Ｓ４２０９）。そして、Ｓ４２０９の外部パラメータ推定処理の結果、再投影誤差が閾値以下であるかを判定する（Ｓ４２１０）。Ｓ４２１０で用いる再投影誤差の閾値については予め設定しておいても良いし、カメラ台数などの撮影条件によって自動で導出するようにしても良く、生成する仮想視点画像の精度に応じた値が用いられる。 If it is determined in S4208 that the used feature score is not equal to or less than the threshold, the calibration unit 2140 performs external parameter estimation processing for each camera 112 (S4209). Then, as a result of the external parameter estimation processing in S4209, it is determined whether the reprojection error is equal to or less than the threshold (S4210). The threshold of the reprojection error used in S4210 may be set in advance, or may be automatically derived according to the shooting conditions such as the number of cameras, and a value corresponding to the accuracy of the virtual viewpoint image to be generated is used. Be

Ｓ４２１０の判定において、再投影誤差が閾値以下でない場合、キャリブレーション部２１４０は誤差が大きいと判断し、Ｓ４２０５における画像特徴点の誤検出、及びＳ４２０７における画像特徴点の誤マッチングの削除処理を行う（Ｓ４２１１）。 If it is determined in step S4210 that the reprojection error is not equal to or less than the threshold value, the calibration unit 2140 determines that the error is large, and performs processing for deleting the erroneous detection of the image feature point in step S4205 and the incorrect matching of the image feature point in step S4207 S4211).

Ｓ４２１１の誤検出及び誤マッチングの判定方法としては、例えばキャリブレーション部２１４０が再投影誤差の大きい特徴点を自動で削除するようにしても良いし、ユーザが再投影誤差及び画像を見ながら手作業で削除するようにしても良い。 For example, the calibration unit 2140 may automatically delete feature points having a large reprojection error as a method of determining the false detection and the false matching in S4211, or the user may manually perform a manual operation while viewing the reprojection error and the image. You may delete it by.

そしてキャリブレーション部２１４０は、Ｓ４２０２で導出した内部パラメータ初期値に対して、内部パラメータの補正を行う（Ｓ４２１２）。そして、Ｓ４２０８において使用特徴点数が閾値以下にならない範囲で、Ｓ４２１０において再投影誤差が閾値以下になるまで、Ｓ４２０８からＳ４２１２の処理を繰り返す。 Then, the calibration unit 2140 corrects the internal parameter with respect to the initial value of the internal parameter derived in S4202 (S4212). Then, the processing from S4208 to S4212 is repeated until the reprojection error becomes equal to or less than the threshold in S4210 within the range where the number of used feature points does not become equal to or less than the threshold in S4208.

Ｓ４２０８の判定においてキャリブレーション部２１４０は、使用特徴点数が閾値以下であればキャリブレーション失敗と判断する（Ｓ４２１３）。キャリブレーション失敗の場合、キャリブレーション用撮影からやり直すなどの対応が行われる。成功又は失敗の判断結果は、逐次制御ステーション３１０に対して通知され、失敗時点以降のキャリブレーション処理を実施するなどの対応が、一元的に制御ステーション３１０で管理される。 If it is determined in S4208 that the used feature score is equal to or less than the threshold value, the calibration unit 2140 determines that calibration has failed (S4213). In the case of calibration failure, measures such as starting again from calibration imaging are performed. The determination result of success or failure is notified to the control station 310 one by one, and the control station 310 centrally manages correspondence such as performing calibration processing after the failure time point.

Ｓ４２１０の判定において、再投影誤差が閾値以下であれば、キャリブレーション部２１４０は、スタジアムデータを用いて、Ｓ４２０９で推定された外部パラメータ座標について、カメラ座標系から世界座標系へ剛体変換を行う（Ｓ４２１４）。 If it is determined in S4210 that the reprojection error is equal to or less than the threshold value, the calibration unit 2140 performs rigid body conversion from the camera coordinate system to the world coordinate system for the external parameter coordinates estimated in S4209 using stadium data S4214).

ここで用いるスタジアムデータとしては、Ｘ／Ｙ／Ｚ軸それぞれの原点（例えばピッチ上のセンターサークルの中心点など）、及びスタジアム内の複数の特徴点（例えばピッチラインの交差点など）の座標値など、剛体変換を行うための座標値が定義される。 As stadium data used here, the origin of each of X / Y / Z axes (for example, the center point of the center circle on the pitch) and coordinate values of a plurality of feature points in the stadium (for example, the intersection of pitch lines) , Coordinate values for performing rigid body transformation are defined.

ただし、スタジアムデータが存在しない、もしくはデータの精度が低い場合などは、剛体変換を行うための世界座標の入力を手動で行うようにしても良いし、世界座標を示すためのデータがキャリブレーション部２１４０に別途与えられるようにしても良い。 However, if there is no stadium data, or if the accuracy of the data is low, etc., world coordinates may be manually input to perform rigid body conversion, and the data for indicating world coordinates may be a calibration unit. It may be provided separately to 2140.

また、Ｓ４２１４の処理を行うことでキャリブレーション用撮影画像内の世界座標が導出されるため、導出結果を用いて、予めスタジアムデータに記録されているスタジアム内の特徴点の座標をより精度が高くなるよう更新しても良い。 In addition, since the world coordinates in the captured image for calibration are derived by performing the processing of S4214, the coordinates of the feature points in the stadium that are recorded in the stadium data in advance are more accurate using the derivation result. It may be updated to become

以上の処理により、カメラパラメータ推定処理フローとして、全カメラ分のカメラパラメータが導出され、導出されたカメラパラメータをカメラアダプタ１２０及びデータベース２５０に保存することができる。 By the above processing, camera parameters for all cameras can be derived as the camera parameter estimation processing flow, and the derived camera parameters can be stored in the camera adapter 120 and the database 250.

なお、複数カメラの撮影画像を用いて仮想視点画像生成を行うシステムにおいては、カメラ１１２設置時に各カメラ１１２の位置姿勢推定を行うキャリブレーション処理（設置時キャリブレーション）が必要である。 Note that, in a system that generates virtual viewpoint images using captured images of a plurality of cameras, calibration processing (calibration at the time of installation) is necessary to estimate the position and orientation of each camera 112 when the camera 112 is installed.

設置時キャリブレーションでは、各カメラのカメラパラメータを求める処理が行われる。カメラパラメータとは、カメラ固有の内部パラメータ（焦点距離、画像中心、及びレンズ歪みパラメータ等）と、カメラの位置姿勢を表す外部パラメータ（回転行列及び位置ベクトル等）から成る。設置時キャリブレーション処理が完了すると、各カメラのカメラパラメータが導出された状態となる。 In the installation calibration, a process of obtaining camera parameters of each camera is performed. The camera parameters are composed of camera-specific internal parameters (focal length, image center, lens distortion parameter, etc.) and external parameters (rotation matrix, position vector, etc.) representing the position and orientation of the camera. When the installation calibration process is completed, the camera parameters of each camera are derived.

カメラパラメータのうち、内部パラメータは、カメラ１１２及びレンズが定まっている場合、ズーム値とフォーカス値に応じて変わるパラメータである。そのため、本システムにおいては、カメラ１１２をスタジアムに設置する以前に、同カメラ１１２及びレンズを用いて、内部パラメータ導出に必要な撮影を行うことで内部パラメータの導出を行っておく。そして、カメラ１１２をスタジアムに設置した際にズーム値とフォーカス値が決まると、自動的に内部パラメータを導出することができるようにしておく。これを本明細書では内部パラメータをマップ化すると表現し、マップ化の結果を内部パラメータマップと記載する。 Among the camera parameters, the internal parameter is a parameter that changes according to the zoom value and the focus value when the camera 112 and the lens are determined. Therefore, in the present system, before installing the camera 112 in the stadium, the internal parameters are derived by performing photographing necessary for deriving the internal parameters using the camera 112 and the lens. Then, when the zoom value and the focus value are determined when the camera 112 is installed in the stadium, internal parameters can be automatically derived. This is referred to herein as mapping internal parameters, and the result of the mapping is referred to as an internal parameter map.

内部パラメータマップの形式としては、ズーム値とフォーカス値に応じた内部パラメータを複数パターン記録しておく形式としても良いし、内部パラメータ値を算出できる演算式の形式としても良い。即ち、内部パラメータマップは、ズーム値とフォーカス値に応じて一意に内部パラメータが求まるものであればよい。 As the format of the internal parameter map, a plurality of patterns of internal parameters corresponding to the zoom value and the focus value may be recorded, or a formula of an operational expression capable of calculating the internal parameter value may be used. That is, the internal parameter map may be any one that can uniquely obtain the internal parameter according to the zoom value and the focus value.

また、内部パラメータマップによって求められたパラメータ値は、内部パラメータの初期値として用いられるものとする。そして、カメラパラメータ推定処理結果としての内部パラメータは、カメラ１１２をスタジアムに設置した後にキャリブレーション用として撮影した画像を用いたカメラパラメータ推定処理の過程で補正された値となる。 Further, parameter values obtained by the internal parameter map are used as initial values of the internal parameters. The internal parameter as the result of the camera parameter estimation process is a value corrected in the process of the camera parameter estimation process using the image captured for calibration after the camera 112 is installed in the stadium.

また、本実施形態では、設置されるカメラ１１２及びレンズは何れも同機種であり、同ズーム値及び同フォーカス値であれば内部パラメータも同じであるものとする。ただしこれに限らず、複数機種のカメラ１１２及びレンズを用いる場合など、同ズーム値及び同フォーカス値であっても内部パラメータに個体差がある場合は、機種毎及びカメラ１１２毎に内部パラメータマップを保持するようにしても良い。 Further, in the present embodiment, both the camera 112 and the lens to be installed are of the same model, and if the zoom value and the focus value are the same, the internal parameters are also the same. However, the present invention is not limited to this, and when there are individual differences in internal parameters even when the zoom value and the focus value are used, such as when using cameras 112 and lenses of multiple models, an internal parameter map for each model and camera 112 is used. You may hold it.

ところで、カメラ画像の高解像度化にともない、各カメラ１１２の画像フレームを伝送した際にデータ伝送量がネットワーク伝送帯域制限を超過する虞がある。この虞を低減する方法について説明する。 By the way, there is a possibility that the data transmission amount may exceed the network transmission band limitation when transmitting the image frame of each camera 112 as the resolution of the camera image is increased. A method of reducing this fear will be described.

図１４は、三次元モデル情報の生成処理のシーケンス図である。ここでは、複数のカメラアダプタ１２０（１２０ａ、１２０ｂ、１２０ｃ、及び１２０ｄ）が連動して三次元モデル情報を生成する処理について説明する。なお、処理の順番は図に示したものに限定される訳ではない。 FIG. 14 is a sequence diagram of generation processing of three-dimensional model information. Here, processing in which a plurality of camera adapters 120 (120a, 120b, 120c, and 120d) interlock to generate three-dimensional model information will be described. The order of processing is not limited to that shown in the figure.

なお、本実施形態の画像処理システム１００には２６台のカメラ１１２とカメラアダプタ１２０が含まれるが、ここでは２台のカメラ１１２ｂと１１２ｃ、及び、４台のカメラアダプタ１２０ａ、１２０ｂ、１２０ｃ、及び１２０ｄに注目して説明する。カメラ１１２ｂとカメラアダプタ１２０ｂ、及びカメラ１１２ｃとカメラアダプタ１２０ｃは、其々接続されている。なおカメラアダプタ１２０ａおよびカメラアダプタ１２０ｄに接続するカメラ１１２や、各カメラアダプタ１２０に接続するマイク１１１、雲台１１３、及び外部センサ１１４については省略する。また、カメラアダプタ１２０ａ〜１２０ｄはタイムサーバ２９０と時刻同期が完了し、撮影状態となっているものとする。 Although the image processing system 100 according to the present embodiment includes 26 cameras 112 and camera adapters 120, here, two cameras 112b and 112c, and four camera adapters 120a, 120b, 120c, and The description will be made focusing on 120d. The camera 112 b and the camera adapter 120 b, and the camera 112 c and the camera adapter 120 c are often connected. The camera 112 connected to the camera adapter 120a and the camera adapter 120d, the microphone 111 connected to each camera adapter 120, the camera platform 113, and the external sensor 114 are omitted. Further, it is assumed that the camera adapters 120a to 120d have completed time synchronization with the time server 290 and are in a shooting state.

カメラ１１２ｂおよびカメラ１１２ｃは其々カメラアダプタ１２０ｂ及び１２０ｃに対して撮影画像（１）及び撮影画像（２）を送信する（Ｆ６３０１、Ｆ６３０２）。カメラアダプタ１２０ｂ及び１２０ｃは、受信した撮影画像（１）または撮影画像（２）に対して、キャリブレーション制御部６１３３においてキャリブレーション処理を行う（Ｆ６３０３、Ｆ６３０４）。キャリブレーション処理は例えば色補正やブレ補正等である。なお、本実施形態ではキャリブレーション処理が実施されているが、必ずしも実施しなくてもよい。 The camera 112b and the camera 112c each transmit the photographed image (1) and the photographed image (2) to the camera adapters 120b and 120c (F6301 and F6302). The camera adapters 120b and 120c perform calibration processing in the calibration control unit 6133 on the received captured image (1) or the received captured image (2) (F6303, F6304). The calibration process is, for example, color correction or shake correction. Although the calibration process is performed in the present embodiment, it may not be necessarily performed.

次に、キャリブレーション処理済の撮影画像（１）または撮影画像（２）に対して、前景背景分離部６１３１によって前景背景分離処理が行われる（Ｆ６３０５、Ｆ６３０６）。 Next, foreground / background separation processing is performed by the foreground / background separation unit 6131 on the captured image (1) or the captured image (2) that has undergone the calibration process (F6305, F6306).

次に、分離された前景画像及び背景画像其々に対してデータ圧縮・伸張部６１２１において圧縮が行われる（Ｆ６３０７、Ｆ６３０８）。なお分離した前景画像及び背景画像の其々の重要度に応じて圧縮率が変更されてもよい。また、場合によっては圧縮を行わなくてもよい。例えば、カメラアダプタ１２０は、背景画像よりも前景画像の圧縮率が低くなるように、前景画像と背景画像とのうち少なくとも背景画像を圧縮して次のカメラアダプタ１２０に対して出力する。前景画像も背景画像も圧縮する場合、重要な撮影対象を含む前景画像はロスレス圧縮を行い、撮影対象を含まない背景画像に対してはロスあり圧縮を行う。これにより、この後に次のカメラアダプタ１２０ｃまたはカメラアダプタ１２０ｄに伝送されるデータ量を効率的に削減する事ができる。例えばサッカー、ラグビー及び野球等が開催されるスタジアムのフィールドを撮影した場合には、画像の大半が背景画像で構成され、プレーヤ等の前景画像の領域が小さいという特徴があるため、伝送データ量を大きく削減できることをここに明記しておく。 Next, compression is performed in the data compression / decompression unit 6121 on each of the separated foreground image and background image (F6307, F6308). The compression rate may be changed according to the degree of importance of the separated foreground image and background image. Also, in some cases, compression may not be performed. For example, the camera adapter 120 compresses at least the background image of the foreground image and the background image and outputs the compressed image to the next camera adapter 120 so that the compression ratio of the foreground image is lower than that of the background image. When the foreground image and the background image are both compressed, the foreground image including the important imaging target is subjected to lossless compression, and the background image not including the imaging target is subjected to lossy compression. As a result, it is possible to efficiently reduce the amount of data to be transmitted to the next camera adapter 120c or camera adapter 120d after this. For example, when shooting a field of a stadium where soccer, rugby, baseball, etc. are held, most of the image is composed of a background image, and the area of the foreground image of the player etc. is small. It is clearly stated here that significant reductions can be made.

さらには、カメラアダプタ１２０ｂ又はカメラアダプタ１２０ｃは、重要度に応じて、次のカメラアダプタ１２０ｃまたはカメラアダプタ１２０ｄに対して出力する画像のフレームレートを変更してもよい。例えば、前景画像よりも背景画像の出力フレームレートが低くなるように、重要な撮影対象を含む前景画像は高フレームレートで出力し、撮影対象を含まない背景画像は低フレームレートで出力してもよい。この事によって更に次のカメラアダプタ１２０ｃまたはカメラアダプタ１２０ｄに伝送されるデータ量を削減する事ができる。またカメラ１１２の設置場所、撮影場所、及び／又はカメラ１１２の性能などに応じて、カメラアダプタ１２０毎に圧縮率や伝送フレームレートを変更してもよい。また、スタジアムの観客席等の三次元構造は図面を用いて事前に確認することができるため、カメラアダプタ１２０は背景画像から観客席の部分を除いた画像を伝送してもよい。これにより、後述のレンダリングの時点で、事前に生成したスタジアム三次元構造を利用することで試合中のプレーヤに重点化した画像レンダリングを実施し、システム全体で伝送及び記憶されるデータ量の削減ができるという効果が生まれる。 Furthermore, the camera adapter 120b or the camera adapter 120c may change the frame rate of the image to be output to the next camera adapter 120c or 120d depending on the importance. For example, even if the foreground image including the important imaging target is output at a high frame rate and the background image not including the imaging target is output at a low frame rate so that the output frame rate of the background image is lower than the foreground image. Good. This can further reduce the amount of data to be transmitted to the next camera adapter 120c or camera adapter 120d. Also, the compression rate and the transmission frame rate may be changed for each camera adapter 120 according to the installation location of the camera 112, the imaging location, and / or the performance of the camera 112, and the like. In addition, since a three-dimensional structure such as a spectator seat of the stadium can be confirmed in advance using a drawing, the camera adapter 120 may transmit an image obtained by removing the portion of the spectator seat from the background image. As a result, at the time of rendering, which will be described later, the image rendering focused on players in the game can be performed by utilizing the stadium three-dimensional structure generated in advance, and the amount of data transmitted and stored throughout the system can be reduced. The effect of being able to

次にカメラアダプタ１２０は、圧縮した前景画像及び背景画像を隣接するカメラアダプタ１２０に転送する（Ｆ６３１０、Ｆ６３１１、Ｆ６３１２）。なお、本実施形態では前景画像及び背景画像は同時に転送されているが、其々が個別に転送されてもよい。 Next, the camera adapter 120 transfers the compressed foreground image and background image to the adjacent camera adapter 120 (F6310, F6311, F6312). Although the foreground image and the background image are simultaneously transferred in the present embodiment, each may be separately transferred.

次に、カメラアダプタ１２０ｂは、カメラアダプタ１２０ａから受信した前景画像と前景背景分離処理Ｆ６３０５で分離した前景画像とを使用して三次元モデル情報を作成する（Ｆ６３１３）。同様にカメラアダプタ１２０ｃも三次元モデル情報を作成する（Ｆ６３１４）。 Next, the camera adapter 120b creates three-dimensional model information using the foreground image received from the camera adapter 120a and the foreground image separated in the foreground / background separation processing F6305 (F6313). Similarly, the camera adapter 120c also creates three-dimensional model information (F6314).

次に、カメラアダプタ１２０ｂはカメラアダプタ１２０ａから受信した前景画像及び背景画像をカメラアダプタ１２０ｃへ転送する（Ｆ６３１５）。カメラアダプタ１２０ｃも同様にカメラアダプタ１２０ｄへ前景画像及び背景画像を転送する。なお、本実施形態では前景画像及び背景画像は同時に転送されているが、其々が個別に転送されてもよい。 Next, the camera adapter 120b transfers the foreground image and the background image received from the camera adapter 120a to the camera adapter 120c (F6315). The camera adapter 120c similarly transfers the foreground image and the background image to the camera adapter 120d. Although the foreground image and the background image are simultaneously transferred in the present embodiment, each may be separately transferred.

さらに、カメラアダプタ１２０ｃは、カメラアダプタ１２０ａが作成し、カメラアダプタ１２０ｂから受信した前景画像及び背景画像をカメラアダプタ１２０ｄへ転送する（Ｆ６３１７）。 Furthermore, the camera adapter 120c transfers the foreground image and the background image, which are created by the camera adapter 120a and received from the camera adapter 120b, to the camera adapter 120d (F6317).

次に、各カメラアダプタ１２０ａ〜１２０ｃは、作成した三次元モデル情報を其々次のカメラアダプタ１２０ｂ〜１２０ｄへ転送する（Ｆ６３１８、Ｆ６３１９、Ｆ６３２０）。 Next, each of the camera adapters 120a to 120c transfers the created three-dimensional model information to the next camera adapters 120b to 120d (F6318, F6319, F6320).

さらに、カメラアダプタ１２０ｂ及び１２０ｃは、逐次受信した三次元モデル情報を次のカメラアダプタ１２０ｃ及び１２０ｄへ転送する（Ｆ６３２１、Ｆ６３２２）。さらに、カメラアダプタ１２０ｃは、カメラアダプタ１２０ａが作成し、カメラアダプタ１２０ｂから受信した三次元モデル情報をカメラアダプタ１２０ｄへ転送する（Ｆ６３２３）。 Further, the camera adapters 120b and 120c transfer the sequentially received three-dimensional model information to the next camera adapters 120c and 120d (F6321, F6322). Furthermore, the camera adapter 120c transfers the three-dimensional model information created by the camera adapter 120a and received from the camera adapter 120b to the camera adapter 120d (F6323).

最終的に、カメラアダプタ１２０ａ〜１２０ｄが作成した前景画像、背景画像、及び三次元モデル情報は、ネットワーク接続されたカメラアダプタ１２０間を逐次伝送され、フロントエンドサーバ２３０に伝送される。 Finally, the foreground image, the background image, and the three-dimensional model information created by the camera adapters 120a to 120d are sequentially transmitted between the networked camera adapters 120 and transmitted to the front end server 230.

なお、本シーケンス図ではカメラアダプタ１２０ａ及びカメラアダプタ１２０ｄのキャリブレーション処理、前景背景分離処理、圧縮処理、及び三次元モデル情報作成処理については記載を省略している。しかし実際には、カメラアダプタ１２０ａ及びカメラアダプタ１２０ｄも、カメラアダプタ１２０ｂやカメラアダプタ１２０ｃと同様の処理を行い、前景画像、背景画像及び三次元モデル情報を作成している。また、ここでは４台のカメラアダプタ１２０間のデータ転送シーケンスについて説明したが、カメラアダプタ１２０の数が増えても同様の処理が行われる。 In the sequence diagram, the description of calibration processing of the camera adapter 120a and the camera adapter 120d, foreground / background separation processing, compression processing, and three-dimensional model information creation processing is omitted. However, in practice, the camera adapter 120a and the camera adapter 120d also perform the same processing as the camera adapter 120b and the camera adapter 120c, and create a foreground image, a background image, and three-dimensional model information. Further, although the data transfer sequence between the four camera adapters 120 has been described here, the same processing is performed even if the number of camera adapters 120 increases.

ここまで説明したように、複数のカメラアダプタ１２０のうち、予め定められた順序において最後のカメラアダプタ１２０以外のカメラアダプタ１２０は、対応するカメラ１１２による撮影画像から所定領域を抽出する。そしてその抽出結果に基づく画像データを、上記の予め定められた順序において次のカメラアダプタ１２０へ出力する。一方、上記の予め定められた順序において最後のカメラアダプタ１２０は、抽出結果に基づく画像データをフロントエンドサーバ２３０へ出力する。すなわち、複数のカメラアダプタ１２０はデイジーチェーンで接続され、各カメラアダプタ１２０が撮影画像から所定領域を抽出した結果に基づく画像データは、予め定められたカメラアダプタ１２０によってフロントエンドサーバ２３０へ入力される。このようなデータの伝送方式を用いることで、画像処理システム１００内におけるセンサシステム１１０の数が変動した場合の、フロントエンドサーバ２３０における処理負荷やネットワークの伝送負荷の変動を抑制することができる。また、カメラアダプタ１２０が出力する画像データは、上記の抽出結果に基づく画像データと、予め定められた順序において前のカメラアダプタ１２０による所定領域の抽出結果に基づく画像データとを用いて生成されるデータであってもよい。例えば、各カメラアダプタ１２０が自身による抽出結果と前のカメラアダプタ１２０による抽出結果の差分に基づく画像データを出力することで、システム内の伝送データ量を低減することができる。上記の順序において最後のカメラアダプタ１２０は、他のカメラ１１２による撮影画像から他のカメラアダプタ１２０により抽出された所定領域の画像データに基づく抽出画像データを上記の他のカメラアダプタ１２０から取得する。そして、自身が抽出した所定領域の抽出結果と、他のカメラアダプタ１２０から取得した抽出画像データとに応じた画像データを、仮想視点画像を生成するための画像コンピューティングサーバ２００に対して出力する。 As described above, among the plurality of camera adapters 120, the camera adapters 120 other than the last camera adapter 120 in the predetermined order extract a predetermined area from the image captured by the corresponding camera 112. Then, image data based on the extraction result is output to the next camera adapter 120 in the above-described predetermined order. On the other hand, the last camera adapter 120 outputs image data based on the extraction result to the front end server 230 in the above-described predetermined order. That is, the plurality of camera adapters 120 are connected in a daisy chain, and image data based on the result of extracting each predetermined area from the photographed image by each camera adapter 120 is input to the front end server 230 by the predetermined camera adapter 120. . By using such a data transmission method, it is possible to suppress fluctuations in processing load on the front end server 230 and transmission load in the network when the number of sensor systems 110 in the image processing system 100 fluctuates. The image data output from the camera adapter 120 is generated using the image data based on the above extraction result and the image data based on the extraction result of the predetermined area by the previous camera adapter 120 in a predetermined order. It may be data. For example, when each camera adapter 120 outputs image data based on the difference between the extraction result by itself and the extraction result by the previous camera adapter 120, it is possible to reduce the amount of transmission data in the system. In the above order, the last camera adapter 120 acquires extracted image data from the other camera adapter 120 based on the image data of the predetermined area extracted by the other camera adapter 120 from the image taken by the other camera 112. Then, image data corresponding to the extraction result of the predetermined area extracted by itself and the extracted image data acquired from another camera adapter 120 is output to the image computing server 200 for generating a virtual viewpoint image. .

また、カメラアダプタ１２０は、カメラ１１２が撮影した画像を前景部分と背景部分に分け、例えばそれぞれの重要度に応じて圧縮率や伝送するフレームレートを変える。このことにより、カメラ１１２が撮影したデータの全てをフロントエンドサーバ２３０に伝送する場合よりも伝送量を低減する事ができる。また、三次元モデル生成に必要な三次元モデル情報を各カメラアダプタ１２０が逐次作成する。この事により、全てのデータをフロントエンドサーバ２３０に集結させ、サーバで全ての三次元モデル生成処理を行う場合と比較し、サーバの処理負荷を低減させる事ができ、よりリアルタイムに三次元モデル生成を可能とする事ができる。 In addition, the camera adapter 120 divides the image captured by the camera 112 into a foreground portion and a background portion, and changes the compression rate and the frame rate to be transmitted according to, for example, the respective degrees of importance. As a result, it is possible to reduce the amount of transmission compared to the case of transmitting all the data captured by the camera 112 to the front end server 230. Also, each camera adapter 120 sequentially creates three-dimensional model information necessary for three-dimensional model generation. By this, it is possible to reduce the processing load of the server as compared to the case where all data are collected in the front end server 230 and all three dimensional model generation processing is performed by the server, and three dimensional model generation in more real time Can be made possible.

図１５は、カメラから撮像画像を受信した際のカメラアダプタの動作を示すフローチャートである。具体的には、カメラアダプタ１２０が、受信した撮像画像から前景画像及び背景画像を生成し後続のカメラアダプタ１２０へ転送する処理である。 FIG. 15 is a flowchart showing an operation of the camera adapter when receiving a captured image from the camera. Specifically, the camera adapter 120 generates the foreground image and the background image from the received captured image and transfers the foreground image and the background image to the subsequent camera adapter 120.

カメラアダプタ１２０は、自身に接続されているカメラ１１２から撮影画像を取得する（Ｓ６５０１）。次に、取得した撮影画像を前景画像及び背景画像に分離する処理を実施する（Ｓ６５０２）。なお、本実施形態における前景画像は、カメラ１１２から取得した撮影画像に対する所定オブジェクトの検出結果に基づいて決定される画像である。所定オブジェクトとは、例えば人物である。ただし、オブジェクトが特定人物（選手、監督、及び／又は審判など）であっても良いし、ボールやゴールなど、画像パターンが予め定められている物体であっても良い。また、オブジェクトとして、動体が検出されるようにしても良い。 The camera adapter 120 acquires a photographed image from the camera 112 connected to the camera adapter 120 (S6501). Next, processing is performed to separate the acquired captured image into a foreground image and a background image (S6502). The foreground image in the present embodiment is an image that is determined based on the detection result of a predetermined object with respect to the captured image acquired from the camera 112. The predetermined object is, for example, a person. However, the object may be a specific person (such as a player, a director, and / or an umpire), or an object such as a ball or a goal, for which an image pattern is predetermined. Also, a moving body may be detected as an object.

次に、分離した前景画像及び背景画像の圧縮処理を行う（Ｓ６５０３）。前景画像に対してはロスレス圧縮が行われ、前景画像は高画質を維持する。背景画像に対してはロスあり圧縮が行われ、伝送データ量が削減される。 Next, compression processing of the separated foreground image and background image is performed (S6503). Lossless compression is performed on the foreground image, and the foreground image maintains high image quality. Lossy compression is performed on the background image to reduce the amount of transmission data.

次にカメラアダプタ１２０は、圧縮した前景画像と背景画像を次のカメラアダプタ１２０へ転送する（Ｓ６５０４）。なお背景画像に関しては毎フレーム転送するのではなく転送フレームを間引いて転送してもよい。例えば、撮影画像が６０ｆｐｓである場合に、前景画像は毎フレーム伝送されるが、背景画像は１秒間の６０フレーム中１フレームのみが伝送される。これにより伝送データ量の削減を行う事ができる特有の効果がある。 Next, the camera adapter 120 transfers the compressed foreground image and background image to the next camera adapter 120 (S6504). As for the background image, transfer frames may be thinned out and transferred instead of transferring each frame. For example, when the captured image is 60 fps, the foreground image is transmitted every frame, but the background image is transmitted only one frame out of 60 frames per second. As a result, there are special effects that can reduce the amount of transmission data.

またカメラアダプタ１２０は、次のカメラアダプタ１２０へ前景画像及び背景画像を転送する際に、メタ情報を付与してもよい。例えば、カメラアダプタ１２０またはカメラ１１２の識別子や、フレーム内の前景画像の位置（ｘｙ座標）や、データサイズ、フレーム番号、及び撮影時刻などがメタ情報として付与される。また注視点を識別するための注視点グループ情報や、前景画像及び背景画像を識別するデータ種別情報などが付与されてもよい。但し付与されるデータの内容はこれらに限定される訳ではなく、他のデータが付与してもよい。 Also, the camera adapter 120 may add meta information when transferring the foreground image and the background image to the next camera adapter 120. For example, the identifier of the camera adapter 120 or the camera 112, the position (xy coordinates) of the foreground image in the frame, the data size, the frame number, the photographing time, and the like are given as meta information. Also, gaze point group information for identifying a gaze point, data type information for identifying a foreground image and a background image, and the like may be added. However, the contents of the data to be given are not limited to these, and other data may be given.

なお、カメラアダプタ１２０がデイジーチェーンを通じてデータを伝送する際に、自身に接続されたカメラ１１２と相関の高いカメラ１１２の撮影画像のみを選択的に処理することで、カメラアダプタ１２０における伝送処理負荷を軽減することができる。また、デイジーチェーン伝送において、何れかのカメラアダプタ１２０において故障が発生してもカメラアダプタ１２０間のデータ伝送が停止しないようにシステムを構成することで、ロバスト性を確保できる。 In addition, when the camera adapter 120 transmits data through the daisy chain, the transmission processing load on the camera adapter 120 can be reduced by selectively processing only the photographed image of the camera 112 highly correlated with the camera 112 connected to the camera adapter 120. It can be reduced. Further, in the daisy chain transmission, robustness can be ensured by configuring the system so that data transmission between the camera adapters 120 does not stop even if a failure occurs in any of the camera adapters 120.

図１６は、隣接するカメラアダプタからデータを受信した際のカメラアダプタの動作を示すフローチャートである。 FIG. 16 is a flowchart showing the operation of the camera adapter when data is received from an adjacent camera adapter.

まずカメラアダプタ１２０は隣接するカメラアダプタ１２０からデータを受信する（Ｓ６６０１）。カメラアダプタ１２０は自身の転送モードがバイパス制御モードか否かを判断する（Ｓ６６０２）。なおバイパス制御モードの詳細については図１９を参照して後述する。 First, the camera adapter 120 receives data from the adjacent camera adapter 120 (S6601). The camera adapter 120 determines whether its own transfer mode is the bypass control mode (S6602). The details of the bypass control mode will be described later with reference to FIG.

バイパス制御モードの場合は、カメラアダプタ１２０は、次のカメラアダプタ１２０へデータを転送する（Ｓ６６１１）。バイパス制御モードでない場合は、受信したデータのパケットを解析する（Ｓ６６０３）。 In the case of the bypass control mode, the camera adapter 120 transfers data to the next camera adapter 120 (S6611). If the bypass control mode is not set, the packet of the received data is analyzed (S6603).

カメラアダプタ１２０は、パケットを解析した結果、バイパス伝送制御対象のパケットであると判断した場合は（Ｓ６６０４のＹｅｓ）、次のカメラアダプタ１２０へデータを転送する（Ｓ６６１０）。バイパス伝送制御対象のパケットは、例えば三次元モデル情報生成に利用しない画像データまたは後述する制御メッセージや時刻補正に係わるメッセージである。なおバイパス伝送制御の詳細については図１８を参照して後述する。 As a result of analyzing the packet, if the camera adapter 120 determines that the packet is a bypass transmission control target packet (Yes in S6604), it transfers data to the next camera adapter 120 (S6610). The packet subjected to bypass transmission control is, for example, image data not used for three-dimensional model information generation, a control message described later, or a message related to time correction. The details of the bypass transmission control will be described later with reference to FIG.

カメラアダプタ１２０は、バイパス伝送制御対象ではないと判断した場合は、データ種別を判別し（Ｓ６６０５）、データの種別に応じた処理を行う。 If it is determined that the camera adapter 120 is not the bypass transmission control target, the camera adapter 120 determines the data type (S6605), and performs processing according to the data type.

データの種別が、制御ステーション３１０から自身のカメラアダプタ１２０宛ての制御メッセージパケットである場合、制御メッセージを解析し、解析結果に基づき処理を行う（Ｓ６６０６）。制御メッセージの送信元が制御ステーション３１０でなく他のノードである場合も同様である。また、パケットが自身のカメラアダプタ１２０宛ての場合だけではなく、カメラアダプタ１２０が属する注視点グループ宛てである場合も同様である。また、カメラアダプタ１２０が行う処理の例としては、カメラアダプタ１２０に接続されるマイク１１１、カメラ１１２及び雲台１１３の制御や、カメラアダプタ１２０自身の制御がある。カメラアダプタ１２０は、制御メッセージの内容に応じて制御結果を送信元もしくは指示されたノードに対して返送する。またパケットがグループ宛ての制御メッセージの場合は次のカメラアダプタ１２０へ制御メッセージを転送する。 If the type of data is a control message packet addressed to the camera adapter 120 from the control station 310, the control message is analyzed and processing is performed based on the analysis result (S6606). The same applies when the source of the control message is not the control station 310 but another node. In addition, not only when the packet is addressed to its own camera adapter 120 but also when addressed to a fixation point group to which the camera adapter 120 belongs. Further, examples of processing performed by the camera adapter 120 include control of the microphone 111 connected to the camera adapter 120, control of the camera 112 and camera platform 113, and control of the camera adapter 120 itself. The camera adapter 120 returns the control result to the sender or designated node according to the content of the control message. If the packet is a control message addressed to a group, the control message is transferred to the next camera adapter 120.

次にカメラアダプタ１２０は、データ種別が時刻補正に係わる場合は時刻補正処理を行う（Ｓ６６０７）。例えばタイムサーバ２９０との間でのＰＴＰ処理に基づき自身の時刻補正を行う。そして補正した時刻に基づきマイク１１１及びカメラ１１２へ供給するワードクロックの補正を行う。なお時刻の補正量が大きい場合にワードクロックのタイミングを一度に変更すると音声や画像品質に影響が出るため、予め設定された変更量に基づき徐々に時刻を補正する処理を行ってもよい。またカメラアダプタ１２０は、作成した三次元モデル情報及び三次元モデル情報作成に使用した前景画像などを、フロントエンドサーバ２３０に送信するために次のカメラアダプタ１２０へ転送する。 Next, when the data type relates to time correction, the camera adapter 120 performs time correction processing (S6607). For example, based on PTP processing with the time server 290, it performs its own time correction. Then, the word clock supplied to the microphone 111 and the camera 112 is corrected based on the corrected time. If the timing of the word clock is changed at one time when the correction amount of time is large, the voice and the image quality are affected. Therefore, the time may be gradually corrected based on the change amount set in advance. Also, the camera adapter 120 transfers the created three-dimensional model information and the foreground image used for creating the three-dimensional model information to the next camera adapter 120 for transmission to the front end server 230.

次にカメラアダプタ１２０は、データ種別が前景画像または背景画像の場合に三次元モデル情報作成処理を行う（Ｓ６６０８）。 Next, the camera adapter 120 performs three-dimensional model information creation processing when the data type is a foreground image or a background image (S6608).

図１７は、注視点グループを説明する図である。各カメラ１１２は光軸が特定の注視点６３０２を向くように設置される。同じ注視点グループ６３０１に分類されるカメラ１１２は、同じ注視点６３０２を向くように設置される。 FIG. 17 is a diagram for explaining a gaze point group. Each camera 112 is installed such that the optical axis is directed to a particular gaze point 6302. The cameras 112 classified into the same gaze point group 6301 are installed to face the same gaze point 6302.

図１７では、注視点Ａ（６３０２Ａ）及び注視点Ｂ（６３０２Ｂ）の２つの注視点６３０２が設定され、９台のカメラ（１１２ａ〜１１２ｉ）が設置された場合の例を示している。４台のカメラ（１１２ａ、１１２ｃ、１１２ｅ及び１１２ｇ）は、同じ注視点Ａ（６３０２Ａ）を向いており、注視点グループＡ（６３０１Ａ）に属する。また、残りの５台のカメラ（１１２ｂ、１１２ｄ、１１２ｆ、１１２ｈ及び１１２ｉ）は、同じ注視点Ｂ（６３０２Ｂ）を向いており、注視点グループＢ（６３０１Ｂ）に属する。 FIG. 17 shows an example in which two fixation points 6302 of fixation point A (6302A) and fixation point B (6302B) are set and nine cameras (112a to 112i) are installed. Four cameras (112a, 112c, 112e and 112g) face the same fixation point A (6302A) and belong to fixation point group A (6301A). The remaining five cameras (112b, 112d, 112f, 112h and 112i) face the same fixation point B (6302B) and belong to the fixation point group B (6301B).

ここでは、同じ注視点グループ６３０１に属するカメラ１１２の中で最も近い（接続ホップ数が小さい）カメラ１１２の組を「論理的に隣接している」と表現する。例えば、カメラ１１２ａとカメラ１１２ｂは、物理的には隣接しているが、異なる注視点グループ６３０１に属するため論理的には隣接していない。カメラ１１２ａと論理的に隣接しているのは、カメラ１１２ｃである。一方、カメラ１１２ｈとカメラ１１２ｉは、物理的に隣接しているだけでなく、論理的にも隣接している。 Here, among the cameras 112 belonging to the same gaze point group 6301, a set of cameras 112 (the number of connection hops is small) is expressed as “logically adjacent”. For example, the camera 112 a and the camera 112 b are physically adjacent but not logically adjacent because they belong to different gaze point groups 6301. Logically adjacent to the camera 112a is a camera 112c. On the other hand, the camera 112 h and the camera 112 i are not only physically adjacent but also logically adjacent.

物理的に隣接するカメラ１１２が論理的にも隣接しているか否かにより、カメラアダプタ１２０で異なる処理が行われる。以下で具体的な処理について説明する。 Depending on whether or not the physically adjacent cameras 112 are logically adjacent, different processes are performed in the camera adapter 120. Specific processing will be described below.

図１８は、バイパス伝送制御を説明する図である。バイパス伝送制御は、各カメラアダプタ１２０が属する注視点グループに応じて伝送データをバイパスする機能である。外部機器制御部６１４０、各画像処理部６１３０、伝送部６１２０、及びネットワークアダプタ６１１０を構成する機能部の記載は省略している。 FIG. 18 is a diagram for explaining bypass transmission control. The bypass transmission control is a function of bypassing transmission data according to a gaze point group to which each camera adapter 120 belongs. Descriptions of functional units constituting the external device control unit 6140, each image processing unit 6130, the transmission unit 6120, and the network adapter 6110 are omitted.

画像処理システム１００において、カメラアダプタ１２０の台数や、どのカメラアダプタ１２０がどの注視点グループに属するかの設定は変更可能である。図１８では、注視点グループＡにカメラアダプタ１２０ｇ、カメラアダプタ１２０ｈ及びカメラアダプタ１２０ｎが属し、注視点グループＢにカメラアダプタ１２０ｉが属していることとする。 In the image processing system 100, the number of camera adapters 120 and the setting of which camera adapter 120 belongs to which gaze point group can be changed. In FIG. 18, it is assumed that the camera adapter 120 g, the camera adapter 120 h, and the camera adapter 120 n belong to the fixation point group A, and the camera adapter 120 i belongs to the fixation point group B.

ルート６４５０はカメラアダプタ１２０ｇが作成した前景画像の伝送ルートを示しており、前景画像は最終的にフロントエンドサーバ２３０へ伝送される。本図では、背景画像、三次元モデル情報、及び制御メッセージや、カメラアダプタ１２０ｈ、カメラアダプタ１２０ｉ及びカメラアダプタ１２０ｎが作成した前景画像の記載は省略している。 A route 6450 indicates a transmission route of the foreground image generated by the camera adapter 120g, and the foreground image is finally transmitted to the front end server 230. In the drawing, the description of the background image, the three-dimensional model information, the control message, and the foreground image created by the camera adapter 120h, the camera adapter 120i and the camera adapter 120n is omitted.

カメラアダプタ１２０ｈは、カメラアダプタ１２０ｇが作成した前景画像を、ネットワークアダプタ６１１０ｈを介して受信し、伝送部６１２０ｈによってルーティング先を決定する。伝送部６１２０ｈは、受信した前景画像の作成元のカメラアダプタ１２０ｇが同じ注視点グループ（ここではグループＡ）に属していると判断すると、受信した前景画像を画像処理部６１３０ｈへ転送する。画像処理部６１３０ｈにおいて、カメラアダプタ１２０ｇが作成し送信した前景画像に基づいて三次元モデル情報を生成されると、カメラアダプタ１２０ｇの前景画像は次のカメラアダプタ１２０ｉに転送される。 The camera adapter 120h receives the foreground image created by the camera adapter 120g via the network adapter 6110h, and determines the routing destination by the transmission unit 6120h. If it is determined that the camera adapter 120g from which the received foreground image is created belongs to the same gaze point group (here, group A), the transmission unit 6120h transfers the received foreground image to the image processing unit 6130h. When three-dimensional model information is generated in the image processing unit 6130 h based on the foreground image generated and transmitted by the camera adapter 120 g, the foreground image of the camera adapter 120 g is transferred to the next camera adapter 120 i.

次にカメラアダプタ１２０ｉは、カメラアダプタ１２０ｈからカメラアダプタ１２０ｇが作成した前景画像を受信する。カメラアダプタ１２０ｉの伝送部６１２０ｉはカメラアダプタ１２０ｇと自身が属する注視点グループが異なる事を判断すると、画像処理部６１３０ｉには転送せず次のカメラアダプタ１２０に転送する。 Next, the camera adapter 120i receives the foreground image created by the camera adapter 120g from the camera adapter 120h. If the transmission unit 6120i of the camera adapter 120i determines that the fixation point group to which the camera adapter 120g belongs differs, the transmission unit 6120i does not transfer it to the image processing unit 6130i, but transfers it to the next camera adapter 120.

次にカメラアダプタ１２０ｎは、カメラアダプタ１２０ｇが作成した前景画像を、ネットワークアダプタ６１１０ｎを介して受信し、伝送部６１２０ｎによってルーティング先を決定する。伝送部６１２０ｎは、カメラアダプタ１２０ｎがカメラアダプタ１２０ｇと同じ注視点グループであると判断する。しかし、画像処理部６１３０ｎによりカメラアダプタ１２０ｇの前景画像が三次元モデル情報生成に必要な前景画像ではないと判断されると、前景画像はそのまま次のカメラアダプタ１２０にデイジーチェーンのネットワークを介して転送される。 Next, the camera adapter 120 n receives the foreground image created by the camera adapter 120 g via the network adapter 6110 n, and determines the routing destination by the transmission unit 6120 n. The transmission unit 6120 n determines that the camera adapter 120 n is the same gaze point group as the camera adapter 120 g. However, when it is determined by the image processing unit 6130 n that the foreground image of the camera adapter 120 g is not a foreground image necessary for generating three-dimensional model information, the foreground image is directly transferred to the next camera adapter 120 via the daisy chain network. Be done.

このように各カメラアダプタ１２０の伝送部６１２０は、受信したデータが画像処理部６１３０における画像処理による三次元モデル情報の作成に必要なデータか否かを判断する。画像処理に必要なデータではない、つまり自身のカメラアダプタ１２０との相関が低いデータであると判断すると、画像処理部６１３０へデータを転送することなく、次のカメラアダプタ１２０に伝送する。つまり、デイジーチェーン１７０を介したデータの伝送において、各カメラアダプタ１２０で必要なデータが選択されて逐次三次元モデル情報を生成する処理が実施される。これによりカメラアダプタ１２０内でデータ受信してから転送するまでのデータ転送に係わる処理負荷及び処理時間を短縮する事ができる。 As described above, the transmission unit 6120 of each camera adapter 120 determines whether the received data is data necessary for creating three-dimensional model information by image processing in the image processing unit 6130. If it is determined that the data is not data necessary for image processing, that is, data having a low correlation with its own camera adapter 120, the data is transmitted to the next camera adapter 120 without being transferred to the image processing unit 6130. That is, in the data transmission via the daisy chain 170, necessary data is selected in each camera adapter 120, and processing of sequentially generating three-dimensional model information is performed. As a result, processing load and processing time related to data transfer from data reception to transfer in the camera adapter 120 can be shortened.

図１９は、カメラアダプタ１２０ｂのバイパス制御を説明する図である。なお外部機器制御部６１４０、各画像処理部６１３０、伝送部６１２０、及びネットワークアダプタ６１１０を構成する機能部の記載は省略している。 FIG. 19 is a diagram for explaining bypass control of the camera adapter 120b. Note that descriptions of functional units constituting the external device control unit 6140, the respective image processing units 6130, the transmission unit 6120, and the network adapter 6110 are omitted.

バイパス制御とは、カメラアダプタ１２０ｂが、カメラアダプタ１２０ｃから受信したデータを、伝送部６１２０のデータルーティング処理部６１２２によるルーティング制御を行わずに次のカメラアダプタ１２０ａへ転送する機能である。 The bypass control is a function of transferring data received from the camera adapter 120c to the next camera adapter 120a without performing routing control by the data routing processing unit 6122 of the transmission unit 6120.

例えばカメラアダプタ１２０ｂは、カメラ１１２ｂの状態が撮影停止中やキャリブレーション中、又はエラー処理中である場合に、ネットワークアダプタ６１１０に対してバイパス制御を起動させる。また例えば、伝送部６１２０または画像処理部６１３０などの動作不良などが発生した場合にも、バイパス制御を起動させる。また、ネットワークアダプタ６１１０が伝送部６１２０の状態を検知し、能動的にバイパス制御モードに遷移してもよい。なお、伝送部６１２０または画像処理部６１３０がエラー状態や停止状態にあることを検知するサブＣＰＵをカメラアダプタ１２０ｂに配備し、サブＣＰＵがエラー検知を行った場合にネットワークアダプタ６１１０をバイパス制御にする処理を加えてもよい。これにより、各機能ブロックのフォールト状態とバイパス制御を独立して制御できる効果がある。 For example, the camera adapter 120 b causes the network adapter 6110 to start bypass control when the state of the camera 112 b is in the state of shooting stop, calibration or error processing. Further, for example, even when an operation failure or the like of the transmission unit 6120 or the image processing unit 6130 occurs, the bypass control is activated. In addition, the network adapter 6110 may detect the state of the transmission unit 6120 and actively transition to the bypass control mode. Note that a sub CPU that detects that the transmission unit 6120 or the image processing unit 6130 is in an error state or a stop state is deployed in the camera adapter 120b, and the network adapter 6110 is subjected to bypass control when an error is detected by the sub CPU. Processing may be added. This has the effect of being able to independently control the fault state and bypass control of each functional block.

また、カメラアダプタ１２０は、カメラ１１２の状態がキャリブレーション状態から撮影状態に遷移した場合や、伝送部６１２０などが動作不良から復旧した場合に、バイパス制御モードから通常の通信モードに遷移してもよい。 Also, even if the camera adapter 120 transitions from the bypass control mode to the normal communication mode when the state of the camera 112 transitions from the calibration state to the imaging state, or when the transmission unit 6120 or the like recovers from a malfunction. Good.

このバイパス制御機能により、カメラアダプタ１２０はデータ転送を高速に行う事ができ、また不慮の故障などが発生しデータルーティングに係わる判断ができない場合でも次のカメラアダプタ１２０ａへデータを転送する事ができる。 With this bypass control function, the camera adapter 120 can perform data transfer at high speed, and can transfer data to the next camera adapter 120 a even if an unexpected failure or the like occurs and it is not possible to make a determination related to data routing. .

本システムにおいては、前景画像、背景画像、および三次元モデル情報が、デイジーチェーンで接続された複数のカメラアダプタ１２０間を伝送されてフロントエンドサーバ２３０へ入力される。ここで、撮影画像内で前景領域が極端に多くなるイベント、例えば全選手が集う開会式などが撮影される場合には、伝送される前景画像のデータ量が通常の競技を撮影する場合よりも膨大になる。そこで、デイジーチェーンで伝送されるデータ量が伝送帯域を超過しないように制御するための方法を以下に示す。 In the present system, foreground images, background images, and three-dimensional model information are transmitted between a plurality of camera adapters 120 connected in a daisy chain and input to the front end server 230. Here, when an event in which the foreground area becomes extremely large in the photographed image, for example, an opening ceremony or the like in which all players gather is photographed, the amount of data of the foreground image to be transmitted is more than when photographing a regular competition. It will be huge. Therefore, a method for controlling so that the amount of data transmitted by the daisy chain does not exceed the transmission band will be shown below.

図２０および図２１を使用して、カメラアダプタ１２０において伝送部６１２０がデータを出力する処理のフローについて説明する。 The flow of processing in which the transmission unit 6120 outputs data in the camera adapter 120 will be described using FIG. 20 and FIG.

図２０は、カメラアダプタ１２０ａ、１２０ｂ及び１２０ｃ間のデータの流れを例示的に示す図である。カメラアダプタ１２０ａとカメラアダプタ１２０ｂ、及びカメラアダプタ１２０ｂとカメラアダプタ１２０ｃが其々接続されている。また、カメラアダプタ１２０ｂにはカメラ１１２ｂが接続されており、カメラアダプタ１２０ｃはフロントエンドサーバ２３０と接続されている。カメラアダプタ１２０ｂの伝送部６１２０のデータ出力処理フローについて説明する。 FIG. 20 exemplarily shows the flow of data between camera adapters 120a, 120b and 120c. The camera adapter 120a and the camera adapter 120b, and the camera adapter 120b and the camera adapter 120c are often connected. Further, a camera 112 b is connected to the camera adapter 120 b, and the camera adapter 120 c is connected to the front end server 230. A data output processing flow of the transmission unit 6120 of the camera adapter 120b will be described.

カメラアダプタ１２０ｂの伝送部６１２０には、カメラ１１２ｂから撮影データ６７２０が入力され、カメラアダプタ１２０ａから画像処理された入力データ６７２１及び６７２２が入力される。また伝送部６１２０は、入力されたデータに対して、画像処理部６１３０への出力、圧縮、フレームレートの設定、およびパケット化等の処理を行って、そのデータをネットワークアダプタ６１１０に出力している。 Photographed data 6720 is input from the camera 112b to the transmission unit 6120 of the camera adapter 120b, and input data 6721 and 6722 subjected to image processing are input from the camera adapter 120a. The transmission unit 6120 performs processing such as output to the image processing unit 6130, compression, setting of a frame rate, and packetization on the input data, and outputs the data to the network adapter 6110. .

図２１は、伝送部６１２０における出力処理のフローチャートである。伝送部６１２０は、画像処理部６１３０からの入力データ６７２１及び６７２０の各々について画像処理結果のデータ量を取得するステップ（Ｓ６７０１）を実行する。 FIG. 21 is a flowchart of output processing in the transmission unit 6120. The transmission unit 6120 executes the step (S6701) of acquiring the data amount of the image processing result for each of the input data 6721 and 6720 from the image processing unit 6130.

次に、カメラアダプタ１２０ａからの入力データ６７２２のデータ量を取得するステップ（Ｓ６７０２）を実行する。次に、カメラアダプタ１２０ｃへの出力データ量導出について、入力データのデータ種別に応じて導出するステップ（Ｓ６７０３）を実行する。 Next, the step (S6702) of acquiring the data amount of the input data 6722 from the camera adapter 120a is executed. Next, the step (S6703) of deriving the amount of output data to the camera adapter 120c is performed according to the data type of the input data.

次に伝送部６１２０は、出力データ量と所定の伝送帯域制約量を比較し、伝送可能性を確認する。具体的には、ネットワークアダプタ６１１０へ出力するデータ量が予め指定された出力データ量の閾値を超えるか否かを判断する（Ｓ６７０４）。なお閾値はデータ種別（ここでは、前景画像、背景画像、全景フレームデータ、及び三次元モデル情報等があげられる）ごとに設けられてもよい。また出力するデータ量については、伝送部６１２０でデータを圧縮する場合は圧縮結果に基づいて導出される。なお出力データ量の閾値はパケット化する際のヘッダ情報やエラー訂正情報等のオーバヘッドを考慮して設定されることが望ましい。 Next, the transmission unit 6120 compares the amount of output data with a predetermined transmission band restriction amount to confirm the transmission possibility. Specifically, it is determined whether the amount of data to be output to the network adapter 6110 exceeds a threshold of the amount of output data specified in advance (S 6704). The threshold may be provided for each data type (here, a foreground image, a background image, whole scene frame data, three-dimensional model information, and the like can be mentioned). The amount of data to be output is derived based on the compression result when the transmission unit 6120 compresses the data. The threshold value of the amount of output data is preferably set in consideration of overhead such as header information and error correction information at the time of packetization.

伝送部６１２０が、出力データ量が閾値を超えないと判断した場合は、入力データをネットワークアダプタ６１１０へ出力する通常転送を行う（Ｓ６７１２）。出力データ量が閾値を超えたと判断された場合（Ｓ６７０４のＹｅｓ）、伝送部６１２０に入力されたデータが画像データの場合は出力データ量オーバ時のポリシーを取得する（Ｓ６７０５）。そして、取得したポリシーに基づいて、以下で説明する複数の処理（Ｓ６７０７〜Ｓ６７１１）の少なくとも何れかを選択して（Ｓ６７０６）実行する。なお伝送部６１２０は、画像データ以外の時刻補正に係わるデータや制御メッセージ係わるデータについては通常転送を行ってもよい。また、メッセージの種別や優先度に応じてメッセージをドロップしてもよい。出力データのデータ量を減らすことによってデータ転送のオーバーフローを抑止することができる。 If the transmission unit 6120 determines that the amount of output data does not exceed the threshold value, it performs normal transfer to output input data to the network adapter 6110 (S6712). If it is determined that the output data amount has exceeded the threshold (Yes in S6704), if the data input to the transmission unit 6120 is image data, a policy for when the output data amount is over is acquired (S6705). Then, based on the acquired policy, at least one of a plurality of processes (S6707 to S6711) described below is selected and executed (S6706). The transmission unit 6120 may normally transfer data other than image data related to time correction and data related to a control message. Also, the message may be dropped according to the type and priority of the message. Overflow of data transfer can be suppressed by reducing the amount of output data.

伝送部６１２０が実行する処理の１つとして、画像データのフレームレートを落としてネットワークアダプタ６１１０へ出力する（Ｓ６７０７）。フレームを間引いて伝送することによりデータ量が削減される。ただし、動きの速いオブジェクトを追従する際には高フレームレートで出力する場合と比較し画質面で劣る虞があるため、対象となる撮影シーンに応じて本手法の適用可否が判断される。 As one of the processes executed by the transmission unit 6120, the frame rate of the image data is dropped and output to the network adapter 6110 (S6707). By decimating and transmitting frames, the amount of data is reduced. However, when following fast-moving objects, there is a possibility that the image quality may be inferior as compared with the case of outputting at a high frame rate, so the applicability of this method is determined according to the target shooting scene.

また別の処理として、伝送部６１２０は、画像データの解像度を落としてネットワークアダプタ６１１０へ出力する（Ｓ６７０８）。この処理は出力画像の画質に影響するため、エンドユーザ端末の種別に応じてポリシー設定がされる。例えば、スマートフォンへ出力する場合は解像度を大きく落としてデータ量削減を行い、高解像度ディスプレイ等へ出力する場合は解像度を小さく落とす等の適応的な解像度変換に関するポリシー設定がされる。 As another process, the transmission unit 6120 reduces the resolution of the image data and outputs it to the network adapter 6110 (S6708). Since this process affects the image quality of the output image, a policy is set according to the type of end user terminal. For example, when outputting to a smart phone, the resolution is greatly reduced to reduce the data amount, and when outputting to a high resolution display or the like, policy setting regarding adaptive resolution conversion such as decreasing the resolution is set.

また別の処理として、伝送部６１２０は、画像データの圧縮率を上げてネットワークアダプタ６１１０へ出力する（Ｓ６７０９）。ここでは、入力画像データに対して、ロスレス圧縮、あるいはロッシー圧縮等の復元性能要求、つまり、画像品質の要求に応じてデータ量削減が図られる。 As another process, the transmission unit 6120 increases the compression rate of the image data and outputs the image data to the network adapter 6110 (S6709). Here, with respect to the input image data, data amount reduction is achieved according to the restoration performance request such as lossless compression or lossy compression, that is, the image quality request.

また別の処理として、伝送部６１２０は、画像処理部６１３０からの撮影データ６７２０の出力を停止する（Ｓ６７１０）。ここでは、画像処理を施した画像データの出力を停止してデータ量削減を図る。十分な台数のカメラ１１２が配備されている場合は、仮想視点画像の生成において、同一注視点グループのカメラ１１２がすべて必須ではない場合がある。例えばスタジアムのフィールド全体を撮影する上でカメラ１１２を削減しても死角が発生しないことを事前に把握できている場合に本制御が適用される。つまり、後工程での画像の破綻が起きないことを条件とし、画像データの送信を行わないカメラを選定することで伝送帯域を確保することができる。 As another process, the transmission unit 6120 stops the output of the shooting data 6720 from the image processing unit 6130 (S6710). Here, the output of the image data subjected to the image processing is stopped to reduce the data amount. When a sufficient number of cameras 112 are deployed, there may be cases where not all the cameras 112 of the same gaze point group are required in the generation of virtual viewpoint images. For example, the present control is applied when it is known in advance that no blind spot will occur even if the camera 112 is reduced in photographing the entire field of the stadium. That is, it is possible to secure a transmission band by selecting a camera that does not transmit image data on condition that failure of the image in a post process does not occur.

また別の処理として、伝送部６１２０は、画像処理部６１３０からの入力データ６７２１の出力を停止するかまたはそのうちの一部のカメラアダプタ１２０の画像の出力のみ停止する（Ｓ６７１１）。上記に加えて、他のカメラアダプタ１２０からの入力画像を利用して三次元モデル情報を生成できた場合は、他のカメラアダプタ１２０からの前景画像や背景画像の出力を停止して、三次元モデル情報のみを出力制御することでデータ量削減を図ってもよい。 Further, as another process, the transmission unit 6120 stops the output of the input data 6721 from the image processing unit 6130 or stops the output of only an image of a part of the camera adapters 120 (S6711). In addition to the above, when three-dimensional model information can be generated using an input image from another camera adapter 120, the output of the foreground image or the background image from the other camera adapter 120 is stopped, and the three-dimensional image is generated. The data volume may be reduced by controlling only the model information.

出力データのデータ量を減らすのに使用された方法は後段のフロントエンドサーバ２３０を介してバックエンドサーバ２７０、仮想カメラ操作ＵＩ３３０、制御ステーション３１０へ通知される（Ｓ６７１３）。本実施形態では、ポリシーに応じてフレームレート、解像度、圧縮率、及びデータ停止等の制御処理の何れかが行われるようにフローが分岐しているがこれに限定されるものではない。これらの制御のうち複数を組み合わせで実行することでさらなるデータ量削減効果が得られることを明記しておく。また、Ｓ６７１３において本制御処理の通知が行われる。この通知により、仮想カメラ操作ＵＩ３３０において、例えば、圧縮率を上げた結果、画像品質面で十分な解像度が得られない場合はズーム操作に制約を設けることができる。さらに、伝送帯域制約量オーバ処理後も、逐次出力データ量の超過をチェックし、データ量が安定したら伝送処理のポリシーを元の設定値に戻すことができることをここに示す。 The method used to reduce the amount of output data is notified to the back end server 270, the virtual camera operation UI 330, and the control station 310 via the front end server 230 at a later stage (S6713). In the present embodiment, the flow is branched so that any of control processing such as frame rate, resolution, compression rate, and data stop is performed according to the policy, but the present invention is not limited to this. It should be clearly stated that the effect of reducing the amount of data can be obtained by executing a combination of two or more of these controls. In addition, notification of the control process is performed in S6713. By this notification, in the virtual camera operation UI 330, for example, when a sufficient resolution can not be obtained in terms of image quality as a result of increasing the compression rate, it is possible to place a restriction on the zoom operation. Furthermore, it is shown here that even after the transmission band constraint amount over processing, the excess of the output data amount is checked successively, and if the data amount is stabilized, the transmission processing policy can be returned to the original set value.

このように、デイジーチェーンの伝送帯域を超過するという課題に対して、状態に応じた伝送制御処理を行うことで、伝送帯域制約を満たす伝送を実現できるという効果がある。 Thus, there is an effect that transmission satisfying the transmission band constraint can be realized by performing the transmission control processing according to the state with respect to the problem of exceeding the transmission band of the daisy chain.

図２３は、カメラアダプタ１２０の画像処理部６１３０における各種処理のフローチャートである。図２３（ａ）の処理に先だって、まずキャリブレーション制御部６１３３は、入力された画像に対して、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因するブレに対して画像の位置を安定させるためのブレ補正処理などを行う。このブレ補正処理は例えば電子防振処理である。 FIG. 23 is a flowchart of various processes in the image processing unit 6130 of the camera adapter 120. Prior to the processing of FIG. 23A, the calibration control unit 6133 first performs color correction processing on an input image to suppress variation in color among cameras, and against blurring due to camera vibration. Shake correction processing for stabilizing the position of the image. The shake correction process is, for example, an electronic image stabilization process.

色補正処理では、フロントエンドサーバ２３０から受信したパラメータに基づいて、入力画像の画素値にオフセット値を加算するなどの処理が行われる。またブレ補正処理では、カメラに内蔵された加速度センサあるいはジャイロセンサなどのセンサからの出力データに基づき画像のブレ量が推定される。そして推定されたブレ量に基づいて入力画像に対する画像位置のシフトや画像の回転処理が行われることで、フレーム画像間のブレが抑制される。なおブレ補正の手法としてはその他の方法を用いてもよい。例えば、時間的に連続した複数のフレーム画像を比較することで画像の移動量を推定し補正するような画像処理による方法や、レンズシフト方式及びセンサシフト方式などのカメラの内部で実現する方法等でもよい。 In the color correction processing, processing such as adding an offset value to the pixel value of the input image is performed based on the parameters received from the front end server 230. In the shake correction process, the shake amount of the image is estimated based on output data from a sensor such as an acceleration sensor or a gyro sensor built in the camera. Then, shift of the image position with respect to the input image and rotation processing of the image are performed based on the estimated blur amount, thereby suppressing blur between frame images. Note that other methods may be used as the blur correction method. For example, a method based on image processing that estimates and corrects a moving amount of an image by comparing a plurality of temporally consecutive frame images, a method realized inside a camera such as a lens shift method and a sensor shift method, etc. May be.

背景更新部５００３は、入力画像と、メモリに保存されている背景画像とを用いて、背景画像５００２を更新する処理を行う。 The background updating unit 5003 performs processing of updating the background image 5002 using the input image and the background image stored in the memory.

図２２は、背景画像を説明する図である。図２２（ａ）は背景画像の一例を示す。更新処理は各画素に対して行われる。その処理フローを図２３（ａ）に示す。 FIG. 22 is a diagram for explaining a background image. FIG. 22A shows an example of the background image. The update process is performed on each pixel. The processing flow is shown in FIG.

まず背景更新部５００３は、Ｓ５００１で、入力画像の各画素に対して、背景画像内の対応する位置にある画素との差分を導出する。つぎに、Ｓ５００２で、差分が定められた閾値Ｋより小さいかどうか判定する。差分がＫより小さい場合にはその画素は背景であるという判断がされる（Ｓ５００２のＹＥＳ）。つぎに背景更新部５００３は、Ｓ５００３で、入力画像の画素値と背景画像の画素値とを一定の比率で混合した値を導出する。そしてＳ５００４で、背景画像内の画素値を導出した値で更新する。 First, at S5001, the background updating unit 5003 derives, with respect to each pixel of the input image, the difference between the pixel and the pixel at the corresponding position in the background image. Next, in S5002, it is determined whether the difference is smaller than a predetermined threshold K. If the difference is smaller than K, it is determined that the pixel is a background (YES in S5002). In step S5003, the background updating unit 5003 derives a value obtained by mixing the pixel value of the input image and the pixel value of the background image at a constant ratio. In step S5004, the pixel value in the background image is updated with the derived value.

一方、図２２（ｂ）は、背景画像である図２２（ａ）に対して人物が映っている画像を示す図である。このような場合には、人物が位置している画素に着目すると、背景に対して画素値の差分が大きくなり、Ｓ５００２において差分がＫ以上となる。その場合には画素値の変化が大きいので背景以外の何らかのオブジェクトが映っているという判断がされて、背景画像５００２の更新は行われない（Ｓ５００２のＮＯ）。なお背景更新処理については他にも様々な手法が考えられる。 On the other hand, FIG. 22 (b) is a view showing an image in which a person appears in FIG. 22 (a) which is a background image. In such a case, focusing on the pixel at which the person is located, the difference in pixel value with respect to the background is large, and the difference becomes K or more in S5002. In that case, the change of the pixel value is large, so it is determined that an object other than the background is displayed, and the background image 5002 is not updated (NO in S5002). Various other methods can be considered for the background updating process.

次に背景切出部５００４は、背景画像５００２からその一部を読み出し、伝送部６１２０へ送信する。スタジアム等でサッカーなどの競技を撮影する際に、フィールド全体を死角なく撮影できるようカメラ１１２を複数配置した場合、カメラ１１２間で背景情報の大部分が重複するという特徴がある。背景情報は膨大なため、伝送帯域制約の面から重複した部分は削除して伝送することで伝送量を削減することができる。その処理の流れを図２３（ｄ）に示す。Ｓ５０１０で、背景切出部５００４は、例えば図２２（ｃ）に示した点線で囲まれた部分領域３４０１のように、背景画像の中央部分を設定する。つまり、本部分領域３４０１は自カメラ１１２が伝送を担当する背景領域であり、それ以外の背景領域は、他のカメラ１１２によって伝送を担当される。 Next, the background cutout unit 5004 reads a part of the background image 5002 and transmits the part to the transmission unit 6120. When shooting a game such as soccer on a stadium or the like, when a plurality of cameras 112 are arranged so that the entire field can be shot without blind spots, most of the background information overlaps between the cameras 112. Since the background information is huge, it is possible to reduce the amount of transmission by deleting and transmitting the overlapping part in terms of transmission band constraints. The flow of the process is shown in FIG. In step S5010, the background cutout unit 5004 sets the central portion of the background image, as in the partial region 3401 surrounded by a dotted line illustrated in FIG. 22C, for example. That is, the partial area 3401 is a background area in which the own camera 112 takes charge of transmission, and the other background areas are in charge of transmission by the other cameras 112.

Ｓ５０１１で背景切出部５００４は、設定された背景画像の部分領域３４０１を読み出す。そしてＳ５０１２で伝送部６１２０へ出力する。出力された背景画像は画像コンピューティングサーバ２００に集められ、背景モデルのテクスチャとして利用される。各カメラアダプタ１２０において背景画像５００２を切出す位置は、背景モデルに対するテクスチャ情報が不足しないように、予め決められたパラメータ値に応じて設定されている。通常は伝送データ量をより少なくするため、切出す領域は必要最小限となるように設定される。これにより、膨大な背景情報の伝送量を削減できるという効果があり、高解像度化にも対応できるシステムにすることができる。 In step S5011, the background cutout unit 5004 reads out the partial area 3401 of the set background image. Then, in step S5012, the data is output to the transmission unit 6120. The output background image is collected by the image computing server 200 and used as a texture of the background model. The position from which the background image 5002 is cut out in each camera adapter 120 is set according to a predetermined parameter value so that there is no shortage of texture information for the background model. Usually, in order to reduce the amount of transmission data, the area to be cut out is set to be the minimum necessary. This has the effect of being able to reduce the amount of transmission of a vast amount of background information, and a system capable of coping with high resolution can be achieved.

次に前景分離部５００１では、前景領域（人物などのオブジェクト）を検出する処理が行われる。画素毎に実行される前景領域検出処理の流れを図２３（ｂ）に示す。前景の検出については、背景差分情報を用いる方法が用いられる。まずＳ５００５で、前景分離部５００１は、新たに入力された画像の各画素と、背景画像５００２内の対応する位置にある画素との差分を導出する。そしてＳ５００６で差分が閾値Ｌより大きいかどうか判定する。ここで、図２２（ａ）に示した背景画像５００２に対して、新たに入力された画像が例えば図２２（ｂ）のようになっているものとすると、人物が映っている領域の各画素においては差分が大きくなる。差分が閾値Ｌより大きい場合にはＳ５００７で、その画素が前景として設定される。なお背景差分情報を用いる前景の検出方法においては、前景をより高精度に検出するための様々な工夫が考えられている。また前景検出についてはその他にも、特徴量や機械学習を用いる手法などさまざまな手法がある。 Next, the foreground separation unit 5001 performs processing to detect a foreground area (an object such as a person). A flow of foreground area detection processing performed for each pixel is shown in FIG. For foreground detection, a method using background difference information is used. First, in step S5005, the foreground separation unit 5001 derives the difference between each pixel of the newly input image and the pixel at the corresponding position in the background image 5002. In step S5006, it is determined whether the difference is larger than the threshold L. Here, assuming that the newly input image is as shown in, for example, FIG. 22 (b) with respect to the background image 5002 shown in FIG. 22 (a), each pixel of the area where the person appears The difference becomes large in. If the difference is larger than the threshold L, the pixel is set as the foreground in S5007. In the foreground detection method using background difference information, various devices for detecting the foreground with higher accuracy are considered. In addition, there are various methods, such as methods using feature quantities and machine learning, for foreground detection.

前景分離部５００１は、以上図２３（ｂ）で説明した処理を入力された画像の画素毎に実行した後、前景領域をブロックとして決定して出力する処理を行う。処理の流れを図２３（ｃ）に示す。Ｓ５００８においては、前景領域を検出した画像に対して、複数の画素が連結した前景領域を１つの前景画像として設定する。画素が連結した領域を検出する処理としては例えば領域成長法を用いる。領域成長法は公知のアルゴリズムであるため詳細な説明は省く。Ｓ５００８で前景領域がそれぞれ前景画像としてまとめられた後、Ｓ５００９で順次各前景画像が読み出されて伝送部６１２０へ出力される。 The foreground separation unit 5001 executes the processing described above with reference to FIG. 23B for each pixel of the input image, and then performs processing for determining and outputting the foreground region as a block. The flow of processing is shown in FIG. In S5008, the foreground area in which a plurality of pixels are connected is set as one foreground image for the image in which the foreground area is detected. For example, a region growing method is used as a process of detecting a region in which pixels are connected. Since the region growing method is a known algorithm, the detailed description is omitted. After the foreground regions are grouped as foreground images in S5008, the foreground images are sequentially read out and output to the transmission unit 6120 in S5009.

次に三次元モデル情報生成部６１３２では、前景画像を用いて三次元モデル情報の生成を行う。カメラアダプタが隣のカメラからの前景画像を受信すると、伝送部６１２０を介して他カメラ前景受信部５００６にその前景画像が入力される。前景画像が入力されたときに三次元モデル処理部５００５が実行する処理の流れを図２３（ｅ）に示す。ここで、画像コンピューティングサーバ２００がカメラ１１２の撮影画像データが集め、画像処理を開始し仮想視点画像を生成する場合に、計算量が多く画像生成に係る時間が長くなる場合が考えられる。とくに三次元モデル生成における計算量が顕著に大きくなる虞がある。そこで図２３（ｅ）では、画像コンピューティングサーバ２００における処理量を低減するために、カメラアダプタ１２０間をデイジーチェーンつないでデータを伝送する中で逐次三次元モデル情報を生成する方法について説明する。 Next, the three-dimensional model information generation unit 6132 generates three-dimensional model information using the foreground image. When the camera adapter receives the foreground image from the adjacent camera, the foreground image is input to the other camera foreground reception unit 5006 via the transmission unit 6120. The flow of processing executed by the three-dimensional model processing unit 5005 when a foreground image is input is shown in FIG. Here, when the image computing server 200 collects captured image data of the camera 112, starts image processing, and generates a virtual viewpoint image, there may be a large amount of calculation and a long time for image generation. In particular, the amount of calculation in three-dimensional model generation may be significantly increased. Thus, in FIG. 23E, in order to reduce the amount of processing in the image computing server 200, a method of sequentially generating three-dimensional model information while transmitting data by daisy-chaining the camera adapters 120 will be described.

まずＳ５０１３で三次元モデル情報生成部６１３２は、他のカメラ１１２により撮影された前景画像を受信する。つぎに三次元モデル情報生成部６１３２は、５０１４では、受信した前景画像を撮影したカメラ１１２が自カメラ１１２と同一注視点のグループに属し、且つ、隣接カメラであるかどうかを確認する。Ｓ５０１４がＹＥＳの場合はＳ５０１５に進む。ＮＯの場合は、当該他カメラ１１２の前景画像との相関がないと判断し、処理を行わず終了する。また、Ｓ５０１４において、隣接カメラであるかどうかの確認が行われているが、カメラ１１２間の相関の判断方法はこれに限らない。例えば、三次元モデル情報生成部６１３２が事前に相関のあるカメラ１１２のカメラ番号を入手及び設定し、そのカメラ１１２の画像データが伝送された場合のみ画像データを取り込んで処理する方法でも、同様の効果が得られる。 First, in step S5013, the three-dimensional model information generation unit 6132 receives a foreground image captured by another camera 112. Next, at 5014, the three-dimensional model information generation unit 6132 checks whether the camera 112 that has captured the received foreground image belongs to the same gaze point group as the subject camera 112 and is an adjacent camera. If S5014 is YES, the process proceeds to S5015. In the case of NO, it is determined that there is no correlation with the foreground image of the other camera 112, and the process ends without performing processing. Further, in S5014, it is confirmed whether the camera is an adjacent camera, but the method of determining the correlation between the cameras 112 is not limited to this. For example, the method is similar to the method in which the three-dimensional model information generation unit 6132 obtains and sets the camera number of the correlated camera 112 in advance, and acquires and processes the image data only when the image data of the camera 112 is transmitted. An effect is obtained.

次にＳ５０１５では、三次元モデル情報生成部６１３２は、前景画像のデプス情報の導出を行う。具体的には、まず前景分離部５００１から受信した前景画像と他のカメラ１１２の前景画像との対応付けを行い、次に対応付けされた各画素の座標値とカメラパラメータに基づいて、各前景画像上の各画素のデプス情報を導出する。ここで画像の対応付けの手法としては例えばブロックマッチング法が用いられる。ブロックマッチング法は良く知られた方法であるので詳細な説明は省く。また対応付けの方法としてはその他にも、特徴点検出、特徴量算出、及びマッチング処理などを組み合わせて性能を向上させるようなさまざまな手法があり、どの手法を用いてもよい。 Next, in S5015, the three-dimensional model information generation unit 6132 derives depth information of the foreground image. Specifically, first, the foreground image received from the foreground separation unit 5001 is associated with the foreground image of another camera 112, and then each foreground is matched based on the coordinate value of each pixel and the camera parameter. The depth information of each pixel on the image is derived. Here, for example, a block matching method is used as a method of associating images. Since the block matching method is a well-known method, the detailed description is omitted. Further, there are various methods for improving the performance by combining feature point detection, feature amount calculation, matching processing and the like as a method of association, and any method may be used.

次にＳ５０１６で、三次元モデル情報生成部６１３２は、前景画像の三次元モデル情報を導出する。具体的には、前景画像の各画素について、Ｓ５０１５で導出したデプス情報とカメラパラメータ受信部５００７に格納されたカメラパラメータに基づいて画素の世界座標値を導出する。そして世界座標値と画素値をセットとして、点群として構成される三次元モデルの１つの点データを設定する。以上の処理により、前景分離部５００１から受信した前景画像から得られた三次元モデルの一部の点群情報と、他のカメラ１１２の前景画像から得られた三次元モデルの一部の点群情報とが得られる。そしてＳ５０１７で、三次元モデル情報生成部６１３２は、得られた三次元モデル情報にカメラ番号およびフレーム番号をメタ情報として付加し（メタ情報は例えば、タイムコードや絶対時刻でもよい。）伝送部６１２０へ出力する。 Next, in S5016, the three-dimensional model information generation unit 6132 derives three-dimensional model information of the foreground image. Specifically, for each pixel of the foreground image, the world coordinate value of the pixel is derived based on the depth information derived in S5015 and the camera parameter stored in the camera parameter receiving unit 5007. Then, one set of point data of the three-dimensional model configured as a point group is set by setting the world coordinate value and the pixel value as a set. According to the above processing, part of point cloud information of the three-dimensional model obtained from the foreground image received from the foreground separation unit 5001 and part of point cloud of the three-dimensional model obtained from the foreground image of another camera 112 Information is obtained. Then, in step S5017, the three-dimensional model information generation unit 6132 adds the camera number and the frame number to the obtained three-dimensional model information as meta information (the meta information may be, for example, a time code or an absolute time). Output to

これによって、カメラアダプタ１２０間がデイジーチェーンで接続され、複数の注視点が設定される場合でも三次元モデル情報を逐次生成することができる。すなわち、デイジーチェーンによってデータを伝送しながら、カメラ１１２間の相関に応じて画像処理を行い、三次元モデル情報を逐次生成することができる。その結果、処理が高速化される効果がある。 Thus, even when the camera adapters 120 are connected by a daisy chain and a plurality of fixation points are set, three-dimensional model information can be sequentially generated. That is, while data is transmitted by the daisy chain, image processing can be performed according to the correlation between the cameras 112 to sequentially generate three-dimensional model information. As a result, the processing can be speeded up.

なお本実施形態では、以上に説明した各処理はカメラアダプタ１２０に実装されたＦＰＧＡまたはＡＳＩＣなどのハードウェアによって実行されるが、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用いてソフトウェア処理によって実行してもよい。また本実施形態ではカメラアダプタ１２０内で三次元モデル情報生成を実行したが、各カメラ１１２からの全ての前景画像が集められる画像コンピューティングサーバ２００が三次元モデル情報の生成を行ってもよい。 In the present embodiment, each processing described above is executed by hardware such as FPGA or ASIC mounted on the camera adapter 120, but may be executed by software processing using, for example, a CPU, GPU, DSP, etc. Good. Although three-dimensional model information generation is executed in the camera adapter 120 in the present embodiment, the image computing server 200 in which all foreground images from each camera 112 are collected may generate three-dimensional model information.

図２４（ａ）は、ズレ検出報知部における処理のフローチャートである。図２４（ｂ）は、ステップＳ２０２の前景背景分離処理の詳細フローチャートである。ズレ検出報知部６１３４は、前景背景分離部６１３１で分離された背景画像から画像のズレを検出する機能を有する。 FIG. 24A is a flowchart of processing in the deviation detection notification unit. FIG. 24B is a detailed flowchart of foreground / background separation processing in step S202. The shift detection notification unit 6134 has a function of detecting a shift of an image from the background image separated by the foreground / background separation unit 6131.

まず図２４（ａ）について説明する。ステップＳ２０１で、設定時キャリブレーション終了後に画像を撮影する。次に、ステップＳ２０２で、Ｓ２０１で得られた画像に対して前景分離部５００１において時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。この時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理に関しての詳細に関しては、後述する。そして、この背景画像５００２を基準画像として基準背景画像記憶部５００８で保存する。なお、例えば、Ｓ２０１おける撮影がスタジアム開場前（フィールド上に選手等が存在せずスタンドに観客が存在しない）に行われる場合などにおいては、Ｓ２０１において得られた画像をそのまま基準画像として用いるよう構成してもよい。 First, FIG. 24 (a) will be described. In step S201, an image is captured after calibration at the time of setting. Next, in step S202, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the temporal correlation and the spatial correlation with the image obtained in step S201. Details regarding this temporal correlation and foreground / background separation processing for separating foreground and background from spatial correlation will be described later. Then, the background image 5002 is stored in the reference background image storage unit 5008 as a reference image. In addition, for example, in the case where shooting in S201 is performed before the stadium opening (a player or the like does not exist on the field and there is no spectator on the stand), the image obtained in S201 is used as a reference image as it is You may

次に、ステップＳ２０３で、システムの起動時や定期的なタイミングで、現画像を撮影する。次に、ステップＳ２０４で、この画像に対して前景分離部５００１において上述と同様の時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行い、背景画像５００２を比較用の現画像として比較現背景画像記憶部５００９で保存する。次に、ステップＳ２０５で、ズレ検出部５０１０で基準背景画像記憶部５００８に記憶された基準画像と比較現背景画像記憶部５００９に記憶された比較現画像とを比較してズレを検出する。比較する手法としては、対応する特徴点間のベクトルを用いる方法、その他、周知の技法を用いることができる。ズレが所定の基準値より大きい場合には、カメラの設置状態が状態変化したと判断し、警報を報知し警報情報を伝送部６１２０に伝送する。 Next, in step S203, the current image is captured at system startup or at regular timing. Next, in step S204, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the spatial correlation as described above in the foreground separation unit 5001, and compares the background image 5002 for comparison. Are stored in the comparison current background image storage unit 5009 as the current image of. Next, in step S205, the shift detection unit 5010 compares the reference image stored in the reference background image storage unit 5008 with the comparison current image stored in the comparison current background image storage unit 5009 to detect the shift. As a method of comparison, a method using vectors between corresponding feature points, or other known techniques can be used. If the deviation is larger than a predetermined reference value, it is determined that the installation state of the camera has changed, and an alarm is notified and alarm information is transmitted to the transmission unit 6120.

次に図２４（ｂ）について説明する。図２４（ｂ）の処理は、入力画像の各画素について行われる。ステップＳ２０２０１では、入力画像の対象画素について、時間的相関と空間的相関のそれぞれにおける背景画像との差分を算出する。具体的には、時間的相関差分を算出する場合、入力画像と同じカメラで過去に撮像された画像を背景画像として、入力画像の対象画素の値と背景画像の対応する画素の値との差分が算出される。一方、空間的相関差分を算出する場合、入力画像とは異なるカメラにより略同じタイミングで撮像された画像を背景画像として、入力画像の対象画素の値と背景画像の対応する画素の値との差分が算出される。次に、ステップＳ２０２０２で時間的相関差分に関し判断し、差分が所定値Ｌより小さい場合には、次のステップに進み、そうでない場合には終了する。次に、ステップＳ２０２０３で空間的相関差分に関し判断し、差分が所定値Ｄより小さい場合には、次のステップに進み、そうでない場合には終了する。次に、ステップＳ２０２０４で、時間的相関差分と空間的相関差分がともに基準範囲内であると判断された入力画像の対象画素を、分離される背景の画素として保存する。以上の図２４（ｂ）の処理を入力画像の全画素について実行することにより、入力画像から空間的相関及び時間的相関に基づく背景を分離することができる。 Next, FIG. 24B will be described. The process of FIG. 24B is performed on each pixel of the input image. In step S <b> 20201, for the target pixel of the input image, the difference between the temporal correlation and the spatial correlation with the background image is calculated. Specifically, when calculating the temporal correlation difference, the difference between the value of the target pixel of the input image and the value of the corresponding pixel of the background image with the image taken in the past by the same camera as the input image as the background image Is calculated. On the other hand, when calculating the spatial correlation difference, the difference between the value of the target pixel of the input image and the value of the corresponding pixel of the background image with the image taken at substantially the same timing by the camera different from the input image as the background image. Is calculated. Next, in step S20202, a determination is made as to the temporal correlation difference, and if the difference is smaller than the predetermined value L, the process proceeds to the next step, otherwise the process ends. Next, in step S20203, the spatial correlation difference is determined, and if the difference is smaller than the predetermined value D, the process proceeds to the next step, otherwise the process ends. Next, in step S20204, the target pixel of the input image determined to have both the temporal correlation difference and the spatial correlation difference within the reference range is stored as a background pixel to be separated. By performing the process of FIG. 24B on all the pixels of the input image, it is possible to separate the background based on the spatial correlation and the temporal correlation from the input image.

尚、図２４（ａ）におけるステップＳ２０４における前景背景分離処理も同様である。なお、Ｓ２０２及びＳ２０４における前景背景分離の少なくとも何れかが、時間的相関のみに基づく前景背景分離又は空間的相関のみに基づく前景背景分離であってもよい。ここでいう時間的相関における差分検出とは、例えば、過去の一定時間の画像の情報に基づいて背景データを生成して、現在の画像との差分を検出する手法であってよい。また、この検出方法についてこの手法に限定されるものではなく、特徴量や機械学習を用いる手法などのような手法を用いてもよい。一方、ここでいう空間的相関における前景背景分離処理は、空間的相関性から共通オブジェクト、領域を抽出する手法であってよい。具体的な手法として、予め３Ｄ情報を有する所与の３Ｄモデルとしての背景を保持し、これと現画像を比較することにより、一致する背景領域のみを抽出する手法がある。同様の手法として、同時に各オブジェクトに対応した距離を測定する手段により、近距離にある前景を遠距離にある背景に対して分離する手法でもよい。また、空間的相関における前景背景分離処理は、これに限定されるものではない。この処理により、同時刻に撮影された画像から遠距離にある背景のみを分離することができる。 The foreground / background separation processing in step S204 in FIG. 24A is the same. Note that at least one of foreground / background separation in S202 and S204 may be foreground / background separation based on only temporal correlation or foreground / background separation based on only spatial correlation. Here, the difference detection in the temporal correlation may be, for example, a method of generating background data based on information of an image of a predetermined time in the past and detecting a difference from the current image. Further, the detection method is not limited to this method, and a method such as a method using feature amounts or machine learning may be used. On the other hand, foreground / background separation processing in spatial correlation referred to here may be a method of extracting a common object and a region from spatial correlation. As a specific method, there is a method of retaining only a background as a given 3D model having 3D information in advance and comparing it with the current image to extract only the matching background region. As a similar method, it may be a method of separating the foreground located at a short distance from the background located at a long distance by means of simultaneously measuring the distance corresponding to each object. Also, foreground / background separation processing in spatial correlation is not limited to this. By this processing, it is possible to separate only the background at a long distance from the image captured at the same time.

図２５は、時間的相関及び空間的相関を利用した前景背景分離処理を例示的に示す図である。ここでは、図２５（ａ）に示すように、カメラ１１０が、スタジアムのフィールドおよび観客席の方向を撮影している状態を示している。カメラ１１０により撮影される概略の撮影範囲をＫで示している。図２５（ｂ１）〜（ｂ４）は、この撮影範囲のカメラ撮影画像に対して前景背景分離処理を適用した際の画像変化を説明する図である。図２５（ｂ１）は、前景背景分離の対象となる画像を示す。この画像に含まれる人物は、時間的に位置が変化しているが、その他の構造物は、時間的に不変である。従って、この図２５（ｂ１）に示す分離対象の画像に対し、別の時刻に同じカメラで撮影された画像との差分を算出して、時間的相関による前景背景分離処理を行えば、図２５（ｂ２）の様に、動きのある人物が前景として分離され、構造物が背景として分離される。 FIG. 25 exemplarily shows foreground / background separation processing using temporal correlation and spatial correlation. Here, as shown in FIG. 25A, the camera 110 is photographing the direction of the field of the stadium and the audience seat. A schematic imaging range imaged by the camera 110 is indicated by K. FIGS. 25 (b1) to 25 (b4) are diagrams for explaining the image change when the foreground / background separation process is applied to the camera shot image in the shooting range. FIG. 25 (b1) shows an image to be subjected to foreground / background separation. The person included in this image is temporally changed in position, but other structures are temporally invariant. Therefore, if the difference between the image to be separated shown in FIG. 25 (b1) and the image photographed by the same camera at another time is calculated, and foreground / background separation processing by temporal correlation is performed, as shown in FIG. As in (b2), a moving person is separated as the foreground, and a structure is separated as the background.

一方、構造物の中でも比較的カメラに近距離に配置される手前のゴールとカメラから遠距離にある観客席の床、梁、柱がある。隣り合って設置された複数のカメラの撮影画像を比較した場合、カメラから近距離にあるオブジェクトの位置の差は大きくなるが、カメラから遠距離にあるオブジェクトの位置の差は比較的小さい。従って、略同じタイミングで複数のカメラにより撮影された複数の撮影画像の差分を算出することで、遠方の観客席と手前のゴールや人物は分離可能である。この空間的相関による前景背景分離手法は、前述の通り、３Ｄ情報を有する３Ｄモデルとしての背景を用いる方法、或いは、距離情報を用いる方法等でもよい。図２５（ｂ１）の画像に対し、空間的相関による前景背景分離処理を行えば、図２５（ｂ３）の様に、手前にあるゴールと選手が前景として分離され、観客席が背景として分離される。 On the other hand, among the structures, there are a goal in front of the camera placed relatively close to the camera and a floor, beams, and columns of a spectator seat at a distance from the camera. When the photographed images of a plurality of cameras installed adjacent to each other are compared, the difference in the position of an object located at a short distance from the camera becomes large, but the difference in the position of an object located at a distance from the camera is relatively small. Therefore, by calculating the difference between a plurality of photographed images photographed by a plurality of cameras at substantially the same timing, it is possible to separate a distant audience seat from a goal or a person in front. The foreground / background separation method using this spatial correlation may be a method using a background as a 3D model having 3D information, a method using distance information, or the like, as described above. When foreground / background separation processing by spatial correlation is performed on the image in FIG. 25 (b1), the goal and the player in front are separated as foreground and the audience seat is separated as background as shown in FIG. 25 (b3). Ru.

さらに、図２５（ｂ１）から時間的相関で分離された図２５（ｂ２）の画像に空間的相関による前景背景分離処理を行えば、図２５（ｂ４）の様に、ゴールが分離され、観客席の床、梁、柱等の構造物のみの背景が分離される。一方、図２５（ｂ１）から空間的相関で分離された図２５（ｂ３）の画像に時間的相関による前景背景分離処理を行えば、図２５（ｂ４）の様になる。つまり、動きのある観客席の人物が分離され、観客席の床、梁、柱等の構造物のみの背景が分離される。 Furthermore, if foreground / background separation processing by spatial correlation is performed on the image of FIG. 25 (b2) separated from FIG. 25 (b1) by temporal correlation, the goal is separated as shown in FIG. 25 (b4). The background of the seat floor, beams, columns and other structures only is separated. On the other hand, if foreground / background separation processing by temporal correlation is performed on the image of FIG. 25 (b3) separated by spatial correlation from FIG. 25 (b1), it becomes as shown in FIG. 25 (b4). That is, a person in a moving audience seat is separated, and the background of only the structure of the floor, beams, columns, etc. of the seating seat is separated.

つまり、元画像（図２５（ｂ１））に対し、時間的相関による前景背景分離処理と空間的相関による前景背景分離処理を行うことにより、その分離処理の順序に関わらず、目的とする背景画像（図２５（ｂ４））を得ることができる。つまり、図２４（ｂ）のフローチャートでは、時間的相関による前景背景分離処理の後に空間的相関による前景背景分離処理を行うフローチャートを示したが、この処理の順序は逆にしてもよい。更には、このズレ検出報知部においては、時間的相関による前景背景分離処理と空間的相関による前景背景分離処理を同時に行ってもよい。 That is, by performing foreground / background separation processing based on temporal correlation and foreground / background separation processing based on spatial correlation on the original image (FIG. 25 (b1)), the target background image is obtained regardless of the order of the separation processing. (FIG. 25 (b4)) can be obtained. That is, although the flowchart of FIG. 24B shows the foreground / background separation processing by spatial correlation after the foreground / background separation processing by temporal correlation, the order of the processing may be reversed. Furthermore, in the deviation detection and notification unit, foreground / background separation processing by temporal correlation and foreground / background separation processing by spatial correlation may be performed simultaneously.

結果として、このズレ検出報知部５００９における時間的相関による前景背景分離処理と空間的相関による前景背景分離処理により、図２５（ｂ４）に示す様な、例えば観客席の床、梁、柱等の構造物のみの背景が得られる。この背景は、図２４（ａ）で示す様なカメラ設置状態のズレを検知するのに適した時間的にも空間的にも変化の少ない安定的な背景画像である。 As a result, foreground / background separation processing by temporal correlation and foreground / background separation processing by spatial correlation in this gap detection / notification unit 5009, as shown in FIG. 25 (b4), for example, floor, beam, pillar, etc. A background of only structures is obtained. This background is a stable background image which has little temporal or spatial change and which is suitable for detecting a deviation of the camera installation state as shown in FIG. 24 (a).

従って、スタジアムで複数カメラの設置調整が終わった時、フィールド上に選手や会場関係者がいる状況であっても、上述の画像処理により適切な基準画像が得られる。すなわち、基準画像を得るために従来のようにフィールドから人がいなくなるまで待機する必要が無くなり、設置作業の効率が向上する。また、上述の画像処理により得られた現画像の背景と基準画像の背景とを比較することで、現画像の撮影時の撮影範囲内に移動するオブジェクトが存在していても、安定的にカメラのずれを検出することができる。 Therefore, when the installation adjustment of a plurality of cameras is completed in the stadium, an appropriate reference image can be obtained by the above-described image processing even in the situation where players and persons concerned with the hall are on the field. That is, there is no need to wait until no one is out of the field as in the prior art in order to obtain a reference image, and the efficiency of the installation work is improved. Also, by comparing the background of the current image obtained by the above-mentioned image processing with the background of the reference image, the camera can stably move even if there is an object moving within the shooting range at the time of shooting of the current image. It is possible to detect the deviation of

また、上述の画像処理により得られる背景画像は、カメラから距離のある遠方の背景である。そのため、カメラの光軸方向に対してカメラの並進方向のズレ（視差）の影響が少ない条件で、カメラの角度変動の影響を画像のズレとして正確に検出することができる。撮影対象が遠方になるほどカメラの撮影オブジェクトとしての大きさの要因が消え、純粋な位置のみの点として扱えるので、より正確なカメラの角度ズレの影響を検出することができるようになる。 Also, the background image obtained by the above-mentioned image processing is a distant background at a distance from the camera. Therefore, it is possible to accurately detect the influence of the camera angle variation as the image shift under the condition that the influence of the shift (parallax) of the translational direction of the camera with respect to the optical axis direction of the camera is small. The factor of the size of the camera as a shooting object disappears as the shooting target moves farther, and it can be treated as a point of only a pure position, so that it is possible to detect the influence of the camera's angular deviation more accurately.

以上説明したとおり第１実施形態によれば、撮影した画像に対して前景背景分離処理を適用することにより得られる背景画像を基準画像との比較用の現画像として利用することで、カメラのずれを検出する。背景画像は、変動要素の大きい前景の影響を受けないため、カメラ設置状態（位置・姿勢など）の状態変化をより正確に検知することが可能となる。 As described above, according to the first embodiment, the camera shift is achieved by using the background image obtained by applying the foreground / background separation processing to the captured image as the current image for comparison with the reference image. To detect Since the background image is not influenced by the foreground having a large variation factor, it is possible to more accurately detect the state change of the camera installation state (position, posture, etc.).

（第２実施形態）
第２実施形態では、現画像を他のタイミングで撮影する形態に関して説明する。具体的には、第１実施形態では定期的に現画像を撮影する形態について説明したが、第２実施形態ではカメラにおいて基準値以上の振動が発生した場合に現画像を撮影する。すなわち、カメラ設置状態が変化した可能性が高いタイミングで現画像を撮影する。なお、撮影タイミング以外については第１実施形態と同様であるため説明は省略する。 Second Embodiment
In the second embodiment, a form in which the current image is captured at another timing will be described. Specifically, in the first embodiment, the present image is periodically taken. However, in the second embodiment, the present image is taken when the camera has a vibration greater than or equal to a reference value. That is, the current image is taken at a timing when the camera installation state is likely to change. In addition, since it is the same as that of 1st Embodiment except imaging | photography timing, description is abbreviate | omitted.

図２６は、第２実施形態におけるズレ検出報知部における処理のフローチャートである。まず、ステップＳ３８０１で、カメラ初期設定時のキャリブレーション終了後に画像を撮影する。次に、ステップＳ３８０２で、このカメラ初期設定時に撮影した画像に対して前景分離部５００１において時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。この時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理に関しての詳細に関しては、第１実施形態と同様である。そして、この背景画像５００２を基準画像として基準背景画像記憶部５００８で保存する。 FIG. 26 is a flowchart of processing in the deviation detection annunciator in the second embodiment. First, in step S3801, an image is captured after calibration at the time of initial setting of the camera. Next, in step S3802, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the spatial correlation in the foreground separation unit 5001 with respect to the image captured at the time of initial setting of the camera. The details of the temporal correlation and foreground / background separation processing for separating the foreground and the background from the spatial correlation are the same as in the first embodiment. Then, the background image 5002 is stored in the reference background image storage unit 5008 as a reference image.

次に、ステップＳ３８０３で、カメラに加えられた振動レベルが基準値以上で有るかどうかを検知する。この振動を検知するセンサは、前述の各センサシステム１１０を構成する外部センサ１１４であり、前述の通り、例えばジャイロセンサにより振動に関する情報を得ることができる。もちろん、外部センサ１１４は、振動を検出するもので有れば、加速度センサ等他のセンサにより構成されてもよい。この基準値は、予め、画像に影響を与えるズレの閾値を導出しておき、この値を採用するとよい。ここで、基準値以上の振動が検知された場合には、ステップＳ３８０４で、現画像を撮影する。ステップＳ３８０５で、この画像に対して前景分離部５００１において上述と同様の時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行い、背景画像５００２を現画像として比較現背景画像記憶部５００９で保存する。 Next, in step S3803, it is detected whether the vibration level applied to the camera is equal to or higher than a reference value. The sensor that detects this vibration is the external sensor 114 that configures each of the above-described sensor systems 110, and as described above, for example, information regarding the vibration can be obtained by a gyro sensor. Of course, the external sensor 114 may be configured of another sensor such as an acceleration sensor as long as it detects vibration. As this reference value, it is preferable to derive in advance a threshold for deviation that affects the image, and to adopt this value. Here, when a vibration equal to or greater than the reference value is detected, the current image is captured in step S3804. In step S3805, this image is subjected to foreground / background separation processing for separating the foreground and the background from the temporal correlation and the spatial correlation as described above in the foreground separation unit 5001, and the background image 5002 is compared as the current image. It is stored in the background image storage unit 5009.

次に、ステップＳ３８０６で、ズレ検出部５０１０で基準背景画像記憶部５００８の基準画像と比較現背景画像記憶部５００９の比較現画像とを比較してズレを検出する。比較する手法は、対応する特徴点間のベクトルを用いる方法、その他、周知の技法を用いる。ズレが所定の基準値より大きい場合には、カメラの設置状態が変化したと判断し、警報を報知するため警報情報を伝送部６１２０に伝送する。 Next, in step S3806, the shift detection unit 5010 compares the reference image of the reference background image storage unit 5008 with the comparison current image of the comparison current background image storage unit 5009 to detect a shift. The comparison method uses a method using vectors between corresponding feature points, and other known techniques. If the deviation is larger than a predetermined reference value, it is determined that the installation state of the camera has changed, and alarm information is transmitted to the transmission unit 6120 to notify an alarm.

以上説明したとおり第２実施形態によれば、カメラにおいて基準値以上の振動が発生した場合に現画像を撮影する。この構成により、カメラ設置状態の変化の有無をより適切なタイミングで判定することが可能となる。また、カメラ設置状態の変化を警報により外部へ通知することが可能となる。 As described above, according to the second embodiment, the current image is captured when the camera has a vibration equal to or greater than the reference value. With this configuration, it is possible to determine the presence or absence of a change in the camera installation state at a more appropriate timing. In addition, it is possible to notify the outside of the camera installation state by alarm.

（第３実施形態）
第３実施形態では、基準画像を更新する形態に関して説明する。具体的には、第１実施形態では設置時処理の撮影画像に基づく基準画像を継続的に用いる形態について説明したが、第３実施形態では画像比較において基準値以上の輝度差（明るさの変化）を検知した場合に基準画像を更新する。なお、基準画像の更新以外については第１実施形態と同様であるため説明は省略する。 Third Embodiment
In the third embodiment, a form of updating a reference image will be described. Specifically, in the first embodiment, a mode has been described in which a reference image based on a photographed image during installation processing is continuously used, but in the third embodiment, a difference in brightness (a change in brightness or more than a reference value) in image comparison The reference image is updated when the) is detected. In addition, since it is the same as that of 1st Embodiment except the update of a reference | standard image, description is abbreviate | omitted.

図２７は、第３実施形態におけるズレ検出報知部における処理のフローチャートである。まず、ステップＳ３９０１で、カメラ初期設定時のキャリブレーション終了後に画像を撮影する。次に、ステップＳ３９０２で、このカメラ初期設定時に撮影した画像に対して前景分離部５００１において時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。この時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理に関しての詳細に関しては、第１実施形態と同様である。そして、この背景画像５００２を基準画像として背景画像記憶部５００８で保存する。次に、ステップＳ３９０３で、現画像を撮影する。このタイミングは、第１実施形態と同様に起動時や定期的でもよいが、操作者が環境の変化に気付いて操作を行う任意のタイミングでもよい。 FIG. 27 is a flowchart of processing in the deviation detection annunciator in the third embodiment. First, in step S3901, an image is captured after completion of calibration at the time of initial setting of the camera. Next, in step S3902, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the temporal correlation and the spatial correlation with the image captured at the time of this camera initial setting. The details of the temporal correlation and foreground / background separation processing for separating the foreground and the background from the spatial correlation are the same as in the first embodiment. Then, the background image storage unit 5008 stores the background image 5002 as a reference image. Next, in step S3903, the current image is captured. This timing may be at the time of activation or periodically as in the first embodiment, but may be any timing at which the operator notices a change in the environment and performs an operation.

次に、ステップＳ３９０４で、基準画像と現画像とを比較し、輝度の変化（差）が基準値以上で有るかどうかを検知する。尚、このフローチャートでは、ステップＳ３９０３とステップＳ３９０４で撮影した画像を比較することにより、画像から輝度差検出を行う構成を示したが、輝度の変化を検知する手段で有れば、他の構成でもよい。例えば、各センサシステム１１０を構成する外部センサ１１４に例えば輝度センサが用い、輝度変化に関する情報を得ることができる。この輝度変化に関する基準値は、予め、画像に影響を与える輝度変化の限界値を導出しておき、この値を採用する。ここで、基準値以上の輝度変化が検知された場合には、ステップＳ３９０５で、現画像に前景背景分離処理を行い、背景画像を新たな基準画像として、基準背景画像記憶部５００８に記憶された過去の基準画像を更新して保存する。そして、ステップＳ３９０３へ繰り返す。基準値以上の輝度変化が検知されなかった場合には、上述の実施形態と同様なので、以下は省略する。 Next, in step S3904, the reference image and the current image are compared, and it is detected whether the change (difference) in luminance is equal to or greater than the reference value. Although this flowchart shows a configuration in which the luminance difference is detected from the image by comparing the images captured in step S3903 and step S3904, other configurations may be used as long as they are means for detecting a change in luminance. Good. For example, for example, a luminance sensor may be used for the external sensor 114 constituting each sensor system 110 to obtain information on a change in luminance. As the reference value regarding the luminance change, a limit value of the luminance change affecting the image is derived in advance, and this value is adopted. Here, when a change in luminance above the reference value is detected, foreground / background separation processing is performed on the current image in step S3905, and the background image is stored in the reference background image storage unit 5008 as a new reference image. Update and save past reference images. Then, the process is repeated to step S3903. If no change in luminance above the reference value is detected, this is the same as the embodiment described above, and thus the description will be omitted.

以上説明したとおり第３実施形態によれば、基準画像と現画像との比較において基準値以上の輝度変化を検知した場合に基準画像を更新する。この構成により、画像に影響のある輝度変化（屋外における時間帯の違いによる明るさの変化など）がある場合であっても、適切な画像比較が可能となり、カメラ設置状態の変化を好適に検出可能となる。 As described above, according to the third embodiment, the reference image is updated when a change in luminance above the reference value is detected in the comparison between the reference image and the current image. With this configuration, even when there is a luminance change (such as a change in brightness due to a difference in time zone outdoors) that affects the image, an appropriate image comparison becomes possible, and a change in the camera installation state is suitably detected. It becomes possible.

（第４実施形態）
第４実施形態では、カメラの注視点が１点に固定されない場合の形態について説明する。注視点の変更以外については第１実施形態と同様であるため説明は省略する。 Fourth Embodiment
In the fourth embodiment, an embodiment in which the fixation point of the camera is not fixed to one point will be described. The configuration other than the change of the fixation point is the same as that of the first embodiment, and thus the description is omitted.

図２８は、第４実施形態におけるズレ検出報知部における処理のフローチャートである。まず、ステップＳ４００１で、予め想定される複数の注視点に関する情報を入力する。具体的には、カメラの運台におけるパン・チルト角、カメラの撮影条件（ズーム値、絞り、シャッター、ゲイン等）、あるいは、ターゲットとなる注視点オブジェクトに関する情報である。次に、ステップＳ４００２で、上記で設定した複数の注視点ごとにキャリブレーションを行い、その後の画像を初期設定時の画像として撮影する。 FIG. 28 is a flowchart of processing in the deviation detection annunciator in the fourth embodiment. First, in step S4001, information on a plurality of fixation points assumed in advance is input. Specifically, it is the pan / tilt angle on the platform of the camera, the shooting conditions of the camera (zoom value, aperture, shutter, gain, etc.), or information on the gaze point object as a target. Next, in step S4002, calibration is performed for each of the plurality of fixation points set above, and a subsequent image is captured as an image at the time of initial setting.

ステップＳ４００３で、このカメラ初期設定時に撮影した画像に対して前景分離部５００１において時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行う。この時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理に関しての詳細に関しては、第１実施形態と同様である。そして、この注視点ごとの複数の背景画像５００２を注視点ごとの複数の基準画像として基準背景画像記憶部５００８で保存する。 In step S4003, the foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the temporal correlation and the spatial correlation, with respect to the image captured at the time of camera initialization. The details of the temporal correlation and foreground / background separation processing for separating the foreground and the background from the spatial correlation are the same as in the first embodiment. Then, the plurality of background images 5002 for each fixation point are stored in the reference background image storage unit 5008 as a plurality of reference images for each fixation point.

次に、ステップＳ４００４で、予め定められた注視点のいずれかにおいて、現画像を撮影する。次に、ステップＳ４００５で、次の動作を行う。この画像に対して前景分離部５００１において上述と同様の時間的相関、及び、空間的相関から前景と背景を分離する前景背景分離処理を行い、背景画像５００２をその注視点における現画像として比較現背景画像記憶部５００９で保存する。 Next, in step S4004, the current image is photographed at any of the predetermined fixation points. Next, in step S4005, the following operation is performed. The foreground separation unit 5001 performs foreground correlation processing to separate the foreground and the background from the temporal correlation and the spatial correlation in the foreground separation unit 5001, and compares the background image 5002 as the current image at the gaze point. It is stored in the background image storage unit 5009.

ステップＳ４００６では、この現撮影と同じ注視点に対応する基準画像を選択する。ステップＳ４００７では、ズレ検出部５０１０で、選択した同じ注視点に対する基準背景画像記憶部５００８の基準画像と比較現背景画像記憶部５００９の比較現画像とを比較してズレを検出する。以降は、上述の実施形態と同様なので省略する。 In step S4006, a reference image corresponding to the same gaze point as the current shooting is selected. In step S4007, the shift detection unit 5010 detects a shift by comparing the reference image of the reference background image storage unit 5008 for the same selected gaze point with the comparison current image of the comparison current background image storage unit 5009. The subsequent steps are the same as the above-described embodiment and thus will not be described.

以上説明したとおり第４実施形態によれば、上述の実施形態と同様のカメラ設置状態の変化検知を、注視点が複数通り存在するシステムにも適用することが可能となる。また、カメラ１１２の台数などのシステムを構成する装置の規模、及び撮影画像の出力解像度や出力フレームレートなどに依らず、仮想視点画像を簡便に生成することが出来る。 As described above, according to the fourth embodiment, it is possible to apply the same change detection of the camera installation state as the above embodiment to a system in which a plurality of fixation points exist. In addition, the virtual viewpoint image can be easily generated regardless of the size of the devices constituting the system, such as the number of cameras 112, and the output resolution and output frame rate of the captured image.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. Can also be realized. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

１１０ａセンサシステム；１１１ａマイク；１１２ａカメラ；１１３ａ雲台；１２０ａカメラアダプタ；１８０スイッチングハブ；１９０エンドユーザ端末；２３０フロントエンドサーバ；２５０データベース；２７０バックエンドサーバ；２９０タイムサーバ；３１０制御ステーション；３３０仮想カメラ操作ＵＩ 110a sensor system; 111a microphone; 112a camera; 113a camera head; 120a camera adapter; 180 switching hub; 190 end user terminal; 230 front end server; 250 database; 270 back end server; 290 time server; 310 control station; Camera operation UI

Claims

A receiving unit configured to receive a captured image captured by the imaging device;
Separating means for separating a background image from the captured image received by the receiving means;
First storage means for storing a reference image based on a first captured image captured at a first time by the imaging device;
Second storage means for storing a current image based on a second captured image captured by the imaging device at a second time different from the first time;
A detection unit that detects a change in the state of the imaging device by comparing the reference image and the current image;
Have
The image processing apparatus according to claim 1, wherein the current image is a background image separated from the second captured image by the separation unit.

The image processing apparatus according to claim 1, wherein the reference image is a background image separated from the first captured image by the separation unit.

The image processing apparatus according to claim 1, wherein the detection unit includes an output unit that outputs an alarm when a change in state of the imaging device is larger than a predetermined reference value.

The separation means is temporally correlated by comparison between a first reference background image generated based on one or more images captured by the imaging device prior to a captured image of the separation target and the captured image of the separation target And a spatial correlation based on a comparison between a second reference background image generated based on one or more images captured by an imaging device other than the imaging device and the captured image of the separation target. The image processing apparatus according to any one of claims 1 to 3, wherein the background image is separated from the pickup image of the separation target.

It further comprises detection means for detecting the vibration of the imaging device,
The image processing apparatus according to any one of claims 1 to 4, wherein the second time is determined based on detection of a vibration equal to or more than a predetermined reference value by the detection unit.

Luminance difference detection means for detecting a luminance difference between the reference image and the current image;
Updating means for updating the reference image based on the current image when a luminance difference equal to or greater than a predetermined threshold is detected by the luminance difference detection means;
The image processing apparatus according to any one of claims 1 to 5, further comprising:

The imaging device is configured to be able to switch the fixation point,
The first storage unit stores a reference image for each gaze point;
When the fixation point of the image pickup apparatus is changed, the detection means reads out a reference image corresponding to the changed fixation point from the first storage means and compares it with the current image. The image processing apparatus according to any one of 1 to 6.

The image processing apparatus according to any one of claims 1 to 7, wherein the imaging device is one of a plurality of imaging devices that capture a plurality of viewpoint images used to generate virtual viewpoint content. .

A control method of an image processing apparatus for detecting a change in an installation state of an imaging device, comprising:
A receiving step of receiving a captured image captured by the imaging device;
A first storage step of storing, in a first storage unit, a reference image based on a first captured image captured at a first time by the imaging device;
A second storage step of separating a background image from a second captured image captured by the imaging device at a second time different from the first time and storing the background image as a current image in a second storage unit;
Detecting the state change of the imaging device by comparing the reference image and the current image;
A control method characterized by including.

An image processing system for generating a virtual viewpoint image, comprising
A plurality of imaging devices that capture images from a plurality of different directions;
A plurality of image processing devices corresponding to each of the plurality of imaging devices, processing a captured image captured by the corresponding imaging device, and generating an image of a predetermined format;
An image generation device that generates a virtual viewpoint image based on a plurality of images of the predetermined format generated by the plurality of image processing devices;
Have
At least one of the plurality of image processing devices is
Receiving means for receiving a captured image captured by a corresponding imaging device;
Separating means for separating a background image from the captured image received by the receiving means;
First storage means for storing a reference image based on a first captured image captured at a first time by the imaging device;
Second storage means for storing a current image based on a second captured image captured by the imaging device at a second time different from the first time;
A detection unit that detects a change in the state of the imaging device by comparing the reference image and the current image;
Have
The image processing system according to claim 1, wherein the current image is a background image separated from the second captured image by the separation unit.

The program for functioning a computer as each means of the image processing apparatus in any one of Claims 1-8.