JP7042571B2

JP7042571B2 - Image processing device and its control method, program

Info

Publication number: JP7042571B2
Application number: JP2017155883A
Authority: JP
Inventors: 肇佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-08-10
Filing date: 2017-08-10
Publication date: 2022-03-28
Anticipated expiration: 2037-08-10
Also published as: JP2019036791A

Description

本発明は、被写体を複数の方向から撮影するための複数のカメラを含む画像処理システムに関する。 The present invention relates to an image processing system including a plurality of cameras for photographing a subject from a plurality of directions.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、当該撮影により得られた複数視点画像を用いて仮想視点コンテンツを生成する技術が注目されている。このような技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることができる。 Recently, a technique of installing a plurality of cameras at different positions to perform synchronous shooting from multiple viewpoints and generating virtual viewpoint contents using the multi-viewpoint images obtained by the shooting has attracted attention. According to such a technique, for example, a highlight scene of soccer or basketball can be viewed from various angles, so that a user can be given a high sense of presence as compared with a normal image.

近年、解像度やフレームレートなどカメラの性能向上に伴い、カメラが出力するデータ量が増大してきている。カメラが撮影した画像は、高解像度、ハイフレームレートであるとデータ伝送容量が大きくなるため、帯域によって伝送速度に影響を与える。 In recent years, the amount of data output by a camera has been increasing with the improvement of camera performance such as resolution and frame rate. When the image taken by the camera has a high resolution and a high frame rate, the data transmission capacity becomes large, so that the transmission speed is affected by the band.

従来技術として、撮影画像を圧縮符号化する技術、データ容量に基づいてフレームレートを制御する技術がある。例えば、特許文献１には、複数のカメラシステムから送信される複数の画像信号を限られた伝送帯域で送信するために、重要領域判定部が、カメラ毎の画像信号のフレームレートに重み付けを行い、全体としてフレームレートの制御を行う構成が開示されている。 Conventional techniques include a technique for compressing and coding a captured image and a technique for controlling a frame rate based on a data capacity. For example, in Patent Document 1, in order to transmit a plurality of image signals transmitted from a plurality of camera systems in a limited transmission band, an important area determination unit weights the frame rate of the image signals for each camera. , A configuration for controlling the frame rate as a whole is disclosed.

特開２００７－２２１３６７号公報Japanese Unexamined Patent Publication No. 2007-221367

仮想視点コンテンツを用いて、スポーツ競技の判定等のためにスロー再生のリプレイ画像を生成したいという要求がある。上記要求に対しては、ハイフレームレートの高容量データが必要となるが、カメラの台数が多くなるほど、複数のカメラの全てをハイフレームレートで実用的な時間で伝送することは困難である。 There is a demand to generate a slow-playing replay image for determining sports competitions using virtual viewpoint contents. To meet the above requirements, high-capacity data with a high frame rate is required, but as the number of cameras increases, it is difficult to transmit all of the plurality of cameras at a high frame rate in a practical time.

本発明は、上記の課題に鑑みてなされたものであり、その目的は、被写体を異なる方向から撮影する複数の撮影装置が撮影する撮影画像のフレームレートを適切に設定できるようにすることである。 The present invention has been made in view of the above problems, and an object of the present invention is to enable an appropriate setting of a frame rate of a captured image captured by a plurality of imaging devices that capture a subject from different directions. ..

本発明の一態様による画像処理装置は以下の構成を有する。すなわち、
複数の撮影装置により撮影される注目オブジェクトの位置を特定する特定手段と、
前記特定手段により、前記注目オブジェクトの位置が特定領域に含まれることが特定された場合、前記複数の撮影装置のうち前記特定領域と所定の位置関係を有する１以上の撮影装置により伝送される画像データのフレームレートが、前記複数の撮影装置のうち前記１以上の撮影装置とは異なる他の撮影装置により伝送される画像データのフレームレートよりも高くなるように、前記複数の撮影装置を制御する制御手段と、を有する。 The image processing apparatus according to one aspect of the present invention has the following configuration. That is,
Specific means to identify the position of the object of interest taken by multiple shooting devices,
When it is specified by the specific means that the position of the object of interest is included in the specific area, the image transmitted by one or more image pickup devices having a predetermined positional relationship with the specific area among the plurality of image pickup devices. The plurality of imaging devices are controlled so that the frame rate of the data is higher than the frame rate of the image data transmitted by another imaging device different from the one or more imaging devices among the plurality of imaging devices. It has a control means.

本発明によれば、被写体を異なる方向から撮影する複数の撮影装置が撮影する撮影画像のフレームレートを適切に設定できるようになる。 According to the present invention, it becomes possible to appropriately set the frame rate of a captured image captured by a plurality of imaging devices that capture a subject from different directions.

画像処理システム１００の構成を説明するための図。The figure for demonstrating the structure of the image processing system 100. カメラアダプタ１２０の機能構成を説明するためのブロック図。The block diagram for demonstrating the functional configuration of a camera adapter 120. バックエンドサーバ２７０の機能構成を説明するためのブロック図。The block diagram for demonstrating the functional configuration of the back-end server 270. カメラ選択部３０１５の動作について説明するための図。The figure for demonstrating the operation of the camera selection part 3015. 選択されるカメラを説明するための図。Diagram to illustrate the camera selected. 仮想カメラ８００１について説明するための図。The figure for demonstrating the virtual camera 8001. 仮想カメラ操作ＵＩ３３０の機能構成を説明するためのブロック図。The block diagram for demonstrating the functional configuration of the virtual camera operation UI 330. カメラ選択部３０１５の処理を示すフローチャート。The flowchart which shows the processing of the camera selection part 3015. 仮想視点前景画像生成部３００５の処理を示すフローチャート。The flowchart which shows the processing of the virtual viewpoint foreground image generation part 3005. カメラアダプタ１２０のハードウェア構成を示すブロック図。The block diagram which shows the hardware composition of a camera adapter 120. バックエンドサーバ２７０の機能構成を説明するためのブロック図。The block diagram for demonstrating the functional configuration of the back-end server 270. カメラ選択部３０１６の動作について説明するための図。The figure for demonstrating the operation of the camera selection part 3016. 選択されるカメラを説明するための図。Diagram to illustrate the camera selected.

本発明に係る実施形態を詳述するのに先立って、複数のカメラを用いて仮想視点画像を生成するシステムにおいて、スポーツ競技の判定等のためにスロー再生のリプレイ画像を生成する際の課題についてさらに説明する。 Prior to detailing the embodiment of the present invention, there is a problem in generating a slow-playing replay image for determining a sports competition in a system that generates a virtual viewpoint image using a plurality of cameras. Further explanation will be given.

複数視点画像に基づく仮想視点コンテンツの生成においては、仮想視点画像が生成される。すなわち、複数のカメラが撮影した画像がサーバなどの画像処理部に集約され、当該画像処理部が、三次元モデル生成、レンダリングなどの処理を施し、仮想視点画像を生成する。そして、この仮想視点画像をユーザ端末に伝送することで、ユーザは仮想視点コンテンツの閲覧ができる。 In the generation of virtual viewpoint content based on a plurality of viewpoint images, a virtual viewpoint image is generated. That is, images taken by a plurality of cameras are aggregated in an image processing unit such as a server, and the image processing unit performs processing such as three-dimensional model generation and rendering to generate a virtual viewpoint image. Then, by transmitting this virtual viewpoint image to the user terminal, the user can view the virtual viewpoint content.

ここで、上述したようにリプレイ画像を生成するためには、ハイフレームレートの高容量データが必要であるが、カメラの台数が多くなるほど、複数のカメラ全てをハイフレームレートで実用的な時間で伝送することは困難である。 Here, as described above, in order to generate a replay image, high-capacity data with a high frame rate is required, but as the number of cameras increases, all of the multiple cameras can be used at a high frame rate in a practical time. It is difficult to transmit.

上記特許文献１で開示される技術は、複数のカメラの画像をつなぎ合わせて広範囲な画像を生成する構成には適しているが、同一の被写体を複数のカメラを用いて撮影し、その画像から仮想視点画像を生成する本画像処理システムには適用できない。すなわち、カメラごとにフレームレートが変更される構成の特許文献１の技術は、各画像をつなぎ合わせて広範囲な画像を生成するのに各画像の制御が独立している。このため、画像をつなぎ合わせる際に画素の欠陥や欠落などがあっても画像を生成することができ、フレームレートが異なっていても問題はない。一方で、上述したような本システムは、複数のカメラが同一時刻に同一の被写体を撮影した画像から仮想視点画像を生成する。このため、仮想視点画像生成に用いる全てのカメラの画像が高精度に同一時刻で撮影したものでない場合、生成される仮想視点画像が劣化してしまうという問題がある。 The technique disclosed in Patent Document 1 is suitable for a configuration in which images of a plurality of cameras are joined together to generate a wide range of images, but the same subject is photographed by a plurality of cameras and the images are used. It cannot be applied to this image processing system that generates virtual viewpoint images. That is, in the technique of Patent Document 1 in which the frame rate is changed for each camera, the control of each image is independent for connecting the images to generate a wide range of images. Therefore, it is possible to generate an image even if there are defects or omissions in the pixels when joining the images, and there is no problem even if the frame rates are different. On the other hand, in this system as described above, a virtual viewpoint image is generated from an image in which a plurality of cameras take the same subject at the same time. Therefore, if the images of all the cameras used for generating the virtual viewpoint image are not taken at the same time with high accuracy, there is a problem that the generated virtual viewpoint image is deteriorated.

また、本システムは、スポーツ競技などの生放送に対応するためのライブ画像とスロー再生を含めたリプレイ画像を混在させて仮想視点画像生成を行うことが好ましい。そのため、ライブ画像のフレームレートは落とさずに、より高速なフレームレートでのリプレイ画像の生成が必要となる。 Further, it is preferable that this system generates a virtual viewpoint image by mixing a live image for live broadcasting such as a sports competition and a replay image including slow playback. Therefore, it is necessary to generate a replay image at a faster frame rate without reducing the frame rate of the live image.

以上のような課題を解決するため、本発明に係る実施形態は、複数のカメラを用いて仮想視点画像を生成するシステムにおいて、画質の高い仮想視点画像の生成に必要な複数の撮影画像をハイフレームレートで伝送できる技術を提供するものである。 In order to solve the above problems, in the embodiment of the present invention, in a system for generating a virtual viewpoint image using a plurality of cameras, a plurality of captured images necessary for generating a high-quality virtual viewpoint image are high-framed. It provides a technology that can transmit at a frame rate.

以下、図面に従って本発明に係る実施形態を詳述する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。
＜実施の形態１＞
競技場（スタジアム）やコンサートホールなどの施設に複数のカメラ及びマイクを設置し撮影及び集音を行うシステムについて、図１を用いて説明する。 Hereinafter, embodiments according to the present invention will be described in detail with reference to the drawings. The configurations shown in the following embodiments are merely examples, and the present invention is not limited to the configurations shown.
<Embodiment 1>
A system in which a plurality of cameras and microphones are installed in facilities such as a stadium and a concert hall to shoot and collect sound will be described with reference to FIG.

＜画像処理システム１００の説明＞
図１は、画像処理システム１００の構成を説明するための図である。画像処理システム１００は、センサシステム１１０ａ，…，１１０ｚと、画像コンピューティングサーバ２００と、コントローラ３００と、スイッチングハブ１８０と、エンドユーザ端末１９０とを含む。 <Explanation of image processing system 100>
FIG. 1 is a diagram for explaining the configuration of the image processing system 100. The image processing system 100 includes a sensor system 110a, ..., 110z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190.

コントローラ３００は、制御ステーション３１０と、仮想カメラ操作ＵＩ３３０とを含む。制御ステーション３１０は、画像処理システム１００を構成するそれぞれのブロックに対して、ネットワーク３１０ａ，３１０ｂ，３１０ｃ、１８０ａ、１８０ｂ、及び１７０ａ，…，１７０ｙを通じて動作状態の管理及びパラメータ設定制御などを行う。 The controller 300 includes a control station 310 and a virtual camera operation UI 330. The control station 310 manages the operating state and controls parameter setting through the networks 310a, 310b, 310c, 180a, 180b, and 170a, ..., 170y for each block constituting the image processing system 100.

＜センサシステム１１０の説明＞
最初に、センサシステム１１０ａ，…，センサシステム１１０ｚの２６セットの画像及び音声をセンサシステム１１０ｚから画像コンピューティングサーバ２００へ送信する動作について説明する。 <Explanation of sensor system 110>
First, an operation of transmitting 26 sets of images and sounds of the sensor systems 110a, ..., The sensor system 110z from the sensor system 110z to the image computing server 200 will be described.

画像処理システム１００では、センサシステム１１０ａ，…，センサシステム１１０ｚがデイジーチェーンにより接続される。ここで、本実施形態において、特別な説明がない場合は、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットのシステムを区別せずセンサシステム１１０と記載する。各センサシステム１１０内の装置についても同様に、特別な説明がない場合は区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１２０と記載する。なお、センサシステムの台数として２６セットと記載しているが、あくまでも一例であり、台数をこれに限定するものではない。また、本実施形態では、特に断りがない限り、画像という文言は、動画と静止画の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１００は、静止画及び動画の何れについても処理可能である。また、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツには、仮想視点画像と仮想視点音声が含まれる例を中心に説明するが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくても良い。また、例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であっても良い。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略しているが、基本的に画像と音声は共に処理されるものとする。 In the image processing system 100, the sensor systems 110a, ..., And the sensor system 110z are connected by a daisy chain. Here, in the present embodiment, unless otherwise specified, the 26 sets of systems from the sensor system 110a to the sensor system 110z are referred to as the sensor system 110 without distinction. Similarly, the devices in each sensor system 110 are not distinguished unless otherwise specified, and are described as a microphone 111, a camera 112, a pan head 113, an external sensor 114, and a camera adapter 120. The number of sensor systems is described as 26 sets, but this is just an example, and the number is not limited to this. Further, in the present embodiment, unless otherwise specified, the word "image" is described as including the concepts of moving images and still images. That is, the image processing system 100 of the present embodiment can process both still images and moving images. Further, in the present embodiment, an example in which the virtual viewpoint content provided by the image processing system 100 includes a virtual viewpoint image and a virtual viewpoint sound will be mainly described, but the present invention is not limited to this. For example, the virtual viewpoint content may not include audio. Further, for example, the sound included in the virtual viewpoint content may be the sound collected by the microphone closest to the virtual viewpoint. Further, in the present embodiment, for the sake of simplification of the explanation, the description about the sound is partially omitted, but basically both the image and the sound are processed.

センサシステム１１０ａ，…，１１０ｚは、それぞれ１台ずつのカメラ１１２ａ，…，１１２ｚを含む。すなわち、画像処理システム１００は、同一の被写体を複数の方向から撮影するための複数のカメラを有する。カメラ１１２は、何れも光軸が固定され、パンやチルトといった左右や上下の動きは行わない。複数のセンサシステム１１０同士は、デイジーチェーンにより接続される。この接続形態により、撮影画像の４Ｋや８Ｋなどへの高解像度化及びハイフレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化ができる効果がある。 The sensor systems 110a, ..., 110z include one camera 112a, ..., 112z, respectively. That is, the image processing system 100 has a plurality of cameras for photographing the same subject from a plurality of directions. The optical axis of the camera 112 is fixed, and the camera 112 does not move left or right or up and down such as panning or tilting. The plurality of sensor systems 110 are connected to each other by a daisy chain. This connection form has the effect of reducing the number of connection cables and labor saving in wiring work in increasing the resolution of captured images to 4K or 8K and increasing the capacity of image data due to the increase in frame rate.

センサシステム１１０は、マイク１１１と、カメラ１１２と、雲台１１３と、外部センサ１１４と、カメラアダプタ１２０とを含んで構成されるが、この構成に限定するものではない。 The sensor system 110 includes, but is not limited to, a microphone 111, a camera 112, a pan head 113, an external sensor 114, and a camera adapter 120.

マイク１１１ａにて集音された音声と、カメラ１１２ａにて撮影された画像は、カメラアダプタ１２０ａにおいて後述の画像処理が施された後、デイジーチェーン１７０ａを通してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。同様に、センサシステム１１０ｂは、集音された音声と撮影された画像を、センサシステム１１０ａから取得した画像及び音声と合わせてセンサシステム１１０ｃに伝送する。 The sound collected by the microphone 111a and the image captured by the camera 112a are subjected to the image processing described later in the camera adapter 120a and then transmitted to the camera adapter 120b of the sensor system 110b through the daisy chain 170a. .. Similarly, the sensor system 110b transmits the collected sound and the captured image to the sensor system 110c together with the image and the sound acquired from the sensor system 110a.

前述した動作を続けることにより、センサシステム１１０ａ，…，１１０ｚが取得した画像及び音声は、センサシステム１１０ｚからネットワーク１８０ｂを用いてスイッチングハブ１８０に伝わり、その後、画像コンピューティングサーバ２００へ伝送される。 By continuing the above-mentioned operation, the images and sounds acquired by the sensor systems 110a, ..., 110z are transmitted from the sensor system 110z to the switching hub 180 using the network 180b, and then transmitted to the image computing server 200.

なお、本実施形態では、カメラ１１２ａ，…，１１２ｚと、カメラアダプタ１２０ａ，…，１２０ｚが分離された構成にしているが、同一筺体で一体化されていてもよい。その場合、マイク１１１ａ，…，１１１ｚは一体化されたカメラ１１２に内蔵されてもよいし、カメラ１１２の外部に接続されていてもよい。 In the present embodiment, the cameras 112a, ..., 112z and the camera adapters 120a, ..., 120z are separated from each other, but they may be integrated in the same housing. In that case, the microphones 111a, ..., 111z may be built in the integrated camera 112 or may be connected to the outside of the camera 112.

＜画像コンピューティングサーバ２００の説明＞
次に、画像コンピューティングサーバ２００の構成及び動作について説明する。画像コンピューティングサーバ２００は、センサシステム１１０ｚから取得したデータの処理を行う。 <Explanation of Image Computing Server 200>
Next, the configuration and operation of the image computing server 200 will be described. The image computing server 200 processes the data acquired from the sensor system 110z.

画像コンピューティングサーバ２００は、フロントエンドサーバ２３０と、データベース２５０（以下、ＤＢと記載する場合がある。）と、バックエンドサーバ２７０と、タイムサーバ２９０とを含む。 The image computing server 200 includes a front-end server 230, a database 250 (hereinafter, may be referred to as a DB), a back-end server 270, and a time server 290.

タイムサーバ２９０は、時刻及び同期信号を配信する機能を有し、スイッチングハブ１８０を介してセンサシステム１１０ａ，…，１１０ｚに時刻及び同期信号を配信する。時刻と同期信号を受信したカメラアダプタ１２０ａ，…，１２０ｚは、カメラ１１２ａ，…，１１２ｚを、時刻と同期信号とをもとにＧｅｎｌｏｃｋさせ画像フレーム同期を行う。すなわち、タイムサーバ２９０は、複数のカメラ１１２の撮影タイミングを同期させる。 The time server 290 has a function of distributing the time and synchronization signals, and distributes the time and synchronization signals to the sensor systems 110a, ..., 110z via the switching hub 180. The camera adapters 120a, ..., 120z that have received the time and synchronization signals cause the cameras 112a, ..., 112z to be Genlocked based on the time and the synchronization signal to perform image frame synchronization. That is, the time server 290 synchronizes the shooting timings of the plurality of cameras 112.

フロントエンドサーバ２３０は、センサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換する。その後、カメラの識別子、画像か音声かを示すデータ種別、フレーム番号に応じてデータベース２５０に書き込む。 The front-end server 230 reconstructs a segmented transmission packet from the image and voice acquired from the sensor system 110z to convert the data format. After that, it is written in the database 250 according to the identifier of the camera, the data type indicating whether it is an image or an audio, and the frame number.

なお、フロントエンドサーバ２３０では、カメラアダプタ１２０から取得したデータをＤＲＡＭ上に一次的に記憶し、前景画像、背景画像、音声データ及び三次元モデルデータが揃うまでバッファする。なお、前景画像、背景画像、音声データ及び三次元モデルデータをまとめて、以降では、撮影データと呼ぶ。撮影データには、ルーティング情報やタイムコード情報（時間情報）、カメラ識別子等のメタ情報が付与されており、フロントエンドサーバ２３０は、このメタ情報を元にデータの属性を確認する。これにより、フロントエンドサーバ２３０は、同一時刻のデータであることなどを判断してデータがそろったことを確認する。これは、ネットワークによって各カメラアダプタ１２０から転送されたデータについて、ネットワークパケットの受信順序は保証されず、ファイル生成に必要なデータが揃うまでバッファする必要があるためである。 The front-end server 230 temporarily stores the data acquired from the camera adapter 120 on the DRAM, and buffers the foreground image, the background image, the audio data, and the three-dimensional model data until they are prepared. The foreground image, background image, audio data, and three-dimensional model data are collectively referred to as shooting data hereafter. Meta information such as routing information, time code information (time information), and camera identifier is added to the shooting data, and the front-end server 230 confirms the data attributes based on this meta information. As a result, the front-end server 230 determines that the data is at the same time and confirms that the data is complete. This is because the reception order of network packets is not guaranteed for the data transferred from each camera adapter 120 by the network, and it is necessary to buffer until the data necessary for file generation is prepared.

また、背景画像は、前景画像とは異なるフレームレートで撮影されてもよい。例えば、背景画像のフレームレートが１ｆｐｓである場合、１秒毎に１つの背景画像が取得されるため、背景画像が取得されない時間については、背景画像が無い状態で全てのデータが揃ったとしてよい。また、フロントエンドサーバ２３０は、所定時間を経過してもデータが揃っていない場合には、データ集結ができないことを示す情報をデータベース２５０に通知する。 Further, the background image may be taken at a frame rate different from that of the foreground image. For example, when the frame rate of the background image is 1 fps, one background image is acquired every second. Therefore, for the time when the background image is not acquired, all the data may be collected without the background image. .. Further, the front-end server 230 notifies the database 250 of information indicating that the data cannot be collected if the data is not available even after the lapse of a predetermined time.

データベース２５０は、フロントエンドサーバ２３０から取得した、各カメラアダプタ１２０からの各フレームや画像データの受信状況を状態管理テーブルで管理する。例えば、各時刻と各カメラについて画像データが届いていなければ０、届いていれば１のフラグを立てることで対応できる。また、所定の時間ごと（例えば、１秒間）に、全て届いていれば１を、届いていない場合は所定時間内の各時刻と各カメラについて同様のフラグを立てることで管理できる。 The database 250 manages the reception status of each frame and image data from each camera adapter 120 acquired from the front-end server 230 in the state management table. For example, it can be dealt with by setting a flag of 0 if the image data has not arrived for each time and each camera, and 1 if the image data has arrived. Further, it can be managed by setting a similar flag for each time within a predetermined time and for each camera if all of them have not arrived at a predetermined time (for example, 1 second).

バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から仮想視点の指定を受け付け、受け付けられた視点に基づいて、データベース２５０から対応する画像及び音声データを読み出し、レンダリング処理を行って仮想視点画像を生成する。この時、データベース２５０は、バックエンドサーバ２７０からの読み出し要求に対して、状態管理テーブルの受信状況に合わせてバックエンドサーバ２７０にデータを提供する。 The back-end server 270 receives the designation of the virtual viewpoint from the virtual camera operation UI 330, reads the corresponding image and audio data from the database 250 based on the accepted viewpoint, performs rendering processing, and generates the virtual viewpoint image. At this time, the database 250 provides data to the back-end server 270 in response to the read request from the back-end server 270 according to the reception status of the state management table.

なお、画像コンピューティングサーバ２００の構成はこれに限らない。例えば、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０のうち少なくとも２つが一体に構成されていてもよい。また、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０の少なくとも何れかが複数存在していてもよい。また、画像コンピューティングサーバ２００の任意の位置に、上記の装置以外の装置が含まれていてもよい。さらに、画像コンピューティングサーバ２００の機能の少なくとも一部をエンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有していてもよい。 The configuration of the image computing server 200 is not limited to this. For example, at least two of the front-end server 230, the database 250, and the back-end server 270 may be integrally configured. Further, at least one of the front-end server 230, the database 250, and the back-end server 270 may exist in a plurality. Further, a device other than the above device may be included at an arbitrary position of the image computing server 200. Further, the end user terminal 190 or the virtual camera operation UI 330 may have at least a part of the functions of the image computing server 200.

レンダリング処理された仮想視点画像は、バックエンドサーバ２７０からエンドユーザ端末１９０に送信され、エンドユーザ端末１９０を操作するユーザは視点の指定に応じた画像の閲覧及び音声の視聴が出来る。すなわち、バックエンドサーバ２７０は、複数のカメラ１１２により撮影された撮影画像（複数視点画像）と視点情報とに基づく仮想視点コンテンツを生成する。より具体的には、バックエンドサーバ２７０は、例えば、複数のカメラアダプタ１２０により複数のカメラ１１２による撮影画像から切り出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ２７０は、生成した仮想視点コンテンツをエンドユーザ端末１９０に提供する。 The rendered virtual viewpoint image is transmitted from the back-end server 270 to the end user terminal 190, and the user who operates the end user terminal 190 can view the image and listen to the sound according to the designation of the viewpoint. That is, the back-end server 270 generates virtual viewpoint contents based on the captured images (multi-viewpoint images) captured by the plurality of cameras 112 and the viewpoint information. More specifically, the back-end server 270 is virtual based on, for example, image data in a predetermined area cut out from images taken by a plurality of cameras 112 by a plurality of camera adapters 120 and a viewpoint designated by a user operation. Generate perspective content. Then, the back-end server 270 provides the generated virtual viewpoint content to the end user terminal 190.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像であるとも言える。仮想的な視点（仮想視点）は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち、仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる場合の例を中心に説明するが、必ずしも音声データが含まれていなくても良い。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, the virtual viewpoint image can be said to be an image representing the appearance at the specified viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. Further, an image corresponding to a viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the device are also included in the virtual viewpoint image. In this embodiment, an example in which audio data (audio data) is included in the virtual viewpoint content will be mainly described, but the audio data may not necessarily be included.

仮想カメラ操作ＵＩ３３０は、バックエンドサーバ２７０を介してデータベース２５０にアクセスする。バックエンドサーバ２７０で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を仮想カメラ操作ＵＩ３３０で行っている。 The virtual camera operation UI 330 accesses the database 250 via the back-end server 270. The back-end server 270 performs common processing related to image generation processing, and the difference portion of the application related to the operation UI is performed by the virtual camera operation UI 330.

このように、画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２による撮影画像に基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。なお、本実施形態における画像処理システム１００は、上記で説明した物理的な構成に限定される訳ではなく、論理的に構成されていてもよい。 As described above, in the image processing system 100, the back-end server 270 generates a virtual viewpoint image based on the images captured by the plurality of cameras 112 for photographing the subject from a plurality of directions. The image processing system 100 in the present embodiment is not limited to the physical configuration described above, and may be logically configured.

＜機能ブロック図の説明＞
次に、図１に記載のシステムにおけるカメラアダプタ１２０、バックエンドサーバ２７０、仮想カメラ操作ＵＩ３３０の機能ブロック図を説明する。 <Explanation of functional block diagram>
Next, a functional block diagram of the camera adapter 120, the back-end server 270, and the virtual camera operation UI 330 in the system shown in FIG. 1 will be described.

図２は、カメラアダプタ１２０の機能構成を説明するためのブロック図である。カメラアダプタ１２０は、ネットワークアダプタ６１１０と、伝送部６１２０と、画像処理部６１３０と、外部機器制御部６１４０と、を含む。ネットワークアダプタ６１１０は、データ送受信部６１１１と、時刻制御部６１１２とを含む。 FIG. 2 is a block diagram for explaining the functional configuration of the camera adapter 120. The camera adapter 120 includes a network adapter 6110, a transmission unit 6120, an image processing unit 6130, and an external device control unit 6140. The network adapter 6110 includes a data transmission / reception unit 6111 and a time control unit 6112.

データ送受信部６１１１は、デイジーチェーン１７０、ネットワーク２９１、及びネットワーク３１０ａを介し、他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、及び制御ステーション３１０とデータ通信を行う。例えば、データ送受信部６１１１は、カメラ１１２による撮影画像から前景背景分離部６１３１により分離された前景画像と背景画像とを、別のカメラアダプタ１２０に対して出力する。出力先のカメラアダプタ１２０は、画像処理システム１００のカメラアダプタ１２０のうち、データルーティング処理部６１２２の処理に応じて予め定められた順序における次のカメラアダプタ１２０である。各カメラアダプタ１２０が、前景画像及び背景画像を出力することで、複数の視点から撮影された前景画像と背景画像とに基づいた仮想視点画像が生成される。なお、撮影画像から分離した前景画像を出力して背景画像は出力しないカメラアダプタ１２０が存在してもよい。 The data transmission / reception unit 6111 performs data communication with the other camera adapter 120, the front-end server 230, the time server 290, and the control station 310 via the daisy chain 170, the network 291 and the network 310a. For example, the data transmission / reception unit 6111 outputs the foreground image and the background image separated by the foreground background separation unit 6131 from the image captured by the camera 112 to another camera adapter 120. The output destination camera adapter 120 is the next camera adapter 120 in the camera adapter 120 of the image processing system 100 in a predetermined order according to the processing of the data routing processing unit 6122. Each camera adapter 120 outputs a foreground image and a background image to generate a virtual viewpoint image based on the foreground image and the background image taken from a plurality of viewpoints. There may be a camera adapter 120 that outputs a foreground image separated from a captured image and does not output a background image.

時刻制御部６１１２は、例えば、ＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ２９０と時刻同期を行う機能とを有する。なお、ＩＥＥＥ１５８８に限定する訳ではなく、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバとの時刻同期を実現してもよい。 The time control unit 6112 has, for example, a function of storing a time stamp of data transmitted / received to / from the time server 290 and a function of performing time synchronization with the time server 290 in accordance with the Oldinay Clock of the IEEE 1588 standard. It is not limited to IEEE1588, and time synchronization with a time server may be realized by another EtherAVB standard or an original protocol.

伝送部６１２０は、データ圧縮・伸張部６１２１と、データルーティング処理部６１２２と、時刻同期制御部６１２３と、画像・音声伝送処理部６１２４と、データルーティング情報保持部６１２５と、フレームレート変更部６１２６とを含む。 The transmission unit 6120 includes a data compression / decompression unit 6121, a data routing processing unit 6122, a time synchronization control unit 6123, an image / audio transmission processing unit 6124, a data routing information holding unit 6125, and a frame rate changing unit 6126. including.

データ圧縮・伸張部６１２１は、データ送受信部６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能とを有している。 The data compression / decompression unit 6121 has a function of compressing data transmitted / received via the data transmission / reception unit 6111 by applying a predetermined compression method, compression rate, and frame rate, and a function of decompressing the compressed data. And have.

データルーティング処理部６１２２は、後述するデータルーティング情報保持部６１２５が保持するデータを利用し、データ送受信部６１１１が受信したデータ及び画像処理部６１３０で処理されたデータのルーティング先を決定する。さらに、決定したルーティング先へデータを送信する機能を有している。ルーティング先としては、同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０とするのが、それぞれのカメラ１１２同士の画像フレーム相関が高いため画像処理を行う上で好適である。複数のカメラアダプタ１２０それぞれのデータルーティング処理部６１２２による決定に応じて、画像処理システム１００において前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が定まる。 The data routing processing unit 6122 uses the data held by the data routing information holding unit 6125, which will be described later, to determine the routing destination of the data received by the data transmitting / receiving unit 6111 and the data processed by the image processing unit 6130. Further, it has a function of transmitting data to the determined routing destination. As the routing destination, a camera adapter 120 corresponding to the cameras 112 focused on the same gazing point is suitable for performing image processing because the image frame correlation between the cameras 112 is high. The order of the camera adapters 120 that output the foreground image and the background image in the relay format in the image processing system 100 is determined according to the determination by the data routing processing unit 6122 of each of the plurality of camera adapters 120.

時刻同期制御部６１２３は、ＩＥＥＥ１５８８規格のＰＴＰ（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌ）に準拠し、タイムサーバ２９０と時刻同期に係わる処理を行う機能を有している。なお、ＰＴＰに限定するのではなく、他の同様のプロトコルを利用して時刻同期してもよい。 The time synchronization control unit 6123 complies with PTP (Precision Time Protocol) of the IEEE1588 standard, and has a function of performing processing related to time synchronization with the time server 290. It should be noted that the time is not limited to PTP, and time synchronization may be performed using another similar protocol.

画像・音声伝送処理部６１２４は、画像データ又は音声データを、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ転送するためのメッセージを作成する機能を有している。メッセージには、画像データ又は音声データ、及び各データのメタ情報が含まる。本実施形態のメタ情報には、画像の撮影または音声のサンプリング時のタイムコードまたはシーケンス番号、データ種別、及びカメラ１１２やマイク１１１の個体を示す識別子などが含まれる。なお、送信する画像データまたは音声データは、データ圧縮・伸張部６１２１でデータ圧縮されていてもよい。また、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０からデータ送受信部６１１１を介してメッセージを受け取る。そして、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされ画像データまたは音声データに復元する。なお、データを復元した際にデータが圧縮されている場合は、データ圧縮・伸張部６１２１が伸張処理を行う。 The image / audio transmission processing unit 6124 has a function of creating a message for transferring image data or audio data to another camera adapter 120 or a front-end server 230 via the data transmission / reception unit 6111. The message includes image data or audio data, and meta information of each data. The meta information of the present embodiment includes a time code or sequence number at the time of image capture or sound sampling, a data type, an identifier indicating an individual of the camera 112 or the microphone 111, and the like. The image data or audio data to be transmitted may be data-compressed by the data compression / decompression unit 6121. Further, the image / voice transmission processing unit 6124 receives a message from another camera adapter 120 via the data transmission / reception unit 6111. Then, depending on the data type included in the message, it is fragmented into the packet size specified by the transmission protocol and restored to image data or voice data. If the data is compressed when the data is restored, the data compression / decompression unit 6121 performs decompression processing.

データルーティング情報保持部６１２５は、データ送受信部６１１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する機能を有する。データルーティング情報保持部６１２５は、例えば、同一の注視点にフォーカスされた複数のカメラ１１２の設置情報および当該設置情報に基づき隣接するカメラ識別子を保持する。なお、設置情報は、制御ステーション３１０から各カメラアダプタ１２０に送信されて、データルーティング情報保持部６１２５に保存される。例えば、図１において、デイジーチェーン接続におけるカメラアダプタ１２０間でデータを伝送する構成において、伝送順序をあらかじめ決定しておいてもよい。具体的には、スイッチングハブ１８０から遠い位置にある２台のカメラ１１２から別方向に向けてスイッチングハブ１８０方向に伝送するように、制御ステーション３１０から送信される設置情報と伝送順序の情報に基づき、画像データの伝送ルートを決定する。このように構成することによって、カメラアダプタ１２０間のデータの相関を用いて、画像処理したデータをさらに別のカメラアダプタ１２０に伝送することが可能となる。 The data routing information holding unit 6125 has a function of holding address information for determining a transmission destination of data transmitted / received by the data transmitting / receiving unit 6111. The data routing information holding unit 6125 holds, for example, the installation information of a plurality of cameras 112 focused on the same gazing point and the adjacent camera identifiers based on the installation information. The installation information is transmitted from the control station 310 to each camera adapter 120 and stored in the data routing information holding unit 6125. For example, in FIG. 1, in a configuration in which data is transmitted between camera adapters 120 in a daisy chain connection, the transmission order may be determined in advance. Specifically, based on the installation information and transmission order information transmitted from the control station 310 so that the two cameras 112 located far from the switching hub 180 transmit in the switching hub 180 direction in different directions. , Determine the transmission route of image data. With this configuration, it is possible to transmit the image-processed data to yet another camera adapter 120 by using the data correlation between the camera adapters 120.

フレームレート変更部６１２６は、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ画像データを転送するためのフレームレートを切り替える機能を有している。また、フレームレート変更部６１２６は、バックエンドサーバ２７０から送信されるフレームレート変更メッセージを受信する機能も有している。なお、フレームレート変更メッセージには、カメラの識別子、ノーマルフレームレートかハイフレームレートかを示すフレームレートの種別を示すメタ情報が含まれる。画像データのフレームレートを切り替える処理は、上記フレームレート変更メッセージに応じて実行される。この切り替え処理により、ライブ画像に十分なノーマルフレームレート（例えば、秒間６０フレーム）と、スロー再生も可能なリプレイ画像用のハイフレームレート（例えば、秒間２４０フレーム）のどちらかに設定される。なお、フレームレートは２段階に限定されるものではなく、解像度等のデータ伝送量や画像仕様に応じて適宜設定すればよい。 The frame rate changing unit 6126 has a function of switching the frame rate for transferring image data to another camera adapter 120 or the front end server 230 via the data transmitting / receiving unit 6111. Further, the frame rate changing unit 6126 also has a function of receiving a frame rate changing message transmitted from the back-end server 270. The frame rate change message includes a camera identifier and meta information indicating the type of frame rate indicating whether it is a normal frame rate or a high frame rate. The process of switching the frame rate of the image data is executed in response to the frame rate change message. By this switching process, either a normal frame rate sufficient for a live image (for example, 60 frames per second) or a high frame rate for a replay image capable of slow playback (for example, 240 frames per second) is set. The frame rate is not limited to two stages, and may be appropriately set according to the amount of data transmission such as resolution and the image specifications.

なお、画像・音声伝送処理部６１２４は、画像データのフレームデータを、最大フレームレートで保持し、フレームレート変更メッセージに基づいて、ノーマルフレームレートのときは適宜対応するフレームのデータを抜き出して送信する。 The image / audio transmission processing unit 6124 holds the frame data of the image data at the maximum frame rate, and extracts and transmits the data of the corresponding frame at the normal frame rate based on the frame rate change message. ..

また、カメラ制御部６１４１が、適宜フレームレート変更メッセージに応じて、カメラ１１２で出力フレームレートを切り替える構成でもよい。この場合は、画像データを一時的に保持するメモリを小さくすることができるが、伝送速度を考慮すると、最大フレームレートで一時的に保存しておいて、フレームレート変更メッセージに応じて、フレームレートを変更する方が好ましい。 Further, the camera control unit 6141 may be configured to switch the output frame rate with the camera 112 according to the frame rate change message as appropriate. In this case, the memory that temporarily holds the image data can be reduced, but considering the transmission speed, it is temporarily stored at the maximum frame rate, and the frame rate is changed according to the frame rate change message. It is preferable to change.

画像処理部６１３０は、前景背景分離部６１３１と、三次元モデル情報生成部６１３２と、キャリブレーション制御部６１３３とを含む。画像処理部６１３０は、カメラ制御部６１４１の制御によりカメラ１１２が撮影した画像データ及び他のカメラアダプタ１２０から受取った画像データに対して処理を行う。 The image processing unit 6130 includes a foreground background separation unit 6131, a three-dimensional model information generation unit 6132, and a calibration control unit 6133. The image processing unit 6130 processes the image data taken by the camera 112 and the image data received from the other camera adapter 120 under the control of the camera control unit 6141.

前景背景分離部６１３１は、カメラ１１２が撮影した画像データを前景画像と背景画像に分離する機能を有している。すなわち、複数のカメラアダプタ１２０それぞれの前景背景分離部６１３１は、複数のカメラ１１２のうち対応するカメラ１１２による撮影画像から所定領域を抽出する。所定領域は、例えば、撮影画像に対するオブジェクト検出処理の結果、オブジェクトが検出された領域である。前景背景分離部６１３１は、撮影画像から抽出した所定領域内の画像を前景画像、所定領域以外の領域の画像（すなわち、前景画像以外）を背景画像として、撮影画像を前景画像と背景画像とに分離する。なお、オブジェクトとは、例えば、人物である。ただし、これに限られず、オブジェクトは、特定の人物（選手、監督、および／または審判など）であっても良いし、ボール等の画像パターンが予め定められている物体であっても良い。また、オブジェクトとして、動体が検出されるようにしても良い。人物等の重要なオブジェクトを含む前景画像とそのようなオブジェクトを含まない背景画像を分離して処理することで、画像処理システム１００において生成される仮想視点画像の上記のオブジェクトに該当する部分の画像の品質を向上できる。また、前景画像と背景画像の分離を複数のカメラアダプタ１２０それぞれが行うことで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることができる。 The foreground background separating unit 6131 has a function of separating the image data taken by the camera 112 into a foreground image and a background image. That is, the foreground background separation unit 6131 of each of the plurality of camera adapters 120 extracts a predetermined area from the image captured by the corresponding camera 112 among the plurality of cameras 112. The predetermined area is, for example, an area in which an object is detected as a result of an object detection process for a captured image. The foreground background separation unit 6131 uses the image in the predetermined area extracted from the captured image as the foreground image, the image in the area other than the predetermined area (that is, other than the foreground image) as the background image, and the captured image as the foreground image and the background image. To separate. The object is, for example, a person. However, the object is not limited to this, and the object may be a specific person (player, manager, and / or referee, etc.) or an object having a predetermined image pattern such as a ball. Further, a moving object may be detected as an object. The image of the part corresponding to the above object of the virtual viewpoint image generated in the image processing system 100 by separately processing the foreground image including an important object such as a person and the background image not including such an object. The quality of the can be improved. Further, by separating the foreground image and the background image by each of the plurality of camera adapters 120, it is possible to distribute the load in the image processing system 100 provided with the plurality of cameras 112.

三次元モデル情報生成部６１３２は、前景背景分離部６１３１で分離された前景画像及び他のカメラアダプタ１２０から受取った前景画像を利用し、例えば、ステレオカメラの原理を用いて三次元モデルに係わる画像情報を生成する機能を有している。 The 3D model information generation unit 6132 uses the foreground image separated by the foreground background separation unit 6131 and the foreground image received from another camera adapter 120, and uses, for example, an image related to the 3D model using the principle of a stereo camera. It has a function to generate information.

キャリブレーション制御部６１３３は、入力された画像に対して、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因するブレに対して画像の位置合せ及び画像の切り出し処理などを行う。 The calibration control unit 6133 performs color correction processing for suppressing color variation for each camera for the input image, image alignment processing and image cropping processing for blurring caused by camera vibration, and the like. I do.

キャリブレーション制御部６１３３は、キャリブレーションに必要な画像データを、カメラ制御部６１４１を介してカメラ１１２から取得し、キャリブレーションに係わる演算処理を行うフロントエンドサーバ２３０に送信する機能を有している。なお、本実施形態ではキャリブレーションに係わる演算処理をフロントエンドサーバ２３０で行っているが、演算処理を行うノードはフロントエンドサーバ２３０に限定されない。例えば、制御ステーション３１０やカメラアダプタ１２０（他のカメラアダプタ１２０を含む）など他のノードで演算処理が行われてもよい。また、キャリブレーション制御部６１３３は、カメラ制御部６１４１を介してカメラ１１２から取得した画像データに対して、予め設定されたパラメータに応じて撮影中のキャリブレーション（動的キャリブレーション）を行う機能を有している。 The calibration control unit 6133 has a function of acquiring image data necessary for calibration from the camera 112 via the camera control unit 6141 and transmitting it to the front-end server 230 that performs arithmetic processing related to calibration. .. In the present embodiment, the arithmetic processing related to the calibration is performed by the front-end server 230, but the node that performs the arithmetic processing is not limited to the front-end server 230. For example, arithmetic processing may be performed at another node such as the control station 310 or the camera adapter 120 (including another camera adapter 120). Further, the calibration control unit 6133 has a function of performing calibration (dynamic calibration) during shooting according to preset parameters for the image data acquired from the camera 112 via the camera control unit 6141. Have.

外部機器制御部６１４０は、カメラアダプタ１２０に接続する機器を制御する機能を有し、カメラ制御部６１４１と、マイク制御部６１４２と、雲台制御部６１４３と、センサ制御部６１４４とを含む。 The external device control unit 6140 has a function of controlling a device connected to the camera adapter 120, and includes a camera control unit 6141, a microphone control unit 6142, a pan head control unit 6143, and a sensor control unit 6144.

カメラ制御部６１４１は、カメラ１１２と接続し、カメラ１１２の制御、撮影画像取得、同期信号提供、及び時刻設定などを行う機能を有している。カメラ１１２の制御には、例えば、撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランスなど）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、及びエラーなど）の取得、撮影の開始及び停止や、ピント調整などがある。同期信号提供は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。時刻設定は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受取る画像データに提供したタイムコードが付与されることになる。 The camera control unit 6141 has a function of connecting to the camera 112 and performing control of the camera 112, acquisition of captured images, provision of synchronization signals, time setting, and the like. The control of the camera 112 includes, for example, setting and reference of shooting parameters (number of pixels, color depth, frame rate, white balance, etc.), and the state of the camera 112 (shooting, stopped, synchronizing, error, etc.). There are acquisition, start and stop of shooting, focus adjustment, etc. The synchronization signal is provided by the time synchronization control unit 6123 using the time synchronized with the time server 290 and providing the shooting timing (control clock) to the camera 112. The time setting is performed by providing the time synchronized with the time server 290 by the time synchronization control unit 6123 with a time code conforming to the format of, for example, SMPTE12M. As a result, the provided time code is added to the image data received from the camera 112.

マイク制御部６１４２、センサ制御部６１４４、雲台制御部６１４３は、それぞれ接続される、マイク１１１、外部センサ１１４、雲台１１３を制御する機能を有する。 The microphone control unit 6142, the sensor control unit 6144, and the pan head control unit 6143 have a function of controlling the microphone 111, the external sensor 114, and the pan head 113, respectively, which are connected to each other.

＜バックエンドサーバ２７０の説明＞
図３は、バックエンドサーバ２７０の機能構成を説明するためのブロック図である。バックエンドサーバ２７０は、データ受信部３００１と、背景テクスチャ貼り付け部３００２と、前景テクスチャ決定部３００３と、前景テクスチャ境界色合わせ部３００４と、仮想視点前景画像生成部３００５と、レンダリング部３００６とを有する。さらに、仮想視点音声生成部３００７と、合成部３００８と、仮想視点コンテンツ出力部３００９と、前景オブジェクト決定部３０１０と、要求リスト生成部３０１１と、要求データ出力部３０１２とを有する。また、背景メッシュモデル管理部３０１３と、レンダリングモード管理部３０１４と、カメラ選択部３０１５も有する。 <Explanation of backend server 270>
FIG. 3 is a block diagram for explaining the functional configuration of the back-end server 270. The back-end server 270 includes a data receiving unit 3001, a background texture pasting unit 3002, a foreground texture determining unit 3003, a foreground texture boundary color matching unit 3004, a virtual viewpoint foreground image generation unit 3005, and a rendering unit 3006. Have. Further, it has a virtual viewpoint audio generation unit 3007, a synthesis unit 3008, a virtual viewpoint content output unit 3009, a foreground object determination unit 3010, a request list generation unit 3011, and a request data output unit 3012. It also has a background mesh model management unit 3013, a rendering mode management unit 3014, and a camera selection unit 3015.

データ受信部３００１は、データベース２５０およびコントローラ３００から送信されるデータを受信する。データベース２５０からは、スタジアムの形状を示す三次元データ（背景メッシュモデル）、前景画像、背景画像、前景画像の三次元モデル（以降、前景三次元モデルと称する）、及び音声を受信する。 The data receiving unit 3001 receives the data transmitted from the database 250 and the controller 300. From the database 250, three-dimensional data (background mesh model) showing the shape of the stadium, a foreground image, a background image, a three-dimensional model of the foreground image (hereinafter referred to as a foreground three-dimensional model), and audio are received.

また、データ受信部３００１は、仮想視点画像の生成に係る視点を指定するコントローラ３００から出力される仮想カメラパラメータを受信する。仮想カメラパラメータとは、仮想視点の位置や姿勢などを表すデータであり、例えば、外部パラメータの行列と内部パラメータの行列が用いられる。 Further, the data receiving unit 3001 receives the virtual camera parameters output from the controller 300 that specifies the viewpoint related to the generation of the virtual viewpoint image. The virtual camera parameter is data representing the position and posture of the virtual viewpoint, and for example, a matrix of external parameters and a matrix of internal parameters are used.

さらに、データ受信部３００１は、データベース２５０やコントローラ３００などの外部の装置から、複数のカメラ１１２に関する情報を取得してもよい。複数のカメラ１１２に関する情報は、例えば、複数のカメラ１１２の数に関する情報や複数のカメラ１１２の動作状態に関する情報などである。カメラ１１２の動作状態には、例えば、カメラ１１２の正常状態、故障状態、待機状態、起動状態、及び再起動状態の少なくとも何れかが含まれる。また、各カメラアダプタ１２０から出力されるデータが、異なるフレームレートであることによって、同一時刻にデータ集結ができていないことを示す情報についても取得する。 Further, the data receiving unit 3001 may acquire information about a plurality of cameras 112 from an external device such as a database 250 or a controller 300. The information regarding the plurality of cameras 112 is, for example, information regarding the number of the plurality of cameras 112, information regarding the operating state of the plurality of cameras 112, and the like. The operating state of the camera 112 includes, for example, at least one of a normal state, a failure state, a standby state, a starting state, and a restarting state of the camera 112. In addition, information indicating that data collection is not possible at the same time is also acquired because the data output from each camera adapter 120 has different frame rates.

背景テクスチャ貼り付け部３００２は、背景メッシュモデル管理部３０１３から取得する背景メッシュモデルで示される三次元空間形状に対して背景画像をテクスチャとして貼り付ける。そうすることで、テクスチャ付き背景メッシュモデルを生成する。メッシュモデルとは、例えば、ＣＡＤデータなど三次元の空間形状を面の集合で表現したデータのことである。テクスチャとは、物体の表面の質感を表現するために貼り付ける画像のことである。 The background texture pasting unit 3002 pastes the background image as a texture for the three-dimensional spatial shape indicated by the background mesh model acquired from the background mesh model management unit 3013. Doing so will generate a textured background mesh model. The mesh model is data that represents a three-dimensional spatial shape as a set of faces, such as CAD data. A texture is an image that is pasted to express the texture of the surface of an object.

前景テクスチャ決定部３００３は、前景画像及び前景三次元モデル群より前景三次元モデルのテクスチャ情報を決定する。前景テクスチャ境界色合わせ部３００４は、各前景三次元モデルのテクスチャ情報と各三次元モデル群からテクスチャの境界の色合わせを行い、前景オブジェクト毎に色付き前景三次元モデル群を生成する。 The foreground texture determination unit 3003 determines the texture information of the foreground 3D model from the foreground image and the foreground 3D model group. The foreground texture boundary color matching unit 3004 performs color matching of the texture boundary from each foreground 3D model and each 3D model group, and generates a colored foreground 3D model group for each foreground object.

仮想視点前景画像生成部３００５は、仮想カメラパラメータに基づいて、前景画像群を仮想視点からの見た目となるように透視変換する。レンダリング部３００６は、レンダリングモード管理部３０１４で決定された、仮想視点画像の生成に用いられる生成方式に基づいて、背景画像と前景画像をレンダリングして全景の仮想視点画像を生成する。 The virtual viewpoint foreground image generation unit 3005 performs fluoroscopic transformation of the foreground image group so that it looks from the virtual viewpoint based on the virtual camera parameters. The rendering unit 3006 renders the background image and the foreground image based on the generation method used for generating the virtual viewpoint image determined by the rendering mode management unit 3014 to generate the virtual viewpoint image of the whole view.

本実施形態では仮想視点画像の生成方式として、モデルベースレンダリング（Ｍｏｄｅｌ－ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＭＢＲ）とイメージベース（Ｉｍａｇｅ－ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＩＢＲ）の２つのレンダリングモードが用いられる。 In this embodiment, two rendering modes, model-based rendering (MBR) and image-based (Image-Based Rendering: IBR), are used as a virtual viewpoint image generation method.

ＭＢＲとは、被写体を複数の方向から撮影した複数の撮影画像に基づいて生成される三次元モデルを用いて仮想視点画像を生成する方式である。レンダリングモードがＭＢＲの場合、背景メッシュモデルと前景テクスチャ境界色合わせ部３００４で生成した前景三次元モデル群を合成することで全景モデルが生成され、その全景モデルから仮想視点画像が生成される。 The MBR is a method of generating a virtual viewpoint image using a three-dimensional model generated based on a plurality of captured images of a subject captured from a plurality of directions. When the rendering mode is MBR, a panoramic model is generated by synthesizing the background mesh model and the foreground three-dimensional model group generated by the foreground texture boundary color matching unit 3004, and a virtual viewpoint image is generated from the panoramic model.

ＩＢＲとは、対象のシーンを複数視点から撮影した入力画像群を変形、合成することによって仮想視点からの見えを再現した仮想視点画像を生成する技術である。本実施形態では、ＩＢＲを用いる場合、ＭＢＲを用いて三次元モデルを生成するための複数の撮影画像より少ない１又は複数の撮影画像に基づいて仮想視点画像が生成される。レンダリングモードがＩＢＲの場合、背景テクスチャモデルに基づいて仮想視点から見た背景画像が生成され、そこに仮想視点前景画像生成部３００５で生成された前景画像を合成することで仮想視点画像が生成される。なお、レンダリング部３００６はＭＢＲとＩＢＲ以外のレンダリング手法を用いてもよい。 IBR is a technique for generating a virtual viewpoint image that reproduces the appearance from a virtual viewpoint by transforming and synthesizing a group of input images obtained by capturing a target scene from a plurality of viewpoints. In this embodiment, when IBR is used, a virtual viewpoint image is generated based on one or a plurality of captured images, which is smaller than a plurality of captured images for generating a three-dimensional model using the MBR. When the rendering mode is IBR, a background image viewed from a virtual viewpoint is generated based on the background texture model, and a virtual viewpoint image is generated by synthesizing the foreground image generated by the virtual viewpoint foreground image generation unit 3005. To. The rendering unit 3006 may use a rendering method other than MBR and IBR.

レンダリングモード管理部３０１４は、仮想視点画像の生成に用いられる生成方式としてのレンダリングモードを決定し、決定結果を保持する。本実施形態では、レンダリングモード管理部３０１４は、複数のレンダリングモードから使用するレンダリングモードを決定する。この決定は、データ受信部３００１が取得した情報に基づいて行われる。例えば、レンダリングモード管理部３０１４は、取得された情報から特定されるカメラの数が閾値以下である場合に、仮想視点画像の生成に用いられる生成方式をＩＢＲに決定する。 The rendering mode management unit 3014 determines the rendering mode as the generation method used for generating the virtual viewpoint image, and holds the determination result. In the present embodiment, the rendering mode management unit 3014 determines the rendering mode to be used from the plurality of rendering modes. This determination is made based on the information acquired by the data receiving unit 3001. For example, the rendering mode management unit 3014 determines in IBR the generation method used for generating the virtual viewpoint image when the number of cameras specified from the acquired information is equal to or less than the threshold value.

一方、カメラ数が閾値より多い場合は生成方式をＭＢＲに決定する。これにより、カメラ数が多い場合には、ＭＢＲを用いて仮想視点画像を生成することで視点の指定可能範囲が広くなる。また、カメラ数が少ない場合には、ＩＢＲを用いることで、ＭＢＲを用いた場合の三次元モデルの精度の低下による仮想視点画像の画質低下を回避することができる。また、例えば、撮影から画像出力までの許容される処理遅延時間の長短に基づいて生成方式を決めてもよい。遅延時間が長くても視点の自由度を優先する場合はＭＢＲ、遅延時間が短いことを要求する場合はＩＢＲを用いる。また、例えば、コントローラ３００やエンドユーザ端末１９０が仮想視点の高さを指定可能であることを示す情報をデータ受信部３００１が取得した場合には、仮想視点画像の生成に用いられる生成方式をＭＢＲに決定する。これにより、生成方式がＩＢＲであることによってユーザによる仮想視点の高さの変更要求が受け入れられなくなることを防ぐことができる。このように、状況に応じて仮想視点画像の生成方式を決定することで、適切に決定された生成方式で仮想視点画像を生成できる。また、複数のレンダリングモードを要求に応じて切り替え可能な構成にすることで、柔軟にシステムを構成することが可能になり、本実施形態をスタジアム以外の被写体にも適用可能となる。 On the other hand, when the number of cameras is larger than the threshold value, the generation method is determined to be MBR. As a result, when the number of cameras is large, the range in which the viewpoint can be specified becomes wide by generating a virtual viewpoint image using the MBR. Further, when the number of cameras is small, by using the IBR, it is possible to avoid the deterioration of the image quality of the virtual viewpoint image due to the deterioration of the accuracy of the three-dimensional model when the MBR is used. Further, for example, the generation method may be determined based on the length of the allowable processing delay time from shooting to image output. MBR is used when the degree of freedom of the viewpoint is prioritized even if the delay time is long, and IBR is used when the delay time is required to be short. Further, for example, when the data receiving unit 3001 acquires information indicating that the height of the virtual viewpoint can be specified by the controller 300 or the end user terminal 190, the generation method used for generating the virtual viewpoint image is MBR. To decide. As a result, it is possible to prevent the user's request to change the height of the virtual viewpoint from being unacceptable due to the generation method being IBR. In this way, by determining the generation method of the virtual viewpoint image according to the situation, the virtual viewpoint image can be generated by the appropriately determined generation method. Further, by configuring the plurality of rendering modes to be switchable according to the request, the system can be flexibly configured, and the present embodiment can be applied to a subject other than the stadium.

また、データ集結ができないことを示す情報が画像データの代わりに存在した場合、周囲の画像から補間して処理をすることによって、画像の欠損が無いようにすることができる。 Further, when the information indicating that the data cannot be collected exists instead of the image data, it is possible to prevent the image from being lost by performing the processing by interpolating from the surrounding images.

仮想視点音声生成部３００７は、仮想カメラパラメータに基づいて、仮想視点において聞こえる音声（音声群）を生成する。合成部３００８は、レンダリング部３００６で生成された画像群と仮想視点音声生成部３００７で生成された音声を合成して仮想視点コンテンツを生成する。 The virtual viewpoint sound generation unit 3007 generates sound (voice group) that can be heard in the virtual viewpoint based on the virtual camera parameters. The compositing unit 3008 synthesizes the image group generated by the rendering unit 3006 and the voice generated by the virtual viewpoint voice generation unit 3007 to generate virtual viewpoint content.

仮想視点コンテンツ出力部３００９は、コントローラ３００とエンドユーザ端末１９０へＥｔｈｅｒｎｅｔ（登録商標）を用いて仮想視点コンテンツを出力する。ただし、外部への伝送手段はＥｔｈｅｒｎｅｔ（登録商標）に限定されるものではなく、ＳＤＩ、ＤｉｓｐｌａｙＰｏｒｔ、及びＨＤＭＩ（登録商標）などの信号伝送手段を用いてもよい。なお、バックエンドサーバ２７０は、レンダリング部３００６で生成された、音声を含まない仮想視点画像を出力してもよい。 The virtual viewpoint content output unit 3009 outputs virtual viewpoint content to the controller 300 and the end user terminal 190 using Ethernet (registered trademark). However, the transmission means to the outside is not limited to Ethernet (registered trademark), and signal transmission means such as SDI, DisplayPort, and HDMI (registered trademark) may be used. The back-end server 270 may output a virtual viewpoint image that does not include audio and is generated by the rendering unit 3006.

前景オブジェクト決定部３０１０は、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの空間上の位置を示す前景オブジェクトの位置情報から、表示される前景オブジェクト群を決定して、前景オブジェクトリストを出力する。つまり、前景オブジェクト決定部３０１０は、仮想視点の画像情報を物理的なカメラ１１２にマッピングする処理を実施する。 The foreground object determination unit 3010 determines the foreground object group to be displayed from the virtual camera parameters and the position information of the foreground object indicating the position of the foreground object in space included in the foreground 3D model, and outputs the foreground object list. do. That is, the foreground object determination unit 3010 performs a process of mapping the image information of the virtual viewpoint to the physical camera 112.

要求リスト生成部３０１１は、指定時間の前景オブジェクトリストに対応する前景画像群と前景三次元モデル群、及び背景画像と音声データをデータベース２５０に要求するための、要求リストを生成する。前景オブジェクトについては仮想視点を考慮して選択されたデータがデータベース２５０に要求されるが、背景画像と音声データについてはそのフレームに関する全てのデータが要求される。バックエンドサーバ２７０の起動後、背景メッシュモデルが取得されるまで背景メッシュモデルの要求リストが生成される。 The request list generation unit 3011 generates a request list for requesting the foreground image group and the foreground three-dimensional model group corresponding to the foreground object list at the specified time, and the background image and the audio data from the database 250. For the foreground object, the data selected in consideration of the virtual viewpoint is requested from the database 250, but for the background image and the audio data, all the data related to the frame is requested. After starting the backend server 270, a request list of the background mesh model is generated until the background mesh model is acquired.

要求データ出力部３０１２は、入力された要求リストを元に、データベース２５０に対してデータ要求のコマンドを出力する。背景メッシュモデル管理部３０１３は、データベース２５０から受信した背景メッシュモデルを記憶する。 The request data output unit 3012 outputs a data request command to the database 250 based on the input request list. The background mesh model management unit 3013 stores the background mesh model received from the database 250.

カメラ選択部３０１５は、背景画像および前景三次元モデルに含まれる前景オブジェクトから、カメラのフレームレートを設定するための判断およびフレームレートを設定するカメラ識別子の選択をする。また、カメラアダプタ１２０内のフレームレート変更部６１２６に送信するためのフレームレート変更メッセージの生成および送信を行う。なお、フレームレート変更メッセージには、カメラの識別子、ノーマルフレームレートかハイフレームレートかのフレームレートの種別を示すメタ情報が含まれる。 The camera selection unit 3015 makes a determination for setting the frame rate of the camera and selects a camera identifier for setting the frame rate from the background image and the foreground object included in the foreground three-dimensional model. In addition, a frame rate change message for transmission to the frame rate change unit 6126 in the camera adapter 120 is generated and transmitted. The frame rate change message includes a camera identifier and meta information indicating the type of frame rate, which is normal frame rate or high frame rate.

図４は、カメラ選択部３０１５の動作について説明するための図である。図４では、被写体を異なる方向から撮影する複数のカメラ９１００ａ，…，９１００ｇが設置されている。本実施の形態に係るカメラ選択部３０１５は、例えば、球技のライン越え判定が可能な視野を有するカメラをハイフレームレートに設定するカメラとして選択する。すなわち、カメラ選択部３０１５は、注目するオブジェクト（以下、「注目オブジェクト）と呼ぶ）が注目領域に入ったときに、注目領域と所定の位置関係にあるカメラを、ハイフレームレートに設定するカメラとして選択する。例えば、撮影対象がサッカーゲームとする。注目オブジェクトであるボールがフィールドラインの近傍領域（注目領域）に入ったときに、近傍領域に近い位置にあるカメラ、例えば、フィールドラインの延直線上に位置するカメラをハイフレームレートのカメラとして選択する。 FIG. 4 is a diagram for explaining the operation of the camera selection unit 3015. In FIG. 4, a plurality of cameras 9100a, ..., 9100g for photographing a subject from different directions are installed. The camera selection unit 3015 according to the present embodiment selects, for example, a camera having a field of view capable of determining the line crossing of a ball game as a camera for setting a high frame rate. That is, the camera selection unit 3015 serves as a camera that sets a camera having a predetermined positional relationship with the region of interest at a high frame rate when the object of interest (hereinafter referred to as “object of interest”) enters the region of interest. Select. For example, the shooting target is a soccer game. When a ball, which is an object of interest, enters a region near the field line (area of interest), a camera located near the vicinity region, for example, a straight line of the field line. Select the camera located above as the high frame rate camera.

図４を参照して、具体的に、注目オブジェクト９１０１（本例では、ボール）が、地点９１０１ａ→地点９１０１ｂ→地点９１０１ｃと移動し、フィールドライン９１０２の近傍領域９１０３に入った場合について説明する。この場合、カメラ選択部３０１５は、複数のカメラ９１００のうち、フィールドライン９１０２の延直線上に一番近いカメラ９１００ｆを選択する。なお、より画質の高い仮想視点画像を生成するために、その隣り合うカメラ９１００ｅ及び９１００ｇも選択するようにしてもよい。つまり、カメラ選択部３０１５は、注目オブジェクト９１０１がフィールドライン９１０２の近傍領域９１０３に入ったことに応じて、選択範囲９１０４のカメラを選択する。 Specifically, a case where the object of interest 9101 (ball in this example) moves from the point 9101a to the point 9101b to the point 9101c and enters the vicinity region 9103 of the field line 9102 will be described with reference to FIG. In this case, the camera selection unit 3015 selects the camera 9100f closest to the extended straight line of the field line 9102 among the plurality of cameras 9100. In addition, in order to generate a virtual viewpoint image with higher image quality, the adjacent cameras 9100e and 9100g may also be selected. That is, the camera selection unit 3015 selects the camera in the selection range 9104 according to the fact that the object of interest 9101 enters the neighborhood region 9103 of the field line 9102.

カメラの選択のタイミングは、注目領域にオブジェクトが入ったときである。例えば、背景画像におけるフィールドライン９１０２の近傍領域９１０３を、注目領域としてあらかじめ設定しておく。そして、カメラ選択部３０１５は、注目オブジェクトの世界座標系上の座標値と注目領域の座標値とを比較して、注目オブジェクトが注目領域に入ったと判断すると、カメラの選択処理を実行する。 The timing of camera selection is when the object enters the area of interest. For example, the neighborhood area 9103 of the field line 9102 in the background image is set in advance as a region of interest. Then, the camera selection unit 3015 compares the coordinate values on the world coordinate system of the object of interest with the coordinate values of the region of interest, and when it is determined that the object of interest has entered the region of interest, the camera selection process is executed.

注目領域は、撮影対象において注目されるシーンや場所に基づいて適宜設定される。例えば、サッカーであれば、ラインやゴールポストからの距離情報、野球であれば、各ベース付近、マウンド、ポール等に基づいて設定される。このように設定することによって、生成するリプレイ画像をスポーツ競技の判定に使用したり、重要なプレイが行われやすい範囲をスロー再生したりすることが可能となる。 The area of interest is appropriately set based on the scene or place of interest in the shooting target. For example, in the case of soccer, it is set based on the distance information from the line or goal post, and in the case of baseball, it is set based on the vicinity of each base, the mound, the pole, and the like. By setting in this way, it is possible to use the generated replay image for determining a sports competition, or to slow-play a range in which important play is likely to be performed.

図５は、選択されるカメラを説明するための図である。図５において、競技場（スタジアム）のフィールド９２０５周辺に設置されたカメラ９２００ａ，…，９２００ｚ，９２００Ａ，…，９２００Ｄのうち、黒塗りのカメラは、ハイフレームレート、白抜きのカメラは、ノーマルフレームレートのカメラである。 FIG. 5 is a diagram for explaining the selected camera. In FIG. 5, among the cameras 9200a, ..., 9200z, 9200A, ..., 9200D installed around the field 9205 of the stadium, the black-painted camera has a high frame rate, and the white camera has a normal frame. It's a rate camera.

カメラ選択部３０１５は、注目オブジェクト９２０１がフィールドライン９２０２を超えたかどうかを判定可能な視野を有するカメラを選択する。すなわち、注目オブジェクト９２０１がフィールドライン９２０２の近傍領域に入ったとき、フィールドライン９２０２の延直線９２０３に近い位置に設置されたカメラが選択される。具体的には、注目オブジェクト９２０１を第１の方向から撮影するカメラ９２００ｉ，９２００ｊと、第１の方向と対向する第２の方向から撮影するカメラ９２００ｐ，９２００ｑ，である。なお、選択されるカメラの台数は、状況に応じて適宜設定すればよい。 The camera selection unit 3015 selects a camera having a field of view capable of determining whether or not the object of interest 9201 has crossed the field line 9202. That is, when the object of interest 9201 enters the region near the field line 9202, a camera installed at a position close to the straight line 9203 of the field line 9202 is selected. Specifically, the cameras 9200i and 9200j shoot the object of interest 9201 from the first direction, and the cameras 9200p and 9200q shoot from the second direction facing the first direction. The number of selected cameras may be appropriately set according to the situation.

加えて、カメラ選択部３０１５は、注目オブジェクト９２０１の座標位置を基準として、延直線９２０３を９０度回転させた、すなわち、延直線９２０３と注目オブジェクト９２０１の位置で直交する延直線９２０４に近い位置に設置されたカメラを選択する。具体的には、カメラ９２００ｍ，９２００ｎと、カメラ９２００Ａ，９２００Ｂである。これは、リプレイ画像として仮想視点画像を閲覧する場合に様々な方向から閲覧したい要求に対応するためである。 In addition, the camera selection unit 3015 rotates the straight line 9203 by 90 degrees with respect to the coordinate position of the object of interest 9201, that is, at a position close to the straight line 9204 orthogonal to the position of the straight line 9203 and the object of interest 9201. Select the installed camera. Specifically, the cameras are 9200m and 9200n, and the cameras are 9200A and 9200B. This is to meet the demand for viewing from various directions when viewing the virtual viewpoint image as a replay image.

さらに、カメラ選択部３０１５は、それ以外のカメラを、例えば、交互に、すなわち、１台おきにハイフレームレートのカメラをとして選択する。具体的には、カメラ９２００Ｄ，９２００ｂ，９２００ｄ，９２００ｆ，９２００ｈ，９２００ｌ，９２００ｒ，９２００ｔ，９２００ｖ，９２００ｘ，９２００ｚである。これは、上記と同様に、仮想視点画像を生成するためであって、前述したＭＢＲにより、複数の方向から撮影した複数の撮影画像に基づいて三次元モデルを生成するためである。 Further, the camera selection unit 3015 selects other cameras, for example, alternately, that is, every other camera with a high frame rate. Specifically, the cameras are 9200D, 9200b, 9200d, 9200f, 9200h, 9200l, 9200r, 9200t, 9200v, 9200x, 9200z. This is to generate a virtual viewpoint image as described above, and to generate a three-dimensional model based on a plurality of captured images captured from a plurality of directions by the above-mentioned MBR.

カメラ選択部３０１５は、最終的に選択されたカメラについて、フレームレートをハイフレームレートとするフレームレート変更メッセージを生成する。また、選択されなかったカメラについて、フレームレートをノーマルフレームレートとするフレームレート変更メッセージを生成する。生成されたフレームレート変更メッセージは、ネットワーク２７０ａを経由して各センサシステム１１０に入力される。各センサシステム１１０のカメラアダプタ１２０はこれを受信し、フレームレート変更部６１２６（図２参照）に入力し、メッセージの内容に応じてフレームレート変更部６１２６がフレームレートを変更する。 The camera selection unit 3015 generates a frame rate change message with the frame rate as the high frame rate for the finally selected camera. It also generates a frame rate change message with the frame rate as the normal frame rate for the unselected cameras. The generated frame rate change message is input to each sensor system 110 via the network 270a. The camera adapter 120 of each sensor system 110 receives this and inputs it to the frame rate changing unit 6126 (see FIG. 2), and the frame rate changing unit 6126 changes the frame rate according to the content of the message.

なお、ノーマルフレームレートに設定されたカメラが撮影した画像データの時刻においては、ハイフレームレートに設定されたカメラで同一時刻に撮影した画像データがデータベース２５０に存在することになる。すなわち、ノーマルフレームレートとハイフレームレートの全てのカメラで撮影された同一時刻の画像データがデータベース２５０には集結している。しかしながら、ノーマルフレームレートに設定されたカメラが撮影する時刻以外のフレームにおいては、ハイフレームレートに設定されたカメラの画像データのみがデータベース２５０に存在することになる。したがって、ノーマルフレームレートに設定されたカメラが撮影する時刻以外では、全てのカメラの画像データがデータベース２５０には集結できていないことになる。 At the time of the image data taken by the camera set to the normal frame rate, the image data taken by the camera set to the high frame rate at the same time exists in the database 250. That is, image data at the same time taken by all cameras having a normal frame rate and a high frame rate are collected in the database 250. However, in the frame other than the time taken by the camera set to the normal frame rate, only the image data of the camera set to the high frame rate exists in the database 250. Therefore, the image data of all the cameras cannot be collected in the database 250 except the time when the camera set to the normal frame rate shoots.

データベース２５０に画像データが集結できている場合は、先述したようにバックエンドサーバ２７０でレンダリング処理されて仮想視点画像を生成することができる。しかしながら、データベース２５０にある時刻における画像データが集結できていない場合は、バックエンドサーバ２７０で仮想視点画像を生成することができない。 When the image data is collected in the database 250, the back-end server 270 can perform rendering processing to generate a virtual viewpoint image as described above. However, if the image data at the time in the database 250 cannot be collected, the back-end server 270 cannot generate the virtual viewpoint image.

そこで、本実施の形態では、カメラ選択部３０１５は、上述のように、少なくとも１台おきにノーマルフレームレートとハイフレームレートのカメラとして選択する。すなわち、同一時刻に、隣り合うカメラのうちの少なくとも一つのカメラの画像データが存在するように設定される。したがって、当該時刻に画像データが無いカメラの画像データは、隣接するカメラの画像データから当該カメラの画像データを生成することが可能となる。 Therefore, in the present embodiment, as described above, the camera selection unit 3015 selects at least every other camera as a camera having a normal frame rate and a high frame rate. That is, the image data of at least one of the adjacent cameras is set to exist at the same time. Therefore, the image data of the camera having no image data at the time can generate the image data of the camera from the image data of the adjacent camera.

仮想視点前景画像生成部３００５において、当該画像データの生成は、例えば、射影変換を用いる。射影変換は、一般的な画像処理なのでここで詳細は説明しないが、隣接するカメラの平面の画像データを用いて、当該カメラの平面に射影する変換である。この変換によって、フレームデータの無い当該カメラの画像データを生成することができる。また、生成したい時刻の前後におけるノーマルフレームレートの画像データのピクセルデータの動き情報から新たなピクセルデータを計算し画像データを生成してもよい。 In the virtual viewpoint foreground image generation unit 3005, for example, a projective transformation is used to generate the image data. Since the projective transformation is a general image process, details will not be described here, but it is a transformation that projects onto the plane of the camera by using the image data of the plane of the adjacent camera. By this conversion, it is possible to generate image data of the camera without frame data. Further, new pixel data may be calculated from the motion information of the pixel data of the image data of the normal frame rate before and after the time to be generated, and the image data may be generated.

なお、上述した画像データが集結できている場合と画像データが集結できていない場合の各処理の実行は、バックエンドサーバ２７０が読み出すデータベース２５０が保持する状態管理テーブルのデータ集結状態に基づく。ところで、上記の構成では、少なくとも１台おきにノーマルフレームレートとハイフレームレートにカメラが設定されるとした。しかしながら、この構成に限定されるわけではなく、画像データが集結できていない場合の射影変換による画像データ生成が問題なく可能であれば必ずしも１台おきである必要はない。 The execution of each process when the above-mentioned image data can be collected and when the image data cannot be collected is based on the data collection state of the state management table held by the database 250 read by the back-end server 270. By the way, in the above configuration, it is assumed that the cameras are set to the normal frame rate and the high frame rate at least every other camera. However, the configuration is not limited to this, and it is not always necessary to use every other unit if it is possible to generate image data by projective transformation when the image data cannot be collected.

図６は、仮想カメラ８００１について説明するための図である。図６（ａ）において、仮想カメラ８００１は、設置されたどのカメラ１１２とも異なる視点において撮影を行うことができる仮想的なカメラである。すなわち、画像処理システム１００において生成される仮想視点画像が、仮想カメラ８００１による撮影画像である。図６（ａ）において、円周上に配置された複数のセンサシステム１１０それぞれがカメラ１１２を有している。例えば、仮想視点画像を生成することにより、あたかもサッカーゴールの近くの仮想カメラ８００１で撮影されたかのような画像を生成することができる。仮想カメラ８００１の撮影画像である仮想視点画像は、設置された複数のカメラ１１２の画像を画像処理することで生成される。オペレータ（ユーザ）は仮想カメラ８００１の位置等操作することで、自由な視点（視線方向）からの撮影画像を得ることができる。図６（ｂ）において、仮想カメラパス８００２は、仮想カメラ８００１の１フレームごとの位置や姿勢を表す情報の列を示している。詳細は後述する。 FIG. 6 is a diagram for explaining the virtual camera 8001. In FIG. 6A, the virtual camera 8001 is a virtual camera capable of taking a picture from a viewpoint different from that of any installed camera 112. That is, the virtual viewpoint image generated by the image processing system 100 is an image taken by the virtual camera 8001. In FIG. 6A, each of the plurality of sensor systems 110 arranged on the circumference has a camera 112. For example, by generating a virtual viewpoint image, it is possible to generate an image as if it was taken by a virtual camera 8001 near a soccer goal. The virtual viewpoint image, which is an image captured by the virtual camera 8001, is generated by image processing the images of a plurality of installed cameras 112. The operator (user) can obtain a captured image from a free viewpoint (line-of-sight direction) by operating the position of the virtual camera 8001 and the like. In FIG. 6B, the virtual camera path 8002 shows a sequence of information representing the position and posture of the virtual camera 8001 for each frame. Details will be described later.

図７は、仮想カメラ操作ＵＩ３３０の機能構成を説明するためのブロック図である。仮想カメラ操作ＵＩ３３０は、仮想カメラ管理部８１３０および操作ＵＩ部８１２０を有する。これらは同一機器上に実装されてもよいし、それぞれサーバとなる装置とクライアントとなる装置に別々に実装されてもよい。 FIG. 7 is a block diagram for explaining the functional configuration of the virtual camera operation UI 330. The virtual camera operation UI 330 includes a virtual camera management unit 8130 and an operation UI unit 8120. These may be mounted on the same device, or may be mounted separately on the server device and the client device, respectively.

仮想カメラ操作部８１０１は、オペレータの仮想カメラ８００１に対する操作、すなわち仮想視点画像の生成に係る視点を指定するためのユーザによる指示を受け付ける。オペレータの操作内容は、例えば、仮想カメラ８００１の位置の変更（移動）、姿勢の変更（回転）、及びズーム倍率の変更などである。また、仮想カメラ操作部８１０１は、ライブ画像およびリプレイ画像の生成に利用される。リプレイ画像を生成する際は、カメラの位置及び姿勢の他に時間を指定する操作が行われる。リプレイ画像では、例えば、時間を止めて仮想カメラ８００１を移動させることも可能である。 The virtual camera operation unit 8101 receives an operation on the virtual camera 8001 of the operator, that is, an instruction by the user for designating a viewpoint related to the generation of the virtual viewpoint image. The operator's operation content is, for example, a change in the position (movement) of the virtual camera 8001, a change in the posture (rotation), a change in the zoom magnification, and the like. Further, the virtual camera operation unit 8101 is used to generate a live image and a replay image. When generating a replay image, an operation of specifying a time in addition to the position and posture of the camera is performed. In the replay image, for example, it is possible to stop the time and move the virtual camera 8001.

仮想カメラパラメータ導出部８１０２は、仮想カメラ８００１の位置や姿勢などを表す仮想カメラパラメータを導出する。仮想パラメータは、演算によって導出されてもよいし、ルックアップテーブルの参照などによって導出されてもよい。仮想カメラパラメータとして、例えば、外部パラメータを表す行列と内部パラメータを表す行列が用いられる。ここで、仮想カメラ８００１の位置と姿勢は外部パラメータに含まれ、ズーム値は内部パラメータに含まれる。 The virtual camera parameter derivation unit 8102 derives virtual camera parameters representing the position, orientation, and the like of the virtual camera 8001. The virtual parameter may be derived by an operation, or may be derived by a reference to a look-up table or the like. As the virtual camera parameters, for example, a matrix representing an external parameter and a matrix representing an internal parameter are used. Here, the position and orientation of the virtual camera 8001 are included in the external parameters, and the zoom value is included in the internal parameters.

仮想カメラ制約管理部８１０３は、仮想カメラ操作部８１０１により受け付けられる指示に基づく視点の指定が制限される制限領域を特定するための情報を取得し管理する。この情報は、例えば、仮想カメラ８００１の位置や姿勢、ズーム値などに関する制約である。 The virtual camera constraint management unit 8103 acquires and manages information for specifying a restricted area in which the designation of the viewpoint based on the instruction received by the virtual camera operation unit 8101 is restricted. This information is, for example, a constraint regarding the position, posture, zoom value, etc. of the virtual camera 8001.

衝突判定部８１０４は、仮想カメラパラメータ導出部８１０２で導出された仮想カメラパラメータが仮想カメラ制約を満たしているかを判定する。制約を満たしていない場合は、例えば、オペレータによる操作入力をキャンセルし、制約を満たす位置から仮想カメラ８００１が動かないよう制御したり、制約を満たす位置に仮想カメラ８００１を戻したりする。 The collision determination unit 8104 determines whether the virtual camera parameter derived by the virtual camera parameter derivation unit 8102 satisfies the virtual camera constraint. If the constraint is not satisfied, for example, the operation input by the operator is canceled, the virtual camera 8001 is controlled not to move from the position satisfying the constraint, or the virtual camera 8001 is returned to the position satisfying the constraint.

フィードバック出力部８１０５は、衝突判定部８１０４の判定結果をオペレータにフィードバックする。例えば、オペレータの操作により、仮想カメラ制約が満たされなくなる場合に、そのことをオペレータに通知する。 The feedback output unit 8105 feeds back the determination result of the collision determination unit 8104 to the operator. For example, when the virtual camera constraint is not satisfied by the operator's operation, the operator is notified.

仮想カメラパス管理部８１０６は、オペレータの操作に応じた仮想カメラ８００１のパス（仮想カメラパス８００２）を管理する。図６（ｂ）において示したように、仮想カメラパス８００２とは、仮想カメラ８００１の１フレームごとの位置や姿勢を表す情報の列である。例えば、仮想カメラ８００１の位置や姿勢を表す情報として仮想カメラパラメータが用いられる。例えば、６０フレーム／秒のフレームレートの設定における１秒分の情報は、６０個の仮想カメラパラメータの列となる。仮想カメラパス管理部８１０６は、衝突判定部８１０４で判定済みの仮想カメラパラメータを、バックエンドサーバ２７０に送信する。バックエンドサーバ２７０は、受信した仮想カメラパラメータを用いて、仮想視点画像及び仮想視点音声を生成する。また、仮想カメラパス管理部８１０６は、仮想カメラパラメータを仮想カメラパス８００２に付加して保持する機能も有する。例えば、仮想カメラ操作ＵＩ３３０を用いて、１時間分の仮想視点画像及び仮想視点音声を生成した場合、１時間分の仮想カメラパラメータが仮想カメラパス８００２として保存される。 The virtual camera path management unit 8106 manages the path (virtual camera path 8002) of the virtual camera 8001 according to the operation of the operator. As shown in FIG. 6B, the virtual camera path 8002 is a sequence of information representing the position and orientation of the virtual camera 8001 for each frame. For example, virtual camera parameters are used as information representing the position and posture of the virtual camera 8001. For example, the information for one second in the setting of the frame rate of 60 frames / second is a sequence of 60 virtual camera parameters. The virtual camera path management unit 8106 transmits the virtual camera parameters determined by the collision determination unit 8104 to the back-end server 270. The back-end server 270 uses the received virtual camera parameters to generate a virtual viewpoint image and a virtual viewpoint sound. Further, the virtual camera path management unit 8106 also has a function of adding and holding virtual camera parameters to the virtual camera path 8002. For example, when the virtual camera operation UI 330 is used to generate the virtual viewpoint image and the virtual viewpoint sound for one hour, the virtual camera parameters for one hour are saved as the virtual camera path 8002.

オーサリング部８１０７は、オペレータがリプレイ画像を生成する際の編集機能を提供する。オーサリング部８１０７は、ユーザ操作に応じて、リプレイ画像用の仮想カメラパス８００２の初期値として、仮想カメラパス管理部８１０６が保持する仮想カメラパス８００２の一部を取り出す。仮想カメラ画像・音声出力部８１０８は、バックエンドサーバ２７０から受け取った仮想カメラ画像・音声を出力する。オペレータは出力された画像及び音声を確認しながら仮想カメラ８００１を操作する。 The authoring unit 8107 provides an editing function when the operator generates a replay image. The authoring unit 8107 takes out a part of the virtual camera path 8002 held by the virtual camera path management unit 8106 as the initial value of the virtual camera path 8002 for the replay image according to the user operation. The virtual camera image / sound output unit 8108 outputs the virtual camera image / sound received from the back-end server 270. The operator operates the virtual camera 8001 while checking the output image and sound.

＜カメラ選択部３０１５の処理の説明＞
続いて、バックエンドサーバ２７０のカメラ選択部３０１５の処理について説明する。図８は、カメラ選択部３０１５の処理を示すフローチャートである。 <Explanation of processing of camera selection unit 3015>
Subsequently, the processing of the camera selection unit 3015 of the back-end server 270 will be described. FIG. 8 is a flowchart showing the processing of the camera selection unit 3015.

ステップＳ１０１において、カメラ選択部３０１５は、前景画像に注目オブジェクトがあるかどうかの判断を実行する。この処理は、撮影シーンに応じて適宜設定される注目オブジェクト（例えば、ボールなど）が前景画像に含まれているかどうかの判断を実行する。この判断は、例えば、画像データから特徴量を抽出し、学習アルゴリズムによって得られる学習結果データに照合して認識結果を獲得することで実行する。また、動体検出または画像認識など、所定の画像処理により注目オブジェクトを検出してもよい。これに限定されず、ユーザが制御ステーション３１０などから手動で指定しても構わない。前景画像に注目オブジェクトがあると判断した場合（ステップＳ１０１において、ＹＥＳ）、処理はステップＳ１０２に進む。そうでない場合（ステップＳ１０１において、ＮＯ）、処理はステップＳ１１０に進む。 In step S101, the camera selection unit 3015 determines whether or not there is an object of interest in the foreground image. This process executes determination of whether or not the foreground image contains an object of interest (for example, a ball) that is appropriately set according to the shooting scene. This determination is performed, for example, by extracting a feature amount from image data and collating it with the learning result data obtained by a learning algorithm to obtain a recognition result. Further, the object of interest may be detected by a predetermined image process such as motion detection or image recognition. The present invention is not limited to this, and the user may manually specify it from the control station 310 or the like. If it is determined that the foreground image has an object of interest (YES in step S101), the process proceeds to step S102. If not (NO in step S101), the process proceeds to step S110.

ステップＳ１０２において、カメラ選択部３０１５は、背景画像に注目するライン（注目ライン）があるかどうかの判断を実行する。この処理は、データベース２５０から出力される背景画像データに基づく。注目ラインは、あらかじめエンドユーザ端末１９０を用いて、ユーザにより設定されている。フィールドのラインなど撮影シーンに応じて適宜設定された注目される範囲が背景画像データに含まれているかを確認する処理を行う。背景画像に注目ラインがあると判断した場合（ステップＳ１０３において、ＹＥＳ）、処理はステップＳ１０３に進む。そうでない場合（ステップＳ１０２において、ＮＯ）、処理はステップＳ１１０に進む。なお、注目ラインは、例えば、競技を行うフィールド上のラインであってよい。また、注目ラインは、注目オブジェクトの位置を通る直線であってもよい。 In step S102, the camera selection unit 3015 determines whether or not there is a line (attention line) of interest in the background image. This process is based on the background image data output from the database 250. The attention line is set in advance by the user using the end user terminal 190. Performs a process of confirming whether the background image data includes a range of interest, such as a field line, which is appropriately set according to the shooting scene. If it is determined that the background image has a line of interest (YES in step S103), the process proceeds to step S103. If not (NO in step S102), the process proceeds to step S110. The line of interest may be, for example, a line on the field where the competition is performed. Further, the line of interest may be a straight line passing through the position of the object of interest.

ステップＳ１０３において、カメラ選択部３０１５は、注目オブジェクトと注目ラインの座標位置を比較する処理を実行する。この処理は、ステップＳ１０１とステップＳ１０２で抽出されたオブジェクトとラインなどの世界座標系上の座標情報を用いて比較処理を行い、差分情報などの比較処理結果を一時的に保存する。 In step S103, the camera selection unit 3015 executes a process of comparing the coordinate positions of the object of interest and the line of interest. In this process, the comparison process is performed using the coordinate information on the world coordinate system such as the object and the line extracted in step S101 and step S102, and the comparison process result such as the difference information is temporarily saved.

ステップＳ１０４において、カメラ選択部３０１５は、注目オブジェクトが注目領域に存在するかどうかの判断を実行する。例えば、図５においては、カメラ選択部３０１５は、注目オブジェクト９２０１がフィールドライン９２０２の近傍領域に存在するかどうか判断する。注目オブジェクトが注目領域内と判断された場合（ステップＳ１０４において、ＹＥＳ）、処理はステップＳ１０５に進む。そうでない場合（ステップＳ１０４において、ＮＯ）、処理はステップＳ１１０に進む。なお、注目領域は、例えば、競技を行うフィールド上のセンターエリア、ペナルティエリア、キックマークの近傍、サイドラインの近傍、ゴールラインの近傍またはゴールエリアであってもよい。 In step S104, the camera selection unit 3015 determines whether or not the object of interest exists in the region of interest. For example, in FIG. 5, the camera selection unit 3015 determines whether or not the object of interest 9201 exists in the vicinity of the field line 9202. If it is determined that the object of interest is within the region of interest (YES in step S104), the process proceeds to step S105. If not (NO in step S104), the process proceeds to step S110. The area of interest may be, for example, a center area on the field where the competition is played, a penalty area, a vicinity of a kick mark, a vicinity of a side line, a vicinity of a goal line, or a goal area.

ステップＳ１０５において、カメラ選択部３０１５は、ステップＳ１０２で獲得した注目ラインの座標値データに基づき、注目ラインの延直線上に近い位置にあるカメラを選択する。例えば、図５においては、フィールドライン９２０２の延直線９２０３に近い位置にあるカメラ９２００ｉ，９２００ｊ、及びカメラ９２００ｑ，９２００ｐが選択される。なお、カメラの設置情報は、あらかじめ世界座標系の座標値として位置づけられており、適宜設置情報を読み出し、平面の座標系に射影変換するなどして、対象となるカメラを特定する。 In step S105, the camera selection unit 3015 selects a camera located near the extension line of the line of interest based on the coordinate value data of the line of interest acquired in step S102. For example, in FIG. 5, the cameras 9200i, 9200j and the cameras 9200q, 9200p located near the straight line 9203 of the field line 9202 are selected. The camera installation information is positioned in advance as the coordinate values of the world coordinate system, and the target camera is specified by appropriately reading the installation information and projecting and transforming it into a plane coordinate system.

ステップＳ１０６において、カメラ選択部３０１５は、ステップＳ１０２で獲得した注目ラインの座標を注目オブジェクトの座標位置を基準に９０度回転させ、回転させた注目ラインの延直線に近い位置にあるカメラを選択する。例えば、図５においては、延直線９２０３を９０度回転させて得られた延直線９２０４に近いカメラ９２００Ａ，９２００Ｂ、及びカメラ９２００ｍ，９２００ｎが選択される。なお、ステップＳ１０２で獲得した注目ラインの座標を注目オブジェクトの座標位置を基準に９０度で回転ではなく、任意の角度で回転させてよい。また、注目ラインの座標を注目オブジェクトの座標位置を基準に、例えば、６０度～１２０度など、幅がある角度帯で回転させてもよい。 In step S106, the camera selection unit 3015 rotates the coordinates of the attention line acquired in step S102 by 90 degrees with respect to the coordinate position of the attention object, and selects a camera at a position close to the extended straight line of the rotated attention line. .. For example, in FIG. 5, cameras 9200A and 9200B and cameras 9200m and 9200n, which are close to the straight line 9204 obtained by rotating the straight line 9203 by 90 degrees, are selected. The coordinates of the line of interest acquired in step S102 may be rotated at an arbitrary angle instead of rotating at 90 degrees with respect to the coordinate position of the object of interest. Further, the coordinates of the line of interest may be rotated in an angle band having a width such as 60 degrees to 120 degrees with respect to the coordinate position of the object of interest.

ステップＳ１０７において、カメラ選択部３０１５は、ステップＳ１０５とステップＳ１０６で選択したカメラを除くカメラについて、例えば、１台おきにカメラを選択する処理を実行する。ステップＳ１０８において、カメラ選択部３０１５は、選択されたカメラに対してフレームレートを変更する情報を含めたフレームレート変更メッセージを作成する。 In step S107, the camera selection unit 3015 executes, for example, a process of selecting every other camera for the cameras other than the cameras selected in steps S105 and S106. In step S108, the camera selection unit 3015 creates a frame rate change message including information for changing the frame rate for the selected camera.

ステップＳ１０９において、カメラ選択部３０１５は、カメラアダプタ１２０のフレームレート変更部６１２６に対してフレームレート変更メッセージを送信し、処理を終了する。ステップＳ１１０において、カメラ選択部３０１５は、判断結果に基づいたメタ情報を生成し、エンドユーザ端末１９０に出力して処理を終了する。 In step S109, the camera selection unit 3015 transmits a frame rate change message to the frame rate change unit 6126 of the camera adapter 120, and ends the process. In step S110, the camera selection unit 3015 generates meta information based on the determination result, outputs the meta information to the end user terminal 190, and ends the process.

上述した一連の処理とすることで、複数のカメラを備えた画像処理システムにおいて、注目オブジェクトを撮影するカメラを選択的にハイフレームレートにするメッセージを生成および出力することが可能となる。 By performing the above-mentioned series of processing, in an image processing system including a plurality of cameras, it is possible to generate and output a message that selectively sets the camera that shoots the object of interest to a high frame rate.

＜仮想視点前景画像生成部３００５の処理の説明＞
図９は、バックエンドサーバ２７０の仮想視点前景画像生成部３００５の処理を示すフローチャートである。本システムでは、ノーマルフレームレートとハイフレームレートのカメラが設定されることから、ある時刻においてデータベース２５０に画像データが集結できている場合と画像データが集結できていない場合がある。仮想視点前景画像生成部３００５は、画像データが集結できていない場合は、画像データを生成する処理を行う。仮想視点前景画像生成部３００５の処理の後、レンダリング部３００６において、仮想視点画像が生成される。 <Explanation of processing of virtual viewpoint foreground image generation unit 3005>
FIG. 9 is a flowchart showing the processing of the virtual viewpoint foreground image generation unit 3005 of the back-end server 270. In this system, since cameras with a normal frame rate and a high frame rate are set, image data may or may not be collected in the database 250 at a certain time. The virtual viewpoint foreground image generation unit 3005 performs a process of generating image data when the image data cannot be collected. After the processing of the virtual viewpoint foreground image generation unit 3005, the rendering unit 3006 generates the virtual viewpoint image.

ステップＳ２０１において、仮想視点前景画像生成部３００５は、データベース２５０からデータ集結状態を示す状態管理テーブルを読み出す。ステップＳ２０２において、仮想視点前景画像生成部３００５は、仮想カメラパラメータに基づき、ある時刻において、仮想視点画像生成に必要な対象となるカメラの画像データが全て存在しているかどうかを読み出した状態管理テーブルから判断する。対象となるカメラの画像データが全て存在していると判断した場合（ステップＳ２０２において、ＹＥＳ）、処理は終了する。対象となるカメラの画像データのうち存在していないものがあると判断した場合（ステップＳ２０２において、ＮＯ）、処理はステップＳ２０３に進む。 In step S201, the virtual viewpoint foreground image generation unit 3005 reads out the state management table indicating the data collection state from the database 250. In step S202, the virtual viewpoint foreground image generation unit 3005 reads out whether or not all the image data of the target camera necessary for virtual viewpoint image generation exists at a certain time based on the virtual camera parameter. Judge from. When it is determined that all the image data of the target camera exists (YES in step S202), the process ends. If it is determined that some of the image data of the target camera does not exist (NO in step S202), the process proceeds to step S203.

ステップＳ２０３において、仮想視点前景画像生成部３００５は、対象となるカメラのカメラ識別子のうち、例えば、一番数値の小さいものを初期値としてセットする。ステップＳ２０４において、仮想視点前景画像生成部３００５は、セットされたカメラ識別子の画像データがあるかどうかを、状態管理テーブルに基づき判断する。画像データが無いと判断した場合（ステップＳ２０４において、ＮＯ）、処理はステップＳ２０５に進む。画像データがあると判断した場合（ステップＳ２０４において、ＹＥＳ）、処理はステップＳ２０７に進む。 In step S203, the virtual viewpoint foreground image generation unit 3005 sets, for example, the camera identifier of the target camera with the smallest numerical value as an initial value. In step S204, the virtual viewpoint foreground image generation unit 3005 determines whether or not there is image data of the set camera identifier based on the state management table. If it is determined that there is no image data (NO in step S204), the process proceeds to step S205. If it is determined that there is image data (YES in step S204), the process proceeds to step S207.

ステップＳ２０５において、仮想視点前景画像生成部３００５は、画像データを生成するために、画像補間に使用するためのデータをデータベース２５０から取得する。なお、ここで、データベース２５０から取得するデータとしては、例えば、当該カメラに隣接するカメラの画像データである。 In step S205, the virtual viewpoint foreground image generation unit 3005 acquires data for use in image interpolation from the database 250 in order to generate image data. Here, the data acquired from the database 250 is, for example, image data of a camera adjacent to the camera.

ステップＳ２０６において、仮想視点前景画像生成部３００５は、ステップＳ２０５で取得した画像データを射影変換するなどして、対象となるカメラの補間画像を生成する処理を実行する。 In step S206, the virtual viewpoint foreground image generation unit 3005 executes a process of generating an interpolated image of the target camera by performing a projective transformation of the image data acquired in step S205.

ステップＳ２０７では、仮想視点前景画像生成部３００５は、対象となるカメラの画像データが全て存在するかどうかのチェックが終了したか判断する。チェックが終了していないと判断した場合（ステップＳ２０７において、ＮＯ）、処理はステップＳ２０４に戻り、対象となるカメラのカメラ識別子のうち、現在のカメラ識別子の次に数値の小さいものについてチェックする。一方、対象となるカメラの画像データが全て存在するかどうかのチェックが終了したと判断した場合（ステップＳ２０７において、ＹＥＳ）、仮想視点前景画像生成部３００５は、処理を終了する。 In step S207, the virtual viewpoint foreground image generation unit 3005 determines whether or not the check for the existence of all the image data of the target camera has been completed. If it is determined that the check has not been completed (NO in step S207), the process returns to step S204, and among the camera identifiers of the target cameras, the one with the smallest numerical value next to the current camera identifier is checked. On the other hand, when it is determined that the check for whether or not all the image data of the target camera exists (YES in step S207), the virtual viewpoint foreground image generation unit 3005 ends the process.

なお、本実施形態において、ハイフレームレートのカメラ台数は、あらかじめ全体のカメラ台数とデータ容量で決定されているが、この構成に限定されるわけではない。例えば、フロントエンドサーバ２３０内に帯域監視部を構成してもよい。この場合、帯域監視部は、帯域を伝送するデータ容量に応じてカメラ台数を計算し、カメラ選択部３０１５にフィードバック制御することになる。 In the present embodiment, the number of high frame rate cameras is determined in advance by the total number of cameras and the data capacity, but the configuration is not limited to this. For example, a bandwidth monitoring unit may be configured in the front-end server 230. In this case, the band monitoring unit calculates the number of cameras according to the data capacity for transmitting the band, and feedback control is performed to the camera selection unit 3015.

＜ハードウェア構成について＞
続いて、本実施形態を構成する各装置のハードウェア構成について説明する。上述の通り、本実施形態では、カメラアダプタ１２０がＦＰＧＡ及び／又はＡＳＩＣなどのハードウェアを実装し、これらのハードウェアによって、上述した各処理を実行する場合の例を中心に説明した。それはセンサシステム１１０内の各種装置や、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、及びコントローラ３００についても同様である。しかしながら、上記装置のうち、少なくとも何れかが、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用い、ソフトウェア処理によって本実施形態の処理を実行するようにしても良い。 <About hardware configuration>
Subsequently, the hardware configuration of each device constituting the present embodiment will be described. As described above, in the present embodiment, an example in which the camera adapter 120 implements hardware such as FPGA and / or ASIC and executes each of the above-mentioned processes by these hardware has been mainly described. The same applies to the various devices in the sensor system 110, the front-end server 230, the database 250, the back-end server 270, and the controller 300. However, at least one of the above devices may use, for example, a CPU, GPU, DSP, or the like to execute the processing of the present embodiment by software processing.

図１０は、カメラアダプタ１２０のハードウェア構成を示すブロック図である。なお、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０、及びエンドユーザ端末１９０などの装置も、図１０のハードウェア構成となりうる。カメラアダプタ１２０は、ＣＰＵ１２０１と、ＲＯＭ１２０２と、ＲＡＭ１２０３と、補助記憶装置１２０４と、表示部１２０５と、操作部１２０６と、通信部１２０７と、バス１２０８とを有する。 FIG. 10 is a block diagram showing a hardware configuration of the camera adapter 120. Devices such as the front-end server 230, the database 250, the back-end server 270, the control station 310, the virtual camera operation UI 330, and the end-user terminal 190 may also have the hardware configuration shown in FIG. The camera adapter 120 includes a CPU 1201, a ROM 1202, a RAM 1203, an auxiliary storage device 1204, a display unit 1205, an operation unit 1206, a communication unit 1207, and a bus 1208.

ＣＰＵ１２０１は、ＲＯＭ１２０２やＲＡＭ１２０３に格納されているコンピュータプログラムやデータを用いてカメラアダプタ１２０の全体を制御する。ＲＯＭ１２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ１２０３は、補助記憶装置１２０４から供給されるプログラムやデータ、及び通信部１２０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１２０４は、例えばハードディスクドライブ等で構成され、静止画や動画などのコンテンツデータを記憶する。 The CPU 1201 controls the entire camera adapter 120 by using computer programs and data stored in the ROM 1202 and the RAM 1203. ROM 1202 stores programs and parameters that do not need to be changed. The RAM 1203 temporarily stores programs and data supplied from the auxiliary storage device 1204, data supplied from the outside via the communication unit 1207, and the like. The auxiliary storage device 1204 is composed of, for example, a hard disk drive or the like, and stores content data such as still images and moving images.

表示部１２０５は、例えば、液晶ディスプレイ等で構成され、ユーザがカメラアダプタ１２０を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部１２０６は、例えば、キーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１２０１に入力する。通信部１２０７は、カメラ１１２やフロントエンドサーバ２３０などの外部の装置と通信を行う。バス１２０８は、カメラアダプタ１２０の各部を繋いで情報を伝達する。 The display unit 1205 is composed of, for example, a liquid crystal display or the like, and displays a GUI (Graphical User Interface) or the like for the user to operate the camera adapter 120. The operation unit 1206 is composed of, for example, a keyboard, a mouse, or the like, and inputs various instructions to the CPU 1201 in response to an operation by the user. The communication unit 1207 communicates with an external device such as a camera 112 or a front-end server 230. Bus 1208 connects each part of the camera adapter 120 to transmit information.

上述の実施形態は、画像処理システム１００が競技場やコンサートホールなどの施設に設置される場合の例を中心に説明した。施設の他の例としては、例えば、遊園地、公園、競馬場、競輪場、カジノ、プール、スケートリンク、スキー場、ライブハウスなどがある。また、各種施設で行われるイベントは、屋内で行われるものであっても屋外で行われるものであっても良い。また、本実施形態における施設は、一時的に（期間限定で）建設される施設も含む。 The above-described embodiment has mainly described an example in which the image processing system 100 is installed in a facility such as a stadium or a concert hall. Other examples of facilities include, for example, amusement parks, parks, racetracks, keirin racetracks, casinos, pools, skating rinks, ski resorts, live houses, and the like. In addition, the events held at various facilities may be held indoors or outdoors. In addition, the facilities in this embodiment include facilities that are temporarily constructed (for a limited time).

＜実施の形態２＞
実施の形態１では、カメラ選択部は、オブジェクトが撮像領域の所定範囲に入ったときに、所定の手続きに基づきハイフレームレートにするカメラを選択していた。本実施の形態では、オブジェクトと仮想視点の位置関係に基づき、ハイフレームレートのカメラを選択する点が実施の形態１と異なる。 <Embodiment 2>
In the first embodiment, the camera selection unit selects a camera having a high frame rate based on a predetermined procedure when the object enters a predetermined range of the imaging region. The present embodiment differs from the first embodiment in that a camera having a high frame rate is selected based on the positional relationship between the object and the virtual viewpoint.

図１１は、実施の形態２に係るバックエンドサーバ２７０の機能構成を説明するためのブロック図である。実施の形態１に係るバックエンドサーバ２７０を説明した図３と同じ符号については、上述した通りであるので詳細な説明は省略する。本実施の形態に係るカメラ選択部３０１６は、コントローラ３００から仮想カメラパラメータを取得する点が、実施の形態１に係るカメラ選択部３０１５と異なる。カメラ選択部３０１６は、この情報を用いて、カメラのフレームレートを設定するための判断およびカメラ識別子の選択、フレームレート変更メッセージの生成および送信を行う。 FIG. 11 is a block diagram for explaining the functional configuration of the back-end server 270 according to the second embodiment. Since the same reference numerals as those in FIG. 3 explaining the back-end server 270 according to the first embodiment are as described above, detailed description thereof will be omitted. The camera selection unit 3016 according to the present embodiment is different from the camera selection unit 3015 according to the first embodiment in that the virtual camera parameters are acquired from the controller 300. The camera selection unit 3016 uses this information to make a determination for setting the frame rate of the camera, select a camera identifier, and generate and transmit a frame rate change message.

図１２は、カメラ選択部３０１６の動作について説明するための図である。図１２（ａ）は、注目オブジェクト９３０１を撮影するカメラＡ～Ｇ及び仮想視点９３０２を示し、図１２（ｂ）は、カメラＡ～Ｇの撮影タイミングの遷移を示している。 FIG. 12 is a diagram for explaining the operation of the camera selection unit 3016. 12 (a) shows cameras A to G and virtual viewpoints 9302 that shoot the object of interest 9301, and FIG. 12 (b) shows the transition of the shooting timing of the cameras A to G.

本実施の形態におけるカメラ選択部３０１６は、仮想視点の視線方向と撮影方向が近いカメラをハイフレームレートに設定するカメラとして選択する。例えば、撮影対象がサッカーゲームとする。注目オブジェクトであるボールと連動して仮想視点が移動しているときに、仮想視点の視線方向と撮影方向が近いカメラ、すなわち、注目オブジェクトと仮想視点とを通る直線上に位置するカメラをハイフレームレートのカメラとして選択する。 The camera selection unit 3016 in the present embodiment selects a camera whose line-of-sight direction and shooting direction of the virtual viewpoint are close to each other as a camera for setting a high frame rate. For example, the shooting target is a soccer game. When the virtual viewpoint is moving in conjunction with the ball, which is the object of interest, a camera whose line-of-sight direction and shooting direction of the virtual viewpoint are close to each other, that is, a camera located on a straight line passing through the object of interest and the virtual viewpoint, is a high frame. Select as a rate camera.

図１２（ａ）を参照して、具体的に、注目オブジェクト９３０１が、地点９３０１ａ→地点９３０１ｂ→地点９３０１ｃと移動している場合について説明する。注目オブジェクト９３０１の移動に連動して、仮想視点９３０２は、地点９３０２ａ→地点９３０２ｂ→地点９３０２ｃと移動する。ここで、世界座標系上で注目オブジェクト９３０１と仮想視点９３０２とを通る直線に一番近いカメラＢを選択する。なお、より画質の高い仮想視点画像を生成するために、その隣り合うカメラＡ及びカメラＣも選択するようにしてもよい。カメラ選択部３０１６は、注目オブジェクト９３０１及び仮想視点９３０２の移動に連動して、選択範囲９３０３ａ→選択範囲９３０３ｂＥ）→選択範囲９３０３ｃのカメラを選択する。 A case where the object of interest 9301 is moving in the order of the point 9301a → the point 9301b → the point 9301c will be specifically described with reference to FIG. 12A. In conjunction with the movement of the object of interest 9301, the virtual viewpoint 9302 moves in the order of point 9302a → point 9302b → point 9302c. Here, the camera B closest to the straight line passing through the object of interest 9301 and the virtual viewpoint 9302 on the world coordinate system is selected. In addition, in order to generate a virtual viewpoint image with higher image quality, the adjacent cameras A and C may also be selected. The camera selection unit 3016 selects a camera in the selection range 9303a → the selection range 9303bE) → the selection range 9303c in conjunction with the movement of the object of interest 9301 and the virtual viewpoint 9302.

図１２（ｂ）を参照して、図１２（ａ）で説明したカメラＡ～Ｇの撮影タイミングの遷移について説明する。図１２（ｂ）では、カメラごとの撮影タイミングがプロットされている。図中の黒丸は、その時刻において撮影するカメラ、白丸は、その時刻において撮影しないカメラである。また、ノーマルフレームレートの時間軸情報をtNn、ハイフレームレートの時間軸情報をtHnとする。時刻tN0，tN1，tN2，…のタイミングで撮影された撮影画像に基づいて、注目オブジェクトの仮想視点画像がライブ画像として生成される。時刻tH0，tH1，tH2,…のタイミングで撮影された撮影画像に基づいて、オブジェクトの仮想視点画像がリプレイ画像として生成される。 The transition of the shooting timing of the cameras A to G described with reference to FIG. 12A will be described with reference to FIG. 12B. In FIG. 12B, the shooting timing for each camera is plotted. The black circles in the figure are cameras that shoot at that time, and the white circles are cameras that do not shoot at that time. The time axis information of the normal frame rate is tNn, and the time axis information of the high frame rate is tHn. A virtual viewpoint image of the object of interest is generated as a live image based on the captured image taken at the timings tN0, tN1, tN2, .... A virtual viewpoint image of the object is generated as a replay image based on the captured image taken at the timing of time tH0, tH1, tH2, ....

カメラの選択のタイミングは、ノーマルフレームレートのタイミングである。例えば、時刻tN0で取得した仮想視点と注目オブジェクトの位置から、時刻tH0，tH1，tH2,tH3において、選択範囲９３０３ａのカメラがハイフレームレートのカメラとして選択される。範囲９３０４a、９３０４ｂ、９３０４ｃは、それぞれ時刻tN0，tN1，tN2で選択されたハイフレームカメラに対応している。ただし、この例では、選択するカメラの決定からフレームレートの設定までの遅延時間は考慮していない。 The timing of camera selection is the timing of the normal frame rate. For example, from the virtual viewpoint acquired at time tN0 and the position of the object of interest, the camera with the selection range 9303a is selected as the high frame rate camera at time tH0, tH1, tH2, tH3. The ranges 9304a, 9304b, and 9304c correspond to the high frame cameras selected at the times tN0, tN1, and tN2, respectively. However, in this example, the delay time from the determination of the camera to be selected to the setting of the frame rate is not taken into consideration.

また、カメラ選択部３０１６は、上述したように、注目オブジェクトに対する仮想視点の視線方向に基づいて選択されるカメラ以外のカメラを、例えば、一台おきにハイフレームレートのカメラとして選択する。具体的には、カメラＡ，Ｃ，Ｅ，Ｇである。これは、仮想視点画像を生成するためであって、前述したＭＢＲにより、複数の方向から撮影した複数の撮影画像に基づいて三次元モデルを生成する。ある時刻において、存在しないハイフレームレートの画像データは、隣り合うカメラの画像データを用いて補間して生成すればよい。補間画像の生成処理は、前述したように、バックエンドサーバ２７０の仮想視点前景画像生成部３００５において実行する。 Further, as described above, the camera selection unit 3016 selects cameras other than the cameras selected based on the line-of-sight direction of the virtual viewpoint with respect to the object of interest, for example, every other camera as a high frame rate camera. Specifically, the cameras A, C, E, and G. This is to generate a virtual viewpoint image, and the above-mentioned MBR generates a three-dimensional model based on a plurality of captured images captured from a plurality of directions. Image data with a high frame rate that does not exist at a certain time may be generated by interpolation using image data of adjacent cameras. As described above, the interpolated image generation process is executed by the virtual viewpoint foreground image generation unit 3005 of the back-end server 270.

図１３は、選択されるカメラを説明するための図である。図１３において、競技場（スタジアム）のフィールド９２０５周辺に設置されたカメラ９２００ａ，…，９２００ｚ，９２００Ａ，…，９２００Ｄのうち、黒塗りのカメラはハイフレームレートのカメラ、白抜きのカメラはノーマルフレームレートのカメラである。 FIG. 13 is a diagram for explaining the selected camera. In FIG. 13, among the cameras 9200a, ..., 9200z, 9200A, ..., 9200D installed around the field 9205 of the stadium, the black-painted camera is the high frame rate camera and the white camera is the normal frame. It's a rate camera.

カメラ選択部３０１６は、仮想視点と同じ視線、および逆方向からの仮想視点コンテンツを閲覧できるようにするため、注目オブジェクト９４０２と仮想視点９４０１を通る直線９４０３に近いカメラを選択する。具体的には、カメラ９２００ｕ及びカメラ９２００ｊである。 The camera selection unit 3016 selects a camera close to the straight line 9403 passing through the object of interest 9402 and the virtual viewpoint 9401 so that the virtual viewpoint content can be viewed from the same line of sight as the virtual viewpoint and from the opposite direction. Specifically, it is a camera 9200u and a camera 9200j.

加えて、注目オブジェクト９４０２の座標位置を基準として、直線９４０３を９０度回転させた直線９４０４に近いカメラを選択する。具体的には、カメラ９２００ｃ及び９２００ｐである。 In addition, a camera close to the straight line 9404 obtained by rotating the straight line 9403 by 90 degrees with respect to the coordinate position of the object of interest 9402 is selected. Specifically, the cameras 9200c and 9200p.

さらに、カメラ選択部３０１６は、それ以外のカメラを、例えば、１台おきに選択する。具体的には、カメラ９２００ｂ，９２００ｄ，９２００ｆ，９２００ｈ，９２００ｌ，９２００ｎ，９２００ｒ，９２００ｔ，９２００ｖ，９２００ｘ，９２００ｚ，９２００Ｂ，９２００Ｄである。 Further, the camera selection unit 3016 selects other cameras, for example, every other camera. Specifically, the cameras are 9200b, 9200d, 9200f, 9200h, 9200l, 9200n, 9200r, 9200t, 9200v, 9200x, 9200z, 9200B, 9200D.

カメラ選択部３０１６は、最終的に選択されたカメラについて、先述した実施形態の構成のようにフレームレートをハイフレームレートとするようフレームレート変更メッセージを生成する。生成されたフレームレート変更メッセージは、ネットワーク２７０ａを経由して各センサシステム１１０に入力される。各センサシステム１１０のカメラアダプタ１２０はこれを受信し、フレームレート変更部６１２６（図２参照）に入力し、メッセージの内容に応じて、フレームレート変更部６１２６がフレームレートを変更する。仮想視点画像の生成も実施の形態１と同様の構成と動作で行われる。 The camera selection unit 3016 generates a frame rate change message for the finally selected camera so that the frame rate is set to a high frame rate as in the configuration of the above-described embodiment. The generated frame rate change message is input to each sensor system 110 via the network 270a. The camera adapter 120 of each sensor system 110 receives this and inputs it to the frame rate changing unit 6126 (see FIG. 2), and the frame rate changing unit 6126 changes the frame rate according to the content of the message. The generation of the virtual viewpoint image is also performed with the same configuration and operation as in the first embodiment.

このように構成することで、複数のカメラを備えた画像処理システムにおいて、所定の方向を撮影するカメラを選択的にハイフレームレートにする。これにより、ノーマルフレームレートの伝送帯域を変更せず、画素の欠陥、欠落の無い仮想視点コンテンツを生成することが可能となる。しかも、リプレイ時にはライブ時と同じ視点での閲覧、さらには、ライブ時とは異なる反対方向からの仮想視点コンテンツを閲覧することが可能となる。
＜他の実施の形態＞
先述したカメラ選択部は、バックエンドサーバ２７０内に構成したが、当該構成に限定されるものではない。例えば、フロントエンドサーバ２３０に構成してもよく、実施の形態１と同様に、オブジェクト情報と対象となる背景画像データの情報を一時的に保存および判断できる構成であればよい。また、同様の構成をカメラアダプタ１２０で実現してもよく、カメラ台数に応じてコストが変動するものの、カメラ画像データの流れにおいて上流で処理できるので、より高速にカメラ選択の判断が可能となる。 With this configuration, in an image processing system including a plurality of cameras, the cameras that shoot in a predetermined direction are selectively set to a high frame rate. This makes it possible to generate virtual viewpoint content without pixel defects or omissions without changing the transmission band of the normal frame rate. Moreover, at the time of replay, it is possible to browse from the same viewpoint as at the time of live, and further, to browse the virtual viewpoint content from the opposite direction different from that at the time of live.
<Other embodiments>
The camera selection unit described above is configured in the back-end server 270, but is not limited to the configuration. For example, it may be configured in the front-end server 230, and may be configured as long as it can temporarily store and determine the object information and the information of the target background image data as in the first embodiment. Further, the same configuration may be realized by the camera adapter 120, and although the cost varies depending on the number of cameras, it can be processed upstream in the flow of camera image data, so that the camera selection can be determined at a higher speed. ..

さらには、カメラ選択の判断要素として、実施の形態１では、注目オブジェクトと背景画像データに基づいて実行したが、当該構成に限定されるものではない。例えば、仮想視点と複数のオブジェクトとの位置関係で、カメラを選択してもよい。この場合、複数のオブジェクトに対して仮想カメラが移動するようなリプレイ画像用の仮想カメラパスを生成して、その仮想カメラパスに基づいて、ハイフレームレートにするカメラを選択することになる。また、さらには、複数のオブジェクトの位置関係に基づいて、カメラを選択し、当該情報に基づいて、仮想カメラパラメータを生成して、仮想画像を生成してもよい。 Further, as a determination factor for camera selection, in the first embodiment, the execution is performed based on the object of interest and the background image data, but the present invention is not limited to this configuration. For example, the camera may be selected based on the positional relationship between the virtual viewpoint and a plurality of objects. In this case, a virtual camera path for a replay image in which the virtual camera moves with respect to a plurality of objects is generated, and a camera having a high frame rate is selected based on the virtual camera path. Further, a camera may be selected based on the positional relationship of a plurality of objects, and a virtual camera parameter may be generated based on the information to generate a virtual image.

また、上述の実施の形態では、ハイフレームレートの時刻において、存在しない画像データの補間画像生成処理は、バックエンドサーバ２７０の仮想視点前景画像生成部３００５において実行していた。しかしながら、当該構成に限定されるものではなく、フロントエンドサーバ２３０の画像処理部２１５０で実行し、データベース２５０に出力してもよい。 Further, in the above-described embodiment, the interpolated image generation process of the non-existent image data is executed by the virtual viewpoint foreground image generation unit 3005 of the back-end server 270 at the time of the high frame rate. However, the configuration is not limited to this, and the image processing unit 2150 of the front-end server 230 may execute the configuration and output the data to the database 250.

上記のように構成することによって、先述の実施の形態と同様に、注目オブジェクトを撮影するカメラを選択的にハイフレームレートにするので、ノーマルフレームレートの伝送帯域を変更せず、画質の高い仮想視点コンテンツを生成することが可能となる。 By configuring as described above, as in the above-described embodiment, the camera that shoots the object of interest is selectively set to a high frame rate, so that the transmission band of the normal frame rate is not changed and the virtual image quality is high. It becomes possible to generate viewpoint contents.

また、上述の実施の形態において、カメラ選択部３０１５は、ハイフレームレートにするカメラを選択していた。しかしながら、カメラ選択部３０１５は、注目するオブジェクトの位置に応じて、フレームレートを下げるカメラを選択してもよい。この場合、例えば、カメラ選択部３０１５は、球技のライン越え判定が可能な視野を有さないカメラを、フレームレートを下げるカメラとして選択する。また、例えば、図４において、カメラ選択部３０１５は、フィールドライン９１０２の延直線上に一番遠いカメラ９１００ａやその周辺のカメラを、フレームレートを下げるカメラとして選択してもよい。また、この場合、カメラ選択部３０１５は、図８のステップＳ１０８において、フレームレートを下げるカメラに、フレームレートを下げることを指示するフレームレート変更メッセージを送信してもよい。 Further, in the above-described embodiment, the camera selection unit 3015 selects a camera having a high frame rate. However, the camera selection unit 3015 may select a camera that lowers the frame rate according to the position of the object of interest. In this case, for example, the camera selection unit 3015 selects a camera having no field of view capable of determining the crossing of the ball game as a camera for lowering the frame rate. Further, for example, in FIG. 4, the camera selection unit 3015 may select the camera 9100a and the surrounding cameras farthest on the extended straight line of the field line 9102 as cameras for lowering the frame rate. Further, in this case, the camera selection unit 3015 may send a frame rate change message instructing the camera that lowers the frame rate to lower the frame rate in step S108 of FIG.

また、上述の実施の形態において、注目オブジェクトは、ボールとして説明を行ったがこれに限らない。例えば、注目オブジェクトは、選手や審判などの人物であってもよい。また、注目オブジェクトを相手チームのペナルティエリアに存在するオフェンス側の選手などの特定の領域に存在する人物としてもよい。この場合、ゴールやシュートなど注目すべきイベントが生じるシーンをハイフレームレートで撮影することができる。また、注目オブジェクトを、複数のオブジェクトとしてもよい。また、注目オブジェクトは、特定条件を満たすオブジェクトであってもよい。例えば、ボールをキープしている選手を注目オブジェクトとしてもよい。
（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Further, in the above-described embodiment, the object of interest has been described as a ball, but the present invention is not limited to this. For example, the object of interest may be a person such as a player or a referee. Further, the object of interest may be a person who exists in a specific area such as an offensive player who exists in the penalty area of the opponent team. In this case, it is possible to shoot a scene where a remarkable event such as a goal or a shot occurs at a high frame rate. Further, the object of interest may be a plurality of objects. Further, the object of interest may be an object that satisfies a specific condition. For example, the player who keeps the ball may be the object of interest.
(Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１０センサシステム、１１１マイク、１１２カメラ、１１３雲台、１２０カメラアダプタ、１８０スイッチングハブ、１９０エンドユーザ端末、２３０フロントエンドサーバ、２５０データベース、２７０バックエンドサーバ、２９０タイムサーバ、３１０制御ステーション、３３０仮想カメラ操作ＵＩ 110 sensor system, 111 microphone, 112 camera, 113 cloud stand, 120 camera adapter, 180 switching hub, 190 end user terminal, 230 front end server, 250 database, 270 back end server, 290 time server, 310 control station, 330 virtual Camera operation UI

Claims

Specific means to identify the position of the object of interest taken by multiple shooting devices,
When it is specified by the specific means that the position of the object of interest is included in the specific area, the image transmitted by one or more image pickup devices having a predetermined positional relationship with the specific area among the plurality of image pickup devices. The plurality of imaging devices are controlled so that the frame rate of the data is higher than the frame rate of the image data transmitted by another imaging device different from the one or more imaging devices among the plurality of imaging devices. Control means and
An image processing device characterized by having.

The image processing apparatus according to claim 1, wherein the specific area is an area including a specific object different from the object of interest.

When the specific means specifies that the object of interest is included in the specific area, the control means sets the frame rate of the image data transmitted by the one or more imaging devices to the other imaging. The image processing apparatus according to claim 1 or 2 , wherein the frame rate is changed so as to be higher than the frame rate of the image data transmitted by the apparatus.

The control means transmits information for designating a frame rate of image data transmitted by the one or more photographing devices and the other photographing devices.
The one or more imaging devices and the other imaging devices shoot at a predetermined frame rate, and the image data acquired based on the shooting is transmitted at the frame rate indicated by the information transmitted by the control means. The image processing apparatus according to any one of claims 1 to 3.

The control means transmits information for designating a frame rate of image data transmitted by the one or more photographing devices and the other photographing devices.
The image processing according to any one of claims 1 to 4 , wherein the one or more imaging devices and the other imaging devices perform imaging at a frame rate indicated by information transmitted by the control means. Device.

The image processing apparatus according to any one of claims 1 to 5, wherein the specific area is a region including a field line in a field photographed by the plurality of photographing devices.

The image processing apparatus according to claim 6, wherein the one or more imaging devices are imaging devices installed on an extension of the field line.

The specific area according to any one of claims 1 to 7, wherein the specific area includes at least one of a center area, a penalty area, and a goal area in a field photographed by the plurality of photographing devices. Image processing device.

The image processing device according to any one of claims 1 to 8 , wherein the object of interest is a moving object photographed by the plurality of photographing devices.

The image processing device according to claim 9 , wherein the moving object is a ball or a person photographed by the plurality of photographing devices.

The image processing apparatus according to any one of claims 1 to 10 , further comprising a generation means for generating a virtual viewpoint image using a plurality of image data transmitted from the plurality of photographing devices.

The image processing apparatus according to claim 11 , wherein the generation means generates a virtual viewpoint image having a frame rate corresponding to the frame rate of the image data transmitted by the one or more photographing devices.

The generation means corresponds to the other imaging device at a time when the one or more imaging devices correspond to the frame rate at which the image data is transmitted and the other imaging device does not correspond to the frame rate at which the image data is transmitted. The twelfth aspect of claim 12 , wherein the image data is interpolated based on the image data transmitted from the one or more photographing devices, and a virtual viewpoint image is generated based on a plurality of image data including the interpolated image data. The image processing device described.

It is a control method performed by an image processing device.
A specific process to identify the position of the object of interest taken by multiple shooting devices,
When it is specified that the position of the object of interest is included in the specific area in the specific step, the image transmitted by one or more photographing devices having a predetermined positional relationship with the specific area among the plurality of photographing devices. The plurality of imaging devices are controlled so that the frame rate of the data is higher than the frame rate of the image data transmitted by another imaging device different from the one or more imaging devices among the plurality of imaging devices. Control process and
A control method characterized by having.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 13.