JP2018191236A

JP2018191236A - Information processing system, information processing method, apparatus, and program

Info

Publication number: JP2018191236A
Application number: JP2017094561A
Authority: JP
Inventors: 友里吉村; Yuri Yoshimura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-05-11
Filing date: 2017-05-11
Publication date: 2018-11-29

Abstract

PROBLEM TO BE SOLVED: To control transmission of an image of a subject according to a position of the subject.SOLUTION: Camera adapters 120a to 120z control transmission of data including images of a region of a subject included in a photographed image photographed by cameras 112a to 112z on the basis of a position of the subject in the real space.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理システム、情報処理方法、装置およびプログラムに関するものである。 The present invention relates to an information processing system, an information processing method, an apparatus, and a program.

相互に異なる位置に設置された複数のカメラが同期して撮影を行い、当該撮影により得られる複数の視点で撮影された画像を用いて仮想視点コンテンツを生成する技術がある。かかる技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることが出来る。 There is a technique in which a plurality of cameras installed at different positions perform shooting synchronously, and virtual viewpoint content is generated using images taken from a plurality of viewpoints obtained by the shooting. According to such a technique, for example, since a highlight scene of soccer or basketball can be viewed from various angles, it is possible to give the user a higher sense of realism than a normal image.

ところで、かかる技術では、撮影された画像から被写体の領域を切り出して伝送することが行われることがある。カメラで撮影された画像から必要な領域を切り出して伝送する技術として、特許文献１に記載の技術がある。特許文献１には、次の技術が開示されている。即ち、駅のホームに停車中の列車の乗降扉付近を含む領域を、列車の外側から撮影することが出来るように複数の監視カメラを設置する。当該複数の監視カメラで撮影された画像から、乗降扉付近の画像だけを切り出して伝送、集約する。 By the way, in such a technique, a subject region may be cut out from a captured image and transmitted. As a technique for extracting and transmitting a necessary area from an image photographed by a camera, there is a technique described in Patent Document 1. Patent Document 1 discloses the following technique. That is, a plurality of surveillance cameras are installed so that an area including the vicinity of the entrance / exit of a train that is stopped at the station platform can be photographed from the outside of the train. Only images near the passenger door are cut out from the images taken by the plurality of monitoring cameras, transmitted, and collected.

特開２０１４−１９２８４４号公報JP 2014-192844 A

しかしながら、特許文献１に記載の技術は、比較的位置が変動しない乗降扉付近の画像を伝送するものである。従って、特許文献１に記載の技術は、被写体の存在位置が変動し得る場合の被写体の画像の伝送については考慮されていない。
本発明は、このような問題点に鑑みてなされたものであり、被写体の画像の伝送を、被写体の位置に応じて制御することを目的とする。 However, the technique described in Patent Document 1 transmits an image in the vicinity of a passenger door whose position does not change relatively. Therefore, the technique disclosed in Patent Document 1 does not consider transmission of an image of a subject when the position where the subject exists may change.
The present invention has been made in view of such problems, and an object thereof is to control transmission of an image of a subject in accordance with the position of the subject.

本発明の情報処理システムは、撮影画像に含まれる被写体の領域の画像を取得する取得手段と、前記被写体の位置に基づいて、前記撮影画像に含まれる当該被写体の領域の画像を含むデータの伝送を制御する制御手段と、を有することを特徴とする。 An information processing system according to the present invention includes an acquisition unit that acquires an image of a subject area included in a captured image, and transmission of data including the image of the subject area included in the captured image based on the position of the subject. And a control means for controlling.

本発明によれば、被写体の画像の伝送を、被写体の位置に応じて制御することが出来る。 According to the present invention, transmission of an image of a subject can be controlled according to the position of the subject.

画像処理システムの構成を示す図である。It is a figure which shows the structure of an image processing system. カメラアダプタの構成を示す図である。It is a figure which shows the structure of a camera adapter. 画像処理部の構成を示す図である。It is a figure which shows the structure of an image process part. 全共通被写範囲を示す平面図である。It is a top view which shows all the common exposure ranges. 全共通被写範囲を示す斜視図である。It is a perspective view which shows all the common exposure ranges. 図４から上流側の３台のカメラに対応する部分を取り出して示す図である。It is a figure which takes out and shows the part corresponding to three cameras of the upstream from FIG. 前景画像の一例を示す図である。It is a figure which shows an example of a foreground image. 伝送調整処理の第１の例を示すフローチャートである。It is a flowchart which shows the 1st example of a transmission adjustment process. 下接カメラへ伝送されるオブジェクトの第１の例を示す図である。It is a figure which shows the 1st example of the object transmitted to a subordinate camera. 伝送調整処理の第１の例を示すフローチャートである。It is a flowchart which shows the 1st example of a transmission adjustment process. 下接カメラへ伝送されるオブジェクトの第２の例を示す図である。It is a figure which shows the 2nd example of the object transmitted to a subordinate camera.

以下に図面を参照して、実施形態を詳細に説明する。
複数の視点で撮影された画像に基づく仮想視点コンテンツの生成および閲覧を行う際には、まず、複数のカメラで撮影された画像をサーバ等の画像処理部に集約する。次に、画像処理部は、複数のカメラで撮影された画像を用いて、三次元モデルの生成と、レンダリング等の処理とを行うことにより仮想視点コンテンツを生成し、ユーザ端末に伝送する。被写体の中でも選手等、動きがある被写体の三次元モデルを、撮影画像から逐次生成する代表的な方法として、ステレオマッチングが挙げられる。 Hereinafter, embodiments will be described in detail with reference to the drawings.
When generating and viewing virtual viewpoint content based on images taken from a plurality of viewpoints, first, images taken by a plurality of cameras are collected in an image processing unit such as a server. Next, the image processing unit generates a virtual viewpoint content by performing processing such as generation of a three-dimensional model and rendering using images taken by a plurality of cameras, and transmits the virtual viewpoint content to the user terminal. Stereo matching is a typical method for sequentially generating a three-dimensional model of a moving subject, such as a player, from a captured image.

ステレオマッチングを行うためには被写体を構成する各面の要素が、少なくとも２台のカメラで撮影される必要がある。そのため、被写体の面の欠落のない三次元モデルを生成するためには、その被写体が、全てのカメラそれぞれの撮影可能な撮影範囲である被写範囲（被写界）の共通範囲（以下、必要に応じて全共通被写範囲と称する）に存在しなければならない。逆に言えば、全共通被写範囲の外側に位置する被写体の画像は、完全な三次元モデルの生成には寄与しないデータである。 In order to perform stereo matching, the elements of each surface constituting the subject need to be photographed by at least two cameras. For this reason, in order to generate a 3D model with no omission of the surface of the subject, the subject is a common range of the subject range (field of view) that is the photographing range that can be taken by all cameras (hereinafter referred to as necessary). (Referred to as the “common common coverage area”). In other words, the image of the subject located outside the entire common subject range is data that does not contribute to the generation of a complete three-dimensional model.

ただし、仮想視点コンテンツの内容によっては被写体の半面のみの三次元モデルがあれば良い等、完全な三次元モデルが必要でない場合も考えられる。しかしながら、そのような場合においてもやはり、特に１台のカメラからのみしか撮影されていない被写体の画像は三次元モデルの生成に使用することが出来ない。それら三次元モデルの生成に寄与しない画像データを画像処理部に伝送することは、カメラと画像処理部との間の伝送負荷を不要に圧迫する。 However, depending on the content of the virtual viewpoint content, there may be a case where a complete three-dimensional model is not necessary, for example, a three-dimensional model for only one half of the subject is required. However, even in such a case, an image of a subject that has been captured only from one camera cannot be used to generate a three-dimensional model. Transmitting image data that does not contribute to generation of the three-dimensional model to the image processing unit unnecessarily imposes a transmission load between the camera and the image processing unit.

もし、三次元モデルの生成に寄与しない画像を含んだまま伝送負荷を下げようとすれば、三次元モデルの生成に必要な画像を含めて比較的高い圧縮等を行うことになり、全体的な画質の劣化を招く。
そこで、本実施形態では、複数の視点で撮影された撮影画像に含まれる被写体の領域のそれぞれについて、三次元モデルの生成に必要か否かを判定し、その判定の結果に合わせて伝送や画質を制御する場合を例に挙げて説明する。このようにすることによって、伝送負荷を低減しつつ、三次元モデルの生成に必要な画像の画質を高く保持することが出来る。 If an attempt is made to reduce the transmission load while including images that do not contribute to the generation of the 3D model, relatively high compression will be performed including the images necessary for the generation of the 3D model. Degraded image quality.
Therefore, in the present embodiment, it is determined whether or not it is necessary to generate a three-dimensional model for each of the regions of the subject included in the captured images captured from a plurality of viewpoints, and transmission and image quality are determined according to the determination result. The case of controlling is described as an example. By doing so, it is possible to maintain high image quality necessary for generating the three-dimensional model while reducing the transmission load.

＜第１の実施形態＞
まず、第１の実施形態を説明する。
（画像処理システム）
図１は、画像処理システム１００の構成の一例を示す図である。本実施形態では、競技場（スタジアム）やコンサートホール等の施設に、複数のカメラおよび複数のマイクを設置する場合を例に挙げて説明する。画像処理システム１００は、複数のカメラおよび複数のマイクを用いて、撮影および集音を行う。画像処理システム１００は、センサシステム１１０ａ〜１１０ｚ、画像コンピューティングサーバ２００、コントローラ３００、スイッチングハブ１８０、およびエンドユーザ端末１９０を有する。本実施形態では、例えば、画像処理システム１００により情報処理システムの一例が実現される。 <First Embodiment>
First, the first embodiment will be described.
(Image processing system)
FIG. 1 is a diagram illustrating an example of the configuration of the image processing system 100. In this embodiment, a case where a plurality of cameras and a plurality of microphones are installed in a facility such as a stadium (stadium) or a concert hall will be described as an example. The image processing system 100 performs shooting and sound collection using a plurality of cameras and a plurality of microphones. The image processing system 100 includes sensor systems 110a to 110z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190. In the present embodiment, for example, an example of an information processing system is realized by the image processing system 100.

コントローラ３００は、制御ステーション３１０と仮想カメラ操作ＵＩ３３０とを有する。制御ステーション３１０は、画像処理システム１００を構成するそれぞれのブロックに対して、ネットワーク３１０ａ〜３１０ｃ、１８０ａ〜１８０ｂ、１７０ａ〜１７０ｙを通じて、動作状態の管理およびパラメータの設定等を行う。ここで、ネットワークは、Ｅｔｈｅｒｎｅｔ（登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサーネット）や１０ＧｂＥでも良いし、インターコネクト（Ｉｎｆｉｎｉｂａｎｄ等）や産業用イーサーネット等を組み合せて構成されても良い。また、ネットワークは、これらに限定されず、他の種別のネットワークであっても良い。 The controller 300 includes a control station 310 and a virtual camera operation UI 330. The control station 310 performs operation state management, parameter setting, and the like through the networks 310 a to 310 c, 180 a to 180 b, and 170 a to 170 y for each block constituting the image processing system 100. Here, the network may be GbE (Gigabit Ethernet) or 10 GbE which is an Ethernet (registered trademark) standard, or may be configured by combining an interconnect (Infiniband, etc.), an industrial Ethernet, or the like. Further, the network is not limited to these, and may be another type of network.

最初に、センサシステム１１０ａ〜１１０ｚの２６セットで得られる画像および音声を、センサシステム１１０ｚから画像コンピューティングサーバ２００へ送信する動作の一例を説明する。本実施形態では、センサシステム１１０ａ〜１１０ｚがデイジーチェーンで相互に接続される場合を例に挙げて説明する。
本実施形態において、特別な説明がない場合には、センサシステム１１０ａ〜１１０ｚの２６セットを区別せず、センサシステム１１０と記載する。各センサシステム１１０内の装置やネットワークについても同様に、特別な説明がない場合には、これらを区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、カメラアダプタ１２０、およびネットワーク１７０と記載する。 First, an example of an operation for transmitting images and sounds obtained from 26 sets of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In the present embodiment, a case where the sensor systems 110a to 110z are connected to each other in a daisy chain will be described as an example.
In the present embodiment, when there is no special description, the 26 sets of the sensor systems 110a to 110z are not distinguished and are described as the sensor system 110. Similarly, the devices and networks in each sensor system 110 are not distinguished from each other unless there is a special description, and the microphone 111, the camera 112, the pan head 113, the external sensor 114, the camera adapter 120, and the network 170 are not distinguished. It describes.

尚、本実施形態では、センサシステム１１０の数を２６セットとするが、これは一例であり、センサシステム１１０の数は２６セットに限定されない。また、本実施形態では、特に断りがない限り、画像は、動画および静止画の概念を含むものとして説明する。即ち、本実施形態の画像処理システム１００は、静止画および動画の何れについても処理することが可能である。また、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツに、仮想視点画像と仮想視点音声とが含まれる例を中心に説明する。しかしながら、仮想視点コンテンツは、これに限定されない。例えば、仮想視点コンテンツに音声が含まれていなくても良い。また、例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近い位置にあるマイクにより集音された音声であっても良い。また、本実施形態では、説明の簡略化のため、部分的に音声についての詳細な説明を省略するが、基本的に音声は画像と共に処理されるものとする。 In the present embodiment, the number of sensor systems 110 is 26 sets, but this is an example, and the number of sensor systems 110 is not limited to 26 sets. In the present embodiment, unless otherwise specified, an image is described as including a concept of a moving image and a still image. That is, the image processing system 100 according to the present embodiment can process both still images and moving images. In the present embodiment, an example in which the virtual viewpoint content provided by the image processing system 100 includes a virtual viewpoint image and a virtual viewpoint sound will be mainly described. However, the virtual viewpoint content is not limited to this. For example, audio may not be included in the virtual viewpoint content. Further, for example, the sound included in the virtual viewpoint content may be a sound collected by a microphone located closest to the virtual viewpoint. Further, in the present embodiment, for the sake of simplification of explanation, detailed explanation of the sound is partially omitted, but the sound is basically processed together with the image.

センサシステム１１０ａ〜１１０ｚは、それぞれ１台ずつのカメラ１１２ａ〜１１２ｚを有する。即ち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。複数のセンサシステム１１０同士は、デイジーチェーンで相互に接続される。この接続形態により、複数のセンサシステム１１０の接続に使用するケーブル数の削減や配線作業の省力化が可能になる。特に、この接続形態は、撮影画像の４Ｋや８Ｋ等への高解像度化や、高フレームレート化に伴う画像データの大容量化が図られる場合に有効である。 Each of the sensor systems 110a to 110z includes one camera 112a to 112z. That is, the image processing system 100 includes a plurality of cameras for photographing the subject from a plurality of directions. The plurality of sensor systems 110 are connected to each other in a daisy chain. With this connection form, it is possible to reduce the number of cables used for connecting the plurality of sensor systems 110 and save labor in wiring work. In particular, this connection mode is effective when the resolution of captured images is increased to 4K, 8K, or the like, and the capacity of image data is increased due to an increase in frame rate.

尚、複数のセンサシステム１１０の接続形態は、デイジーチェーンに限定されない。例えば、スター型のネットワークを用いて良い。スター型のネットワークを用いる場合、全てのセンサシステム１１０ａ〜１１０ｚがスイッチングハブ１８０に接続される。この場合、スイッチングハブ１８０を経由してセンサシステム１１０ａ〜１１０ｚ間のデータ送受信が行われる。 The connection form of the plurality of sensor systems 110 is not limited to the daisy chain. For example, a star type network may be used. When a star network is used, all the sensor systems 110a to 110z are connected to the switching hub 180. In this case, data transmission / reception is performed between the sensor systems 110a to 110z via the switching hub 180.

また、図１では、デイジーチェーン接続となるようにセンサシステム１１０ａ〜１１０ｚの全てがカスケード接続される構成を示した。しかしながら、必ずしも、センサシステム１１０ａ〜１１０ｚの全てがカスケード接続される必要はない。例えば、複数のセンサシステム１１０を複数のグループに分割しても良い。このようにする場合、分割したグループ単位でデイジーチェーン接続となるようにセンサシステム１１０を接続しても良い。また、分割したグループ単位の終端となるカメラアダプタ１２０をスイッチングハブ１８０に接続し、スイッチングハブ１８０が、各カメラ１１２で撮影された画像を画像コンピューティングサーバ２００に出力しても良い。このような構成は、画像処理システム１００をスタジアムに適用する場合に特に有効である。例えば、スタジアムが複数のフロアを有し、フロア毎にセンサシステム１１０を配備する場合が考えられる。このような場合に前述した構成を採用すれば、例えば、フロア毎、或いは、スタジアムの半周毎に、カメラ１１２で撮影された画像を画像コンピューティングサーバ２００へ入力することが出来る。従って、全てのセンサシステム１１０を１つのデイジーチェーンで接続する配線が困難な場所でも、センサシステム１１０の設置の簡便化と、システムの柔軟化とを図ることが出来る。 Further, FIG. 1 shows a configuration in which all of the sensor systems 110a to 110z are cascade-connected so as to be daisy chain connected. However, not all of the sensor systems 110a to 110z need be cascaded. For example, the plurality of sensor systems 110 may be divided into a plurality of groups. In this case, the sensor system 110 may be connected so as to be daisy chain connected in divided group units. Alternatively, the camera adapter 120 that is the end of the divided group unit may be connected to the switching hub 180, and the switching hub 180 may output an image captured by each camera 112 to the image computing server 200. Such a configuration is particularly effective when the image processing system 100 is applied to a stadium. For example, a case where a stadium has a plurality of floors and the sensor system 110 is provided for each floor can be considered. If the above-described configuration is employed in such a case, for example, an image captured by the camera 112 can be input to the image computing server 200 for each floor or for each half of the stadium. Therefore, it is possible to simplify the installation of the sensor system 110 and make the system flexible even in a place where wiring for connecting all the sensor systems 110 with one daisy chain is difficult.

また、デイジーチェーンで接続されて画像コンピューティングサーバ２００へ画像を出力するカメラアダプタ１２０が１つであるか２つ以上であるかに応じて、画像コンピューティングサーバ２００での画像処理の制御が切り替えられる。即ち、センサシステム１１０が複数のグループに分割されているか否かに応じて、画像コンピューティングサーバ２００での画像処理の制御が切り替えられる。画像コンピューティングサーバ２００へ画像を出力するカメラアダプタ１２０が１つの場合、デイジーチェーンで接続されたセンサシステム１１０間で画像が伝送され、終端のカメラアダプタ１２０から画像コンピューティングサーバ２００に画像が伝送される。そして、画像コンピューティングサーバ２００で競技場の全周の画像が生成される。このため、画像コンピューティングサーバ２００において競技場の全周の画像データが揃うタイミングは同期がとられている。即ち、センサシステム１１０がグループに分割されていなければ、画像コンピューティングサーバ２００において必要な画像データが揃うタイミングの同期をとることが出来る。 Also, the control of image processing in the image computing server 200 is switched depending on whether the number of camera adapters 120 connected to each other in a daisy chain and outputting images to the image computing server 200 is one or more. It is done. That is, control of image processing in the image computing server 200 is switched depending on whether or not the sensor system 110 is divided into a plurality of groups. When there is one camera adapter 120 that outputs an image to the image computing server 200, an image is transmitted between the sensor systems 110 connected in a daisy chain, and an image is transmitted from the terminal camera adapter 120 to the image computing server 200. The Then, the image computing server 200 generates an image of the entire circumference of the stadium. For this reason, the timing at which the image data of the entire circumference of the stadium is gathered in the image computing server 200 is synchronized. That is, if the sensor system 110 is not divided into groups, the timing at which necessary image data is prepared in the image computing server 200 can be synchronized.

一方、画像コンピューティングサーバ２００へ画像を出力するカメラアダプタ１２０が複数になる（センサシステム１１０がグループに分割される）場合、其々のデイジーチェーンのレーン（経路）によって、伝送時の画像データの遅延が異なる場合がある。そのため、画像コンピューティングサーバ２００は、競技場の全周の画像データが揃うまで待って、各画像データの同期をとることによって、画像データの集結をチェックしながら後段の画像処理を行う必要がある。 On the other hand, when there are a plurality of camera adapters 120 that output images to the image computing server 200 (the sensor system 110 is divided into groups), the daisy chain lanes (routes) of the image data at the time of transmission The delay may be different. For this reason, the image computing server 200 needs to perform subsequent image processing while checking the concentration of the image data by waiting until the image data of the entire circumference of the stadium are gathered and synchronizing each image data. .

本実施形態では、センサシステム１１０ａは、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、外部センサ１１４ａ、およびカメラアダプタ１２０ａを有する。尚、センサシステム１１０ａの構成は、この構成に限定されない。センサシステム１１０ａは、少なくとも１台のカメラアダプタ１２０ａと、少なくとも１台のカメラ１１２ａとを有していれば良い。例えば、センサシステム１１０ａは、１台のカメラアダプタ１２０ａと、複数のカメラ１１２ａとで構成されても良いし、１台のカメラ１１２ａと複数のカメラアダプタ１２０ａとで構成されても良い。即ち、画像処理システム１００ａ内の少なくとも１つのカメラ１１２ａと少なくとも１つのカメラアダプタ１２０ａは、Ｎ対Ｍ（ＮとＭは共に１以上の整数）で対応する。また、センサシステム１１０ａは、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、およびカメラアダプタ１２０ａ以外の装置を含んでいても良い。また、カメラ１１２ａとカメラアダプタ１２０ａとが同一筐体で一体となって構成されていても良い。更に、カメラアダプタ１２０ａの機能の少なくとも一部をフロントエンドサーバ２３０が有していても良い。本実施形態では、センサシステム１１０ｂ〜１１０ｚの構成は、センサシステム１１０ａの構成と同じであるので、センサシステム１１０ｂ〜１１０ｚの詳細な説明を省略する。尚、センサシステム１１０ｂ〜１１０ｚは、センサシステム１１０ａと同じ構成に限定されず、其々のセンサシステム１１０が異なる構成でも良い。 In the present embodiment, the sensor system 110a includes a microphone 111a, a camera 112a, a pan head 113a, an external sensor 114a, and a camera adapter 120a. Note that the configuration of the sensor system 110a is not limited to this configuration. The sensor system 110a only needs to include at least one camera adapter 120a and at least one camera 112a. For example, the sensor system 110a may be configured with one camera adapter 120a and a plurality of cameras 112a, or may be configured with one camera 112a and a plurality of camera adapters 120a. That is, at least one camera 112a and at least one camera adapter 120a in the image processing system 100a correspond to N to M (N and M are integers of 1 or more). Further, the sensor system 110a may include devices other than the microphone 111a, the camera 112a, the pan head 113a, and the camera adapter 120a. Further, the camera 112a and the camera adapter 120a may be integrally configured in the same housing. Furthermore, the front end server 230 may have at least a part of the functions of the camera adapter 120a. In the present embodiment, the configuration of the sensor systems 110b to 110z is the same as the configuration of the sensor system 110a, and thus a detailed description of the sensor systems 110b to 110z is omitted. The sensor systems 110b to 110z are not limited to the same configuration as the sensor system 110a, and may be configured differently.

カメラ１１２ａにて撮影された画像は、カメラアダプタ１２０ａにおいて後述の画像処理が施された後、ネットワーク１７０ａを介してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。マイク１１１ａにて集音された音声も、ネットワーク１７０ａを介してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。センサシステム１１０ｂは、マイク１１１ｂにて集音された音声と、カメラ１１２ｂにて撮影された画像とを、センサシステム１１０ａから取得した画像および音声と合わせてセンサシステム１１０ｃに伝送する。 An image photographed by the camera 112a is subjected to image processing described later in the camera adapter 120a, and then transmitted to the camera adapter 120b of the sensor system 110b via the network 170a. The sound collected by the microphone 111a is also transmitted to the camera adapter 120b of the sensor system 110b via the network 170a. The sensor system 110b transmits the sound collected by the microphone 111b and the image taken by the camera 112b to the sensor system 110c together with the image and sound acquired from the sensor system 110a.

センサシステム１１０ｃ〜１１０ｚでも前述した動作を続ける。これにより、センサシステム１１０ａ〜１１０ｚで取得された画像および音声は、センサシステム１１０ｚからネットワーク１８０ｂを介してスイッチングハブ１８０に伝送され、スイッチングハブ１８０から画像コンピューティングサーバ２００へ伝送される。
尚、本実施形態では、カメラ１１２ａ〜１１２ｚとカメラアダプタ１２０ａ〜１２０ｚとが分離された構成を例に挙げて示す。しかしながら、前述したようにこれらは、同一筺体で一体化されていても良い。その場合、マイク１１１ａ〜１１１ｚは、カメラアダプタ１２０ａ〜１２０ｚと一体化されたカメラ１１２ａ〜１１２ｚに内蔵されていても良いし、カメラ１１２ａ〜１１２ｚの外部に接続されていても良い。 The operation described above is continued in the sensor systems 110c to 110z. As a result, the images and sounds acquired by the sensor systems 110a to 110z are transmitted from the sensor system 110z to the switching hub 180 via the network 180b, and then transmitted from the switching hub 180 to the image computing server 200.
In the present embodiment, a configuration in which the cameras 112a to 112z and the camera adapters 120a to 120z are separated is described as an example. However, as described above, these may be integrated in the same casing. In that case, the microphones 111a to 111z may be built in the cameras 112a to 112z integrated with the camera adapters 120a to 120z, or may be connected to the outside of the cameras 112a to 112z.

次に、画像コンピューティングサーバ２００の構成および動作の一例について説明する。本実施形態の画像コンピューティングサーバ２００は、センサシステム１１０ｚから取得したデータの処理を行う。画像コンピューティングサーバ２００は、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、およびタイムサーバ２９０を有する。 Next, an example of the configuration and operation of the image computing server 200 will be described. The image computing server 200 of this embodiment processes data acquired from the sensor system 110z. The image computing server 200 includes a front end server 230, a database 250, a back end server 270, and a time server 290.

タイムサーバ２９０は、時刻および同期信号を配信する機能を有する。タイムサーバ２９０は、スイッチングハブ１８０を介してセンサシステム１１０ａ〜１１０ｚに時刻および同期信号を配信する。時刻および同期信号を受信したカメラアダプタ１２０ａ〜１２０ｚは、時刻および同期信号に基づいてカメラ１１２ａ〜１１２ｚにより撮影された画像データをＧｅｎｌｏｃｋさせ画像データのフレーム同期を行う。即ち、タイムサーバ２９０は、複数のカメラ１１２ａ〜１１２ｚの撮影タイミングを同期させる。これにより、画像処理システム１００は、同じタイミングで撮影された複数の画像に基づいて仮想視点画像を生成することが出来るため、撮影タイミングのずれによる仮想視点画像の品質の低下を抑制することが出来る。尚、本実施形態では、タイムサーバ２９０が複数のカメラ１１２の時刻同期を管理するものとする。しかしながら、必ずしも、タイムサーバ２９０が複数のカメラ１１２の時刻同期を管理する必要はない。例えば、時刻同期のための処理を各カメラ１１２または各カメラアダプタ１２０が独立して行っても良い。 The time server 290 has a function of distributing time and synchronization signals. The time server 290 distributes the time and the synchronization signal to the sensor systems 110a to 110z via the switching hub 180. The camera adapters 120a to 120z that have received the time and the synchronization signal Genlock the image data photographed by the cameras 112a to 112z based on the time and the synchronization signal to perform frame synchronization of the image data. That is, the time server 290 synchronizes the shooting timings of the plurality of cameras 112a to 112z. As a result, the image processing system 100 can generate a virtual viewpoint image based on a plurality of images shot at the same timing, and thus can suppress deterioration in the quality of the virtual viewpoint image due to a shift in shooting timing. . In the present embodiment, it is assumed that the time server 290 manages time synchronization of the plurality of cameras 112. However, the time server 290 does not necessarily have to manage time synchronization of the plurality of cameras 112. For example, each camera 112 or each camera adapter 120 may perform processing for time synchronization independently.

フロントエンドサーバ２３０は、センサシステム１１０ｚから取得した画像データおよび音声データから、セグメント化された伝送パケットを再構成してデータ形式を変換する。フロントエンドサーバ２３０は、データ形式を変換した画像データおよび音声データを、カメラの識別子、データ種別、およびフレーム番号等と関連付けてデータベース２５０に書き込む。
バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から仮想的な視点の指定を受け付ける。バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から受け付けられた視点に基づいて、仮想カメラ操作ＵＩ３３０から受け付けられた視点に対応する画像データおよび音声データをデータベース２５０から読み出す。バックエンドサーバ２７０は、データベース２５０から読み出した画像データおよび音声データに対してレンダリング処理を行って仮想視点画像を生成する。 The front-end server 230 reconstructs a segmented transmission packet from the image data and audio data acquired from the sensor system 110z and converts the data format. The front-end server 230 writes the image data and audio data converted in data format in the database 250 in association with the camera identifier, data type, frame number, and the like.
The back-end server 270 receives a virtual viewpoint designation from the virtual camera operation UI 330. The back-end server 270 reads image data and audio data corresponding to the viewpoint received from the virtual camera operation UI 330 from the database 250 based on the viewpoint received from the virtual camera operation UI 330. The back-end server 270 performs a rendering process on the image data and audio data read from the database 250 to generate a virtual viewpoint image.

尚、画像コンピューティングサーバ２００の構成はこれに限らない。例えば、フロントエンドサーバ２３０、データベース２５０、およびバックエンドサーバ２７０のうち少なくとも２つが一体となって構成されていても良い。また、フロントエンドサーバ２３０、データベース２５０、およびバックエンドサーバ２７０の少なくとも１つが複数含まれていても良い。また、画像コンピューティングサーバ２００内の任意の位置に、前述した装置以外の装置が含まれていても良い。更に、画像コンピューティングサーバ２００の機能の少なくとも一部をエンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有していても良い。 The configuration of the image computing server 200 is not limited to this. For example, at least two of the front end server 230, the database 250, and the back end server 270 may be configured integrally. A plurality of at least one of the front-end server 230, the database 250, and the back-end server 270 may be included. In addition, a device other than the devices described above may be included in an arbitrary position in the image computing server 200. Further, the end user terminal 190 and the virtual camera operation UI 330 may have at least a part of the functions of the image computing server 200.

レンダリング処理された仮想視点画像は、バックエンドサーバ２７０からエンドユーザ端末１９０に送信される。エンドユーザ端末１９０を操作するユーザは、視点の指定に応じた画像の閲覧および音声の視聴が出来る。即ち、バックエンドサーバ２７０は、複数のカメラ１１２により撮影された画像（複数の仮想視点画像）と仮想視点情報とに基づく仮想視点コンテンツを生成する。具体的に、バックエンドサーバ２７０は、例えば、複数のカメラ１１２により撮影された画像データから複数のカメラアダプタ１２０により抽出された所定領域の画像データと、ユーザの操作により指定された視点とに基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ２７０は、生成した仮想視点コンテンツをエンドユーザ端末１９０に提供する。カメラアダプタ１２０による所定領域の抽出の詳細については後述する。 The rendered virtual viewpoint image is transmitted from the back-end server 270 to the end user terminal 190. A user who operates the end user terminal 190 can view an image and view audio according to the designation of the viewpoint. That is, the back-end server 270 generates virtual viewpoint content based on images (a plurality of virtual viewpoint images) taken by a plurality of cameras 112 and virtual viewpoint information. Specifically, the back-end server 270 is based on, for example, image data of a predetermined area extracted by a plurality of camera adapters 120 from image data captured by a plurality of cameras 112, and a viewpoint specified by a user operation. To generate virtual viewpoint content. Then, the back end server 270 provides the generated virtual viewpoint content to the end user terminal 190. Details of extraction of the predetermined area by the camera adapter 120 will be described later.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、仮想カメラ操作ＵＩ３３０により指定された視点における見えを表す画像であるとも言える。視点は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。即ち、仮想視点画像には、ユーザが任意に指定した視点に対応する画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。尚、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる場合の例を中心に説明するが、仮想視点コンテンツに音声データが含まれていなくても良い。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, it can be said that the virtual viewpoint image is an image representing the appearance at the viewpoint designated by the virtual camera operation UI 330. The viewpoint may be specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. An image corresponding to the viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the apparatus are also included in the virtual viewpoint image. In the present embodiment, an example in which audio data (audio data) is included in the virtual viewpoint content will be mainly described. However, the audio data may not be included in the virtual viewpoint content.

また、バックエンドサーバ２７０は、仮想視点画像をＨ．２６４やＨＥＶＣに代表される標準技術により圧縮符号化したうえで、ＭＰＥＧ−ＤＡＳＨプロトコルを使ってエンドユーザ端末１９０へ送信しても良い。また、仮想視点画像は、非圧縮でエンドユーザ端末１９０へ送信されても良い。圧縮符号化を行う前者の手法は、エンドユーザ端末１９０としてスマートフォンやタブレットを想定している。後者の手法は、エンドユーザ端末１９０として非圧縮画像を表示可能なディスプレイを想定している。即ち、エンドユーザ端末１９０の種別に応じて画像フォーマットの切り替えが可能である。また、画像の送信プロトコルはＭＰＥＧ−ＤＡＳＨに限らず、例えば、ＨＬＳ（ＨＴＴＰＬｉｖｅＳｔｒｅａｍｉｎｇ）やその他の送信方法を用いても良い。 Further, the back-end server 270 converts the virtual viewpoint image into the H.264 format. The data may be compressed and encoded by a standard technique represented by H.264 or HEVC and then transmitted to the end user terminal 190 using the MPEG-DASH protocol. Further, the virtual viewpoint image may be transmitted to the end user terminal 190 without being compressed. The former method of performing compression encoding assumes a smartphone or a tablet as the end user terminal 190. The latter method assumes a display capable of displaying an uncompressed image as the end user terminal 190. That is, the image format can be switched according to the type of the end user terminal 190. The image transmission protocol is not limited to MPEG-DASH, and for example, HLS (HTTP Live Streaming) or other transmission methods may be used.

このように、画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、および映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインは、センサシステム１１０ａ〜１１０ｚを含む。データ保存ドメインは、データベース２５０、フロントエンドサーバ２３０、およびバックエンドサーバ２７０を含む。映像生成ドメインは、仮想カメラ操作ＵＩ３３０およびエンドユーザ端末１９０を含む。尚、画像処理システム１００の構成は、このような構成に限定されない。例えば、仮想カメラ操作ＵＩ３３０が直接センサシステム１１０ａ〜１１０ｚから画像データを取得する事も可能である。しかしながら、本実施形態では、センサシステム１１０ａ〜１１０ｚから直接画像データを取得する方法ではなく、映像収集ドメインと映像生成ドメインとの間にデータ保存ドメインを配置する。具体的に、フロントエンドサーバ２３０は、センサシステム１１０ａ〜１１０ｚで生成された画像データ、音声データ、およびそれらのデータのメタ情報を、データベース２５０の共通スキーマおよびデータ型に変換する。これにより、センサシステム１１０ａ〜１１０ｚのカメラ１１２ａ〜１１２ｚが他機種のカメラに変わっても、カメラの差分をフロントエンドサーバ２３０が吸収し、他機種のカメラにより撮影された画像データをデータベース２５０に登録することが出来る。このことによって、カメラ１１２が他機種のカメラに変わった場合に、仮想カメラ操作ＵＩ３３０が適切に動作しない虞を低減することが出来る。 As described above, the image processing system 100 has three functional domains: a video collection domain, a data storage domain, and a video generation domain. The video collection domain includes sensor systems 110a-110z. The data storage domain includes a database 250, a front end server 230, and a back end server 270. The video generation domain includes a virtual camera operation UI 330 and an end user terminal 190. Note that the configuration of the image processing system 100 is not limited to such a configuration. For example, the virtual camera operation UI 330 can directly acquire image data from the sensor systems 110a to 110z. However, in the present embodiment, a data storage domain is arranged between the video collection domain and the video generation domain, instead of a method of directly acquiring image data from the sensor systems 110a to 110z. Specifically, the front-end server 230 converts image data, audio data, and meta information of these data generated by the sensor systems 110a to 110z into a common schema and data type of the database 250. As a result, even if the cameras 112a to 112z of the sensor systems 110a to 110z are changed to cameras of other models, the front-end server 230 absorbs the difference between the cameras and registers the image data captured by the cameras of the other models in the database 250. I can do it. Thus, when the camera 112 is changed to a camera of another model, it is possible to reduce a possibility that the virtual camera operation UI 330 does not operate properly.

また、本実施形態では、仮想カメラ操作ＵＩ３３０は、直接データベース２５０にアクセスせずにバックエンドサーバ２７０を介してアクセスする。画像生成処理に係わる共通処理をバックエンドサーバ２７０で行い、操作ＵＩに係わるアプリケーションの差分部分の処理を仮想カメラ操作ＵＩ３３０で行う。このことにより、仮想カメラ操作ＵＩ３３０を開発する際に、ＵＩ（ユーザインターフェース）となる操作デバイスや、生成したい仮想視点画像を操作するＵＩの機能に対する開発に注力する事が出来る。また、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０からの要求に応じて画像生成処理に係わる共通処理を追加または削除することも可能である。このことによって仮想カメラ操作ＵＩ３３０から要求に柔軟に対応することが出来る。 In the present embodiment, the virtual camera operation UI 330 is accessed via the back-end server 270 without directly accessing the database 250. Common processing related to the image generation processing is performed by the back-end server 270, and processing of the difference portion of the application related to the operation UI is performed by the virtual camera operation UI 330. As a result, when developing the virtual camera operation UI 330, it is possible to focus on development of an operation device serving as a UI (user interface) and a UI function for operating a virtual viewpoint image to be generated. Further, the back-end server 270 can add or delete common processing related to image generation processing in response to a request from the virtual camera operation UI 330. This makes it possible to respond flexibly to requests from the virtual camera operation UI 330.

このように、画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２により撮影された画像データに基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。尚、本実施形態における画像処理システム１００は、前述した物理的な構成に限定される訳ではなく、論理的に構成されていても良い。 As described above, in the image processing system 100, the virtual viewpoint image is generated by the back-end server 270 based on the image data captured by the plurality of cameras 112 for capturing the subject from a plurality of directions. Note that the image processing system 100 in the present embodiment is not limited to the physical configuration described above, and may be logically configured.

本実施形態では、例えば、カメラアダプタ１２０ａ〜１２０ｚを用いることにより、複数の情報処理装置の一例が実現される。また、仮想カメラ操作ＵＩ３３０は、被写体に対する視点を設定する。また、バックエンドサーバ２７０は、複数の方向から撮影された複数の撮影画像に含まれる被写体の領域の画像を用いて、仮想カメラ操作ＵＩ３３０により設定された視点から見た場合の当該被写体の画像を生成する。 In this embodiment, for example, an example of a plurality of information processing apparatuses is realized by using the camera adapters 120a to 120z. Further, the virtual camera operation UI 330 sets a viewpoint for the subject. Further, the back-end server 270 uses the image of the subject area included in the plurality of photographed images photographed from the plurality of directions, and displays the image of the subject when viewed from the viewpoint set by the virtual camera operation UI 330. Generate.

（カメラアダプタ）
次に、本実施形態におけるカメラアダプタ１２０の機能ブロックの一例について図２を利用して説明する。図２は、カメラアダプタ１２０の機能的な構成の一例を示す図である。
カメラアダプタ１２０は、ネットワークアダプタ６１１０、伝送部６１２０、画像処理部６１３０、および外部機器制御部６１４０を有する。
ネットワークアダプタ６１１０は、データ送受信部６１１１および時刻制御部６１１２を有する。 (Camera adapter)
Next, an example of functional blocks of the camera adapter 120 in the present embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of a functional configuration of the camera adapter 120.
The camera adapter 120 includes a network adapter 6110, a transmission unit 6120, an image processing unit 6130, and an external device control unit 6140.
The network adapter 6110 includes a data transmission / reception unit 6111 and a time control unit 6112.

データ送受信部６１１１は、ネットワーク１７０、２９１、３１０ａを介して、他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、および制御ステーション３１０とデータ通信を行う。例えば、データ送受信部６１１１は、カメラ１１２により撮影された画像から前景背景分離部６１３１により分離された前景画像と背景画像とを、別のカメラアダプタ１２０に対して出力する。前景画像および背景画像の出力先となるカメラアダプタ１２０は、画像処理システム１００内のカメラアダプタ１２０のうち、データルーティング処理部６１２２の処理に応じて予め定められた順序において次の順序のカメラアダプタ１２０である。各カメラアダプタ１２０が前景画像と背景画像とを出力することで、複数の視点から撮影された前景画像と背景画像に基づいて仮想視点画像が生成される。尚、撮影画像から分離された前景画像を出力して背景画像を出力しないカメラアダプタ１２０が存在しても良い。 The data transmission / reception unit 6111 performs data communication with other camera adapters 120, the front-end server 230, the time server 290, and the control station 310 via the networks 170, 291 and 310a. For example, the data transmission / reception unit 6111 outputs the foreground image and the background image separated by the foreground / background separation unit 6131 from the image taken by the camera 112 to another camera adapter 120. The camera adapter 120 that is the output destination of the foreground image and the background image is the camera adapter 120 in the next order in a predetermined order according to the processing of the data routing processing unit 6122 among the camera adapters 120 in the image processing system 100. It is. Each camera adapter 120 outputs a foreground image and a background image, thereby generating a virtual viewpoint image based on the foreground image and the background image taken from a plurality of viewpoints. There may be a camera adapter 120 that outputs a foreground image separated from a captured image and does not output a background image.

時刻制御部６１１２は、例えばＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ２９０と時刻同期を行う機能とを有する。尚、時刻同期のためのプロトコルは、ＩＥＥＥ１５８８に限定されず、例えば、他のＥｔｈｅｒＡＶＢ規格や、独自のプロトコルでも良い。本実施形態では、ネットワークアダプタ６１１０としてＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）を利用するが、ネットワークアダプタ６１１０は、ＮＩＣに限定されず、他のＩｎｔｅｒｆａｃｅを利用しても良い。また、ＩＥＥＥ１５８８は、ＩＥＥＥ１５８８−２００２、ＩＥＥＥ１５８８−２００８のように標準規格として更新されている。後者は、ＰＴＰｖ２（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌＶｅｒｓｉｏｎ２）とも呼ばれる。 The time control unit 6112 is based on, for example, the IEEE 1588 standard Ordinary Clock, and has a function of storing a time stamp of data transmitted to and received from the time server 290 and a function of performing time synchronization with the time server 290. Note that the protocol for time synchronization is not limited to IEEE 1588. For example, another EtherAVB standard or a unique protocol may be used. In the present embodiment, a network interface card (NIC) is used as the network adapter 6110. However, the network adapter 6110 is not limited to the NIC, and other interfaces may be used. IEEE 1588 has been updated as a standard, such as IEEE 1588-2002 and IEEE 1588-2008. The latter is also called PTPv2 (Precision Time Protocol Version 2).

伝送部６１２０は、ネットワークアダプタ６１１０を介してスイッチングハブ１８０等に対するデータの伝送を制御する機能を有し、以下の機能部を有する。
データ圧縮・伸張部６１２１は、データ送受信部６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、およびフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能とを有する。 The transmission unit 6120 has a function of controlling transmission of data to the switching hub 180 and the like via the network adapter 6110, and includes the following functional units.
The data compression / decompression unit 6121 has a function of compressing data transmitted / received via the data transmission / reception unit 6111 by applying a predetermined compression method, compression rate, and frame rate, and a function of decompressing the compressed data And have.

データルーティング処理部６１２２は、後述するデータルーティング情報保持部６１２５が保持するデータを利用し、データ送受信部６１１１が受信したデータおよび画像処理部６１３０で処理されたデータのルーティング先を決定する機能を有する。また、データルーティング処理部６１２２は、決定したルーティング先へデータを送信する機能を有する。データルーティング処理部６１２２は、自身が属するカメラアダプタ１２０に対応するカメラ１１２と同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０をルーティング先として決定するのが好ましい。これらのカメラ１１２で撮影される画像はフレーム相関が高いため、このようにしてルーティング先を決定することが画像処理を行う上で好適だからである。複数のカメラアダプタ１２０それぞれのデータルーティング処理部６１２２によるルーティング先の決定に応じて、前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が定まる。 The data routing processing unit 6122 has a function of determining the routing destination of the data received by the data transmitting / receiving unit 6111 and the data processed by the image processing unit 6130 using data held by the data routing information holding unit 6125 described later. . Further, the data routing processing unit 6122 has a function of transmitting data to the determined routing destination. The data routing processing unit 6122 preferably determines the camera adapter 120 corresponding to the camera 112 focused on the same point of sight as the camera 112 corresponding to the camera adapter 120 to which the data routing processing unit 6122 belongs as a routing destination. This is because images taken by these cameras 112 have a high frame correlation, and thus it is preferable to determine the routing destination in this way for image processing. In accordance with the determination of the routing destination by the data routing processing unit 6122 of each of the plurality of camera adapters 120, the order of the camera adapters 120 that output the foreground image and the background image in the relay format is determined.

時刻同期制御部６１２３は、ＩＥＥＥ１５８８規格のＰＴＰ（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌ）に準拠し、タイムサーバ２９０と時刻同期に係わる処理を行う機能を有する。尚、時刻同期を行うためのプロトコルは、ＰＴＰに限定されず、他の同様のプロトコルを利用して時刻同期を行っても良い。 The time synchronization control unit 6123 conforms to the IEEE 1588 standard PTP (Precision Time Protocol) and has a function of performing processing related to time synchronization with the time server 290. Note that the protocol for performing time synchronization is not limited to PTP, and time synchronization may be performed using other similar protocols.

画像・音声伝送処理部６１２４は、画像データおよび音声データを、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ転送するためのメッセージを作成する機能を有する。メッセージには、画像データおよび音声データと、各データのメタ情報とが含まれる。本実施形態のメタ情報には、画像の撮影をした時および音声のサンプリングをした時のタイムコードまたはシーケンス番号と、データ種別と、カメラ１１２、マイク１１１の個体を示す識別子等が含まれる。尚、画像・音声伝送処理部６１２４により送信される画像データおよび音声データは、データ圧縮・伸張部６１２１でデータ圧縮されていても良い。 The image / audio transmission processing unit 6124 has a function of creating a message for transferring image data and audio data to another camera adapter 120 or the front-end server 230 via the data transmission / reception unit 6111. The message includes image data and audio data, and meta information of each data. The meta information of this embodiment includes a time code or sequence number when an image is taken and when audio is sampled, a data type, an identifier indicating the individual of the camera 112 and the microphone 111, and the like. Note that the image data and audio data transmitted by the image / audio transmission processing unit 6124 may be compressed by the data compression / decompression unit 6121.

また、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０からデータ送受信部６１１１を介してメッセージを受け取る。そして、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０から受け取ったメッセージに含まれるデータ種別に応じて、伝送プロトコルに規定されるパケットサイズにフラグメントされたデータ情報を画像データ、音声データに復元する。尚、データを復元した際にデータが圧縮されている場合、データ圧縮・伸張部６１２１が当該データに対する伸張処理を行う。
データルーティング情報保持部６１２５は、データ送受信部６１１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する機能を有する。 The image / sound transmission processing unit 6124 receives a message from another camera adapter 120 via the data transmission / reception unit 6111. Then, the image / sound transmission processing unit 6124 converts the data information fragmented into the packet size specified in the transmission protocol into image data and sound data according to the data type included in the message received from the other camera adapter 120. Restore. If the data is compressed when the data is restored, the data compression / decompression unit 6121 performs decompression processing on the data.
The data routing information holding unit 6125 has a function of holding address information for determining a transmission destination of data transmitted / received by the data transmitting / receiving unit 6111.

画像処理部６１３０は、カメラ制御部６１４１の制御によりカメラ１１２が撮影した画像データ、および他のカメラアダプタ１２０から受け取った画像データに対して処理を行う機能を有し、以下の機能部を有する。
前景背景分離部６１３１は、カメラ１１２が撮影した画像データを前景画像と背景画像とに分離する機能を有する。即ち、前景背景分離部６１３１は、自身が属するカメラアダプタ１２０に対応するカメラ１１２により撮影された画像データから所定領域を抽出する。所定領域は、例えば、撮影画像に対するオブジェクトの検出結果として得られる前景画像の領域である。前景背景分離部６１３１は、この所定領域の抽出結果に基づいて、撮影画像を前景画像と背景画像とに分離する。尚、オブジェクトとは、例えば、人物などの被写体である。ただし、オブジェクトは、特定人物（選手、監督、及び／又は審判等）であっても良いし、ボールやゴール等、画像パターンが予め定められている物体であっても良い。また、オブジェクトとして、動体が検出されるようにしても良い。人物等の重要なオブジェクトを含む前景画像と、そのようなオブジェクトを含まない背景領域とを分離して処理することで、画像処理システム１００において生成される仮想視点画像のオブジェクトに該当する部分の画像の品質を向上することが出来る。また、前景領域と背景領域との分離を複数のカメラアダプタ１２０それぞれが行うことで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることが出来る。尚、所定領域は、前景画像に限定されず、例えば、背景画像であっても良い。 The image processing unit 6130 has a function of processing image data captured by the camera 112 under the control of the camera control unit 6141 and image data received from another camera adapter 120, and includes the following function units.
The foreground / background separator 6131 has a function of separating image data captured by the camera 112 into a foreground image and a background image. That is, the foreground / background separation unit 6131 extracts a predetermined area from the image data photographed by the camera 112 corresponding to the camera adapter 120 to which the foreground / background separator 6131 belongs. The predetermined area is, for example, an area of the foreground image obtained as a result of object detection for the captured image. The foreground / background separator 6131 separates the captured image into a foreground image and a background image based on the extraction result of the predetermined area. Note that an object is, for example, a subject such as a person. However, the object may be a specific person (player, manager, and / or referee, etc.), or may be an object with a predetermined image pattern, such as a ball or a goal. A moving object may be detected as the object. An image of a portion corresponding to an object of a virtual viewpoint image generated in the image processing system 100 by separating and processing a foreground image including an important object such as a person and a background area not including such an object Quality can be improved. Further, the foreground area and the background area are separated by each of the plurality of camera adapters 120, whereby the load on the image processing system 100 including the plurality of cameras 112 can be distributed. The predetermined area is not limited to the foreground image, and may be a background image, for example.

三次元情報処理部６１３２は、前景背景分離部６１３１で分離された前景画像と、他のカメラアダプタ１２０から受け取った前景画像と、制御ステーション３１０が有する撮影空間情報と、を用いて、三次元的画像情報処理を行う機能を有する。三次元的画像情報処理には、例えば、ステレオマッチング等が利用される。
画質調整部６１３３は、前景背景分離部６１３１で分離された前景画像および背景画像の少なくとも何れか一方の画質を調整する機能を有する。画質とは、例えば、解像度、色階調、コントラスト、彩度、輝度、および明度の少なくとも何れか１つである。 The three-dimensional information processing unit 6132 uses the foreground image separated by the foreground / background separation unit 6131, the foreground image received from the other camera adapter 120, and the imaging space information of the control station 310 to perform a three-dimensional process. It has a function to perform image information processing. For example, stereo matching is used for the three-dimensional image information processing.
The image quality adjustment unit 6133 has a function of adjusting the image quality of at least one of the foreground image and the background image separated by the foreground / background separation unit 6131. The image quality is, for example, at least one of resolution, color gradation, contrast, saturation, luminance, and brightness.

キャリブレーション制御部６１３４は、キャリブレーションに必要な画像データを、カメラ制御部６１４１を介してカメラ１１２から取得し、キャリブレーションに係わる演算処理を行うフロントエンドサーバ２３０に送信する機能を有する。尚、本実施形態では、キャリブレーションに係わる演算処理をフロントエンドサーバ２３０で行う場合を例に挙げて説明する。しかしながら、当該演算処理を行うノードはフロントエンドサーバ２３０に限定されない。例えば、制御ステーション３１０やカメラアダプタ１２０（他のカメラアダプタ１２０を含む）等、他のノードで当該演算処理が行われても良い。また、キャリブレーション制御部６１３４は、カメラ制御部６１４１を介してカメラ１１２から取得した画像データに対して、予め設定されたパラメータに応じて撮影中のキャリブレーション（動的キャリブレーション）を行う機能を有する。 The calibration control unit 6134 has a function of acquiring image data necessary for calibration from the camera 112 via the camera control unit 6141 and transmitting the image data to the front-end server 230 that performs arithmetic processing related to calibration. In the present embodiment, a case where the calculation process related to calibration is performed by the front-end server 230 will be described as an example. However, the node that performs the arithmetic processing is not limited to the front end server 230. For example, the calculation process may be performed in another node such as the control station 310 or the camera adapter 120 (including another camera adapter 120). In addition, the calibration control unit 6134 has a function of performing calibration during shooting (dynamic calibration) on image data acquired from the camera 112 via the camera control unit 6141 according to a preset parameter. Have.

外部機器制御部６１４０は、カメラアダプタ１２０に接続される機器を制御する機能を有し、以下の機能部を有する。
カメラ制御部６１４１は、カメラ１１２と接続し、カメラ１１２の制御、カメラ１１２で撮影された画像データの取得、同期信号の提供、および時刻設定等を行う機能を有する。 The external device control unit 6140 has a function of controlling devices connected to the camera adapter 120, and includes the following functional units.
The camera control unit 6141 is connected to the camera 112 and has functions of controlling the camera 112, obtaining image data captured by the camera 112, providing a synchronization signal, and setting a time.

カメラ１１２の制御には、例えば、撮影パラメータ（画素数、色深度、フレームレート、およびホワイトバランスの設定等）の設定および参照が含まれる。さらに、カメラ１１２の制御には、カメラ１１２の状態（撮影中、停止中、同期中、およびエラー等）の取得と、撮影の開始および停止と、ピント調整等が含まれる。尚、取り外し可能なレンズがカメラ１１２に装着されている場合には、カメラアダプタ１２０がレンズに接続し、直接レンズの調整を行っても良い。また、カメラアダプタ１２０がカメラ１１２を介してズーム等のレンズ調整を行っても良い。 The control of the camera 112 includes, for example, setting and reference of shooting parameters (such as setting of the number of pixels, color depth, frame rate, and white balance). Furthermore, the control of the camera 112 includes acquisition of the state of the camera 112 (during shooting, stop, synchronization, error, etc.), start and stop of shooting, and focus adjustment. When a removable lens is attached to the camera 112, the camera adapter 120 may be connected to the lens and directly adjust the lens. The camera adapter 120 may perform lens adjustment such as zooming via the camera 112.

同期信号の提供は、時刻同期制御部６１２３で得られる、タイムサーバ２９０と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。
時刻設定は、時刻同期制御部６１２３で得られる、タイムサーバ２９０と同期した時刻を、例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受け取る画像データに、カメラ制御部６１４１から提供されるタイムコードが付与されることになる。尚、タイムコードのフォーマットはＳＭＰＴＥ１２Ｍに限定されず、他のフォーマットであっても良い。また、カメラ制御部６１４１は、カメラ１１２に対するタイムコードの提供をせず、カメラ１１２から受け取った画像データに自身がタイムコードを付与しても良い。 The synchronization signal is provided by providing the camera 112 with a photographing timing (control clock) using a time synchronized with the time server 290 obtained by the time synchronization control unit 6123.
The time setting is performed by providing the time synchronized with the time server 290 obtained by the time synchronization control unit 6123 using, for example, a time code conforming to the SMPTE12M format. As a result, the time code provided from the camera control unit 6141 is added to the image data received from the camera 112. Note that the format of the time code is not limited to SMPTE12M, and other formats may be used. Further, the camera control unit 6141 may provide the time code to the image data received from the camera 112 without providing the time code to the camera 112.

マイク制御部６１４２は、マイク１１１と接続し、マイク１１１の制御と、マイク１１１での収音の開始および停止と、マイク１１１で収音された音声データの取得等を行う機能を有する。
マイク１１１の制御には、例えば、ゲイン調整や、マイク１１１の状態の取得等が含まれる。また、マイク制御部６１４２は、音声データをサンプリングするタイミングとタイムコードをマイク１１１に対して提供する。例えば、タイムサーバ２９０からの時刻情報が例えば４８ＫＨｚのワードクロックに変換されてマイク１１１に供給されることにより、音声データのサンプリングのタイミングとなるクロック情報がマイク１１１に供給される。 The microphone control unit 6142 is connected to the microphone 111 and has a function of controlling the microphone 111, starting and stopping sound collection by the microphone 111, obtaining audio data collected by the microphone 111, and the like.
Control of the microphone 111 includes gain adjustment, acquisition of the state of the microphone 111, and the like, for example. In addition, the microphone control unit 6142 provides the microphone 111 with timing and time code for sampling the audio data. For example, time information from the time server 290 is converted into, for example, a 48 KHz word clock and supplied to the microphone 111, so that clock information serving as a timing for sampling audio data is supplied to the microphone 111.

雲台制御部６１４３は、雲台１１３と接続し、雲台１１３の制御を行う機能を有する。雲台１１３の制御には、例えば、パン・チルト制御や、雲台１１３の状態の取得等が含まれる。
センサ制御部６１４４は、外部センサ１１４と接続し、外部センサ１１４がセンシングしたセンサ情報を取得する機能を有する。例えば、外部センサ１１４としてジャイロセンサが利用される場合、センサ制御部６１４４は、振動を表す情報を取得する。画像処理部６１３０は、前景背景分離部６１３１での処理に先立って、センサ制御部６１４４により取得された振動情報を用いて、振動を抑えた画像データを生成する。例えば、カメラ１１２が８Ｋカメラであるとする。この場合、画像処理部６１３０は、カメラ１１２で撮影された画像データを、振動情報を考慮して、元の８Ｋサイズよりも小さいサイズで切り出して、当該カメラ１１２と隣接する位置に設置されたカメラ１１２の画像との位置合わせを行う。これにより、画像処理部６１３０は、建造物の躯体振動が各カメラ１１２に異なる周波数で伝搬しても、カメラアダプタ１２０に配備された本機能で、各カメラ１１２で撮影された画像データの位置合わせを行うことが出来る。その結果、画像処理部６１３０は、電子的に防振された画像データを生成することが出来、画像コンピューティングサーバ２００におけるカメラ１１２の台数分の位置合わせの処理負荷を軽減する効果が得られる。尚、センサシステム１１０のセンサは外部センサ１１４に限定されず、カメラアダプタ１２０に内蔵されたセンサであっても同様の効果が得られる。 The pan head control unit 6143 is connected to the pan head 113 and has a function of controlling the pan head 113. The control of the pan head 113 includes, for example, pan / tilt control, acquisition of the state of the pan head 113, and the like.
The sensor control unit 6144 is connected to the external sensor 114 and has a function of acquiring sensor information sensed by the external sensor 114. For example, when a gyro sensor is used as the external sensor 114, the sensor control unit 6144 acquires information indicating vibration. Prior to the processing by the foreground / background separation unit 6131, the image processing unit 6130 uses the vibration information acquired by the sensor control unit 6144 to generate image data in which vibration is suppressed. For example, assume that the camera 112 is an 8K camera. In this case, the image processing unit 6130 cuts out image data captured by the camera 112 in a size smaller than the original 8K size in consideration of vibration information, and is installed at a position adjacent to the camera 112. Alignment with 112 images is performed. As a result, the image processing unit 6130 aligns the image data captured by each camera 112 with this function provided in the camera adapter 120 even if the building vibration of the building propagates to each camera 112 at a different frequency. Can be done. As a result, the image processing unit 6130 can generate electronically image-stabilized image data, and an effect of reducing the alignment processing load for the number of cameras 112 in the image computing server 200 can be obtained. The sensor of the sensor system 110 is not limited to the external sensor 114, and the same effect can be obtained even with a sensor built in the camera adapter 120.

（画像処理部６１３０）
図３は、カメラアダプタ１２０内部の画像処理部６１３０の機能的な構成の一例を示す図である。
キャリブレーション制御部６１３４は、カメラ制御部６１４１から入力された画像データに対して、色補正処理やブレ補正処理（電子防振処理）等を行う。色補正処理は、カメラ１１２毎の色のばらつきを抑えるための処理である。ブレ補正処理は、カメラ１１２の振動に起因するブレに対して画像の位置を安定させるための処理である。 (Image processing unit 6130)
FIG. 3 is a diagram illustrating an example of a functional configuration of the image processing unit 6130 in the camera adapter 120.
The calibration control unit 6134 performs color correction processing, blur correction processing (electronic image stabilization processing), and the like on the image data input from the camera control unit 6141. The color correction process is a process for suppressing color variations for each camera 112. The blur correction process is a process for stabilizing the position of the image against the blur caused by the vibration of the camera 112.

前景背景分離部６１３１の機能的な構成の一例について説明する。
前景分離部５００１は、カメラ１１２で撮影された画像データであって、位置合わせが行われた画像データと、背景画像５００２とを比較した結果に基づいて、当該画像データから前景画像を分離する。背景更新部５００３は、背景画像５００２と、カメラ１１２で撮影された画像データであって、位置合わせが行われた画像データとを用いて新しい背景画像を生成し、背景画像５００２を新しい背景画像に更新する。背景切出部５００４は、背景画像５００２の一部を切り出す制御を行う。背景切出部５００４は、切り出した背景画像５００２に対する画質調整が必要である場合、当該背景画像５００２を画質調整部６１３３に出力し、そうでない場合、当該背景画像５００２を伝送部６１２０に出力する。 An example of a functional configuration of the foreground / background separation unit 6131 will be described.
The foreground separation unit 5001 separates the foreground image from the image data captured by the camera 112 based on the result of comparison between the image data that has been aligned and the background image 5002. The background update unit 5003 generates a new background image using the background image 5002 and the image data captured by the camera 112 and subjected to alignment, and makes the background image 5002 a new background image. Update. The background cutout unit 5004 performs control to cut out a part of the background image 5002. The background cutout unit 5004 outputs the background image 5002 to the image quality adjustment unit 6133 when image quality adjustment is necessary for the cut out background image 5002, and otherwise outputs the background image 5002 to the transmission unit 6120.

次いで三次元情報処理部６１３２の機能的な構成の一例について説明する。
三次元モデル情報受信部５００５は、仮想視点画像を得るためにフロントエンドサーバ２３０内で生成される三次元モデルに関する情報を制御ステーション３１０から受信する。当該受信する情報の一つは、画像処理システム１００が有する全てのカメラ１１２の全共通被写範囲を表す三次元データである。即ち、全てのカメラ１１２の全共通被写範囲を表す三次元データは、全てのカメラ１１２が撮影可能な被写空間（被写界）の乗算（論理積の演算）を行うことにより求められる空間である。全共通被写範囲は、カメラ１１２に固有の内部パラメータ（焦点距離、画像中心、およびレンズ歪みパラメータ等）と、カメラ１１２の位置姿勢を表す外部パラメータ（回転行列および位置ベクトル等）とに依存する。全共通被写範囲は、制御ステーション３１０にて算出される。尚、カメラ１１２の内部パラメータおよび外部パラメータの変更が撮影中に生じた場合、全共通被写範囲は、随時制御ステーション３１０にて再計算される。 Next, an example of a functional configuration of the three-dimensional information processing unit 6132 will be described.
The three-dimensional model information receiving unit 5005 receives information on the three-dimensional model generated in the front end server 230 to obtain a virtual viewpoint image from the control station 310. One of the received information is three-dimensional data representing all common coverage areas of all cameras 112 included in the image processing system 100. That is, the three-dimensional data representing all the common coverage areas of all the cameras 112 is a space obtained by performing multiplication (logical product operation) of a subject space (field of view) that can be photographed by all the cameras 112. It is. The total common coverage range depends on internal parameters (focal length, image center, lens distortion parameters, etc.) unique to the camera 112 and external parameters (rotation matrix, position vector, etc.) representing the position and orientation of the camera 112. . The total common coverage area is calculated by the control station 310. Note that if the internal parameters and external parameters of the camera 112 change during shooting, the entire common coverage area is recalculated at any time by the control station 310.

他カメラ前景受信部５００６は、他のカメラアダプタ１２０で画像データから分離された前景画像を受信する。
前景位置判定部５００７は、前景分離部５００１で分離された前景画像と、他のカメラアダプタ１２０で画像データから分離された前景画像とを用いて、ステレオマッチングの原理等を用い、前景画像が被写空間のどのエリアに存在する画像かを判定する。前景位置判定部５００７は、この判定の結果に基づいて、前景画像をそのまま伝送部６１２０に送信する処理と、画質調整部６１３３に送信する処理と、どちらにも送信しない処理（即ち前景画像データを削除する処理）と、の何れかの処理を行う。前景位置判定部５００７で行われる処理の一例の詳細については後述する。 The other camera foreground receiving unit 5006 receives the foreground image separated from the image data by the other camera adapter 120.
The foreground position determination unit 5007 uses the foreground image separated by the foreground separation unit 5001 and the foreground image separated from the image data by the other camera adapter 120 using the principle of stereo matching, etc. It is determined in which area of the image space the image exists. Based on the result of this determination, the foreground position determination unit 5007 transmits the foreground image as it is to the transmission unit 6120, the process of transmitting it to the image quality adjustment unit 6133, and the process of not transmitting to either (that is, foreground image data Delete process). Details of an example of processing performed by the foreground position determination unit 5007 will be described later.

本実施形態では、例えば、ステレオマッチングの原理等を用いて、前景画像が被写空間のどのエリアに存在する画像かを判定することにより、複数の撮影画像に含まれる被写体の画像を用いて、当該被写体の実空間における位置を特定する。また、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲の三次元座標を算出することにより、前記複数の撮影画像に含まれる被写体の画像を用いて、当該被写体の領域の画像の三次元座標を導出する。 In the present embodiment, for example, by using the stereo matching principle or the like to determine in which area of the image space the foreground image exists, by using the images of the subject included in the plurality of captured images, The position of the subject in real space is specified. In addition, by calculating the three-dimensional coordinates of the range in which the foreground image obtained by the top camera and the foreground image obtained by the own camera are matched, the subject image is included in the plurality of photographed images. The three-dimensional coordinates of the image of the region are derived.

画質調整部６１３３は、前景位置判定部５００７、背景切出部５００４により伝送前に画質調整を実行すると判定された前景画像、背景画像を受信する。画質調整部６１３３は、解像度、色階調、コントラスト、彩度、輝度、および明度等の画質調整を実行し、画質調整した画像データを伝送部６１２０に送信する。 The image quality adjustment unit 6133 receives the foreground image and the background image determined by the foreground position determination unit 5007 and the background extraction unit 5004 to execute image quality adjustment before transmission. The image quality adjustment unit 6133 performs image quality adjustments such as resolution, color gradation, contrast, saturation, luminance, and brightness, and transmits image data that has undergone image quality adjustment to the transmission unit 6120.

（伝送調整処理）
以下に、本実施形態の伝送調整処理の一例を、具体例を用いて説明する。
図４は、全共通被写範囲の一例を示す平面図である。図４は、フィールド４００と、その周囲の或る高さ位置に設置された８台のカメラ１１２ａ〜１１２ｈとを、上方から見た様子の概略を示す。カメラ１１２ａ〜１１２ｈの光軸中心は、全て共通の注視点６３０１に向いている。各カメラ１１２ａ〜１１２ｈは、水平画角θａ〜θｈとなる焦点距離にそれぞれ設定されている。また、フィールド４００上にはオブジェクトＰ〜Ｕが存在する。ここでは、説明を簡単にするために、オブジェクトＰ〜Ｕが立方体の物体であるものとする。図４において、全てのカメラ１１２ａ〜１１２ｈの全共通被写範囲は、全共通被写範囲Ａになる。図５は、全共通被写範囲の一例を示す斜視図である。図５では、図４に示すフィールド４００を俯瞰した様子の概略を示す。図５に示すように、全共通被写範囲Ａは、実際には、三次元の多面体の領域である。 (Transmission adjustment processing)
Hereinafter, an example of the transmission adjustment process of the present embodiment will be described using a specific example.
FIG. 4 is a plan view showing an example of the entire common coverage area. FIG. 4 schematically shows a state in which the field 400 and the eight cameras 112a to 112h installed at a certain height around the field 400 are viewed from above. The optical axis centers of the cameras 112a to 112h are all directed to a common gaze point 6301. Each of the cameras 112a to 112h is set to a focal length that becomes a horizontal angle of view θa to θh. Further, objects P to U exist on the field 400. Here, in order to simplify the explanation, it is assumed that the objects P to U are cubic objects. In FIG. 4, the total common coverage range of all the cameras 112 a to 112 h is the all common coverage range A. FIG. 5 is a perspective view showing an example of the entire common coverage area. FIG. 5 shows an outline of a state in which the field 400 shown in FIG. As shown in FIG. 5, the entire common coverage area A is actually a three-dimensional polyhedral area.

全共通被写範囲Ａの内側に存在するオブジェクトＲ、Ｓの全ての面は、カメラ１１２ａ〜１１２ｈのうち２台以上のカメラで撮影される。このため、カメラ１１２ａ〜１１２ｈから得られる、オブジェクトＲ、Ｓに対応する前景画像をフロントエンドサーバ２３０に集結する。従って、フロントエンドサーバ２３０は、ステレオマッチングの原理を用いることで、３６０°どこから見ても欠落のないオブジェクトＲ、Ｓの三次元モデルを生成することが可能である。一方、全共通被写範囲Ａの外側に存在するオブジェクトＰ、Ｑ、Ｔ、Ｕには、一台のカメラからしか撮影されない面が存在する。換言すると、全共通被写範囲Ａの外側に存在するオブジェクトＰ、Ｑ、Ｔ、Ｕは、カメラ１１２ａ〜１１２ｈのうち少なくとも一台において撮影可能な範囲に存在しない。従って、オブジェクトＰ、Ｑ、Ｔ、Ｕについては、或る視点から見た場合に欠落のある三次元モデルしか生成されない。本実施形態では、三次元モデルが確実に不完全となる範囲、即ち、全共通被写範囲Ａの外側に存在するオブジェクトに対応する前景画像を各カメラアダプタ１２０で随時判定し、他のカメラアダプタ１２０に伝送しない。このようにすることで、デイジーチェーン接続を行う場合のデータの伝送負荷を低減することが出来る。 All surfaces of the objects R and S existing inside the all common coverage area A are photographed by two or more cameras among the cameras 112a to 112h. For this reason, the foreground images corresponding to the objects R and S obtained from the cameras 112 a to 112 h are collected in the front end server 230. Accordingly, the front-end server 230 can generate a three-dimensional model of the objects R and S that are not missing from any 360 ° position by using the principle of stereo matching. On the other hand, the objects P, Q, T, and U that exist outside the entire common coverage area A have a surface that can only be photographed from one camera. In other words, the objects P, Q, T, and U that exist outside the entire common shooting range A do not exist in a range that can be captured by at least one of the cameras 112a to 112h. Therefore, for the objects P, Q, T, and U, only a three-dimensional model with a missing portion is generated when viewed from a certain viewpoint. In the present embodiment, each camera adapter 120 determines at any time a foreground image corresponding to an object existing outside the entire common subject area A, that is, the range in which the 3D model is definitely incomplete, and other camera adapters. 120 is not transmitted. By doing so, it is possible to reduce the data transmission load when performing daisy chain connection.

図６は、図４に示した８台のカメラ１１２ａ〜１１２ｈのうち、デイジーチェーンの上流側にある３台のカメラ１１２ａ〜１１２ｃだけを取り出して示す図である。図７（ａ）〜（ｃ）は、それぞれ、カメラ１１２ａ、１１２ｂ、１１２ｃで撮影された画像データから得られる、背景画像を分離した後の前景画像の一例を示す図である。図７（ａ）〜（ｃ）では、画像間でのオブジェクトの共通面を識別するため、各オブジェクトＰ〜Ｕの図６における左側の面に丸印を描き加えている。また、図８は、三次元情報処理部６１３２により実行される伝送調整処理の一例を説明するフローチャートである。尚、以下では、デイジーチェーンにおいて連続する３台のカメラを、デイジーチェーンの上流側から上接カメラ、自カメラ、下接カメラと称する。 FIG. 6 is a diagram showing only three cameras 112a to 112c on the upstream side of the daisy chain out of the eight cameras 112a to 112h shown in FIG. FIGS. 7A to 7C are diagrams illustrating examples of the foreground image obtained by separating the background image obtained from the image data photographed by the cameras 112a, 112b, and 112c, respectively. 7A to 7C, a circle is drawn on the left side surface of each object P to U in FIG. 6 in order to identify the common surface of the objects between the images. FIG. 8 is a flowchart illustrating an example of transmission adjustment processing executed by the three-dimensional information processing unit 6132. In the following, the three cameras that are continuous in the daisy chain are referred to as an upper camera, a local camera, and a lower camera from the upstream side of the daisy chain.

まず、デイジーチェーンの最上流のカメラアダプタ１２０ａについては、図８のフローチャートによる処理を実行せずに、データ送受信部６１１１は、全ての前景画像を下接カメラであるカメラ１１２ｃに伝送する（図７（ａ）を参照）。２台目のカメラ１１２ｂにおける前景位置判定部５００７は、上接カメラであるカメラ１１２ａで撮影された画像データから得られる前景画像と自カメラであるカメラ１１２ｂで撮影された画像データから得られる前景画像とを比較する。前景位置判定部５００７は、この比較の結果に基づいて、これらの前景画像間の対応点のマッチングを行う。尚、上接カメラであるカメラ１１２ａで撮影された画像データから得られる前景画像は、他カメラ前景受信部５００６で受信される。また、以下の説明では、上接カメラで撮影された画像データから得られる前景画像を、必要に応じて上接カメラで得られる前景画像と称し、自カメラで撮影された画像データから得られる前景画像を、必要に応じて自カメラで得られる前景画像と称する。 First, for the most upstream camera adapter 120a in the daisy chain, the data transmitting / receiving unit 6111 transmits all the foreground images to the camera 112c, which is a subordinate camera, without executing the processing shown in the flowchart of FIG. 8 (FIG. 7). (See (a)). The foreground position determination unit 5007 in the second camera 112b is a foreground image obtained from image data photographed by the camera 112b as the foreground image and image data photographed by the camera 112b as the camera. And compare. The foreground position determination unit 5007 performs matching of corresponding points between these foreground images based on the comparison result. Note that the foreground image obtained from the image data captured by the camera 112a, which is the upper camera, is received by the other camera foreground receiving unit 5006. In the following description, the foreground image obtained from the image data captured by the top camera is referred to as the foreground image obtained by the top camera as necessary, and the foreground obtained from the image data captured by the camera itself. The image is referred to as a foreground image obtained with the own camera as necessary.

本実施形態では、前景位置判定部５００７は、上接カメラで得られる前景画像と、自カメラで撮影された画像データから得られる前景画像とを取得することにより、撮影画像に含まれる被写体の領域の画像を取得する。また、例えば、前景画像は、被写体の領域の画像の一例である。また、例えば、上接カメラで得られる前景画像と、自カメラで得られる前景画像は、複数の方向から撮影された複数の撮影画像に含まれる被写体の領域の画像の一例である。また、例えば、上接カメラであるカメラ１１２ａに対応するカメラアダプタ１２０ａは、予め決められた順序において当該情報処理装置の１つ前の順序になる前記情報処理装置の一例である。また、下接カメラであるカメラ１１２ｃに対応するカメラアダプタ１２０ｃは、予め決められた順序において当該情報処理装置の次の順序になる前記情報処理装置の一例である。 In the present embodiment, the foreground position determination unit 5007 obtains a foreground image obtained by the top camera and a foreground image obtained from image data captured by the own camera, so that the area of the subject included in the captured image is obtained. Get the image. For example, a foreground image is an example of an image of a subject area. Further, for example, the foreground image obtained by the upper camera and the foreground image obtained by the own camera are examples of the image of the subject area included in the plurality of photographed images photographed from a plurality of directions. In addition, for example, the camera adapter 120a corresponding to the camera 112a that is the upper camera is an example of the information processing apparatus that is in the order immediately preceding the information processing apparatus in a predetermined order. In addition, the camera adapter 120c corresponding to the camera 112c, which is a subordinate camera, is an example of the information processing apparatus that becomes the next order of the information processing apparatus in a predetermined order.

前景位置判定部５００７は、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像とのマッチングする範囲を判定する（Ｓ１００１）。これらの前景画像のうち、マッチングする範囲については、Ｓ１００２の処理が実行され、マッチングしない範囲については、Ｓ１００６の処理が実行される。即ち、自カメラで得られる前景画像と、上接カメラで得られる前景画像とに、マッチングする範囲とマッチングしない範囲とが含まれる場合、これらの前景画像については、Ｓ１００２以降の処理と、Ｓ１００６以降の処理との双方の処理が行われる。 The foreground position determination unit 5007 determines a matching range between the foreground image obtained by the camera 112a that is the top camera and the foreground image obtained by the camera 112b that is the camera itself (S1001). Of these foreground images, the process of S1002 is executed for the matching range, and the process of S1006 is executed for the non-matching range. That is, when the foreground image obtained by the own camera and the foreground image obtained by the top camera include a matching range and a non-matching range, for these foreground images, the processing after S1002 and the processing after S1006 are performed. Both of these processes are performed.

Ｓ１００２において、前景位置判定部５００７は、ステレオマッチングの原理を用いて、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲の三次元座標を算出する（Ｓ１００２）。 In step S1002, the foreground position determination unit 5007 calculates the three-dimensional coordinates of the matching range between the foreground image obtained by the top camera and the foreground image obtained by the own camera using the principle of stereo matching (S1002).

次に、前景位置判定部５００７は、Ｓ１００２で得られた三次元座標に基づいて、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲が、全共通被写範囲Ａの内側にあるか否かを判定する（Ｓ１００３）。全共通被写範囲Ａは、制御ステーション３１０から受信されるものである。
この判定の結果、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲が全共通被写範囲Ａの内側にある場合（Ｓ１００３でＹｅｓの場合）、前景位置判定部５００７は、次の処理を行う。即ち、前景位置判定部５００７は、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲を、下接カメラであるカメラ１１２ｃに対応するカメラアダプタ１２０ｃへ伝送すると決定する（Ｓ１００４）。 Next, based on the three-dimensional coordinates obtained in S1002, the foreground position determination unit 5007 determines that the matching range between the foreground image obtained by the superimposing camera and the foreground image obtained by the own camera is the entire common coverage area. It is determined whether it is inside A (S1003). The entire common coverage area A is received from the control station 310.
As a result of this determination, if the matching range between the foreground image obtained by the superimposing camera and the foreground image obtained by the own camera is inside the all common object range A (Yes in S1003), the foreground position determination unit 5007 performs the following processing. That is, the foreground position determination unit 5007 determines to transmit a matching range between the foreground image obtained by the upper camera and the foreground image obtained by the own camera to the camera adapter 120c corresponding to the camera 112c that is the lower camera. (S1004).

本実施形態では、前景位置判定部５００７は、Ｓ１００４において、少なくとも２つの撮影装置における被写範囲の共通領域に被写体の領域がある場合、前記撮影画像に含まれる当該被写体の領域の画像を伝送する。 In this embodiment, the foreground position determination unit 5007 transmits an image of the subject area included in the photographed image when there is a subject area in the common area of the subject range in at least two photographing apparatuses in S1004. .

一方、上接カメラで得られる前景画像と自カメラで得られる前景画像とのマッチングする範囲が全共通被写範囲Ａの内側にない場合（Ｓ１００３でＮｏの場合）、前景位置判定部５００７は、次の処理を行う。即ち、前景位置判定部５００７は、上接カメラで得られる前景画像の、自カメラで得られる前景画像とマッチングする範囲が、全共通被写範囲Ａの内側にない場合、当該範囲の画像を伝送しないと決定する（Ｓ１００５）。また、前景位置判定部５００７は、自カメラで得られる前景画像の、上接カメラで得られる前景画像とマッチングする範囲が、全共通被写範囲Ａの内側にない場合、当該範囲の画像を伝送すると決定する（Ｓ１００５）。ただし、前景位置判定部５００７は、当該範囲の画像の伝送を、下接カメラであるカメラ１１２ｃに対応するカメラアダプタ１２０ｃまでに制限する（Ｓ１００５）。 On the other hand, if the matching range between the foreground image obtained with the top camera and the foreground image obtained with the own camera is not within the all common subject range A (No in S1003), the foreground position determination unit 5007 Perform the following process. That is, the foreground position determination unit 5007 transmits an image in the foreground image obtained by the top camera if the range matching the foreground image obtained by the own camera is not inside the all common coverage area A. It decides not to do (S1005). The foreground position determination unit 5007 transmits an image in the range when the range of the foreground image obtained by the own camera and the foreground image obtained by the top camera is not inside the all-common captured area A. Then, it determines (S1005). However, the foreground position determination unit 5007 limits the transmission of the image in the range to the camera adapter 120c corresponding to the camera 112c that is the underlying camera (S1005).

この時点で、自カメラで得られる前景画像の、上接カメラで得られる前景画像とマッチングする範囲であって、全共通被写範囲Ａの内側にない範囲の画像を非伝送としてもよい。しかしながら、本実施形態では、下接カメラであるカメラ１１２ｃに対応するカメラアダプタ１２０ｃまでは、当該画像を伝送する。このようにすれば、下接カメラにおける前景画像の対応点のマッチング探索の精度を維持することが出来るからである。ただし、当該画像は、下接カメラであるカメラ１１２ｃよりも下流側にあるカメラ１１２ｄ〜１１２ｈに対応するカメラアダプタ１２０ｄ〜１２０ｈには伝送されない。具体的に、カメラアダプタ１２０ｃの前景位置判定部５００７は、当該画像をカメラアダプタ１２０ｄに伝送しないと決定する。なお、Ｓ１００５において、カメラアダプタ１２０は、全共通被写範囲Ａの内側にない前景画像を圧縮やリサイズして伝送してもよい。このように、全共通被写範囲Ａの内側にない前景画像のデータ量を削減して伝送することによる伝送の制限を行うことで、システムの伝送負荷を抑えつつ、より広い範囲で被写体が存在する仮想視点画像を生成することができる。 At this time, the range of the foreground image obtained by the own camera that matches the foreground image obtained by the upper camera and that is not within the entire common coverage area A may be not transmitted. However, in the present embodiment, the image is transmitted up to the camera adapter 120c corresponding to the camera 112c that is the underlying camera. This is because the matching search accuracy of the corresponding points of the foreground image in the underlying camera can be maintained. However, the image is not transmitted to the camera adapters 120d to 120h corresponding to the cameras 112d to 112h on the downstream side of the camera 112c that is the underlying camera. Specifically, the foreground position determination unit 5007 of the camera adapter 120c determines not to transmit the image to the camera adapter 120d. Note that in step S1005, the camera adapter 120 may compress and resize and transmit a foreground image that is not inside the entire common coverage area A. In this way, by limiting the transmission by reducing the data amount of the foreground image that is not inside the entire common coverage area A, the subject exists in a wider range while suppressing the transmission load of the system. A virtual viewpoint image to be generated can be generated.

本実施形態では、前景位置判定部５００７は、Ｓ１００３〜Ｓ１００５において、前記被写体の実空間における位置に基づいて、前記撮影画像に含まれる当該被写体の領域の画像を含むデータの伝送を制御する。前景位置判定部５００７は、Ｓ１００５において、前記撮影画像に含まれる当該被写体の領域の画像を伝送しないことと、前記撮影画像に含まれる当該被写体の領域の画像の伝送先を制限することとの何れかを、当該撮影画像を撮影した撮影装置に応じて選択する。 In this embodiment, the foreground position determination unit 5007 controls transmission of data including the image of the subject area included in the captured image based on the position of the subject in the real space in S1003 to S1005. In S1005, the foreground position determination unit 5007 does not transmit the image of the subject area included in the captured image, or restricts the transmission destination of the image of the subject area included in the captured image. Is selected according to the photographing device that photographed the photographed image.

一方、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲については、Ｓ１００６の処理が行われる。即ち、前景位置判定部５００７は、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像とに対し、非マッチング回数ｎに１を加算する（Ｓ１００６）。非マッチング回数ｎの初期値は０（ゼロ）である。 On the other hand, for the non-matching range between the foreground image obtained by the camera 112a which is the upper camera and the foreground image obtained by the camera 112b which is the own camera, the processing of S1006 is performed. In other words, the foreground position determination unit 5007 adds 1 to the non-matching count n for the foreground image obtained by the camera 112a as the upper camera and the foreground image obtained by the camera 112b as the own camera (S1006). The initial value of the non-matching count n is 0 (zero).

本実施形態では、例えば、非マッチング回数により計数値の一例が実現される。また、前景位置判定部５００７は、Ｓ１００６において、前記撮影画像に含まれる被写体の領域の画像が、当該撮影画像と異なる前記撮影画像に含まれる被写体の領域の画像とマッチングしない場合、当該マッチングしない画像に対する計数値を更新する。 In the present embodiment, for example, an example of the count value is realized by the number of non-matching. Further, in S1006, the foreground position determination unit 5007 determines that the image of the subject area included in the captured image does not match the image of the subject area included in the captured image different from the captured image. Update the count value for.

次に、前景位置判定部５００７は、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲に対する非マッチング回数ｎを判定する（Ｓ１００７）。上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲のうち、非マッチング回数ｎが２となる範囲については、Ｓ１００８の処理が行われる。一方、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲のうち、非マッチング回数ｎが２でない範囲（１である範囲）については、Ｓ１００４の処理が行われる。 Next, the foreground position determination unit 5007 determines the number n of non-matching for a non-matching range between the foreground image obtained by the camera 112a that is the top camera and the foreground image obtained by the camera 112b that is the camera itself (S1007). ). Of the non-matching range between the foreground image obtained by the camera 112a which is the top camera and the foreground image obtained by the camera 112b which is the own camera, the processing of S1008 is performed for the range where the non-matching count n is 2. Is called. On the other hand, of the non-matching range between the foreground image obtained by the camera 112a which is the upper camera and the foreground image obtained by the camera 112b which is the own camera (the range where the non-matching number n is not 2). In S1004, the process is performed.

Ｓ１００８において、前景位置判定部５００７は、次の処理を行う。即ち、前景位置判定部５００７は、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲のうち、非マッチング回数ｎが２となる範囲を伝送しないと決定する（Ｓ１００８）。
本実施形態では、例えば、Ｓ１００８が、前記マッチングしない画像に対する前記計数値が所定値よりも大きい場合、当該画像を伝送しないと決定することの一例である。なお、カメラアダプタ１２０は、Ｓ１００６及びＳ１００７の処理を行わず、上接カメラで得られる前景画像と自カメラで得られる前景画像とでマッチングしないすべての前景画像の伝送を制限する構成としてもよい。 In step S1008, the foreground position determination unit 5007 performs the following process. That is, the foreground position determination unit 5007 has a non-matching count n of 2 in a non-matching range between the foreground image obtained by the camera 112a that is the top camera and the foreground image obtained by the camera 112b that is the camera itself. It is determined not to transmit the range (S1008).
In the present embodiment, for example, S1008 is an example of determining that the image is not transmitted when the count value for the unmatched image is larger than a predetermined value. Note that the camera adapter 120 may be configured to limit the transmission of all foreground images that do not match between the foreground image obtained by the top camera and the foreground image obtained by the own camera without performing the processing of S1006 and S1007.

一方、処理がＳ１００４に進むと、前景位置判定部５００７は、次の処理を行う。即ち、前景位置判定部５００７は、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲のうち、非マッチング回数ｎが２でない範囲を伝送すると決定する（Ｓ１００４）。ここでは、カメラ１１２ａ、１１２ｂの２台のみについてしか処理を行っていないため、非マッチング回数ｎが２となることはない。よって、上接カメラであるカメラ１１２ａで得られる前景画像と自カメラであるカメラ１１２ｂで得られる前景画像との、マッチングしない範囲の全てが伝送される。 On the other hand, when the process proceeds to S1004, the foreground position determination unit 5007 performs the following process. That is, the foreground position determination unit 5007 is a range in which the number n of non-matching is not 2 out of the non-matching range between the foreground image obtained by the camera 112a as the upper camera and the foreground image obtained by the camera 112b as the own camera. Is determined to be transmitted (S1004). Here, since only two cameras 112a and 112b are processed, the non-matching count n does not become two. Therefore, the entire non-matching range between the foreground image obtained by the camera 112a as the upper camera and the foreground image obtained by the camera 112b as the own camera is transmitted.

本実施形態では、例えば、Ｓ１００４が、前記マッチングしない画像に対する前記計数値が所定値よりも小さい場合、当該画像を伝送すると決定することの一例である。また、例えば、Ｓ１００４、Ｓ１００７、Ｓ１００８により、前記マッチングしない画像に対する前記計数値に基づいて、当該画像を含むデータの伝送を制御することの一例が実現される。また、例えば、Ｓ１００１〜Ｓ１００８により、前記複数の撮影画像に含まれる被写体の領域の画像がマッチングするか否かに基づいて、前記撮影画像に含まれる当該被写体の領域の画像を含むデータの伝送を制御することの一例が実現される。 In the present embodiment, for example, S1004 is an example of determining that the image is to be transmitted when the count value for the unmatched image is smaller than a predetermined value. In addition, for example, S1004, S1007, and S1008 realize an example of controlling transmission of data including the image based on the count value for the unmatched image. Also, for example, through S1001 to S1008, based on whether or not the image of the subject area included in the plurality of captured images is matched, data including the image of the subject area included in the captured image is transmitted. An example of controlling is realized.

図９は、下接カメラへ伝送されるオブジェクトの一例を説明する図である。
図９の上段は、図７（ａ）および図７（ｂ）に示す前景画像に対してＳ１００１〜Ｓ１００７の処理を実行した結果を示す。図９の上段の左側は、図７（ａ）に対応し、図９の上段の右側は、図７（ｂ）に対応する。図９の上段において、全共通被写範囲Ａの内側と判定されて下接カメラであるカメラ１１２ｃへの伝送が決定される前景画像の範囲は、黒塗りの範囲となる。また、図９の上段において、全共通被写範囲Ａの外側と判定されて下接カメラであるカメラ１１２ｃまでのみを限定して伝送される前景画像の範囲は、縦縞の範囲である。また、図９の上段において、非伝送となる前景画像は、破線で囲まれた範囲となる。また、図９の上段において、非マッチング回数ｎが１であることにより伝送される前景画像の範囲は、斜線の範囲となる。 FIG. 9 is a diagram for explaining an example of an object transmitted to the underlying camera.
The upper part of FIG. 9 shows the result of executing the processing of S1001 to S1007 on the foreground image shown in FIGS. 7 (a) and 7 (b). 9 corresponds to FIG. 7A, and the right side of the upper stage in FIG. 9 corresponds to FIG. 7B. In the upper part of FIG. 9, the range of the foreground image that is determined to be inside the all-common coverage area A and determined to be transmitted to the camera 112c that is the underlying camera is a blackened range. In addition, in the upper part of FIG. 9, the range of the foreground image that is determined to be outside the entire common coverage range A and is transmitted only to the camera 112c that is the underlying camera is a vertical stripe range. In the upper part of FIG. 9, the foreground image that is not transmitted is in a range surrounded by a broken line. In the upper part of FIG. 9, the range of the foreground image transmitted when the non-matching count n is 1 is a hatched range.

続く３台目のカメラ１１２ｃにおいて、上接カメラをカメラ１１２ｂとし、自カメラをカメラ１１２ｃとして、自カメラをカメラ１１２ｂとする場合と同様に図８のフローチャートによる処理を実行した結果を、図９の下段に示す。図９の下段の左側は、図７（ｂ）に対応し、図９の下段の右側は、図７（ｃ）に対応する。図９の下段における面の塗り分けは、図９の上段における面の塗り分けと同じである。ここで、図９の下段の左側において、破線の斜線となっているオブジェクトＱの面は、カメラ１１２ｂにしか写りこんでいないため、非マッチング回数ｎが２となり、非伝送と決定された面である。 In the subsequent third camera 112c, the result of executing the processing according to the flowchart of FIG. 8 in the same manner as in the case where the upper camera is the camera 112b, the own camera is the camera 112c, and the own camera is the camera 112b. Shown below. The lower left side of FIG. 9 corresponds to FIG. 7B, and the lower right side of FIG. 9 corresponds to FIG. 7C. The surface coloring in the lower part of FIG. 9 is the same as the surface coloring in the upper part of FIG. Here, on the left side of the lower part of FIG. 9, the surface of the object Q which is a broken diagonal line is only reflected in the camera 112b, so the non-matching count n is 2, and the surface determined to be non-transmitted. is there.

また、２台目のカメラ１１２ｂが自カメラである時点では、オブジェクトＵは、全共通被写範囲Ａの外側であるが、下接カメラであるカメラ１１２ｃまでのみ伝送するとしたオブジェクトである。このオブジェクトＵの前景画像の一部の範囲（図９の上段の右側の縦縞の範囲）は、３台目のカメラ１１２ｃが自カメラである時点で、自カメラであるカメラ１１２ｃで得られる前景画像とマッチングせず、非伝送と決定される。従って、このオブジェクトＵの前景画像の一部の範囲（図９の上段の右側の縦縞の範囲）は、４台目のカメラ以降には伝送されない。もし、自カメラであるカメラ１１２ｃで得られる前景画像の前方に別のオブジェクトや障害物が存在する等してオブジェクトＵが写っておらず、当該前景画像と上接カメラで得られる前景画像との対応点のマッチングが取れなかったとする。このような場合にも、以降のカメラ１１２ｄ〜１１２ｈには被写オブジェクトＵの前景画像は伝送されない。 At the time when the second camera 112b is its own camera, the object U is an object that is outside the entire common subject range A but is transmitted only to the camera 112c, which is a subordinate camera. The range of a part of the foreground image of the object U (the range of the vertical stripe on the right side of the upper stage in FIG. 9) is the foreground image obtained by the camera 112c which is the own camera when the third camera 112c is the own camera. Is not matched and determined as non-transmission. Accordingly, a partial range of the foreground image of the object U (the vertical stripe region on the right side in the upper part of FIG. 9) is not transmitted after the fourth camera. If there is another object or obstacle in front of the foreground image obtained by the camera 112c which is the own camera, the object U is not reflected, and the foreground image obtained by the foreground camera and the foreground image are obtained. Suppose that the corresponding points could not be matched. Also in such a case, the foreground image of the object U is not transmitted to the subsequent cameras 112d to 112h.

ここで、図９の下段の左側に注目する。全共通被写範囲Ａの内側に存在するオブジェクトＲ、Ｓの前景画像のみが伝送対象として残る。一方、全共通被写範囲Ａの外側に位置する被写オブジェクトＱ、Ｔ、Ｕの前景画像は非伝送となる。このように、デイジーチェーンにおいて連続する３台のカメラ１１２間で、前景画像を上流から下流に向かって伝送する中で、前後のカメラで得られる前景画像を比較し、全共通被写範囲Ａにあるか否かの判定と非マッチング回数ｎのカウントとを行う。このようにすることにより、全共通被写範囲Ａの内側に存在するオブジェクトの前景画像のみを選別して伝送することが可能である。尚、デイジーチェーンの１台目のカメラ１１２ａで得られる前景画像については、カメラ１１２ａを自カメラとし、デイジーチェーンの終端に位置するカメラ１１２ｈを上接カメラとすることで、フロントエンドサーバ２３０において、位置の判定が可能になる。 Here, attention is paid to the left side of the lower part of FIG. Only the foreground images of the objects R and S existing inside the entire common coverage area A remain as transmission targets. On the other hand, the foreground images of the object Q, T, U that are located outside the entire common object range A are not transmitted. In this way, the foreground images are transmitted from the upstream to the downstream between the three cameras 112 that are consecutive in the daisy chain, and the foreground images obtained by the front and rear cameras are compared, and the entire common coverage range A is obtained. It is determined whether or not there is a non-matching count n. In this way, it is possible to select and transmit only the foreground image of the object existing inside the all common coverage area A. As for the foreground image obtained by the first camera 112a in the daisy chain, the camera 112a is used as its own camera, and the camera 112h located at the end of the daisy chain is used as the upper camera. The position can be determined.

以上のように本実施形態では、デイジーチェーン接続において連続する２つのカメラ（自カメラ、上接カメラ）で得られる前景画像の範囲のうち、相互にマッチングし、且つ、全共通被写範囲Ａの内側にある範囲を、下流側のカメラ（下接カメラ）に伝送する。また、連続する２つのカメラ（自カメラ、上接カメラ）で得られる前景画像の、相互にマッチングする範囲のうち、全共通被写範囲Ａの外側にある範囲については、以下のようにして伝送の要否を決定する。即ち、自カメラで得られる前景画像の範囲を伝送し、上接カメラで得られる前景画像の範囲を伝送しない。また、非マッチング回数ｎが１になる前景画像の範囲については、下流側のカメラ（下接カメラ）に伝送するが、非マッチング回数ｎが２になる前景画像の範囲については、下流側のカメラ（下接カメラ）に伝送しない。従って、複数の視点で撮影された画像を用いて被写体の画像（三次元モデル）を生成するに際し、前景画像を的確に抽出し、当該前景画像の位置に応じて画質を最適化しながら伝送負荷を低減することが可能になる。 As described above, in the present embodiment, among the foreground image ranges obtained by two continuous cameras (the own camera and the top camera) in the daisy chain connection, they match each other and The range inside is transmitted to the downstream camera (underlying camera). Further, of the foreground images obtained by two consecutive cameras (the own camera and the top camera) that match each other, the range outside the all common coverage area A is transmitted as follows. Determine whether or not. In other words, the range of the foreground image obtained by the own camera is transmitted, and the range of the foreground image obtained by the upper camera is not transmitted. Further, the range of the foreground image where the non-matching count n is 1 is transmitted to the downstream camera (the subordinate camera), but the range of the foreground image where the non-matching count n is 2 is transmitted to the downstream camera. Do not transmit to (underlying camera). Therefore, when generating a subject image (three-dimensional model) using images taken from a plurality of viewpoints, the foreground image is accurately extracted, and the transmission load is optimized while optimizing the image quality according to the position of the foreground image. It becomes possible to reduce.

＜第２の実施形態＞
次に、第２の実施形態を説明する。第１の実施形態では、連続する２つのカメラ（自カメラ、上接カメラ）で得られる前景画像の、相互にマッチングする範囲のうち、全共通被写範囲Ａの外側にある範囲であって、上接カメラで得られる前景画像の範囲を伝送しない。これに対し、本実施形態では、このような前景画像の範囲のデータ量を低減してから伝送する。このように本実施形態と第１の実施形態とは、連続する２つのカメラ（自カメラ、上接カメラ）で得られる前景画像の、相互にマッチングする範囲のうち、全共通被写範囲Ａの外側にある範囲についての処理が主として異なる。従って、本実施形態の説明において、第１の実施形態と同一の部分については、図１〜図９に付した符号と同一の符号を付す等して詳細な説明を省略する。 <Second Embodiment>
Next, a second embodiment will be described. In the first embodiment, the range of the foreground image obtained by two consecutive cameras (the own camera and the top-facing camera) is a range that is outside the all common subject range A among the ranges that match each other, Does not transmit the range of the foreground image obtained with the top camera. In contrast, in the present embodiment, transmission is performed after reducing the amount of data in the range of such foreground images. As described above, the present embodiment and the first embodiment are based on the entire common subject range A in the foreground images obtained by two consecutive cameras (the own camera and the top camera). The processing for the outer range is mainly different. Therefore, in the description of the present embodiment, the same parts as those in the first embodiment are denoted by the same reference numerals as those in FIGS.

図１０は、三次元情報処理部６１３２により実行される伝送調整処理の一例を説明するフローチャートである。図１１は、下接カメラへの伝送されるオブジェクトの一例を説明する図である。図１０において、図８のフローチャートと同一の処理を行うステップについては、図８に付した符号と同一の符号を付している。また、図１１における面の塗り分けは、図９における面の塗り分けと同じである。従って、これらの詳細な説明を省略する。 FIG. 10 is a flowchart for explaining an example of transmission adjustment processing executed by the three-dimensional information processing unit 6132. FIG. 11 is a diagram illustrating an example of an object transmitted to the underlying camera. 10, steps that perform the same processing as in the flowchart of FIG. 8 are denoted by the same reference numerals as those in FIG. Further, the surface coloring in FIG. 11 is the same as the surface coloring in FIG. Therefore, detailed description thereof will be omitted.

本実施形態では、前景位置判定部５００７は、以下の処理を行う。即ち、前景位置判定部５００７は、自カメラにより得られる前景画像の、上接カメラにより得られる前景画像とマッチングする範囲のうち、全共通被写範囲Ａの外側の範囲の画像を、画質を下げてから伝送すると決定する（Ｓ２００５）。ここで、画質を下げることは、画質調整部６１３３にて実行される。また、画質を下げることは、例えば、解像度を下げること、色階調数を下げること、コントラストを下げること、カラー画像をモノクロ化すること等により実現される。尚、Ｓ２００５でも図８のＳ１００５と同様、前景位置判定部５００７は、自カメラで得られる前景画像の、上接カメラで得られる前景画像とマッチングする範囲が、全共通被写範囲Ａの内側にない場合、当該範囲の画像を伝送すると決定する。下接カメラにおける前景画像の対応点のマッチング探索の精度を維持するためである。ただし、当該範囲の画像は、下接カメラよりも下流側にあるカメラ１１２ｄに対応するカメラアダプタ１２０には伝送されない。 In the present embodiment, the foreground position determination unit 5007 performs the following processing. In other words, the foreground position determination unit 5007 reduces the image quality of images in the range outside the all-common captured area A out of the range in which the foreground image obtained by the own camera matches the foreground image obtained by the top camera. Then, it is determined that the data is to be transmitted (S2005). Here, reducing the image quality is executed by the image quality adjustment unit 6133. In addition, lowering the image quality is realized by, for example, lowering the resolution, lowering the number of color gradations, lowering the contrast, and making the color image monochrome. In S2005 as well as S1005 in FIG. 8, the foreground position determination unit 5007 has a range in which the foreground image obtained by the own camera matches the foreground image obtained by the top camera within the all-common captured area A. If not, it is determined to transmit the image in the range. This is to maintain the matching search accuracy of the corresponding points of the foreground image in the lower camera. However, the image in the range is not transmitted to the camera adapter 120 corresponding to the camera 112d on the downstream side of the underlying camera.

本実施形態では、前景位置判定部５００７は、Ｓ２００５において、少なくとも２つの撮影装置における被写範囲の共通領域に被写体の領域がない場合、前記撮影画像に含まれる当該被写体の領域の画像を、当該画像のデータ量を少なくして伝送すると決定する。また、Ｓ２００５において、少なくとも２つの撮影装置における被写範囲の共通領域に被写体の領域がない場合、前記撮影画像に含まれる当該被写体の領域の画像の伝送先を制限すると決定される。 In the present embodiment, the foreground position determination unit 5007, in S2005, if there is no subject area in the common area of the subject range in at least two photographing devices, the foreground position determination unit 5007 It is determined to transmit with a reduced amount of image data. In S2005, when there is no subject area in the common area of the subject range in at least two photographing apparatuses, it is determined to limit the transmission destination of the image of the subject area included in the photographed image.

図７に示した前景画像に対して図１０のフローチャートによる処理を適用しながらフロントエンドサーバ２３０まで伝送することにより得られる前景画像を図１１に示す。ここでは、フロントエンドサーバ２３０が、１台目のカメラ１１２ａで得られる画像に対して、終端に位置するカメラ１１２ｈで得られる前景画像を用いて図１０のフローチャートによる処理を適用した結果を示す（図１１の上段の左側を参照）。図１１において、ドット模様で示す範囲は、画質を下げた上で伝送される範囲である。図１１に示すように、全共通被写範囲Ａの外側にあるが、対応点のマッチングがとれ、且つ、三次元座標の算出が可能なオブジェクトＴ、Ｕの前景画像については画質を下げて伝送される。 FIG. 11 shows a foreground image obtained by transmitting the foreground image shown in FIG. 7 to the front-end server 230 while applying the processing according to the flowchart of FIG. Here, the front end server 230 shows the result of applying the processing according to the flowchart of FIG. 10 to the image obtained by the first camera 112a using the foreground image obtained by the camera 112h located at the end ( (Refer to the upper left side of FIG. 11). In FIG. 11, the range indicated by the dot pattern is a range that is transmitted after the image quality is lowered. As shown in FIG. 11, the foreground images of the objects T and U that are outside the all common coverage area A but whose corresponding points can be matched and whose three-dimensional coordinates can be calculated are transmitted with reduced image quality. Is done.

一方、オブジェクトＴの、図４の上側および右側の面は、カメラ１１２ｇからしか撮影されない。また、オブジェクトＵの、図４の右側の面は、カメラ１１２ｆからしか撮影されない。このため、フロントエンドサーバ２３０は、これら面の三次元モデルを生成することが出来ない。即ち、フロントエンドサーバ２３０には、オブジェクトＴ、Ｕの略半分のみの低画質のモデルが存在することになる。そのような画像データは、３６０°自由なカメラワークに対応する完全な仮想視点データとは言えない。しかしながら、仮想視点コンテンツの生成において、例えば、前ボケや後ボケ（画像上でコンテンツの注目点となるメインオブジェクトの前後に存在する、大きくボケた画像）等を表現するためには利用可能である。メインオブジェクトの画質が保たれているなかで、その周囲のオブジェクトの画像がボケた画像であれば、（そのモデルの一部が欠落していても）視聴者に違和感を与えることや画質の低下を感じさせることを抑制することが出来る。 On the other hand, the upper and right surfaces of the object T in FIG. 4 are photographed only from the camera 112g. Also, the surface of the object U on the right side in FIG. 4 is photographed only from the camera 112f. For this reason, the front-end server 230 cannot generate a three-dimensional model of these surfaces. That is, the front-end server 230 has a low-quality model that is only about half of the objects T and U. Such image data cannot be said to be complete virtual viewpoint data corresponding to 360 ° free camera work. However, in the generation of virtual viewpoint content, it can be used to represent, for example, front blur and rear blur (largely blurred images existing before and after the main object that is the attention point of the content on the image) and the like. . If the image quality of the main object is maintained and the surrounding objects are blurred, the viewer may feel uncomfortable (even if a part of the model is missing) or the image quality will be degraded. Can be suppressed.

一方、図１１の上段において、被写オブジェクトＰ、Ｑの、図１１に示す面は非伝送となる。これら面は、それぞれ、カメラ１１２ａ、１１２ｂの１台にしか写りこんでいないため、非マッチング回数ｎが２となり、非伝送と決定される。これらの面は、フロントエンドサーバ２３０まで伝送したとしてもステレオマッチングによる三次元モデル化が出来ない。このため、伝送負荷の低減効果の向上のため、第１の実施形態と同様に非伝送とするのが好ましい。こうして、フロントエンドサーバ２３０には、図１１の下段に示すようなオブジェクトＲ、Ｓ、Ｔ、Ｕが伝送される。ただし、前述したようにオブジェクトＴ、Ｕは、画質が落とされて伝送される。 On the other hand, in the upper part of FIG. 11, the surfaces shown in FIG. Since these planes are reflected in only one of the cameras 112a and 112b, the non-matching count n is 2, and it is determined that the transmission is not performed. Even if these planes are transmitted to the front-end server 230, they cannot be three-dimensional modeled by stereo matching. For this reason, in order to improve the reduction effect of the transmission load, it is preferable not to transmit as in the first embodiment. Thus, the objects R, S, T, and U as shown in the lower part of FIG. However, as described above, the objects T and U are transmitted with reduced image quality.

以上のように本実施形態では、全共通被写範囲Ａの外側にあっても、マッチングがとれ、且つ、三次元座標の算出が可能なオブジェクトＴ、Ｕの前景画像については画質を下げて伝送する。従って、第１の実施形態で説明した効果に加え、伝送負荷を低減しながらも仮想視点コンテンツの表現の幅を広げることが可能になるという効果が得られる。 As described above, in the present embodiment, the foreground images of the objects T and U that can be matched and can calculate the three-dimensional coordinates even when they are outside the entire common coverage area A are transmitted with reduced image quality. To do. Therefore, in addition to the effects described in the first embodiment, there is an effect that it is possible to widen the expression range of the virtual viewpoint content while reducing the transmission load.

本実施形態では、画質を調整することにより伝送負荷を下げる場合を例に挙げて説明した（Ｓ２００５を参照）。しかしながら、伝送する画像のデータ量を低減することが出来れば、必ずしも画質を下げる必要はない。例えば、該当する前景画像の範囲のデータ圧縮・伸張部６１２１における圧縮率を上げることにより、伝送負荷を下げてもよい。また、該当する前景画像の範囲のみの伝送フレームレートを下げてもよい。また、これらの制御の少なくとも２つを組み合わせて伝送負荷を下げてもよい。 In the present embodiment, the case where the transmission load is reduced by adjusting the image quality has been described as an example (see S2005). However, if the data amount of the image to be transmitted can be reduced, it is not always necessary to lower the image quality. For example, the transmission load may be reduced by increasing the compression rate in the data compression / decompression unit 6121 in the range of the corresponding foreground image. Further, the transmission frame rate of only the range of the corresponding foreground image may be lowered. Further, the transmission load may be reduced by combining at least two of these controls.

また、第１、第２の実施形態では、全共通被写範囲Ａの内側にある前景画像の範囲を判定する場合を例に挙げて説明した。しかしながら、必ずしも、前景画像の位置の判定をこのようにして行う必要はない。例えば、三次元座標の算出の際の誤差の吸収のため、全共通被写範囲Ａを任意の割合で拡大した三次元空間の内側にある前景画像の範囲を判定しても良い。逆に、より限られた範囲の高画質の保持のため、或いは伝送負荷の削減効果を高めるため、全共通被写範囲Ａの一部の三次元空間の内側にある前景画像を判定しても良い。また、オブジェクトの三次元モデルに対して要求される完成率、即ち、オブジェクトの表面の何割の三次元モデルを生成するか、の設定に応じて、全数ではない一定数以上のカメラ１１２の共通被写範囲を算出しても良い。この場合、当該一定数以上のカメラ１１２の共通被写範囲の内側にある前景画像の範囲を判定することになる。更に、共通被写度（何台のカメラの共通被写範囲か）や、最も注目したい範囲からの距離に応じて、段階的に前景画像の画質調整や圧縮率を調整しても良い。 Further, in the first and second embodiments, the case where the range of the foreground image that is inside the all common subject range A is determined has been described as an example. However, it is not always necessary to determine the position of the foreground image in this way. For example, in order to absorb errors when calculating the three-dimensional coordinates, the range of the foreground image inside the three-dimensional space obtained by enlarging the all common object range A at an arbitrary ratio may be determined. On the other hand, in order to maintain high image quality in a more limited range or to increase the effect of reducing the transmission load, it is possible to determine a foreground image inside a part of the three-dimensional space of the entire common coverage range A. good. Further, depending on the setting of the completion rate required for the three-dimensional model of the object, that is, what percentage of the three-dimensional model of the surface of the object is to be generated, a common number of cameras 112 that are not the total number are common The coverage area may be calculated. In this case, the range of the foreground image that is inside the common field of view of the certain number of cameras 112 or more is determined. Furthermore, the image quality adjustment and the compression rate of the foreground image may be adjusted stepwise in accordance with the common image coverage (how many cameras are in common image coverage) and the distance from the most desired range.

尚、前述した実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することが出来る。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

（その他の実施例）
本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１０ａ〜１１０ｚ：センサシステム、１１２ａ〜１１２ｚ：カメラ、１２０ａ〜１２０ｚ：カメラアダプタ、１７０ａ〜１７０ｙ：ネットワーク 110a to 110z: sensor system, 112a to 112z: camera, 120a to 120z: camera adapter, 170a to 170y: network

Claims

Acquisition means for acquiring an image of a subject area included in the captured image;
An information processing system comprising: control means for controlling transmission of data including an image of a region of the subject included in the photographed image based on the position of the subject.

Further comprising specifying means for specifying the position of the subject in real space;
The acquisition means acquires images of a subject area included in a plurality of captured images captured from a plurality of directions,
The information processing system according to claim 1, wherein the specifying unit specifies a position of the subject in real space using images of the subject included in the plurality of captured images.

The specifying means derives the three-dimensional coordinates of the image of the subject area using the images of the subject included in the plurality of captured images, and based on the derived three-dimensional coordinates of the image of the subject area, The information processing system according to claim 2, wherein the position of the subject in real space is specified.

The control unit is configured to capture the image based on whether or not the subject is located in a common area of the field of view in at least two of the plurality of photographing units for photographing the scene from a plurality of directions. The information processing system according to claim 1, wherein transmission of data including an image of the subject area included in the image is controlled.

The control means is included in the photographed image when there is a subject area in the common area of the subject range of at least two photographing means among the plurality of photographing means for photographing the object scene from a plurality of directions. 5. The information processing system according to claim 4, wherein it is determined to transmit an image of the subject area.

The control means is included in the photographed image when there is no subject area in the common area of the subject area of at least two photographing means among the plurality of photographing means for photographing the object scene from a plurality of directions. 6. The information processing system according to claim 4, wherein it is determined not to transmit an image of the subject area.

The control means is included in the photographed image when there is no subject area in the common area of the subject area of at least two photographing means among the plurality of photographing means for photographing the object scene from a plurality of directions. 6. The information processing system according to claim 4, wherein the image of the subject area is determined to be transmitted with a reduced data amount of the image.

The information processing system according to claim 7, wherein reducing the data amount of the image is performed by changing at least one of an image quality, a frame rate, and a compression rate of the image.

The control means is included in the photographed image when there is no subject area in the common area of the subject area of at least two photographing means among the plurality of photographing means for photographing the object scene from a plurality of directions. 6. The information processing system according to claim 4, wherein it is determined to limit a transmission destination of an image of the subject area.

The control means is included in the photographed image when there is no subject area in the common area of the subject area of at least two photographing means among the plurality of photographing means for photographing the object scene from a plurality of directions. Either not transmitting the image of the subject area or restricting the transmission destination of the image of the subject area included in the captured image is selected according to the imaging unit that captured the captured image. The information processing system according to claim 4 or 5, wherein

The acquisition means acquires images of a subject area included in a plurality of captured images captured from a plurality of directions,
The control means controls transmission of data including the image of the subject area included in the captured image based on whether or not the image of the subject area included in the plurality of captured images is matched. The information processing system according to any one of claims 1 to 10, wherein the information processing system is characterized in that:

When the image of the subject area included in the photographed image does not match the image of the subject area included in the photographed image different from the photographed image, the update unit further updates the count value for the unmatched image. ,
The information processing system according to claim 11, wherein the control unit controls transmission of data including the image based on the count value for the non-matching image.

The information processing system according to claim 12, wherein the control unit determines that the image is not transmitted when the count value for the unmatched image is larger than a predetermined value.

The information processing system according to claim 12 or 13, wherein the control unit determines to transmit the image when the count value for the unmatched image is smaller than a predetermined value.

The information processing system according to claim 12, further comprising a plurality of information processing apparatuses each having the acquisition unit, the control unit, and the update unit.

The control means included in the information processing apparatus transmits data including an image of a subject area included in the photographed image to the information processing apparatus that is in the next order of the information processing apparatus in a predetermined order. Control
The acquisition unit included in the information processing apparatus is configured so that an image of the area of the subject included in a photographed image photographed by a photographing unit connected to the information processing apparatus and the information processing apparatus 1 in a predetermined order. An image of the area of the subject transmitted based on control by the control means of the information processing apparatus in the previous order, and
The update unit included in the information processing apparatus includes an image of the area of the subject included in a photographed image photographed by a photographing unit connected to the information processing apparatus, and one of the information processing apparatuses in a predetermined order. The count value for the image is updated when the image of the subject area transmitted based on the control by the control unit of the information processing apparatus in the previous order does not match. 15. The information processing system according to 15.

The information processing system according to claim 1, further comprising a plurality of information processing apparatuses each having the acquisition unit and the control unit.

The control means included in the information processing apparatus transmits data including an image of a subject area included in the photographed image to the information processing apparatus that is in the next order of the information processing apparatus in a predetermined order. Control
The acquisition unit included in the information processing apparatus is configured so that an image of the area of the subject included in a photographed image photographed by a photographing unit connected to the information processing apparatus and the information processing apparatus 1 in a predetermined order. 18. The information processing system according to claim 17, wherein an image of the area of the subject transmitted based on control by the control unit of the information processing apparatus in the previous order is acquired.

The control means included in the information processing apparatus has a subject area in a common area of a subject range in at least two photographing means among a plurality of photographing means for photographing a scene from a plurality of directions. The image of the subject area included in the photographed image photographed by the photographing means connected to the information processing apparatus is transferred to the information processing apparatus that is in the order after the information processing apparatus in a predetermined order. 19. The information processing system according to claim 16, wherein the information processing system is determined to be transmitted.

The information processing system according to claim 15, wherein the plurality of information processing apparatuses are daisy chain connected.

Setting means for setting a viewpoint for the subject;
Generation means for generating an image of the subject when viewed from the viewpoint set by the setting means, using images of the area of the subject included in the plurality of captured images taken from a plurality of directions; The information processing system according to any one of claims 1 to 20, wherein

An acquisition step of acquiring an image of a subject area included in the captured image;
And a control step of controlling transmission of data including an image of the region of the subject included in the captured image based on the position of the subject.

A program that causes the information processing system according to any one of claims 1 to 21 to function as a computer.

In a system for generating a virtual viewpoint image using photographed images photographed by a plurality of photographing devices, the device transmits a photographed image photographed by a first photographing device among the plurality of photographing devices,
Means for obtaining a photographed image photographed by the first photographing device;
Control means for restricting transmission of an image including the predetermined subject when the predetermined subject included in the captured image acquired by the acquisition means does not exist in the imaging range of the second imaging device among the plurality of imaging devices. And a device characterized by comprising:

The apparatus according to claim 24, wherein the control unit does not transmit an image including the predetermined subject.

The control means transmits an image including the predetermined subject to another device that transmits a photographed image photographed by a third photographing device among the plurality of photographing devices, and generates the virtual viewpoint image. 25. The device of claim 24, wherein the device is not transmitted to the device.

25. The apparatus according to claim 24, wherein the control means transmits the image including the predetermined subject with a reduced image quality.