JP2019140483A

JP2019140483A - Image processing system, image processing system control method, transmission device, transmission method, and program

Info

Publication number: JP2019140483A
Application number: JP2018020968A
Authority: JP
Inventors: 洋大藤; Hiroshi Ofuji
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2019-08-22

Abstract

To enhance foreground reproducibility in a virtual viewpoint image even in a situation where the band of a transmission line is limited.SOLUTION: An image processing system which transmits an image taken by plural cameras via a predetermined transmission line and generates a virtual viewpoint image comprises: means which derives the number of cameras taking a predetermined object; and means which, in response to the number of the cameras taking the predetermined object, sets the quality of a front ground including the predetermined object.SELECTED DRAWING: Figure 12

Description

本発明は、被写体を複数の方向から撮影するための複数のカメラを備えた画像処理システムに関する。 The present invention relates to an image processing system including a plurality of cameras for photographing a subject from a plurality of directions.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、その撮影により得られた複数視点画像を用いて仮想視点コンテンツを生成する技術が注目されている。この複数視点画像から仮想視点コンテンツを生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することを可能とするため、ユーザに高臨場感を与えることができる。 In recent years, attention has been paid to a technique for installing a plurality of cameras at different positions, performing synchronous shooting from multiple viewpoints, and generating virtual viewpoint content using a plurality of viewpoint images obtained by the shooting. According to the technology for generating virtual viewpoint content from the plurality of viewpoint images, for example, it is possible to view a highlight scene of soccer or basketball from various angles, so that a high sense of reality can be given to the user.

一方で、カメラで取得した画像（映像）をネットワークで伝送する際には、その映像の符号量がネットワークの帯域を超過しないように制御する必要がある。ここで、特許文献１には、複数の監視カメラを同一のネットワークに接続し、映像を伝送する際、複数の監視カメラ映像の総符号量が所定の帯域（容量）を超過しないように、フレームレートを制御する方法が開示されている。また、特許文献２には、複数の監視カメラを同一のネットワークに接続し、映像を伝送する際、目標被写体等の重要領域を含む映像信号（即ち、重要領域を捉えるカメラ）の優先度を高く設定することで、フレームレートを制御する方法が開示されている。 On the other hand, when an image (video) acquired by a camera is transmitted over a network, it is necessary to control the code amount of the video so as not to exceed the network bandwidth. Here, in Patent Document 1, when a plurality of surveillance cameras are connected to the same network and video is transmitted, a frame is set so that the total code amount of the plurality of surveillance camera videos does not exceed a predetermined band (capacity). A method for controlling the rate is disclosed. In Patent Document 2, when a plurality of surveillance cameras are connected to the same network and video is transmitted, a video signal including an important area such as a target subject (that is, a camera that captures the important area) is given higher priority. A method of controlling the frame rate by setting is disclosed.

特開２００４−３２６８０号公報JP 2004-32680 A 特開２００７−２２１３６７号公報JP 2007-221367 A

また、他方で、仮想視点コンテンツの画質において、重要な要素の一つに、前景（例えば、人物等）の再現性がある。この前景の再現性に関して、如何に異なる位置にある多数のカメラで、その被写体を捉えられているかが重要になる。これは、仮想視点コンテンツを生成する際に、仮想視点近傍に位置するカメラの映像を主として用いることに関係しており、そのカメラの位置が仮想視点に近ければ近いほど、再現性の高い前景が得られるためである。 On the other hand, one of the important factors in the image quality of virtual viewpoint content is the reproducibility of the foreground (for example, a person). Regarding foreground reproducibility, it is important how many cameras at different positions can capture the subject. This is related to mainly using video from a camera located near the virtual viewpoint when generating the virtual viewpoint content. The closer the camera position is to the virtual viewpoint, the more reproducible foreground is. It is because it is obtained.

そして、この前景の再現性を踏まえると、特許文献１に記載の方法では、同一ネットワーク上のカメラのフレームレートを一律で制御するため、送信しないと判定されたフレームに関して、仮想視点コンテンツの生成に用いることができなくなる。また、特許文献２に記載の方法は、上述のように、重要領域を捉えるカメラの優先度を高く設定するが、一つの注視点を囲んで撮影するような場合においては、重要領域を捉えるカメラが多数となるため、優先度の設定が難しくなる。そのため、特許文献１に記載の方法と特許文献２に記載の方法のいずれの方法においても、再現性の高い前景を取得することができない可能性がある。 In view of the foreground reproducibility, the method described in Patent Document 1 uniformly controls the frame rate of cameras on the same network, so that virtual viewpoint content can be generated for frames determined not to be transmitted. It cannot be used. In addition, as described above, the method described in Patent Document 2 sets a higher priority for a camera that captures an important area. However, in the case of shooting around a single gazing point, the camera that captures the important area. Since there are many, it becomes difficult to set priority. For this reason, there is a possibility that a foreground with high reproducibility cannot be acquired by either of the method described in Patent Document 1 and the method described in Patent Document 2.

本発明は、上記従来の問題に鑑みてなされたものであって、伝送路の帯域に制限がある状況であっても、仮想視点画像において、前景の再現性を高めることを目的とする。 The present invention has been made in view of the above-described conventional problems, and it is an object of the present invention to improve the foreground reproducibility in a virtual viewpoint image even in a situation where the bandwidth of the transmission path is limited.

上記目的を達成するために、本発明は、複数のカメラにより撮影された画像を所定の伝送路で伝送し、仮想視点画像を生成する画像処理システムにおいて、所定のオブジェクトを撮影しているカメラの台数を導出するカメラ台数導出手段と、前記所定のオブジェクトを撮影しているカメラの台数に応じて、前記所定のオブジェクトを含む前景画像の画質を設定する画質設定手段と、を備えることを特徴とする。 In order to achieve the above object, the present invention provides an image processing system that transmits images taken by a plurality of cameras via a predetermined transmission path and generates a virtual viewpoint image. Camera number deriving means for deriving the number, and image quality setting means for setting the image quality of a foreground image including the predetermined object in accordance with the number of cameras shooting the predetermined object. To do.

本発明によれば、伝送路の帯域に制限がある状況であっても、仮想視点画像において、前景の再現性を高めることができる。 According to the present invention, foreground reproducibility can be improved in a virtual viewpoint image even in a situation where the bandwidth of the transmission path is limited.

画像処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of an image processing system. カメラアダプタの機能ブロック図である。It is a functional block diagram of a camera adapter. 画像処理部の機能ブロック図である。It is a functional block diagram of an image processing part. フロントエンドサーバの機能ブロック図である。It is a functional block diagram of a front end server. データ入力制御部の機能ブロック図である。It is a functional block diagram of a data input control part. データベースの機能ブロック図である。It is a functional block diagram of a database. バックエンドサーバの機能ブロック図である。It is a functional block diagram of a back end server. 前景画像と背景画像を説明するための概念図である。It is a conceptual diagram for demonstrating a foreground image and a background image. 符号量制御処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of code amount control processing. 視野重複マップを説明するための図である。It is a figure for demonstrating a visual field duplication map. 量子化パラメータ、圧縮率、視野重複度の関係を説明するための図である。It is a figure for demonstrating the relationship between a quantization parameter, a compression rate, and a visual field overlap degree. 初期フレームの量子化パラメータを決定する処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process which determines the quantization parameter of an initial frame. 符号量制御処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of code amount control processing. 量子化パラメータと視野重複度の関係を説明するための図である。It is a figure for demonstrating the relationship between a quantization parameter and a visual field overlap degree.

（第１の実施形態）
図１は、本実施形態に係る画像処理システムの構成を示すブロック図である。画像処理システム１００は、例えば、競技場（スタジアム）やコンサートホール等の施設に、複数のカメラ及びマイクを設置することで、撮影及び集音を行うシステムである。画像処理システム１００は、センサシステム１１０ａ−１１０ｚ、画像コンピューティングサーバ２００、コントローラ３００、スィッチングハブ１８０、及びエンドユーザ端末１９０を有する。 (First embodiment)
FIG. 1 is a block diagram illustrating a configuration of an image processing system according to the present embodiment. The image processing system 100 is a system that performs shooting and sound collection by installing a plurality of cameras and microphones in facilities such as a stadium (stadium) and a concert hall. The image processing system 100 includes sensor systems 110a to 110z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190.

コントローラ３００は、制御ステーション３１０と仮想カメラ操作ＵＩ（User Interface）３３０を有する。制御ステーション３１０は、画像処理システム１００を構成する各々のブロックに対してネットワーク３１０ａ−３１０ｃ、１８０ａ、及び１８０ｂ、並びにデイジーチェーン１７０ａ−１７０ｙを通じて、動作状態の管理及びパラメータ設定制御等を行う。なお、これらのネットワークは、Ｅｔｈｅｒｎｅｔ（登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサーネット）や１０ＧｂＥでもよいし、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用イーサーネット等を組み合せて構成されてもよい。また、これらに限定されず、他の種別のネットワークであってもよい。 The controller 300 includes a control station 310 and a virtual camera operation UI (User Interface) 330. The control station 310 performs operation state management, parameter setting control, and the like through the networks 310a to 310c, 180a, and 180b, and the daisy chain 170a to 170y for each block constituting the image processing system 100. Note that these networks may be Ethernet (registered trademark) GbE (Gigabit Ethernet) or 10 GbE conforming to the IEEE standard, or may be configured by combining interconnect Infiniband, industrial Ethernet, and the like. Moreover, it is not limited to these, Other types of networks may be sufficient.

先ず、センサシステム１１０ａ−１１０ｚの２６セットの画像及び音声をセンサシステム１１０ｚから画像コンピューティングサーバ２００に送信する処理について説明する。なお、本実施形態において、特別な説明がない場合は、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットのシステムを区別せず、センサシステム１１０と記載する。同様に、各センサシステム１１０内の装置についても、特別な説明がない場合は区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１２０と記載する。また、本実施形態では、センサシステムの台数を２６セットとして説明するが、あくまでも一例であり、センサシステムの台数は、必ずしもこれに限定されない。 First, a process of transmitting 26 sets of images and sounds of the sensor systems 110a to 110z from the sensor system 110z to the image computing server 200 will be described. In addition, in this embodiment, when there is no special description, 26 sets of systems from the sensor system 110a to the sensor system 110z are not distinguished, and are described as the sensor system 110. Similarly, the devices in each sensor system 110 are also referred to as a microphone 111, a camera 112, a pan head 113, an external sensor 114, and a camera adapter 120 unless otherwise specified. In the present embodiment, the number of sensor systems is described as 26 sets. However, this is just an example, and the number of sensor systems is not necessarily limited to this.

さらに、本実施形態において、特に断りがない限り、画像という用語には、静止画と動画の概念が含まれるものとする。即ち、本実施形態の画像処理システム１００では、静止画及び動画のいずれについても処理対象とすることができる。加えて、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツとして、仮想視点画像と仮想視点音声が含まれる例を中心に説明するが、画像処理システム１００により提供される仮想視点コンテンツは、必ずしもこれに限定されない。したがって、例えば、仮想視点コンテンツに音声が含まれていなくてもよい。また、例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であってもよい。その他、本実施形態では、説明の簡略化のため、部分的に音声に関する記載を省略しているが、基本的に画像と音声は共に処理されるものとする。 Furthermore, in this embodiment, the term “image” includes the concept of a still image and a moving image unless otherwise specified. That is, in the image processing system 100 of the present embodiment, both still images and moving images can be processed. In addition, in this embodiment, an example in which a virtual viewpoint image and a virtual viewpoint sound are included as the virtual viewpoint content provided by the image processing system 100 will be mainly described. However, the virtual viewpoint content provided by the image processing system 100 is described. Is not necessarily limited to this. Therefore, for example, audio may not be included in the virtual viewpoint content. For example, the sound included in the virtual viewpoint content may be a sound collected by a microphone closest to the virtual viewpoint. In addition, in this embodiment, for simplification of explanation, description about sound is partially omitted, but both image and sound are basically processed.

図１に示されるように、本実施形態の画像処理システム１００において、センサシステム１１０は、各々１台ずつのカメラ１１２を有する。即ち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。 As shown in FIG. 1, in the image processing system 100 of the present embodiment, each sensor system 110 has one camera 112. That is, the image processing system 100 includes a plurality of cameras for photographing the subject from a plurality of directions.

複数のセンサシステム１１０（即ち、センサシステム１１０ａ−１１０ｚ）は、デイジーチェーンにより接続される。このように、デイジーチェーンにより接続することで、撮影画像の４Ｋや８Ｋ等への高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数を削減することや配線作業を省力化することができる。 The plurality of sensor systems 110 (i.e., sensor systems 110a-110z) are connected by a daisy chain. In this way, by connecting through a daisy chain, the number of connected cables can be reduced and wiring work can be achieved in the increase in the capacity of image data due to the increase in resolution and frame rate of captured images to 4K, 8K, etc. It can save labor.

なお、センサシステム１１０を接続する形態としては、必ずしもこれに限定されない。したがって、例えば、各々のセンサシステム１１０ａ−１１０ｚをスィッチングハブ１８０に接続し、スィッチングハブ１８０を経由させて、センサシステム１１０間のデータ通信を行うスター型のネットワーク構成としてもよい。 In addition, as a form which connects the sensor system 110, it is not necessarily limited to this. Therefore, for example, each sensor system 110a-110z may be connected to the switching hub 180, and a star-type network configuration that performs data communication between the sensor systems 110 via the switching hub 180 may be employed.

また、図１では、センサシステム１１０ａ−１１０ｚの全てが、デイジーチェーンで接続されるように、カスケード接続される構成を示したが、必ずしもこれに限定されない。したがって、例えば、センサシステム１１０をいくつかのグループに分割して、その分割したグループ単位でデイジーチェーン接続してもよい。その場合、分割したグループの終端となるカメラアダプタ１２０が、スィッチングハブ１８０に接続され、画像コンピューティングサーバ２００に画像を入力するようにしてもよい。 Further, in FIG. 1, the configuration in which all of the sensor systems 110a to 110z are cascade-connected so as to be connected in a daisy chain is shown, but the configuration is not necessarily limited thereto. Thus, for example, the sensor system 110 may be divided into several groups and daisy chain connected in units of the divided groups. In that case, the camera adapter 120 that is the end of the divided group may be connected to the switching hub 180 and input an image to the image computing server 200.

このような構成は、スタジアムにおいて、特に有効である。例えば、スタジアムが複数階で構成され、フロア毎にセンサシステム１１０を配備する場合を想定する。この場合、センサシステム１１０を上述の構成とすることで、フロア毎、或いはスタジアムの半周毎に画像コンピューティングサーバ２００への入力を行うことができる。このように、センサシステム１１０の全てを１つのデイジーチェーンで接続する配線が困難な場所であっても、設置の簡便化及びシステムの柔軟化を図ることができる。 Such a configuration is particularly effective in a stadium. For example, it is assumed that the stadium is composed of a plurality of floors and the sensor system 110 is provided for each floor. In this case, the sensor system 110 having the above-described configuration allows input to the image computing server 200 for each floor or for each half of the stadium. Thus, even in a place where wiring for connecting all the sensor systems 110 with one daisy chain is difficult, the installation can be simplified and the system can be made flexible.

また、デイジーチェーンで接続された構成において、画像コンピューティングサーバ２００に画像を出力するカメラアダプタ１２０が１つであるか、２つ以上であるかに応じて、画像コンピューティングサーバ２００での画像処理の制御を切り替える。即ち、画像処理システム１００では、センサシステム１１０が複数のグループに分割されているかどうかに応じて、画像コンピューティングサーバ２００での画像処理の制御を切り替える。 Further, in a configuration connected in a daisy chain, image processing in the image computing server 200 depends on whether the number of camera adapters 120 that output an image to the image computing server 200 is one or two or more. Switch the control. That is, in the image processing system 100, the image processing control in the image computing server 200 is switched depending on whether the sensor system 110 is divided into a plurality of groups.

ここで、画像を出力するカメラアダプタ１２０が１つの場合は、デイジーチェーン接続で画像を伝送しながら競技場の全周画像を生成するため、画像コンピューティングサーバ２００において、全周の画像データが揃うタイミングは、同期が取れている。即ち、センサシステム１１０がグループに分割されていなければ、同期を取ることができる。しかし、画像を出力するカメラアダプタ１２０が複数になる場合（即ち、センサシステム１１０が複数のグループに分割される場合）、各々のデイジーチェーンのレーン（経路）によって遅延が異なることが想定される。そこで、センサシステム１１０を複数のグループに分割する場合には、画像コンピューティングサーバ２００において全周の画像データが揃うまで待機させることで同期を取る同期制御によって、画像データの集結を確認しながら、後段の画像処理を実行する。 Here, when there is one camera adapter 120 that outputs an image, the entire image of the stadium is generated while transmitting the image through a daisy chain connection. Timing is synchronized. That is, if the sensor system 110 is not divided into groups, synchronization can be achieved. However, when there are a plurality of camera adapters 120 that output images (that is, when the sensor system 110 is divided into a plurality of groups), it is assumed that the delay differs depending on the lane (path) of each daisy chain. Therefore, when the sensor system 110 is divided into a plurality of groups, the image computing server 200 waits until the image data of the entire circumference is gathered, and the synchronization control is performed to confirm the collection of the image data. The subsequent image processing is executed.

センサシステム１１０ａは、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、外部センサ１１４ａ、及びカメラアダプタ１２０ａを有する。なお、センサシステム１１０ａの構成は、必ずしも、この構成に限定されるものではなく、少なくとも１台のカメラアダプタ１２０ａと、１台のカメラ１１２ａ又は１台のマイク１１１ａを有していればよい。また、例えば、センサシステム１１０ａに関して、１台のカメラアダプタ１２０ａと複数のカメラ１１２ａを有するように構成させてもよいし、複数のカメラアダプタ１２０ａと１台のカメラ１１２ａを有するように構成させてもよい。即ち、画像処理システム１００内の複数のカメラ１１２と複数のカメラアダプタ１２０は、Ｎ対Ｍ（ＮとＭは共に１以上の整数）で構成される。 The sensor system 110a includes a microphone 111a, a camera 112a, a pan head 113a, an external sensor 114a, and a camera adapter 120a. Note that the configuration of the sensor system 110a is not necessarily limited to this configuration, and it is sufficient that the sensor system 110a has at least one camera adapter 120a and one camera 112a or one microphone 111a. Further, for example, the sensor system 110a may be configured to have one camera adapter 120a and a plurality of cameras 112a, or may be configured to have a plurality of camera adapters 120a and one camera 112a. Good. That is, the plurality of cameras 112 and the plurality of camera adapters 120 in the image processing system 100 are configured by N to M (N and M are both integers of 1 or more).

さらに、センサシステム１１０ａは、マイク１１１ａ、カメラ１１２ａ、雲台１１３ａ、及びカメラアダプタ１２０ａ以外の装置を備えていてもよい。また、カメラ１１２ａとカメラアダプタ１２０ａが一体となって構成されてもよい。加えて、カメラアダプタ１２０の機能の少なくとも一部を、フロントエンドサーバ２３０が有するようにしてもよい。 Furthermore, the sensor system 110a may include devices other than the microphone 111a, the camera 112a, the pan head 113a, and the camera adapter 120a. The camera 112a and the camera adapter 120a may be integrated. In addition, the front end server 230 may have at least a part of the functions of the camera adapter 120.

以上、センサシステム１１０ａについて説明したが、本実施形態では、センサシステム１１０ｂ−１１０ｚについても、センサシステム１１０ａと同様の構成であることから、その説明を省略する。なお、センサシステム１１０ｂ−１１０ｚの構成は、センサシステム１１０ａと同じ構成に限定されるものではなく、各々のセンサシステム１１０がセンサシステム１１０ａと異なる構成を有してもよい。 The sensor system 110a has been described above. In the present embodiment, the sensor systems 110b to 110z have the same configuration as that of the sensor system 110a, and thus the description thereof is omitted. The configuration of the sensor systems 110b to 110z is not limited to the same configuration as the sensor system 110a, and each sensor system 110 may have a configuration different from the sensor system 110a.

センサシステム１１０ａは、上述の構成において、マイク１１１ａにより集音した音声と、カメラ１１２ａにより撮影した画像に対して、カメラアダプタ１２０ａにおいて後述の画像処理を施す。センサシステム１１０ａは、その画像処理を施すと、デイジーチェーン１７０ａを介して、センサシステム１１０ｂのカメラアダプタ１２０ｂに伝送する。同様に、センサシステム１１０ｂは、集音した音声と撮影した画像に対して画像処理を施し、センサシステム１１０ａから取得した画像及び音声と併せて、センサシステム１１０ｃに伝送する。 In the above-described configuration, the sensor system 110a performs image processing described later on the camera adapter 120a on the sound collected by the microphone 111a and the image captured by the camera 112a. When the image processing is performed, the sensor system 110a transmits it to the camera adapter 120b of the sensor system 110b via the daisy chain 170a. Similarly, the sensor system 110b performs image processing on the collected sound and the captured image, and transmits them together with the image and sound acquired from the sensor system 110a to the sensor system 110c.

これらの動作を順に繰り返すことで、センサシステム１１０ａ−１１０ｚにより取得された画像及び音声は、センサシステム１１０ｚから１８０ｂを介して、スィッチングハブ１８０に伝送され、さらに、画像コンピューティングサーバ２００に伝送される。 By repeating these operations in order, the images and sounds acquired by the sensor systems 110a to 110z are transmitted to the switching hub 180 via the sensor systems 110z to 180b and further to the image computing server 200. .

なお、本実施形態では、カメラ１１２ａ−１１２ｚとカメラアダプタ１２０ａ−１２０ｚを分離して構成させているが、同一筺体で一体化してもよい。その場合、マイク１１１ａ−１１１ｚを一体化したカメラ１１２に内蔵してもよいし、カメラ１１２の外部に接続してもよい。 In the present embodiment, the cameras 112a-112z and the camera adapters 120a-120z are separated from each other, but may be integrated in the same casing. In that case, the microphones 111a to 111z may be incorporated in the integrated camera 112 or connected to the outside of the camera 112.

次に、画像コンピューティングサーバ２００の構成及び動作について説明する。本実施形態の画像コンピューティングサーバ２００は、センサシステム１１０ｚから取得したデータの処理を行う。画像コンピューティングサーバ２００は、フロントエンドサーバ２３０、データベース２５０（以下、ＤＢとも記載する）、バックエンドサーバ２７０、及びタイムサーバ２９０を有する。 Next, the configuration and operation of the image computing server 200 will be described. The image computing server 200 of this embodiment processes data acquired from the sensor system 110z. The image computing server 200 includes a front-end server 230, a database 250 (hereinafter also referred to as DB), a back-end server 270, and a time server 290.

タイムサーバ２９０は、時刻及び同期信号を配信する機能を有し、スィッチングハブ１８０を介してセンサシステム１１０ａ−１１０ｚに時刻及び同期信号を配信する。時刻及び同期信号を受信したカメラアダプタ１２０ａ−１２０ｚは、時刻及び同期信号に基づいて、カメラ１１２ａ−１１２ｚをＧｅｎｌｏｃｋさせ、画像（フレーム）の同期を行う。即ち、タイムサーバ２９０は、複数のカメラ１１２の撮影タイミングを同期させる。これにより、画像処理システム１００は、同じタイミングで撮影された複数の画像（以下、撮影画像とも記載する）に基づいて仮想視点画像を生成できるので、撮影タイミングのずれによる仮想視点画像の品質低下を抑制することができる。なお、本実施形態では、タイムサーバ２９０が複数のカメラ１１２の時刻同期を管理する仕様としているが、必ずしもこれに限定されることなく、時刻同期に関する処理を各カメラ１１２又は各カメラアダプタ１２０が独立して実行してもよい。 The time server 290 has a function of distributing the time and the synchronization signal, and distributes the time and the synchronization signal to the sensor systems 110a to 110z via the switching hub 180. The camera adapters 120a to 120z that have received the time and the synchronization signal make the cameras 112a to 112z Genlock based on the time and the synchronization signal, and synchronize images (frames). That is, the time server 290 synchronizes the shooting timings of the plurality of cameras 112. As a result, the image processing system 100 can generate a virtual viewpoint image based on a plurality of images (hereinafter also referred to as “captured images”) captured at the same timing. Can be suppressed. In this embodiment, the time server 290 manages the time synchronization of the plurality of cameras 112. However, the present invention is not necessarily limited to this, and each camera 112 or each camera adapter 120 performs processing related to time synchronization independently. May be executed.

フロントエンドサーバ２３０は、センサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換する。そして、カメラの識別子、データ種別、及びフレーム番号に応じて、データベース２５０にデータを書き込む。 The front-end server 230 reconstructs a segmented transmission packet from the image and sound acquired from the sensor system 110z and converts the data format. Then, data is written in the database 250 according to the camera identifier, data type, and frame number.

データベース２５０は、フロントエンドサーバ２３０から受信したデータへのアクセスが可能になるようにデータベーステーブルを作成する。また、バックエンドサーバ２７０から要求されたデータが、キャッシュ、一次ストレージ、又は二次ストレージのいずれかに保存されているかを判定し、保存先からデータを読み出してバックエンドサーバ２７０に送信する。 The database 250 creates a database table so that the data received from the front end server 230 can be accessed. Further, it is determined whether the data requested from the back-end server 270 is stored in the cache, the primary storage, or the secondary storage, and the data is read from the storage destination and transmitted to the back-end server 270.

バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から視点の指定を受け付け、その受け付けた視点に基づいて、データベース２５０から対応する画像及び音声データを読み出し、レンダリング処理を行って仮想視点画像を生成する。 The back-end server 270 receives the designation of the viewpoint from the virtual camera operation UI 330, reads the corresponding image and audio data from the database 250 based on the received viewpoint, performs rendering processing, and generates a virtual viewpoint image.

なお、画像コンピューティングサーバ２００の構成は、必ずしもこれに限定されない。したがって、例えば、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０のうち、少なくとも２つが一体となって構成されてもよい。また、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０の少なくともいずれかが複数含まれて構成されてもよい。さらに、画像コンピューティングサーバ２００内の任意の位置に、上述の装置以外の装置が含まれて構成されてもよい。その他、画像コンピューティングサーバ２００の機能の少なくとも一部を、エンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有するように構成させてもよい。 Note that the configuration of the image computing server 200 is not necessarily limited to this. Therefore, for example, at least two of the front end server 230, the database 250, and the back end server 270 may be configured integrally. Further, a plurality of at least one of the front-end server 230, the database 250, and the back-end server 270 may be included. Furthermore, an apparatus other than the above-described apparatuses may be included in any position in the image computing server 200. In addition, at least part of the functions of the image computing server 200 may be configured to be included in the end user terminal 190 and the virtual camera operation UI 330.

レンダリング処理された画像は、バックエンドサーバ２７０からエンドユーザ端末１９０に送信され、ユーザは、エンドユーザ端末１９０を操作することで、視点の指定に応じた画像閲覧及び音声視聴を行うことができる。即ち、バックエンドサーバ２７０は、複数のカメラ１１２により撮影された撮影画像（複数視点画像）と視点情報に基づいて、仮想視点コンテンツを生成する。より具体的には、バックエンドサーバ２７０は、例えば、複数のカメラアダプタ１２０により複数のカメラ１１２による撮影画像から抽出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ２７０は、生成した仮想視点コンテンツをエンドユーザ端末１９０に提供する。なお、カメラアダプタ１２０による所定領域の画像データを抽出する処理の詳細については、後述することで説明を補足する。 The rendered image is transmitted from the back-end server 270 to the end user terminal 190, and the user can operate the end user terminal 190 to perform image browsing and audio viewing according to the viewpoint designation. That is, the back-end server 270 generates virtual viewpoint content based on captured images (multi-viewpoint images) captured by a plurality of cameras 112 and viewpoint information. More specifically, the back-end server 270 performs, for example, virtual image processing based on image data of a predetermined area extracted from images captured by the plurality of cameras 112 by the plurality of camera adapters 120 and a viewpoint designated by a user operation. Generate viewpoint content. Then, the back end server 270 provides the generated virtual viewpoint content to the end user terminal 190. The details of the process of extracting the image data of the predetermined area by the camera adapter 120 will be described later and will be supplemented.

なお、本実施形態において、仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に取得される画像としての仮想視点画像（即ち、指定された視点における見えを表す画像）を含むものとする。また、仮想的な視点（仮想視点）は、ユーザにより指定されてもよいし、画像解析の結果等に基づいて、自動的に指定されてもよい。即ち、仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。加えて、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。 In the present embodiment, the virtual viewpoint content includes a virtual viewpoint image (that is, an image representing appearance at a specified viewpoint) as an image acquired when a subject is photographed from a virtual viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on the result of the image analysis. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. In addition, an image corresponding to the viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the apparatus are also included in the virtual viewpoint image.

補足として、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる例を中心に説明するが、必ずしも音声データが含まれていなくてもよい。さらに、バックエンドサーバ２７０は、仮想視点画像をＨ．２６４やＨＥＶＣに代表される標準技術により圧縮符号化した上で、ＭＰＥＧ−ＤＡＳＨプロトコルを用いてエンドユーザ端末１９０に送信してもよい。また、仮想視点画像を、非圧縮でエンドユーザ端末１９０に送信してもよい。なお、この場合、圧縮符号化を行う前者はエンドユーザ端末１９０としてスマートフォンやタブレットを想定しており、後者は非圧縮画像を表示可能なディスプレイを想定している。即ち、エンドユーザ端末１９０の種別に応じて、画像フォーマットを切り替えることができる。その他、画像の送信プロトコルは、ＭＰＥＧ−ＤＡＳＨに限らず、例えば、ＨＬＳ（HTTP Live Streaming）やそれ以外の送信方法を用いることもできる。 As a supplement, in the present embodiment, an example in which audio data (audio data) is included in the virtual viewpoint content will be mainly described. However, audio data may not necessarily be included. Further, the back-end server 270 converts the virtual viewpoint image to H.264. The data may be compressed and encoded by a standard technique typified by H.264 or HEVC and then transmitted to the end user terminal 190 using the MPEG-DASH protocol. Further, the virtual viewpoint image may be transmitted to the end user terminal 190 without being compressed. In this case, the former performing compression encoding assumes a smartphone or a tablet as the end user terminal 190, and the latter assumes a display capable of displaying an uncompressed image. That is, the image format can be switched according to the type of the end user terminal 190. In addition, the image transmission protocol is not limited to MPEG-DASH, and for example, HLS (HTTP Live Streaming) or other transmission methods may be used.

このように、画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインはセンサシステム１１０−１１０ｚを、データ保存ドメインはデータベース２５０、フロントエンドサーバ２３０及びバックエンドサーバ２７０を、映像生成ドメインは仮想カメラ操作ＵＩ３３０及びエンドユーザ端末１９０を備える。 As described above, the image processing system 100 has three functional domains: a video collection domain, a data storage domain, and a video generation domain. The video collection domain includes a sensor system 110-110z, the data storage domain includes a database 250, a front-end server 230, and a back-end server 270, and the video generation domain includes a virtual camera operation UI 330 and an end user terminal 190.

なお、画像処理システムの構成は、必ずしも本構成に限定されず、例えば、仮想カメラ操作ＵＩ３３０が直接、センサシステム１１０ａ−１１０ｚから画像を取得する構成とすることもできる。但し、本実施形態では、上述のように、センサシステム１１０ａ−１１０ｚから直接、画像を取得する構成ではなく、データ保存機能を中間に配置する構成としている。具体的には、フロントエンドサーバ２３０が、センサシステム１１０ａ−１１０ｚにより生成された画像データ、音声データ及びそれらのメタ情報を、データベース２５０の共通スキーマ及びデータ型に変換する構成としている。これにより、センサシステム１１０ａ−１１０ｚのカメラ１１２が他機種のカメラに変更されても、変更された差分をフロントエンドサーバ２３０が吸収し、データベース２５０に登録することができる。即ち、これにより、カメラ１１２が他機種カメラに変更された場合であっても、仮想カメラ操作ＵＩ３３０が適切に動作しない虞を低減することができる。 Note that the configuration of the image processing system is not necessarily limited to this configuration. For example, the virtual camera operation UI 330 may directly acquire an image from the sensor systems 110a to 110z. However, in the present embodiment, as described above, the configuration is such that the data storage function is arranged in the middle rather than a configuration in which an image is directly acquired from the sensor systems 110a to 110z. Specifically, the front-end server 230 is configured to convert image data, audio data, and meta information thereof generated by the sensor systems 110a to 110z into a common schema and data type of the database 250. Thereby, even if the camera 112 of the sensor systems 110a to 110z is changed to a camera of another model, the changed difference can be absorbed by the front end server 230 and registered in the database 250. That is, this can reduce the possibility that the virtual camera operation UI 330 does not operate properly even when the camera 112 is changed to another model camera.

また、仮想カメラ操作ＵＩ３３０に関して、直接データベース２５０にアクセスせずに、バックエンドサーバ２７０を介してデータベース２５０にアクセスする構成としている。即ち、画像生成処理に関する共通処理をバックエンドサーバ２７０で実行し、操作ＵＩに関するアプリケーションの差分を仮想カメラ操作ＵＩ３３０で実行している。 Further, the virtual camera operation UI 330 is configured to access the database 250 via the back-end server 270 without directly accessing the database 250. That is, common processing related to image generation processing is executed by the back-end server 270, and application differences related to the operation UI are executed by the virtual camera operation UI 330.

このような構成とすることで、仮想カメラ操作ＵＩ３３０の開発において、ＵＩ操作デバイスや、生成したい仮想視点画像を操作するＵＩの機能要求に関する開発に注力することができる。また、バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０の要求に応じて、画像生成処理に関する共通処理を追加又は削除することも可能である。このように、仮想カメラ操作ＵＩ３３０の要求に対して、柔軟に対応することができる。 With such a configuration, in the development of the virtual camera operation UI 330, it is possible to focus on development related to a UI operation device and a UI function request for operating a virtual viewpoint image to be generated. Further, the back-end server 270 can add or delete common processing related to image generation processing in response to a request from the virtual camera operation UI 330. In this way, it is possible to flexibly respond to the request for the virtual camera operation UI 330.

以上のように、画像処理システム１００において、カメラ１１２（即ち、カメラ１１２ａ−１１２ｚ）により被写体を複数の方向から撮影した画像データに基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。なお、本実施形態における画像処理システム１００に関して、上述では物理的な構成として説明したが、その構成は必ずしも物理的である必要はなく、論理的に構成されてもよい。 As described above, in the image processing system 100, the virtual viewpoint image is generated by the back-end server 270 based on the image data obtained by photographing the subject from a plurality of directions by the camera 112 (that is, the cameras 112a to 112z). The image processing system 100 according to the present embodiment has been described as a physical configuration in the above description, but the configuration does not necessarily need to be physical, and may be logically configured.

次に、図１の画像処理システム１００における各ノード（カメラアダプタ１２０、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、仮想カメラ操作ＵＩ３３０、エンドユーザ端末１９０）に関して、順に説明する。 Next, each node (camera adapter 120, front end server 230, database 250, back end server 270, virtual camera operation UI 330, end user terminal 190) in the image processing system 100 of FIG. 1 will be described in order.

図２は、本実施形態におけるカメラアダプタ１２０の機能ブロック図である。カメラアダプタ１２０は、図２に示されるように、ネットワークアダプタ０６１１０、伝送部０６１２０、画像処理部０６１３０、及び外部機器制御部０６１４０を有する。 FIG. 2 is a functional block diagram of the camera adapter 120 in the present embodiment. As illustrated in FIG. 2, the camera adapter 120 includes a network adapter 06110, a transmission unit 06120, an image processing unit 06130, and an external device control unit 06140.

ネットワークアダプタ０６１１０は、データ送受信部０６１１１及び時刻制御部０６１１２を有する。なお、本実施形態では、ネットワークアダプタ０６１１０としてＮＩＣ（Network Interface Card）を用いるが、ＮＩＣに限定されるものではなく、同様の他のＩｎｔｅｒｆａｃｅを用いてもよい。 The network adapter 06110 includes a data transmission / reception unit 06111 and a time control unit 06112. In the present embodiment, a network interface card (NIC) is used as the network adapter 06110. However, the present invention is not limited to the NIC, and other similar interfaces may be used.

データ送受信部０６１１１は、デイジーチェーン１７０、ネットワーク１８０ｂ、及びネットワーク３１０ａを介して、他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、及び制御ステーション３１０とデータ通信を行う。 The data transmission / reception unit 06111 performs data communication with the other camera adapter 120, the front-end server 230, the time server 290, and the control station 310 via the daisy chain 170, the network 180b, and the network 310a.

例えば、データ送受信部０６１１１は、カメラ１１２による撮影画像から前景背景分離部０６１３１により分離された前景画像と背景画像とを、別のカメラアダプタ１２０に出力する。そして、各カメラアダプタ１２０が前景画像と背景画像とを出力することで、複数の視点から撮影された前景画像と背景画像に基づく仮想視点画像が生成される。なお、出力先のカメラアダプタ１２０は、画像処理システム１００のカメラアダプタ１２０のうち、データルーティング処理部０６１２２の処理に応じて、予め定められた順序に従って設定される。また、カメラアダプタ１２０に関して、撮影画像から分離した前景画像のみを出力し、背景画像を出力しないカメラアダプタ１２０があってもよい。 For example, the data transmission / reception unit 06111 outputs the foreground image and the background image separated by the foreground / background separation unit 06131 from the image captured by the camera 112 to another camera adapter 120. Each camera adapter 120 outputs a foreground image and a background image, thereby generating a virtual viewpoint image based on the foreground image and the background image taken from a plurality of viewpoints. The output destination camera adapter 120 is set according to a predetermined order according to the processing of the data routing processing unit 06122 among the camera adapters 120 of the image processing system 100. Further, regarding the camera adapter 120, there may be a camera adapter 120 that outputs only the foreground image separated from the captured image and does not output the background image.

時刻制御部０６１１２は、例えば、ＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ２９０との時刻同期を行う機能を備える。なお、ＩＥＥＥ１５８８は、ＩＥＥＥ１５８８−２００２、ＩＥＥＥ１５８８−２００８のように標準規格として更新されており、後者については、ＰＴＰｖ２（Precision Time Protocol Version2）とも呼ばれる。また、タイムサーバ２９０との時刻同期に関して、必ずしもＩＥＥＥ１５８８規格に準拠させる必要はなく、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコル等によってタイムサーバ２９０との時刻同期を実現してもよい。 The time control unit 06112 is compliant with the Ordinary Clock of the IEEE 1588 standard, for example, and has a function of storing a time stamp of data transmitted to and received from the time server 290 and a function of performing time synchronization with the time server 290. Note that IEEE 1588 has been updated as a standard such as IEEE 1588-2002 and IEEE 1588-2008, and the latter is also called PTPv2 (Precision Time Protocol Version 2). Further, the time synchronization with the time server 290 is not necessarily compliant with the IEEE 1588 standard, and the time synchronization with the time server 290 may be realized by another EtherAVB standard or a unique protocol.

伝送部０６１２０は、ネットワークアダプタ０６１１０を介して、スィッチングハブ１８０等に対するデータの伝送を制御する。伝送部０６１２０は、データ圧縮・伸張部０６１２１、データルーティング処理部０６１２２、時刻同期制御部０６１２３、画像・音声伝送処理部０６１２４、及びデータルーティング情報保持部０６１２５を有する。以下、各々の機能について説明する。 The transmission unit 06120 controls data transmission to the switching hub 180 and the like via the network adapter 06110. The transmission unit 06120 includes a data compression / decompression unit 06121, a data routing processing unit 06122, a time synchronization control unit 06123, an image / audio transmission processing unit 06124, and a data routing information holding unit 06125. Hereinafter, each function will be described.

データ圧縮・伸張部０６１２１は、データ送受信部０６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用して圧縮を行う機能と、その圧縮されたデータを伸張する機能を有する。 The data compression / decompression unit 06121 compresses data transmitted / received via the data transmission / reception unit 06111 by applying a predetermined compression method, compression rate, and frame rate, and decompresses the compressed data. It has the function to do.

データルーティング処理部０６１２２は、後述するデータルーティング情報保持部０６１２５が保持するデータを用いて、データ送受信部０６１１１が受信したデータ及び画像処理部０６１３０が処理したデータのルーティング先を決定する。データルーティング処理部０６１２２は、さらに、決定したルーティング先にデータを送信する。そして、複数のカメラアダプタ１２０各々のデータルーティング処理部０６１２２による決定に応じて、画像処理システム１００において、前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が設定される。 The data routing processing unit 06122 determines the routing destination of the data received by the data transmitting / receiving unit 06111 and the data processed by the image processing unit 06130, using data held by the data routing information holding unit 06125 described later. The data routing processing unit 06122 further transmits data to the determined routing destination. Then, according to the determination by the data routing processing unit 06122 of each of the plurality of camera adapters 120, the order of the camera adapters 120 that output the foreground image and the background image in the relay format is set in the image processing system 100.

なお、ルーティング先として、同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０を設定するのが、各々のカメラ１１２同士の画像フレームの相関が高いことから、画像処理を実行する上で好適である。 Note that setting the camera adapter 120 corresponding to the camera 112 focused on the same gazing point as the routing destination has a high correlation between the image frames of the respective cameras 112. Is preferred.

時刻同期制御部０６１２３は、例えば、ＩＥＥＥ１５８８規格のＰＴＰ（Precision Time Protocol）に準拠し、タイムサーバ２９０との時刻同期を行う機能を備える。なお、タイムサーバ２９０との時刻同期に関して、必ずしもＩＥＥＥ１５８８規格のＰＴＰに準拠させる必要はなく、他の同様のプロトコルを用いて時刻同期を実現してもよい。 The time synchronization control unit 06123 has a function of performing time synchronization with the time server 290 based on, for example, the IEEE 1588 standard PTP (Precision Time Protocol). Note that the time synchronization with the time server 290 is not necessarily compliant with the IEEE 1588 standard PTP, and the time synchronization may be realized using another similar protocol.

画像・音声伝送処理部０６１２４は、画像データ又は音声データを、データ送受信部０６１１１を介して他のカメラアダプタ１２０又はフロントエンドサーバ２３０に転送するためのメッセージを作成する機能を有している。なお、メッセージには、画像データ又は音声データ、及び各データのメタ情報が含まれる。また、メタ情報には、画像の撮影又は音声のサンプリングをしたときのタイムコード又はシーケンス番号、データ種別、及びカメラ１１２やマイク１１１の個体を示す識別子等が含まれる。その他、送信する画像データ又は音声データは、データ圧縮・伸張部０６１２１によりデータ圧縮がなされていてもよい。 The image / audio transmission processing unit 06124 has a function of creating a message for transferring image data or audio data to another camera adapter 120 or the front-end server 230 via the data transmission / reception unit 06111. The message includes image data or audio data and meta information of each data. The meta information includes a time code or sequence number when an image is taken or audio is sampled, a data type, an identifier indicating an individual of the camera 112 or the microphone 111, and the like. In addition, image data or audio data to be transmitted may be compressed by the data compression / decompression unit 06121.

また、画像・音声伝送処理部０６１２４は、他のカメラアダプタ１２０からデータ送受信部０６１１１を介してメッセージを受信する。そして、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を画像データ又は音声データに復元する。なお、データを復元したときにデータが圧縮されている場合は、データ圧縮・伸張部０６１２１により伸張処理が実行される。 The image / sound transmission processing unit 06124 receives a message from another camera adapter 120 via the data transmission / reception unit 06111. Then, in accordance with the data type included in the message, the data information fragmented to the packet size defined by the transmission protocol is restored to image data or audio data. If the data is compressed when the data is restored, the data compression / decompression unit 06121 executes decompression processing.

データルーティング情報保持部０６１２５は、データ送受信部０６１１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する。なお、ルーティング方法については後述する。 The data routing information holding unit 06125 holds address information for determining a transmission destination of data transmitted / received by the data transmitting / receiving unit 06111. The routing method will be described later.

画像処理部０６１３０は、カメラ制御部０６１４１の制御によりカメラ１１２が撮影した画像データ及び他のカメラアダプタ１２０から受信した画像データに対して処理を施す。画像処理部０６１３０は、前景背景分離部０６１３１、三次元モデル情報生成部０６１３２、及びキャリブレーション制御部０６１３３を有する。以下、各々の機能について説明する。 The image processing unit 06130 performs processing on image data captured by the camera 112 and image data received from another camera adapter 120 under the control of the camera control unit 06141. The image processing unit 06130 includes a foreground / background separation unit 06131, a three-dimensional model information generation unit 06132, and a calibration control unit 06133. Hereinafter, each function will be described.

前景背景分離部０６１３１は、カメラ１１２が撮影した画像データを前景画像と背景画像に分離する。具体的には、複数のカメラアダプタ１２０の各々において、前景背景分離部０６１３１は、対応するカメラ１１２の撮影画像から所定領域を抽出する。ここで、所定領域は、例えば、撮影画像に対するオブジェクト検出の結果、取得される前景画像であり、また、この抽出により、前景背景分離部０６１３１は、撮影画像を前景画像と背景画像に分離する。 A foreground / background separator 06131 separates image data captured by the camera 112 into a foreground image and a background image. Specifically, in each of the plurality of camera adapters 120, the foreground / background separation unit 06131 extracts a predetermined area from the captured image of the corresponding camera 112. Here, the predetermined area is, for example, a foreground image acquired as a result of object detection on the captured image, and by this extraction, the foreground / background separation unit 06131 separates the captured image into a foreground image and a background image.

ここで、オブジェクトとは、例えば人物等であり、選手、監督、及び／又は審判等の特定の人物であってもよい。また、ボールやゴール等、画像パターンが予め定められている物体であってもよい。その他、オブジェクトとして、動体が検出されるようにしてもよい。例えば、人物を重要なオブジェクトとして設定する場合、図８（ａ）に示すように、人物を含む前景画像と、図８（ｂ）に示すようにオブジェクト（人物）を含まない背景画像を分離して処理する。 Here, the object is, for example, a person or the like, and may be a specific person such as a player, a manager, and / or a referee. Further, it may be an object with a predetermined image pattern, such as a ball or a goal. In addition, a moving object may be detected as an object. For example, when a person is set as an important object, a foreground image including a person is separated from a background image not including an object (person) as shown in FIG. 8B, as shown in FIG. To process.

これにより、画像処理システム１００において生成される仮想視点画像のオブジェクトに該当する部分の画像の品質を向上させることができる。また、複数のカメラアダプタ１２０の各々において前景と背景を分離することで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることができる。補足として、所定領域は前景画像に限らず、例えば、背景画像等であってもよい。 Thereby, the quality of the image of the part applicable to the object of the virtual viewpoint image produced | generated in the image processing system 100 can be improved. Further, by separating the foreground and the background in each of the plurality of camera adapters 120, the load on the image processing system 100 including the plurality of cameras 112 can be distributed. As a supplement, the predetermined area is not limited to the foreground image, but may be a background image, for example.

三次元モデル情報生成部０６１３２は、前景背景分離部０６１３１で分離された前景画像、及び他のカメラアダプタ１２０から受信した前景画像を用いて、例えば、ステレオカメラの原理等を適用することで、三次元モデルに関する画像情報を生成する。 The three-dimensional model information generation unit 06132 uses the foreground image separated by the foreground / background separation unit 06131 and the foreground image received from the other camera adapter 120, for example, by applying the principle of a stereo camera, etc. Generate image information about the original model.

キャリブレーション制御部０６１３３は、キャリブレーションに必要な画像データを、カメラ制御部０６１４１を介してカメラ１１２から取得し、キャリブレーションに関する演算処理を実行するフロントエンドサーバ２３０に送信する。なお、本実施形態では、キャリブレーションに関する演算処理をフロントエンドサーバ２３０で実行しているが、キャリブレーションに関する演算処理を実行するノードは必ずしもフロントエンドサーバ２３０に限定されない。したがって、例えば、制御ステーション３１０やカメラアダプタ１２０（この場合、他のカメラアダプタ１２０を含む）等、他のノードで演算処理を実行してもよい。 The calibration control unit 06133 acquires image data necessary for calibration from the camera 112 via the camera control unit 06141, and transmits the image data to the front-end server 230 that executes arithmetic processing related to calibration. In the present embodiment, the calculation process related to calibration is executed by the front-end server 230, but the node that executes the calculation process related to calibration is not necessarily limited to the front-end server 230. Therefore, for example, the arithmetic processing may be executed in another node such as the control station 310 or the camera adapter 120 (in this case, including the other camera adapter 120).

また、キャリブレーション制御部０６１３３は、カメラ制御部０６１４１を介してカメラ１１２から取得した画像データに対して、予め設定されたパラメータに応じて撮影中のキャリブレーション（動的キャリブレーション）を行う。 The calibration control unit 06133 performs calibration during shooting (dynamic calibration) on the image data acquired from the camera 112 via the camera control unit 06141 according to a preset parameter.

外部機器制御部０６１４０は、カメラアダプタ１２０に接続する機器を制御する。外部機器制御部０６１４０は、カメラ制御部０６１４１、マイク制御部０６１４２、雲台制御部０６１４３、及びセンサ制御部０６１４４を有する。以下、各々の機能について説明する。 The external device control unit 06140 controls devices connected to the camera adapter 120. The external device control unit 06140 includes a camera control unit 06141, a microphone control unit 06142, a pan head control unit 06143, and a sensor control unit 06144. Hereinafter, each function will be described.

カメラ制御部０６１４１は、カメラ１１２と接続され、カメラ１１２の制御、撮影画像の取得、同期信号の提供、及び時刻設定等を行う。ここで、カメラ１１２の制御には、撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランス等）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、及びエラー等）の取得、撮影の開始及び停止、並びにピント調整等がある。 The camera control unit 06141 is connected to the camera 112 and performs control of the camera 112, acquisition of a captured image, provision of a synchronization signal, time setting, and the like. Here, the control of the camera 112 includes setting and referring to shooting parameters (number of pixels, color depth, frame rate, white balance, etc.), and the state of the camera 112 (shooting, stopping, synchronizing, error, etc.) Acquisition, shooting start and stop, and focus adjustment.

なお、本実施形態では、カメラ１１２を介してピント調整を行う仕様としているが、取り外し可能なレンズがカメラ１１２に装着されている場合には、カメラアダプタ１２０をレンズに接続し、直接レンズの調整を行ってもよい。また、カメラアダプタ１２０が、カメラ１１２を介してズーム等のレンズ調整を行ってもよい。 In this embodiment, the focus adjustment is performed via the camera 112. However, when a detachable lens is attached to the camera 112, the camera adapter 120 is connected to the lens and the lens is directly adjusted. May be performed. Further, the camera adapter 120 may perform lens adjustment such as zooming via the camera 112.

加えて、同期信号の提供は、時刻同期制御部０６１２３がタイムサーバ２９０と同期した時刻を用いて、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。さらに、時刻設定は、時刻同期制御部０６１２３がタイムサーバ２９０と同期した時刻を、例えば、ＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受信する画像データに、提供されたタイムコードが付与される。なお、タイムコードのフォーマットはＳＭＰＴＥ１２Ｍに限定されるわけではなく、他のフォーマットを用いてもよい。また、カメラ制御部０６１４１は、カメラ１１２に対するタイムコードの提供は行わず、カメラ１１２から受信した画像データにタイムコードを付与してもよい。 In addition, the synchronization signal is provided by providing the imaging timing (control clock) to the camera 112 using the time synchronized with the time server 290 by the time synchronization control unit 06123. Furthermore, the time setting is performed by providing the time synchronized with the time server 290 by the time synchronization control unit 06123 using, for example, a time code conforming to the SMPTE12M format. Thereby, the provided time code is added to the image data received from the camera 112. The time code format is not limited to SMPTE12M, and other formats may be used. In addition, the camera control unit 06141 may provide the time code to the image data received from the camera 112 without providing the time code to the camera 112.

マイク制御部０６１４２は、マイク１１１と接続され、マイク１１１の制御、収音の開始及び停止、また、収音された音声データの取得等を行う。マイク１１１の制御は、例えば、ゲイン調整や、状態取得等である。また、マイク制御部０６１４２は、カメラ制御部０６１４１と同様に、マイク１１１に対して音声サンプリングするタイミングとタイムコードを提供する。音声サンプリングのタイミングとなるクロック情報としては、タイムサーバ２９０からの時刻情報が例えば、４８ＫＨｚのワードクロックに変換されてマイク１１１に供給される。 The microphone control unit 06142 is connected to the microphone 111 and performs control of the microphone 111, start and stop of sound collection, acquisition of collected sound data, and the like. The control of the microphone 111 is, for example, gain adjustment, state acquisition, and the like. Similarly to the camera control unit 06141, the microphone control unit 06142 provides audio sampling timing and time code to the microphone 111. As clock information that is the timing of audio sampling, time information from the time server 290 is converted into, for example, a 48 KHz word clock and supplied to the microphone 111.

雲台制御部０６１４３は、雲台１１３と接続され、雲台１１３の制御を行う。雲台１１３の制御は例えば、パン・チルト制御や状態取得等がある。センサ制御部（内部センサ）０６１４４は、外部センサ１１４と接続され、外部センサ１１４がセンシングしたセンサ情報を取得する。例えば、外部センサ１１４としてジャイロセンサが用いられる場合は、振動の程度を表す情報（振動情報）を取得することができる。この場合、例えば、センサ制御部０６１４４が取得した振動情報を用いて、画像処理部０６１３０は、前景背景分離部０６１３１での処理に先立って、振動を抑えた画像を生成することができる。 The pan head control unit 06143 is connected to the pan head 113 and controls the pan head 113. Control of the camera platform 113 includes, for example, pan / tilt control and status acquisition. The sensor control unit (internal sensor) 06144 is connected to the external sensor 114 and acquires sensor information sensed by the external sensor 114. For example, when a gyro sensor is used as the external sensor 114, information indicating the degree of vibration (vibration information) can be acquired. In this case, for example, using the vibration information acquired by the sensor control unit 06144, the image processing unit 06130 can generate an image in which vibration is suppressed prior to processing by the foreground / background separation unit 06131.

振動情報は例えば、８Ｋカメラの画像データを、振動情報を考慮して、元の８Ｋサイズよりも小さいサイズで切り出して、隣接設置されたカメラ１１２の画像との位置合わせを行う場合に用いられる。これにより、建造物の躯体振動が各カメラに異なる周波数で伝搬した場合であっても、位置合わせを行うことができる。そして、その結果、電子的に防振された画像データ（即ち、振動の影響が画像処理により低減された画像データ）を生成することができ、画像コンピューティングサーバ２００におけるカメラ１１２の台数分の位置合わせの処理負荷を軽減することができる。なお、センサシステム１１０のセンサは、必ずしも外部センサ１１４に限定されるわけではなく、カメラアダプタ１２０に内蔵されたセンサであっても同様の効果を得ることができる。 For example, the vibration information is used when image data of an 8K camera is cut out with a size smaller than the original 8K size in consideration of the vibration information and is aligned with the image of the adjacent camera 112. Thereby, even if it is a case where the frame vibration of a building propagates to each camera at a different frequency, alignment can be performed. As a result, it is possible to generate electronically image-stabilized image data (that is, image data in which the influence of vibration is reduced by image processing), and the number of cameras 112 in the image computing server 200 The combined processing load can be reduced. Note that the sensor of the sensor system 110 is not necessarily limited to the external sensor 114, and the same effect can be obtained even if it is a sensor built in the camera adapter 120.

図３は、カメラアダプタ１２０内部の画像処理部０６１３０の機能ブロック図である。以下、キャリブレーション制御部０６１３３、前景背景分離部０６１３１、三次元モデル情報生成部０６１３２の各々に関して、説明を補足する。 FIG. 3 is a functional block diagram of the image processing unit 06130 inside the camera adapter 120. Hereinafter, a supplementary explanation will be given for each of the calibration control unit 06133, the foreground / background separation unit 06131, and the three-dimensional model information generation unit 06132.

キャリブレーション制御部０６１３３は、入力された画像に対して、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因する画像のブレを低減させて画像の位置を安定させるためのブレ補正処理（電子防振処理）等を行う。 The calibration control unit 06133 stabilizes the position of the image by reducing color blurring caused by camera vibration and color correction processing for suppressing color variation for each camera with respect to the input image. Blur correction processing (electronic image stabilization processing) is performed.

前景背景分離部０６１３１は、図３に示されるように、前景分離部０５００１、背景更新部０５００３、及び背景切出部０５００４を有する。前景分離部０５００１は、カメラ１１２の画像に関して位置合わせが行われた画像データに対して、背景画像０５００２との比較により前景画像の分離処理を行う。背景更新部０５００３は、背景画像０５００２とカメラ１１２の位置合わせが行われた画像を用いて新しい背景画像を生成し、背景画像０５００２を新しい背景画像に更新する。背景切出部０５００４は、背景画像０５００２の一部を切り出す制御を行う。 As shown in FIG. 3, the foreground / background separation unit 06131 includes a foreground separation unit 05001, a background update unit 05003, and a background cutout unit 05004. The foreground separation unit 05001 performs a foreground image separation process on the image data that has been aligned with respect to the image of the camera 112 by comparison with the background image 05002. The background update unit 05003 generates a new background image using the image in which the background image 05002 and the camera 112 are aligned, and updates the background image 05002 to a new background image. The background cutout unit 05004 performs control to cut out a part of the background image 05002.

三次元モデル情報生成部０６１３２は、図３に示されるように、三次元モデル処理部０５００５、他カメラ前景受信部０５００６、及びカメラパラメータ受信部０５００７を有する。三次元モデル処理部０５００５は、前景分離部０５００１で分離された前景画像と、伝送部０６１２０を介して受信した他のカメラ１１２の前景画像を用いて、例えば、ステレオカメラの原理等から三次元モデルに関する画像情報を逐次、生成する。他カメラ前景受信部０５００６は、他のカメラアダプタ１２０で前景背景分離された前景画像を受信する。カメラパラメータ受信部０５００７は、カメラ固有の内部パラメータ（例えば、焦点距離、画像中心、及びレンズ歪みパラメータ等）と、カメラの位置姿勢を表す外部パラメータ（例えば、回転行列及び位置ベクトル等）を受信する。これらのパラメータは、後述のキャリブレーション処理で取得される情報であり、制御ステーション３１０から対象となるカメラアダプタ１２０に対して送信及び設定される。 As shown in FIG. 3, the 3D model information generation unit 06132 includes a 3D model processing unit 05005, another camera foreground reception unit 05006, and a camera parameter reception unit 05007. The three-dimensional model processing unit 05005 uses the foreground image separated by the foreground separation unit 05001 and the foreground image of the other camera 112 received via the transmission unit 06120, for example, based on the principle of a stereo camera. The image information regarding is sequentially generated. The other camera foreground receiving unit 05006 receives the foreground image obtained by separating the foreground and background with the other camera adapter 120. The camera parameter receiving unit 05007 receives camera-specific internal parameters (for example, focal length, image center, lens distortion parameters, etc.) and external parameters (for example, rotation matrix, position vector, etc.) representing the position and orientation of the camera. . These parameters are information acquired by a calibration process described later, and are transmitted and set from the control station 310 to the target camera adapter 120.

図４は、本実施形態におけるフロントエンドサーバ２３０の機能ブロック図である。フロントエンドサーバ２３０は、図４に示されるように、各種機能ブロック（制御部０２１１０−ＤＢアクセス制御部０２１９０）を有する。以下、各々の機能について説明する。 FIG. 4 is a functional block diagram of the front-end server 230 in the present embodiment. As shown in FIG. 4, the front-end server 230 includes various functional blocks (control unit 02110-DB access control unit 02190). Hereinafter, each function will be described.

制御部０２１１０は、ＣＰＵ、ＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリなどの記憶媒体、Ｅｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。なお、ＤＲＡＭは、Dynamic Random Access Memory の略である。 The control unit 02110 includes a CPU, a DRAM, a storage medium such as an HDD or NAND memory that stores program data and various data, and hardware such as Ethernet (registered trademark). DRAM is an abbreviation for Dynamic Random Access Memory.

制御部０２１１０は、フロントエンドサーバ２３０の各機能ブロック及びフロントエンドサーバ２３０のシステム全体を制御する。加えて、制御部０２１１０は、モード制御を行うことで、キャリブレーション動作、撮影前の準備動作、及び撮影中動作等の動作モードを切り替える。制御部０２１１０は、さらに、Ｅｔｈｅｒｎｅｔ（登録商標）を通じて制御ステーション３１０からの制御指示を受信し、各モードの切り替えやデータの入出力等を行う。また、制御部０２１１０は、同様にネットワークを通じて制御ステーション３１０からスタジアムＣＡＤデータ（スタジアム形状データ）を取得し、スタジアムＣＡＤデータをＣＡＤデータ記憶部０２１３５と撮影データファイル生成部０２１８０に送信する。なお、本実施形態におけるスタジアムＣＡＤデータ（スタジアム形状データ）は、スタジアムの形状を示す三次元データであり、メッシュモデルやその他の三次元形状を表すデータであればよく、必ずしもＣＡＤ形式に限定されない。 The control unit 02110 controls each functional block of the front end server 230 and the entire system of the front end server 230. In addition, the control unit 02110 switches operation modes such as a calibration operation, a preparatory operation before photographing, and an operation during photographing by performing mode control. The control unit 02110 further receives a control instruction from the control station 310 through Ethernet (registered trademark), and performs switching of each mode, input / output of data, and the like. Similarly, the control unit 02110 acquires stadium CAD data (stadium shape data) from the control station 310 through the network, and transmits the stadium CAD data to the CAD data storage unit 02135 and the imaging data file generation unit 02180. The stadium CAD data (stadium shape data) in the present embodiment is three-dimensional data indicating the shape of the stadium, and may be data representing a mesh model or other three-dimensional shapes, and is not necessarily limited to the CAD format.

データ入力制御部０２１２０は、Ｅｔｈｅｒｎｅｔ（登録商標）等の通信路とスィッチングハブ１８０を介して、カメラアダプタ１２０とネットワーク接続される。データ入力制御部０２１２０は、ネットワークを介してカメラアダプタ１２０から前景画像、背景画像、被写体の三次元モデル（三次元形状）、音声データ、及びカメラキャリブレーション撮影画像データを取得する。 The data input control unit 02120 is connected to the camera adapter 120 via a communication path such as Ethernet (registered trademark) and the switching hub 180 via a network. The data input control unit 02120 acquires a foreground image, a background image, a three-dimensional model (three-dimensional shape) of the subject, audio data, and camera calibration photographed image data from the camera adapter 120 via the network.

ここで、前景画像は仮想視点画像の生成のための撮影画像の前景領域に基づく画像データであり、背景画像は、その撮影画像の背景領域に基づく画像データである。カメラアダプタ１２０は、カメラ１１２による撮影画像に対する所定のオブジェクトの検出処理の結果に応じて、前景領域及び背景領域を特定し、前景画像及び背景画像を生成する。なお、所定のオブジェクトとは、例えば、人物等であり、選手、監督、及び／又は審判等の特定の人物であってもよい。また、所定のオブジェクトには、ボールやゴール等、画像パターンが予め定められている物体が含まれてもよい。その他、所定のオブジェクトとして、動体が検出されるようにしてもよい。 Here, the foreground image is image data based on the foreground area of the captured image for generating the virtual viewpoint image, and the background image is image data based on the background area of the captured image. The camera adapter 120 identifies the foreground area and the background area according to the result of detection processing of a predetermined object for the captured image by the camera 112, and generates the foreground image and the background image. The predetermined object is, for example, a person or the like, and may be a specific person such as a player, a manager, and / or a referee. The predetermined object may include an object having a predetermined image pattern such as a ball or a goal. In addition, a moving object may be detected as the predetermined object.

データ入力制御部０２１２０は、受信したデータの圧縮伸張やデータルーティング処理等を行う。データ入力制御部０２１２０は、さらに、前景画像及び背景画像をデータ同期部０２１３０に送信し、カメラキャリブレーション撮影画像データをキャリブレーション部０２１４０に送信する。なお、制御部０２１１０とデータ入力制御部０２１２０は共にＥｔｈｅｒｎｅｔ（登録商標）等のネットワークによる通信機能を有しているが、当該通信機能により制御部０２１１０とデータ入力制御部０２１２０の通信を行ってもよい。その場合、例えば、データ入力制御部０２１２０は、制御ステーション３１０からの制御コマンドによる指示やスタジアムＣＡＤデータを受信すると、それらのデータ等を制御部０２１１０に対して送信する。 The data input control unit 02120 performs received data compression / decompression, data routing processing, and the like. The data input control unit 02120 further transmits the foreground image and the background image to the data synchronization unit 02130, and transmits camera calibration photographed image data to the calibration unit 02140. Note that both the control unit 02110 and the data input control unit 02120 have a communication function via a network such as Ethernet (registered trademark), but even if the control unit 02110 and the data input control unit 02120 communicate with each other using the communication function. Good. In this case, for example, when receiving an instruction by a control command from the control station 310 or stadium CAD data, the data input control unit 02120 transmits such data to the control unit 02110.

データ同期部０２１３０は、カメラアダプタ１２０から取得したデータを、前景画像、背景画像、音声データ及び三次元モデルデータが揃うまで、内部のＤＲＡＭに一次的に記憶する（バッファする）。これは、ネットワークによって各カメラアダプタ１２０から転送されたデータに関して、ネットワークパケットの受信順序が保証されず、ファイル生成に必要なデータを揃える必要があるためである。なお、ここで揃えるデータは、後述の撮影データファイル生成部０２１８０において、ファイルを生成するために必要なデータである。また、以降では、前景画像、背景画像、音声データ及び三次元モデルデータを纏めて、撮影データと称する。 The data synchronization unit 02130 temporarily stores (buffers) the data acquired from the camera adapter 120 in the internal DRAM until the foreground image, background image, audio data, and 3D model data are prepared. This is because the reception order of network packets is not guaranteed for the data transferred from each camera adapter 120 by the network, and it is necessary to prepare data necessary for file generation. Note that the data aligned here is data necessary to generate a file in the shooting data file generation unit 02180 described later. Further, hereinafter, the foreground image, the background image, the sound data, and the 3D model data are collectively referred to as shooting data.

撮影データには、ルーティング情報、タイムコード情報（時間情報）、及びカメラ識別子等のメタ情報が付与されており、データ同期部０２１３０は、このメタ情報に基づいて、データの属性を確認する。これにより、データ同期部０２１３０は、同一時刻のデータであること等を判定して、データが揃ったことを確認する。 Meta information such as routing information, time code information (time information), and a camera identifier is added to the shooting data, and the data synchronization unit 02130 confirms the attribute of the data based on the meta information. As a result, the data synchronization unit 02130 determines that the data is at the same time, and confirms that the data is ready.

そして、データが揃うと、データ同期部０２１３０は、前景画像及び背景画像を画像処理部０２１５０に、三次元モデルデータを三次元モデル結合部０２１６０に、音声データを撮影データファイル生成部０２１８０に各々、送信する。なお、背景画像は、前景画像と異なるフレームレートで撮影されてもよい。この場合、例えば、背景画像のフレームレートが１ｆｐｓである場合、１秒毎に１つの背景画像が取得されるため、背景画像が取得されない時間については、背景画像が無い状態で全てのデータが揃ったとしてよい。 When the data is ready, the data synchronization unit 02130 sends the foreground image and the background image to the image processing unit 02150, the 3D model data to the 3D model combination unit 02160, and the audio data to the shooting data file generation unit 02180, respectively. Send. Note that the background image may be taken at a frame rate different from that of the foreground image. In this case, for example, when the frame rate of the background image is 1 fps, one background image is acquired every second, and therefore, for the time when the background image is not acquired, all the data is obtained without the background image. It's okay.

また、データ同期部０２１３０は、所定時間が経過してもデータが揃っていない場合には、データを集結できないことを示す情報をデータベース２５０に通知する。例えば、後段のデータベース２５０は、データを格納するときに、カメラ番号やフレーム番号と併せてデータ欠落を示す情報を格納する。そして、これにより、仮想カメラ操作ＵＩ３３０からバックエンドサーバ２７０への視点指示に応じて、データ集結したカメラ１１２の撮影画像から所望の画像が形成できるか否かをレンダリング前に自動通知することができる。その結果、仮想カメラ操作ＵＩ３３０のオペレータの目視負荷を軽減することができる。 In addition, the data synchronization unit 02130 notifies the database 250 of information indicating that data cannot be collected when the data is not ready after a predetermined time has elapsed. For example, the database 250 at the subsequent stage stores information indicating data loss together with the camera number and frame number when storing data. Thus, according to the viewpoint instruction from the virtual camera operation UI 330 to the back-end server 270, whether or not a desired image can be formed from the captured images of the camera 112 collected data can be automatically notified before rendering. . As a result, the visual load on the operator of the virtual camera operation UI 330 can be reduced.

ＣＡＤデータ記憶部０２１３５は、制御部０２１１０から受信したスタジアム形状を示す三次元データを、ＤＲＡＭ、又はＨＤＤやＮＡＮＤメモリ等の記憶媒体に記憶する。そして、ＣＡＤデータ記憶部０２１３５は、スタジアム形状データの要求を受信すると、画像結合部０２１７０に、記憶しているスタジアム形状データを送信する。 The CAD data storage unit 02135 stores the three-dimensional data indicating the stadium shape received from the control unit 02110 in a storage medium such as a DRAM or HDD or NAND memory. When the CAD data storage unit 02135 receives a request for stadium shape data, the CAD data storage unit 02135 transmits the stored stadium shape data to the image combining unit 02170.

キャリブレーション部０２１４０は、カメラのキャリブレーション動作を行い、キャリブレーションによって取得されたカメラパラメータを、後述の非撮影データファイル生成部０２１８５に送信する。また、キャリブレーション部０２１４０は、同時に、自身の記憶領域にもカメラパラメータを保持し、後述の三次元モデル結合部０２１６０にカメラパラメータ情報を提供する。 The calibration unit 02140 performs a camera calibration operation, and transmits camera parameters acquired by the calibration to a non-photographed data file generation unit 02185 described later. At the same time, the calibration unit 02140 holds camera parameters in its own storage area, and provides camera parameter information to a 3D model combining unit 02160 described later.

画像処理部０２１５０は、前景画像や背景画像に対して、カメラ間の色や輝度値の合わせ込み、ＲＡＷ画像データが入力される場合には現像処理、及び画像復号時のエラー補完やカメラのレンズ歪みの補正等の処理を実行する。そして、画像処理が施された前景画像は撮影データファイル生成部０２１８０に、背景画像は画像結合部０２１７０に各々、送信される。 The image processing unit 02150 matches colors and luminance values between cameras with respect to the foreground image and the background image, develops processing when RAW image data is input, and error compensation at the time of image decoding and camera lens Processing such as distortion correction is executed. Then, the foreground image subjected to the image processing is transmitted to the shooting data file generation unit 02180, and the background image is transmitted to the image combination unit 02170.

三次元モデル結合部０２１６０は、キャリブレーション部０２１４０により生成されたカメラパラメータを用いて、カメラアダプタ１２０から取得した同一時刻の三次元モデルデータを結合する。そして、三次元モデル結合部０２１６０は、ＶｉｓｕａｌＨｕｌｌと呼ばれる方法を用いて、スタジアム全体における前景画像の三次元モデルデータを生成する。なお、生成された三次元モデルデータは、撮影データファイル生成部０２１８０に送信される。 The 3D model combining unit 02160 combines the 3D model data at the same time acquired from the camera adapter 120 using the camera parameters generated by the calibration unit 02140. Then, the 3D model combination unit 02160 generates 3D model data of the foreground image in the entire stadium using a method called VisualHull. The generated three-dimensional model data is transmitted to the shooting data file generation unit 02180.

画像結合部０２１７０は、画像処理部０２１５０から背景画像を取得し、また、ＣＡＤデータ記憶部０２１３５からスタジアムの三次元形状データを取得すると、その取得したスタジアムの三次元形状データの座標に対する背景画像の位置を特定する。画像結合部０２１７０は、背景画像の各々に関して、スタジアムの三次元形状データ（スタジアム形状データ）の座標に対する位置を特定すると、背景画像を結合して１つの背景画像とする。なお、背景画像の三次元形状データの生成については、バックエンドサーバ２７０が実行してもよい。 When the image combining unit 02170 obtains the background image from the image processing unit 02150 and obtains the three-dimensional shape data of the stadium from the CAD data storage unit 02135, the image combining unit 02170 obtains the background image with respect to the coordinates of the obtained three-dimensional shape data of the stadium. Identify the location. When the position of the stadium three-dimensional shape data (stadium shape data) is specified for each of the background images, the image combining unit 02170 combines the background images into one background image. Note that the back-end server 270 may execute the generation of the three-dimensional shape data of the background image.

撮影データファイル生成部０２１８０は、データ同期部０２１３０から音声データを、画像処理部０２１５０から前景画像を、三次元モデル結合部０２１６０から三次元モデルデータを、画像結合部０２１７０から三次元形状に結合された背景画像を取得する。そして、撮影データファイル生成部０２１８０は、取得したこれらのデータをＤＢアクセス制御部０２１９０に出力する。 The shooting data file generation unit 02180 is combined with the audio data from the data synchronization unit 02130, the foreground image from the image processing unit 02150, the 3D model data from the 3D model combining unit 02160, and the 3D shape from the image combining unit 02170. Get the background image. Then, the captured data file generation unit 02180 outputs the acquired data to the DB access control unit 02190.

ここで、撮影データファイル生成部０２１８０は、これらのデータを各々の時間情報に基づいて、対応付けて出力する。但し、この場合、これらのデータに関して、その一部を対応付けて出力してもよい。例えば、撮影データファイル生成部０２１８０は、前景画像と背景画像を、前景画像の時間情報及び背景画像の時間情報に基づいて、対応付けて出力する。また、例えば、撮影データファイル生成部０２１８０は、前景画像、背景画像、及び三次元モデルデータを、前景画像の時間情報、背景画像の時間情報、及び三次元モデルデータの時間情報に基づいて、対応付けて出力する。なお、撮影データファイル生成部０２１８０は、対応付けがなされたデータをデータの種類別にファイル化して出力することも、複数種類のデータを時間情報が示す時刻毎に纏めてファイル化して出力することもできる。 Here, the imaging data file generation unit 02180 outputs these data in association with each other based on each time information. However, in this case, some of these data may be output in association with each other. For example, the shooting data file generation unit 02180 outputs the foreground image and the background image in association with each other based on the time information of the foreground image and the time information of the background image. In addition, for example, the shooting data file generation unit 02180 handles foreground images, background images, and 3D model data based on foreground image time information, background image time information, and 3D model data time information. Output. Note that the shooting data file generation unit 02180 may output the data associated with each other in the form of a file for each data type, or may output a plurality of types of data collectively for each time indicated by the time information. it can.

そして、このように対応付けがなされた撮影データが、後述のＤＢアクセス制御部０２１９０によりデータベース２５０に出力されることで、バックエンドサーバ２７０は時間情報に対応する前景画像と背景画像とから仮想視点画像を生成することができる。 Then, the image data thus associated is output to the database 250 by the DB access control unit 02190 described later, so that the back-end server 270 can determine the virtual viewpoint from the foreground image and the background image corresponding to the time information. An image can be generated.

なお、データ入力制御部０２１２０により取得される前景画像と背景画像のフレームレートが異なる場合、撮影データファイル生成部０２１８０は、常に同時刻の前景画像と背景画像を対応付けて出力することは困難である。そこで、撮影データファイル生成部０２１８０は、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像とを対応付けて出力する。ここで、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像とは、例えば、撮影データファイル生成部０２１８０が取得した背景画像のうち、前景画像の時間情報に最も近い時間情報を有する背景画像のことである。このように、所定の規則に基づいて前景画像と背景画像を対応付けることにより、前景画像と背景画像のフレームレートが異なる場合でも、近い時刻に撮影された前景画像と背景画像とから仮想視点画像を生成することができる。 When the foreground image acquired by the data input control unit 02120 and the background image have different frame rates, it is difficult for the shooting data file generation unit 02180 to always output the foreground image and the background image at the same time in association with each other. is there. Therefore, the shooting data file generation unit 02180 associates and outputs the time information of the foreground image and the background image having the time information having a relationship based on a predetermined rule. Here, the background image having the time information of the foreground image and the time information having a relationship based on a predetermined rule is, for example, the closest to the time information of the foreground image among the background images acquired by the shooting data file generation unit 02180. It is a background image having time information. In this way, by associating the foreground image and the background image based on a predetermined rule, even if the foreground image and the background image have different frame rates, the virtual viewpoint image can be obtained from the foreground image and the background image captured at a close time. Can be generated.

補足として、前景画像と背景画像の対応付けの方法は、必ずしも上記の方法に限定されない。例えば、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像として、取得された背景画像であって、かつ前景画像より前の時刻に撮影された背景画像のうち、前景画像の時間情報に最も近い時間情報を有する背景画像を設定してもよい。この方法によれば、前景画像よりフレームレートの低い背景画像の取得を待つことなく、対応付けられた前景画像と背景画像を低遅延で出力することができる。また、前景画像の時間情報と所定の規則に基づく関係にある時間情報を有する背景画像として、取得された背景画像であって、かつ前景画像より後の時刻に撮影された背景画像のうち、前景画像の時間情報に最も近い時間情報を有する背景画像を設定してもよい。 As a supplement, the method for associating the foreground image with the background image is not necessarily limited to the above method. For example, among the background images acquired at a time before the foreground image as a background image having time information having a relationship based on a predetermined rule with the time information of the foreground image, the foreground A background image having time information closest to the time information of the image may be set. According to this method, the associated foreground image and background image can be output with low delay without waiting for acquisition of a background image having a lower frame rate than the foreground image. In addition, among the background images acquired at a time after the foreground image as a background image having time information having a relationship based on a predetermined rule with the time information of the foreground image, the foreground A background image having time information closest to the time information of the image may be set.

非撮影データファイル生成部０２１８５は、キャリブレーション部０２１４０からカメラパラメータ、制御部０２１１０からスタジアムの三次元形状データを取得すると、それらのデータをファイル形式に応じて成形する。そして、その成形したデータをＤＢアクセス制御部０２１９０に送信する。なお、非撮影データファイル生成部０２１８５に入力されるカメラパラメータ及びスタジアム形状データは、ファイル形式に応じて個別に成形される。即ち、非撮影データファイル生成部０２１８５は、いずれか一方のデータを受信すると、そのデータを個別にＤＢアクセス制御部０２１９０に送信する。 When the non-photographed data file generation unit 02185 acquires the camera parameters from the calibration unit 02140 and the three-dimensional shape data of the stadium from the control unit 02110, the non-photographing data file generation unit 02185 shapes the data according to the file format. Then, the formed data is transmitted to the DB access control unit 02190. The camera parameters and stadium shape data input to the non-photographing data file generation unit 02185 are individually formed according to the file format. That is, when any one of the data is received, the non-photographed data file generation unit 02185 individually transmits the data to the DB access control unit 02190.

ＤＢアクセス制御部０２１９０は、ＩｎｆｉｎｉＢａｎｄ等により高速な通信が可能となるようにデータベース２５０と接続される。ＤＢアクセス制御部０２１９０は、撮影データファイル生成部０２１８０及び非撮影データファイル生成部０２１８５から受信したファイルをデータベース２５０に送信する。このように、本実施形態では、撮影データファイル生成部０２１８０により時間情報に基づいて対応付けられた撮影データは、フロントエンドサーバ２３０とネットワークを介して接続されるデータベース２５０に出力される。 The DB access control unit 02190 is connected to the database 250 so that high-speed communication is possible using InfiniBand or the like. The DB access control unit 02190 transmits the files received from the imaging data file generation unit 02180 and the non-imaging data file generation unit 02185 to the database 250. As described above, in the present embodiment, the shooting data associated with the shooting data file generation unit 02180 based on the time information is output to the database 250 connected to the front-end server 230 via the network.

なお、撮影データファイル生成部０２１８０により時間情報に基づいて対応付けられた撮影データの出力先は、必ずしもこれに限定されない。したがって、例えば、フロントエンドサーバ２３０は、時間情報に基づいて対応付けられた撮影データを、フロントエンドサーバ２３０とネットワークを介して接続され、仮想視点画像を生成するバックエンドサーバ２７０に出力してもよい。また、フロントエンドサーバ２３０は、時間情報に基づいて対応付けられた撮影データを、データベース２５０とバックエンドサーバ２７０の両方に出力してもよい。 Note that the output destination of the shooting data associated based on the time information by the shooting data file generation unit 02180 is not necessarily limited to this. Therefore, for example, the front-end server 230 may output the shooting data associated based on the time information to the back-end server 270 that is connected to the front-end server 230 via the network and generates a virtual viewpoint image. Good. Further, the front end server 230 may output shooting data associated based on the time information to both the database 250 and the back end server 270.

その他、本実施形態では、フロントエンドサーバ２３０が前景画像と背景画像の対応付けを行う仕様として説明しているが、必ずしもこれに限らず、データベース２５０がその対応付けを行うこともできる。例えば、データベース２５０は、フロントエンドサーバ２３０から時間情報を有する前景画像及び背景画像を取得し、前景画像と背景画像とを各々の時間情報に基づいて対応付けて、データベース２５０自体が備える記憶領域に出力することもできる。 In addition, in the present embodiment, the front-end server 230 is described as a specification for associating the foreground image with the background image. However, the present invention is not necessarily limited thereto, and the database 250 can also perform the association. For example, the database 250 acquires a foreground image and a background image having time information from the front-end server 230, associates the foreground image and the background image based on each time information, and stores them in a storage area included in the database 250 itself. It can also be output.

図５は、フロントエンドサーバ２３０内部のデータ入力制御部０２１２０の機能ブロック図である。データ入力制御部０２１２０は、図５に示されるように、サーバネットワークアダプタ０６２１０、サーバ伝送部０６２２０、及びサーバ画像処理部０６２３０を有する。以下、サーバネットワークアダプタ０６２１０、サーバ伝送部０６２２０、及びサーバ画像処理部０６２３０の各々に関して、説明を補足する。 FIG. 5 is a functional block diagram of the data input control unit 02120 inside the front-end server 230. As shown in FIG. 5, the data input control unit 02120 includes a server network adapter 06210, a server transmission unit 06220, and a server image processing unit 06230. Hereinafter, a supplementary explanation will be given for each of the server network adapter 06210, the server transmission unit 06220, and the server image processing unit 06230.

サーバネットワークアダプタ０６２１０は、サーバデータ受信部０６２１１を有し、カメラアダプタ１２０から送信されるデータを受信する。サーバ伝送部０６２２０は、サーバデータ受信部０６２１１から受信したデータに対して処理を実行する。以下、サーバデータ伸張部０６２２１、サーバデータルーティング処理部０６２２２、サーバ画像・音声伝送処理部０６２２３、サーバデータルーティング情報保持部０６２２４の各々に関して、説明を補足する。 The server network adapter 06210 has a server data receiving unit 06221 and receives data transmitted from the camera adapter 120. The server transmission unit 06220 performs processing on the data received from the server data reception unit 06221. Hereinafter, the server data decompression unit 06221, the server data routing processing unit 06222, the server image / audio transmission processing unit 06223, and the server data routing information holding unit 06224 will be supplementarily described.

サーバデータ伸張部０６２２１は、圧縮されたデータを伸張する。サーバデータルーティング処理部０６２２２は、後述のサーバデータルーティング情報保持部０６２２４が保持するアドレス等のルーティング情報に基づいて、データの転送先を決定し、サーバデータ受信部０６２１１から受信したデータを転送する。 The server data decompression unit 06221 decompresses the compressed data. The server data routing processing unit 06222 determines a data transfer destination based on routing information such as an address held by a server data routing information holding unit 06224, which will be described later, and transfers the data received from the server data receiving unit 06221.

サーバ画像・音声伝送処理部０６２２３は、カメラアダプタ１２０からサーバデータ受信部０６２１１を介してメッセージを受信し、そのメッセージに含まれるデータ種別に応じて、フラグメント化されたデータを画像データ又は音声データに復元する。なお、復元後の画像データや音声データが圧縮されている場合は、サーバデータ伸張部０６２２１で伸張処理が施される。 The server image / audio transmission processing unit 06223 receives a message from the camera adapter 120 via the server data reception unit 06221, and converts the fragmented data into image data or audio data according to the data type included in the message. Restore. If the restored image data or audio data is compressed, the server data decompression unit 06221 performs decompression processing.

サーバデータルーティング情報保持部０６２２４は、サーバデータ受信部０６２１１が受信したデータの送信先を決定するためのアドレス情報を保持する。なお、ルーティング方法については後述する。 The server data routing information holding unit 06224 holds address information for determining a transmission destination of data received by the server data receiving unit 06221. The routing method will be described later.

サーバ画像処理部０６２３０は、カメラアダプタ１２０から受信した画像データ又は音声データに関連する処理を実行する。処理内容は、例えば、画像データのデータ実体（前景画像、背景画像、及び三次元モデル情報）に応じた、カメラ番号や画像フレームの撮影時刻、画像サイズ、画像フォーマット、及び画像の座標の属性情報等が付与されたフォーマットへの整形処理等である。 The server image processing unit 06230 executes processing related to image data or audio data received from the camera adapter 120. The processing content includes, for example, camera number, image frame shooting time, image size, image format, and image coordinate attribute information according to the data entity of the image data (foreground image, background image, and 3D model information). For example, shaping processing into a format to which etc. are assigned.

図６は、データベース２５０の機能ブロック図である。制御部０２４１０は、ＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリ等の記憶媒体、及びＥｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。制御部０２４１０は、データベース２５０の各機能ブロック及びデータベース２５０のシステム全体を制御する。 FIG. 6 is a functional block diagram of the database 250. The control unit 02410 includes a CPU, a DRAM, a storage medium such as an HDD or a NAND memory that stores program data and various data, and hardware such as Ethernet (registered trademark). The control unit 02410 controls each functional block of the database 250 and the entire system of the database 250.

データ入力部０２４２０は、ＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、フロントエンドサーバ２３０から撮影データや非撮影データのファイルを受信する。受信したファイルは、キャッシュ（ＲＡＭ）０２４４０に送信される。また、受信した撮影データのメタ情報を読み出し、メタ情報に記録されたタイムコード情報やルーティング情報、カメラ識別子等の情報に基づいて、取得したデータへのアクセスが可能になるようにデータベーステーブルを作成する。 The data input unit 02420 receives shooting data and non-shooting data files from the front-end server 230 through high-speed communication such as InfiniBand. The received file is transmitted to the cache (RAM) 02440. In addition, the meta information of the received shooting data is read, and a database table is created so that the acquired data can be accessed based on information such as time code information, routing information, and camera identifier recorded in the meta information. To do.

データ出力部０２４３０は、バックエンドサーバ２７０から要求されたデータが後述のキャッシュ０２４４０、一次ストレージ０２４５０、二次ストレージ０２４６０のいずれに保存されているか判定する。データ出力部０２４３０は、ＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、保存先からデータを読み出してバックエンドサーバ２７０に送信する。 The data output unit 02430 determines whether the data requested from the back-end server 270 is stored in a cache 02440, a primary storage 02450, or a secondary storage 02460, which will be described later. The data output unit 02430 reads out data from the storage destination and transmits it to the back-end server 270 by high-speed communication such as InfiniBand.

キャッシュ０２４４０は、高速な入出力スループットを実現可能なＤＲＡＭ等の記憶装置を有しており、データ入力部０２４２０から取得した撮影データや非撮影データを記憶装置に格納する。格納されたデータは一定量保持され、それを超えるデータが入力される場合に、古いデータから随時一次ストレージ０２４５０に書き込まれ、また、書き込み済みのデータは新たなデータによって上書きされる。 The cache 02440 includes a storage device such as a DRAM capable of realizing high-speed input / output throughput, and stores the shooting data and non-shooting data acquired from the data input unit 02420 in the storage device. A certain amount of stored data is retained, and when data exceeding that amount is input, old data is written to the primary storage 02450 as needed, and written data is overwritten by new data.

なお、キャッシュ０２４４０に一定量保存されるデータは、少なくとも１フレーム分の撮影データである。それによって、バックエンドサーバ２７０において画像のレンダリング処理を実行する際に、データベース２５０内でのスループットを最小限に抑え、最新の画像フレームを低遅延かつ連続的にレンダリングすることができる。また、ここで、上述の目的を達成するためには、キャッシュされているデータの中に背景画像が含まれている必要がある。そのため、背景画像を有さないフレームの撮影データがキャッシュされた場合、キャッシュ上の背景画像は更新されず、そのままキャッシュ上に保持される。 Note that the data stored in a certain amount in the cache 02440 is photographing data for at least one frame. Thereby, when the image rendering process is executed in the back-end server 270, the throughput in the database 250 can be minimized, and the latest image frame can be rendered continuously with low delay. Here, in order to achieve the above object, the cached data needs to include a background image. For this reason, when shooting data of a frame that does not have a background image is cached, the background image on the cache is not updated and is held on the cache as it is.

キャッシュ可能なＤＲＡＭの容量は、予めシステムに設定されたキャッシュフレームサイズ、又は制御ステーションからの指示によって設定される。なお、非撮影データについては、入出力の頻度が少なく、試合前等においては高速なスループットを要求されないため、すぐに一次ストレージにコピーされる。また、キャッシュされたデータは、データ出力部０２４３０によって読み出される。 The capacity of the DRAM that can be cached is set by a cache frame size preset in the system or by an instruction from the control station. Note that the non-photographed data is copied to the primary storage immediately because the frequency of input / output is low and high-speed throughput is not required before a game or the like. Also, the cached data is read by the data output unit 02430.

一次ストレージ０２４５０は、ＳＳＤ等のストレージメディアを並列に接続する等して構成される。一次ストレージ０２４５０では、データ入力部０２４２０からの大量のデータの書き込み処理及びデータ出力部０２４３０からのデータ読み出し処理が同時に実現され、処理の高速化が図られる。なお、一次ストレージ０２４５０には、キャッシュ０２４４０上に格納されたデータのうち、古いデータから順に書き出される。 The primary storage 02450 is configured by connecting storage media such as SSDs in parallel. In the primary storage 02450, writing processing of a large amount of data from the data input unit 02420 and data reading processing from the data output unit 02430 are realized at the same time, and the processing speed is increased. In the primary storage 02450, data stored in the cache 02440 is written in order from the oldest data.

二次ストレージ０２４６０は、ＨＤＤやテープメディア等で構成され、高速性よりも大容量が重視され、また、一次ストレージと比較して安価で長期間の記憶に適したメディアである。なお、二次ストレージ０２４６０には、撮影が完了した後、データのバックアップとして一次ストレージ０２４５０に格納されたデータが書き出される。 The secondary storage 02460 is composed of an HDD, a tape medium, and the like. The secondary storage 02460 is a medium that emphasizes a large capacity rather than high speed, and is inexpensive and suitable for long-term storage compared to the primary storage. Note that the data stored in the primary storage 02450 is written to the secondary storage 02460 as a backup of the data after the photographing is completed.

図７は、バックエンドサーバ２７０の機能ブロック図である。バックエンドサーバ２７０は、図７に示されるように、各種機能ブロック（データ受信部０３００１−レンダリングモード管理部０３０１４）を有する。以下、各々の機能について説明する。 FIG. 7 is a functional block diagram of the back-end server 270. As illustrated in FIG. 7, the back-end server 270 includes various functional blocks (data reception unit 03001-rendering mode management unit 03014). Hereinafter, each function will be described.

データ受信部０３００１は、データベース２５０及びコントローラ３００から送信されるデータを受信する。データ受信部０３００１は、データベース２５０から、スタジアムの形状を示す三次元データ（スタジアム形状データ）、前景画像、背景画像、前景画像の三次元モデル（以降、前景三次元モデルと称する）、及び音声を受信する。また、データ受信部０３００１は、仮想視点画像の生成に関する視点を指定するコントローラ３００から、仮想カメラパラメータを受信する。ここで、仮想カメラパラメータとは仮想視点の位置や姿勢等を示すデータであり、仮想カメラパラメータには、例えば、外部パラメータの行列と内部パラメータの行列が用いられる。 The data receiving unit 03001 receives data transmitted from the database 250 and the controller 300. The data receiving unit 03001 receives from the database 250 three-dimensional data (stadium shape data) indicating the shape of the stadium, foreground image, background image, three-dimensional model of the foreground image (hereinafter referred to as the foreground three-dimensional model), and sound. Receive. Further, the data receiving unit 03001 receives virtual camera parameters from the controller 300 that designates a viewpoint related to generation of a virtual viewpoint image. Here, the virtual camera parameters are data indicating the position and orientation of the virtual viewpoint, and for example, an external parameter matrix and an internal parameter matrix are used as the virtual camera parameters.

なお、データ受信部０３００１がコントローラ３００から受信（取得）するデータに関して、必ずしも仮想カメラパラメータに限定されない。コントローラ３００から取得する情報として、視点の指定方法、コントローラが動作させているアプリケーションを特定する情報、コントローラ３００の識別情報、及びコントローラ３００を使用するユーザの識別情報の少なくとも何れかを含んでいてもよい。 Note that the data received by the data receiving unit 03001 from the controller 300 is not necessarily limited to virtual camera parameters. Information acquired from the controller 300 may include at least one of a viewpoint designation method, information for specifying an application that the controller is operating, identification information for the controller 300, and identification information for a user who uses the controller 300. Good.

また、データ受信部０３００１は、コントローラ３００から出力される上述の情報と同様の情報を、エンドユーザ端末１９０から取得してもよい。さらに、データ受信部０３００１は、データベース２５０やコントローラ３００等の外部の装置から、複数のカメラ１１２に関する情報を取得してもよい。ここで、複数のカメラ１１２に関する情報は、例えば、複数のカメラ１１２の数に関する情報や複数のカメラ１１２の動作状態に関する情報等である。なお、カメラ１１２の動作状態には、例えば、カメラ１１２の正常状態、故障状態、待機状態、起動状態、及び再起動状態の少なくとも何れかが含まれる。 Further, the data receiving unit 03001 may acquire the same information as the above-described information output from the controller 300 from the end user terminal 190. Further, the data receiving unit 03001 may acquire information related to the plurality of cameras 112 from external devices such as the database 250 and the controller 300. Here, the information regarding the plurality of cameras 112 is, for example, information regarding the number of the plurality of cameras 112, information regarding the operation state of the plurality of cameras 112, and the like. Note that the operation state of the camera 112 includes, for example, at least one of a normal state, a failure state, a standby state, a start state, and a restart state of the camera 112.

背景テクスチャ貼り付け部０３００２は、背景メッシュモデル管理部０３０１３から取得され、背景メッシュモデル（スタジアム形状データ）で示される三次元空間形状に対して、背景画像をテクスチャとして貼り付ける。これにより、背景テクスチャ貼り付け部０３００２はテクスチャ付き背景メッシュモデルを生成する。ここで、メッシュモデルとは、例えば、ＣＡＤデータ等、三次元の空間形状を面の集合で表現したデータのことである。また、テクスチャとは、物体の表面の質感を表現するために、三次元空間形状に対して貼り付ける画像のことである。 The background texture pasting unit 03002 obtains the background image from the background mesh model management unit 03013 and pastes the background image as a texture to the three-dimensional space shape indicated by the background mesh model (stadium shape data). As a result, the background texture pasting unit 03002 generates a textured background mesh model. Here, the mesh model is data representing a three-dimensional space shape as a set of surfaces, such as CAD data. A texture is an image that is pasted on a three-dimensional space shape in order to express the surface texture of an object.

前景テクスチャ決定部０３００３は、前景画像及び前景三次元モデル群より前景三次元モデルのテクスチャ情報を決定する。前景テクスチャ境界色合わせ部０３００４は、前景三次元モデルのテクスチャ情報と三次元モデル群からテクスチャの境界の色合わせを実行し、前景オブジェクト毎に色付き前景三次元モデル群を生成する。仮想視点前景画像生成部０３００５は、仮想カメラパラメータに基づいて、前景画像群を仮想視点からの見た目となるように透視変換する。 The foreground texture determination unit 03003 determines the texture information of the foreground 3D model from the foreground image and the foreground 3D model group. The foreground texture boundary color matching unit 03004 executes color matching of the texture boundary from the texture information of the foreground 3D model and the 3D model group, and generates a colored foreground 3D model group for each foreground object. The virtual viewpoint foreground image generation unit 03005 performs perspective transformation so that the foreground image group looks from the virtual viewpoint based on the virtual camera parameters.

レンダリング部０３００６は、レンダリングモード管理部０３０１４で決定された、仮想視点画像の生成に用いられる生成方式に基づいて、背景画像と前景画像をレンダリングして全景の仮想視点画像を生成する。ここでは、仮想視点画像の生成方式として、モデルベースレンダリング（Model‐Based Rendering：ＭＢＲ）とイメージベースレンダリング（Image‐Based Rendering：ＩＢＲ）の２つのレンダリングモードを用いる。 The rendering unit 03006 renders the background image and the foreground image based on the generation method used for generating the virtual viewpoint image determined by the rendering mode management unit 03014, and generates a virtual viewpoint image of the entire view. Here, two rendering modes of model-based rendering (Model-Based Rendering: MBR) and image-based rendering (Image-Based Rendering: IBR) are used as the virtual viewpoint image generation method.

ＭＢＲとは、被写体を複数の方向から撮影した場合に、その撮影された複数の画像に基づいて生成される三次元モデルを用いて仮想視点画像を生成する方式である。具体的には、視体積交差法、Ｍｕｌｔｉ−Ｖｉｅｗ−Ｓｔｅｒｅｏ（ＭＶＳ）等の三次元形状復元手法により取得された対象シーンの三次元形状（モデル）を用いて、仮想視点からのシーンの見えを画像として生成する技術である。本実施形態では、レンダリングモードとしてＭＢＲを用いる場合、背景メッシュモデルと前景テクスチャ境界色合わせ部０３００４で生成された前景三次元モデル群を合成することで全景モデルを生成し、さらに、その全景モデルから仮想視点画像を生成する。 MBR is a method of generating a virtual viewpoint image using a three-dimensional model generated based on a plurality of captured images when the subject is captured from a plurality of directions. Specifically, using the three-dimensional shape (model) of the target scene acquired by a three-dimensional shape restoration method such as a visual volume intersection method or Multi-View-Stereo (MVS), the appearance of the scene from the virtual viewpoint is determined. This is a technique for generating images. In the present embodiment, when MBR is used as the rendering mode, a foreground model is generated by combining the background mesh model and the foreground 3D model group generated by the foreground texture boundary color matching unit 03004. A virtual viewpoint image is generated.

また、ＩＢＲとは、対象のシーンを複数視点から撮影した入力画像群を変形、合成することによって、仮想視点からの見えを再現した仮想視点画像を生成する技術である。本実施形態では、レンダリングモードとしてＩＢＲを用いる場合、背景テクスチャモデルに基づいて仮想視点から見た背景画像を生成し、そこに仮想視点前景画像生成部０３００５で生成された前景画像を合成することで、仮想視点画像を生成する。なお、レンダリングモードとしてＩＢＲを用いると、ＭＢＲによりレンダリングするときに用いられる撮影画像（即ち、三次元モデルを生成するために用いられる撮影画像）よりも少ない撮影画像に基づいて、仮想視点画像を生成することができる。その他、補足として、レンダリング部０３００６は、上述のＭＢＲとＩＢＲ以外のレンダリング手法を用いることもできる。 IBR is a technique for generating a virtual viewpoint image that reproduces the appearance from a virtual viewpoint by transforming and synthesizing an input image group obtained by photographing a target scene from a plurality of viewpoints. In this embodiment, when IBR is used as a rendering mode, a background image viewed from a virtual viewpoint is generated based on a background texture model, and a foreground image generated by the virtual viewpoint foreground image generation unit 03005 is synthesized there. A virtual viewpoint image is generated. When IBR is used as a rendering mode, a virtual viewpoint image is generated based on fewer photographed images than photographed images used when rendering by MBR (that is, photographed images used for generating a three-dimensional model). can do. In addition, as a supplement, the rendering unit 03006 can use rendering methods other than the above-described MBR and IBR.

レンダリングモード管理部０３０１４は、仮想視点画像の生成に用いられる生成方式（即ち、レンダリングモード）を決定し、その決定した結果を保持する。本実施形態では、レンダリングモード管理部０３０１４は、複数のレンダリングモードからレンダリングに用いるレンダリングモードを決定する。 The rendering mode management unit 03014 determines a generation method (that is, rendering mode) used for generating the virtual viewpoint image, and holds the determined result. In the present embodiment, the rendering mode management unit 03014 determines a rendering mode used for rendering from a plurality of rendering modes.

そして、このレンダリングモードの決定は、データ受信部０３００１により取得された情報に基づいて実行される。例えば、レンダリングモード管理部０３０１４は、取得された情報から特定されるカメラの数が閾値以下である場合に、仮想視点画像の生成に用いられる生成方式をＩＢＲに決定する。このように、カメラの数が少ない場合には、ＩＢＲを用いることで、ＭＢＲを用いた場合の三次元モデルの精度の低下による仮想視点画像の画質低下を回避することができる。 The rendering mode is determined based on information acquired by the data receiving unit 03001. For example, when the number of cameras specified from the acquired information is equal to or less than a threshold value, the rendering mode management unit 03014 determines the generation method used for generating the virtual viewpoint image to be IBR. As described above, when the number of cameras is small, by using IBR, it is possible to avoid a decrease in image quality of the virtual viewpoint image due to a decrease in accuracy of the three-dimensional model when MBR is used.

他方、レンダリングモード管理部０３０１４は、取得された情報から特定されるカメラの数が閾値より多い場合に、仮想視点画像の生成に用いられる生成方式をＭＢＲに決定する。このように、カメラの数が多い場合には、ＭＢＲを用いて仮想視点画像を生成することで、視点の指定可能範囲を拡張することができる。 On the other hand, when the number of cameras specified from the acquired information is greater than the threshold, the rendering mode management unit 03014 determines the generation method used for generating the virtual viewpoint image to be MBR. As described above, when the number of cameras is large, the specifiable range of viewpoints can be expanded by generating a virtual viewpoint image using MBR.

なお、レンダリングモード管理部０３０１４は、例えば、撮影から画像出力までの許容される処理遅延時間の長短に基づいて、生成方式（即ち、レンダリングモード）を決定してもよい。このとき、遅延時間が長くても視点の自由度を優先する場合はＭＢＲ、遅延時間が短いことを要求する場合はＩＢＲを用いる。 Note that the rendering mode management unit 03014 may determine the generation method (that is, the rendering mode) based on, for example, the allowable processing delay time from shooting to image output. At this time, MBR is used when priority is given to the degree of freedom of view even if the delay time is long, and IBR is used when a short delay time is required.

また、例えば、コントローラ３００やエンドユーザ端末１９０が視点の高さを指定可能であることを示す情報をデータ受信部０３００１が取得した場合、レンダリングモード管理部０３０１４は、仮想視点画像の生成に用いられる生成方式をＭＢＲに決定する。これにより、生成方式がＩＢＲであることに起因して、ユーザによる視点の高さの変更要求が受け入れられなくなることを防ぐことができる。 For example, when the data reception unit 03001 acquires information indicating that the controller 300 or the end user terminal 190 can specify the height of the viewpoint, the rendering mode management unit 03014 is used to generate a virtual viewpoint image. The generation method is determined as MBR. As a result, it is possible to prevent the request for changing the height of the viewpoint by the user from being accepted due to the generation method being IBR.

このように、状況に応じて仮想視点画像の生成方式を決定することで、適切な生成方式で仮想視点画像を生成することができる。また、複数のレンダリングモードを要求に応じて切り替え可能な構成にすることで（即ち、システムを柔軟に構成することで）、本発明をスタジアム以外の被写体にも容易に適用することができる。 Thus, by determining the generation method of the virtual viewpoint image according to the situation, the virtual viewpoint image can be generated by an appropriate generation method. In addition, by adopting a configuration in which a plurality of rendering modes can be switched as required (that is, by flexibly configuring the system), the present invention can be easily applied to subjects other than the stadium.

なお、レンダリングモード管理部０３０１４が保持するレンダリングモードは、システムに予め設定された方式であってもよい。また、仮想カメラ操作ＵＩ３３０やエンドユーザ端末１９０を操作するユーザがレンダリングモードを任意に設定できるようにしてもよい。その他、レンダリングモード管理部０３０１４が決定する仮想視点画像の生成方式はレンダリングの方式に必ずしも限られず、レンダリングモード管理部０３０１４は仮想視点画像を生成するためのレンダリング以外の処理の方式を決定することもできる。 Note that the rendering mode held by the rendering mode management unit 03014 may be a system preset in the system. Further, a user who operates the virtual camera operation UI 330 or the end user terminal 190 may arbitrarily set the rendering mode. In addition, the virtual viewpoint image generation method determined by the rendering mode management unit 03014 is not necessarily limited to the rendering method, and the rendering mode management unit 03014 may determine a processing method other than rendering for generating the virtual viewpoint image. it can.

仮想視点音声生成部０３００７は、仮想カメラパラメータに基づいて、仮想視点において聞こえる音声（音声群）を生成する。合成部０３００８は、レンダリング部０３００６で生成された画像群と仮想視点音声生成部０３００７で生成された音声を合成して仮想視点コンテンツを生成する。 The virtual viewpoint sound generation unit 03007 generates sound (sound group) that can be heard at the virtual viewpoint based on the virtual camera parameters. The synthesizing unit 03008 synthesizes the image group generated by the rendering unit 03006 and the audio generated by the virtual viewpoint audio generation unit 03007 to generate virtual viewpoint content.

画像出力部０３００９は、コントローラ３００とエンドユーザ端末１９０にＥｔｈｅｒｎｅｔ（登録商標）を用いて仮想視点コンテンツを出力する。なお、外部への伝送手段はＥｔｈｅｒｎｅｔ（登録商標）に限定されるものではなく、ＳＤＩ、ＤｉｓｐｌａｙＰｏｒｔ、及びＨＤＭＩ（登録商標）等の信号伝送手段を用いてもよい。また、バックエンドサーバ２７０は、レンダリング部０３００６で生成された仮想視点画像（即ち、音声を含まない仮想視点画像）を出力してもよい。 The image output unit 0309 outputs virtual viewpoint content to the controller 300 and the end user terminal 190 using Ethernet (registered trademark). Note that the transmission means to the outside is not limited to Ethernet (registered trademark), and signal transmission means such as SDI, DisplayPort, and HDMI (registered trademark) may be used. Further, the back-end server 270 may output the virtual viewpoint image generated by the rendering unit 03006 (that is, a virtual viewpoint image that does not include sound).

前景オブジェクト決定部０３０１０は、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの空間上の位置を示す前景オブジェクトの位置情報から、表示される前景オブジェクト群を決定して、前景オブジェクトリストを出力する。即ち、前景オブジェクト決定部０３０１０は、仮想視点の画像情報を物理的なカメラ１１２にマッピングする処理を実行する。 The foreground object determination unit 03010 determines a foreground object group to be displayed from the virtual camera parameters and position information of the foreground object indicating the position of the foreground object included in the foreground three-dimensional model, and outputs a foreground object list To do. That is, the foreground object determination unit 03010 executes a process of mapping the virtual viewpoint image information to the physical camera 112.

なお、本実施形態では、レンダリングモード管理部０３０１４で決定されるレンダリングモードに応じて、マッピングする処理を実行する。そのため、複数の前景オブジェクトを決定する制御部が前景オブジェクト決定部０３０１０内部に実装され、レンダリングモードと連動して制御を行う。 In the present embodiment, mapping processing is executed in accordance with the rendering mode determined by the rendering mode management unit 03014. For this reason, a control unit that determines a plurality of foreground objects is mounted in the foreground object determination unit 03010 and performs control in conjunction with the rendering mode.

要求リスト生成部０３０１１は、指定時間の前景オブジェクトリストに対応する前景画像群と前景三次元モデル群、及び背景画像と音声データをデータベース２５０に要求するための要求リストを生成する。なお、前景画像群と前景三次元モデル群に関しては、仮想視点を考慮して選択されたデータがデータベース２５０に要求されるが、背景画像と音声データに関しては、そのフレームに関する全てのデータがデータベース２５０に要求される。バックエンドサーバ２７０の起動後、背景メッシュモデルが取得されるまで背景メッシュモデルの要求リストが生成される。 The request list generation unit 03011 generates a request list for requesting the database 250 for the foreground image group and the foreground 3D model group corresponding to the foreground object list at the specified time, and the background image and audio data. For the foreground image group and the foreground three-dimensional model group, data selected in consideration of the virtual viewpoint is required of the database 250. However, for the background image and the audio data, all data related to the frame is stored in the database 250. As required. After the back-end server 270 is started, a background mesh model request list is generated until the background mesh model is acquired.

要求データ出力部０３０１２は、入力された要求リストに基づいて、データベース２５０に対してデータ要求のコマンドを出力する。背景メッシュモデル管理部０３０１３は、データベース２５０から受信した背景メッシュモデルを記憶する。 The request data output unit 03012 outputs a data request command to the database 250 based on the input request list. The background mesh model management unit 03013 stores the background mesh model received from the database 250.

なお、本実施形態では、バックエンドサーバ２７０が仮想視点画像の生成方式の決定と仮想視点画像の生成の両方を実行する仕様として説明するが、必ずしもこれに限定されない。したがって、バックエンドサーバ２７０は、仮想視点画像の生成方式を決定することなく、入力された生成方式を示す情報に基づいて、仮想視点画像を生成してもよい。その場合、例えば、フロントエンドサーバ２３０が、複数のカメラ１１２に関する情報や仮想視点画像の生成に係る視点を指定する装置から出力される情報等に基づいて、仮想視点画像の生成に用いられる生成方式を決定する。フロントエンドサーバ２３０は、カメラ１１２による撮影に基づく画像データをデータベース２５０に、また、決定した生成方式を示す情報をバックエンドサーバ２７０に出力する。そして、バックエンドサーバ２７０は、フロントエンドサーバ２３０が出力した生成方式を示す情報に基づいて、仮想視点画像を生成する。 In the present embodiment, the back-end server 270 is described as a specification for executing both the determination of the generation method of the virtual viewpoint image and the generation of the virtual viewpoint image, but the present invention is not necessarily limited thereto. Therefore, the back-end server 270 may generate the virtual viewpoint image based on the input information indicating the generation method without determining the virtual viewpoint image generation method. In that case, for example, the generation method used by the front-end server 230 to generate a virtual viewpoint image based on information about a plurality of cameras 112, information output from a device that specifies a viewpoint for generating a virtual viewpoint image, or the like. To decide. The front-end server 230 outputs image data based on the image taken by the camera 112 to the database 250 and information indicating the determined generation method to the back-end server 270. Then, the back-end server 270 generates a virtual viewpoint image based on the information indicating the generation method output from the front-end server 230.

このように、フロントエンドサーバ２３０が生成方式を決定することで、決定された方式とは別の方式での画像生成のためのデータをデータベース２５０やバックエンドサーバ２７０が処理することによる処理負荷を低減できる。補足として、本実施形態のように、バックエンドサーバ２７０が生成方式を決定する場合、データベース２５０は複数の生成方式に対応可能なデータを保持するため、複数の生成方式の各々に対応する複数の仮想視点画像の生成が可能となる。以上のように、システムを構成することにより、複数のカメラで取得された撮影画像から所望の仮想視点画像を生成することができる。 As described above, when the front-end server 230 determines the generation method, the processing load caused by the database 250 and the back-end server 270 processing data for image generation using a method different from the determined method. Can be reduced. As a supplement, when the back-end server 270 determines the generation method as in the present embodiment, the database 250 holds data that can correspond to a plurality of generation methods, and thus a plurality of generation methods corresponding to each of the plurality of generation methods. A virtual viewpoint image can be generated. As described above, by configuring the system, a desired virtual viewpoint image can be generated from captured images acquired by a plurality of cameras.

次に、カメラアダプタ１２０における符号量制御処理について説明する。なお、この符号量制御処理は、画像処理システムにおける伝送路に即した符号量となるように、個々の視点の符号化処理を適切に決定するために実行される。 Next, the code amount control process in the camera adapter 120 will be described. This code amount control process is executed in order to appropriately determine the encoding process for each viewpoint so that the code amount conforms to the transmission path in the image processing system.

以下、図９のフローチャートを用いて、符号量制御処理の手順を説明する。なお、以下の説明では、目標符号量という用語を、画像処理システムにおいて発生するデータをリアルタイムで伝送するために個々のカメラにおいて設定される符号量という意味で用いる。具体的には、本実施形態において、目標符号量は、伝送路の通信帯域中で映像を伝送するために、使用が可能な帯域を視点数（カメラ数）で除算した値である。即ち、画質の指標となる目標符号量は、伝送路の伝送容量に基づいて設定され得る。 Hereinafter, the procedure of the code amount control process will be described with reference to the flowchart of FIG. In the following description, the term target code amount is used to mean a code amount set in each camera in order to transmit data generated in the image processing system in real time. Specifically, in the present embodiment, the target code amount is a value obtained by dividing a usable band by the number of viewpoints (the number of cameras) in order to transmit video in the communication band of the transmission path. That is, the target code amount serving as an index of image quality can be set based on the transmission capacity of the transmission path.

ステップＳ９０１において、カメラアダプタ１２０は、視野重複マップを照合し、前景毎に、自身を捕捉しているカメラの数を取得する。ここで、視野重複マップは、処理中のカメラにおける視野において、他のカメラの視野がどれだけ重複しているか（即ち、重複度合い）を示すマップのことである。 In step S901, the camera adapter 120 collates the visual field overlap map, and acquires the number of cameras capturing itself for each foreground. Here, the field-of-view overlap map is a map indicating how much the fields of view of other cameras overlap in the field of view of the camera being processed (that is, the degree of overlap).

以下、視野重複マップに関して、図１０を用いて説明する。図１０（ａ）は視野重複マップを説明するための図であり、図１０（ａ）では、カメラを４台配置し、各々のカメラの視野を模式的に示している。なお、説明を容易にするため、カメラを４台としているが、実際には、これよりも多くのカメラを用いる方が、仮想視点の画質という観点で望ましい。 Hereinafter, the visual field overlap map will be described with reference to FIG. FIG. 10A is a diagram for explaining a field-of-view overlap map. In FIG. 10A, four cameras are arranged, and the field of view of each camera is schematically shown. For ease of explanation, the number of cameras is four, but in reality, using more cameras than this is desirable from the viewpoint of the image quality of the virtual viewpoint.

図１０（ａ）に示されるように、スタジアムのフィールド上では、領域によって視野に入っているカメラの数（以下、視野重複度）が異なる。そして、この視野重複度を、各カメラの視野に基づいてマッピングしたものが視野重複マップである。このように、視野重複マップは、スタジアムの三次元形状とカメラ配置の相対的な位置関係に基づいて、作成（取得）される。なお、本実施形態では、視野重複マップを事前データとして作成し、制御ステーション３１０から対象となるカメラアダプタに各々、予め配信するものとする。 As shown in FIG. 10A, on the stadium field, the number of cameras in the field of view (hereinafter referred to as field overlap) differs depending on the area. A visual field overlap map is obtained by mapping the degree of visual field overlap based on the visual field of each camera. As described above, the visual field overlap map is created (acquired) based on the relative positional relationship between the three-dimensional shape of the stadium and the camera arrangement. In the present embodiment, it is assumed that a visual field overlap map is created as advance data and is distributed in advance from the control station 310 to each target camera adapter.

図１０（ｂ）は、図１０（ａ）のＣａｍ３（カメラ３）における視野重複マップであり、マップ内の数字は領域の重複度を示している。ここで、例えば、図１０（ｃ）に示されるように、前景Ａが存在した場合、前景Ａにおける重複度は２であり、また、前景Ｂ、Ｃにおける重複度は４である。 FIG. 10B is a visual field overlap map in Cam3 (camera 3) of FIG. 10A, and the numbers in the map indicate the overlapping degree of the regions. Here, for example, as shown in FIG. 10C, when the foreground A exists, the degree of overlap in the foreground A is 2, and the degree of overlap in the foregrounds B and C is 4.

このように、各カメラの視野において視野重複マップを作成し、前景がマップ上のどこに位置するかを判定することによって、その前景がいくつのカメラによって捕捉されているかを判定することができる。即ち、この場合、カメラアダプタ１２０は、前景となるオブジェクトを撮影するカメラ台数を導出する処理（カメラ台数導出処理）を実行する。なお、この判定に使用する前景の座標として、前景の足元の座標を用いることが望ましい。その場合、例えば、前景の外接矩形における下辺の中点を用いる等すればよい。但し、足元にオクルージョンが発生することが予想される場合には、外接矩形上辺中点から、被写体の大きさ（例えば、人物であれば１７０ｃｍ等）に基づいて、足元の座標を算出するようにしてもよい。 In this way, it is possible to determine how many cameras the foreground is captured by creating a field-of-view overlap map in the field of view of each camera and determining where the foreground is located on the map. In other words, in this case, the camera adapter 120 executes a process for deriving the number of cameras that shoot an object as a foreground (camera number deriving process). It is desirable to use the coordinates of the foreground foot as the foreground coordinates used for this determination. In that case, for example, the midpoint of the lower side of the circumscribed rectangle of the foreground may be used. However, if it is predicted that occlusion will occur at the foot, the coordinates of the foot are calculated based on the size of the subject (for example, 170 cm for a person) from the midpoint of the upper side of the circumscribed rectangle. May be.

その他、前景の座標を決定するにあたり、三次元モデル情報生成部０６１３２で生成される三次元モデルに関する画像情報を用いてもよい。上述のように、三次元モデルに関する画像情報を生成する過程で、例えば、ステレオカメラの原理等を適用することから（即ち、前景からカメラ１１２までの距離情報を処理することから）、前景の座標を決定（導出）することができる。また、視野重複マップにおいて前景が複数の領域に跨るような場合、その領域内で最も小さい重複度を用いるようにする。 In addition, when determining the coordinates of the foreground, image information regarding the three-dimensional model generated by the three-dimensional model information generation unit 06132 may be used. As described above, in the process of generating the image information related to the three-dimensional model, for example, the principle of a stereo camera is applied (that is, the distance information from the foreground to the camera 112 is processed), the coordinates of the foreground Can be determined (derived). Further, when the foreground extends over a plurality of areas in the field-of-view overlap map, the smallest degree of overlap within that area is used.

ステップＳ９０２において、カメラアダプタ１２０は、処理対象として着目するフレームが撮影を開始した１枚目のフレーム（即ち、初期フレーム）であるか否かを判定する。そして、カメラアダプタ１２０は、処理対象として着目するフレームが撮影を開始した１枚目のフレームであると判定すると（Ｓ９０２Ｙｅｓ）、処理をステップＳ９０３に移行させる。また、処理対象として着目するフレームが撮影を開始した１枚目のフレームではないと判定すると（Ｓ９０２Ｎｏ）、処理をステップＳ９０５に移行させる。 In step S902, the camera adapter 120 determines whether or not the frame of interest as a processing target is the first frame (that is, the initial frame) from which shooting has started. If the camera adapter 120 determines that the frame of interest as the processing target is the first frame for which shooting has started (S902 Yes), the process proceeds to step S903. If it is determined that the frame of interest as a processing target is not the first frame for which shooting has started (S902: No), the process proceeds to step S905.

カメラアダプタ１２０は、処理をステップＳ９０３に移行させると、接続されるカメラ１１２に写っている前景の画素数を、視野重複度毎に取得（計測）する。具体的には、例えば、図１０（ｃ）では、視野重複度１の画素数は前景Ｅの画素数、視野重複度２の画素数は前景Ａの画素数、視野重複度３の画素数は前景Ｄの画素数、視野重複度４の画素数は前景Ｂの画素数と前景Ｃの画素数の合算値を計測する。 When the process proceeds to step S903, the camera adapter 120 acquires (measures) the number of foreground pixels reflected in the connected camera 112 for each field overlap degree. Specifically, for example, in FIG. 10C, the number of pixels with field overlap 1 is the number of pixels in foreground E, the number of pixels with field overlap 2 is the number of pixels in foreground A, and the number of pixels with field overlap 3 is As for the number of pixels of the foreground D and the number of pixels of the field overlap degree 4, the total value of the number of pixels of the foreground B and the number of pixels of the foreground C is measured.

カメラアダプタ１２０は、前景の画素数を取得すると、ステップＳ９０４において、各々の前景に対して、符号化時に適用する量子化パラメータを決定する。なお、カメラアダプタ１２０は、ステップＳ９０４の処理を実行する上で、図１１（ａ）に示される量子化パラメータ・圧縮率対応テーブルと、図１１（ｂ）に示される視野重複度・量子化パラメータセット対応テーブルを用いる。以下、図１１を用いて、内容を補足する。 When the camera adapter 120 acquires the number of foreground pixels, in step S904, the camera adapter 120 determines a quantization parameter to be applied to each foreground during encoding. It should be noted that the camera adapter 120, when executing the processing of step S904, performs the quantization parameter / compression ratio correspondence table shown in FIG. 11 (a) and the visual field overlap / quantization parameter shown in FIG. 11 (b). A set correspondence table is used. Hereinafter, the contents will be supplemented with reference to FIG.

先ず、量子化パラメータ・圧縮率対応テーブルについて説明する。この量子化パラメータ・圧縮率対応テーブルは、量子化パラメータに対応する圧縮率を示すテーブルであり、量子化パラメータの設定時に所望の符号量となるか否かを予測するためのテーブルである。なお、この量子化パラメータ・圧縮率対応テーブルは、予めシミュレーションや予備実験等で取得された結果に基づいて作成される。 First, the quantization parameter / compression ratio correspondence table will be described. This quantization parameter / compression rate correspondence table is a table indicating the compression rate corresponding to the quantization parameter, and is a table for predicting whether or not a desired code amount is obtained when the quantization parameter is set. The quantization parameter / compression ratio correspondence table is created based on results obtained in advance through simulations, preliminary experiments, or the like.

図１１（ａ）は、量子化パラメータ・圧縮率対応テーブルの一例である。図１１（ａ）において、ｑｐは量子化パラメータであり、ｂｒは圧縮率である。図１１（ａ）に示されるように、例えば、量子化パラメータが１００（ｑｐ＝１００）の場合は圧縮率が７０％（ｂｒ＝０．７０）、量子化パラメータが５０（ｑｐ＝５０）の場合は圧縮率が３５％（ｂｒ＝０．３５）となっている。なお、以降では、量子化パラメータｑｐに対応する圧縮率をｂｒ［ｑｐ］と記載する。 FIG. 11A is an example of a quantization parameter / compression ratio correspondence table. In FIG. 11A, qp is a quantization parameter, and br is a compression rate. As shown in FIG. 11A, for example, when the quantization parameter is 100 (qp = 100), the compression rate is 70% (br = 0.70), and the quantization parameter is 50 (qp = 50). In this case, the compression rate is 35% (br = 0.35). Hereinafter, the compression rate corresponding to the quantization parameter qp is referred to as br [qp].

次に、視野重複度・量子化パラメータセット対応テーブルについて説明する。この視野重複度・量子化パラメータセット対応テーブルは、処理中のカメラにおいて発生する符号量を制御する時に使用する。図１１（ｂ）は、視野重複度・量子化パラメータセット対応テーブルの一例である。ｄｎは視野重複度であり、ｑｌは量子化パラメータセットの段階（以下、量子化レベル）である。図１１（ｂ）に示されるように、例えば、量子化レベルを２（ｑｌ＝２）に設定した場合、視野重複度が１０（ｄｎ＝１０）の量子化パラメータｑｐは６０、また、視野重複度が１（ｄｎ＝１）の量子化パラメータｑｐは１００となる。このように、量子化パラメータは、視野重複度と逆相関するように設定される。即ち、その前景を捕捉しているカメラの台数が少ない前景画像から高い画質になるように画質設定される。 Next, the field overlap / quantization parameter set correspondence table will be described. This view overlap / quantization parameter set correspondence table is used to control the amount of code generated in the camera being processed. FIG. 11B is an example of a table of visual field overlap / quantization parameter set correspondence. dn is a visual field overlap degree, and ql is a stage of a quantization parameter set (hereinafter referred to as a quantization level). As shown in FIG. 11B, for example, when the quantization level is set to 2 (ql = 2), the quantization parameter qp with a field overlap degree of 10 (dn = 10) is 60, and the field overlap is also shown. The quantization parameter qp with a degree of 1 (dn = 1) is 100. As described above, the quantization parameter is set so as to be inversely correlated with the visual field overlap degree. That is, the image quality is set so that the foreground image has a high image quality from the foreground image with a small number of cameras capturing the foreground.

なお、この視野重複度・量子化パラメータセット対応テーブルは、システムのポリシーに基づいて、シミュレーションや予備実験で仮想視点コンテンツの画質を確認しながら作成すればよい。また、以降では、視野重複度ｄｎ、量子化レベルｑｌに対応する量子化パラメータをｑｐ［ｄｎ、ｑｌ］と記載する。 Note that this field-of-view overlap / quantization parameter set correspondence table may be created while confirming the image quality of the virtual viewpoint content through simulation or preliminary experiment based on the system policy. Further, hereinafter, the quantization parameter corresponding to the visual field overlap degree dn and the quantization level ql is described as qp [dn, ql].

本実施形態では、目標符号量を達成する最小の量子化レベルを探索することで、各々の前景の量子化パラメータを決定する。この各々の前景の量子化パラメータを決定する処理に関して、図１２を用いて、説明を補足する。 In the present embodiment, the foreground quantization parameter is determined by searching for the minimum quantization level that achieves the target code amount. The processing for determining each foreground quantization parameter will be supplemented by using FIG.

ステップＳ１２０１において、カメラアダプタ１２０は、カレント量子化レベルｃｕｒＱＬを０に設定する（ｃｕｒＱＬ＝０）。次に、ステップＳ１２０２において、カメラアダプタ１２０は、カレント量子化レベルｃｕｒＱＬに対応付けられている、視野重複度毎の量子化パラメータ（量子化パラメータセット）を取得する。 In step S1201, the camera adapter 120 sets the current quantization level curQL to 0 (curQL = 0). Next, in step S1202, the camera adapter 120 acquires a quantization parameter (quantization parameter set) for each visual field overlap degree associated with the current quantization level curQL.

ステップＳ１２０３において、カメラアダプタ１２０は、上述の図９のステップＳ９０３で取得した視野重複度毎の前景画素数と、カレント量子化レベルから、予測符号量を下式に基づいて算出する。なお、下式において、カメラ台数をＮ、視野重複度に対応する前景画素数をｆｐｉｘ［ｄｎ］とする。 In step S1203, the camera adapter 120 calculates the predicted code amount based on the following expression from the number of foreground pixels for each field overlap obtained in step S903 in FIG. 9 and the current quantization level. In the following equation, the number of cameras is N, and the number of foreground pixels corresponding to the visual field overlap is fpix [dn].

ステップＳ１２０４において、カメラアダプタ１２０は、ステップＳ１２０３で算出した予測符号量が、目標符号量より小さいか否かを判定する。そして、カメラアダプタ１２０は、ステップＳ１２０３で算出した予測符号量が目標符号量より小さい場合は処理をステップＳ１２０５に移行させ、ステップＳ１２０３で算出した予測符号量が目標符号量以上である場合は処理をステップＳ１２０６に移行させる。 In step S1204, the camera adapter 120 determines whether or not the predicted code amount calculated in step S1203 is smaller than the target code amount. The camera adapter 120 shifts the process to step S1205 when the predicted code amount calculated in step S1203 is smaller than the target code amount, and performs the process when the predicted code amount calculated in step S1203 is equal to or larger than the target code amount. The process proceeds to step S1206.

処理をステップＳ１２０５に移行させると、カメラアダプタ１２０は、各々の前景に対して、その視野重複度に応じて量子化レベルｃｕｒＱＬに対応する量子化パラメータを割り当てる。また、処理をステップＳ１２０６に移行させると、カメラアダプタ１２０は、カレント量子化レベルｃｕｒＱＬをカウントアップする。以上の処理を実行することで、前景毎の量子化パラメータを決定することができる。 When the process proceeds to step S1205, the camera adapter 120 assigns a quantization parameter corresponding to the quantization level curQL to each foreground according to the degree of visual field overlap. When the process proceeds to step S1206, the camera adapter 120 counts up the current quantization level curQL. By executing the above processing, the quantization parameter for each foreground can be determined.

ここで、再度、図９のフローチャートに戻り、上述のように、処理対象として着目するフレームが撮影を開始した１枚目のフレームではないと判定すると（Ｓ９０２Ｎｏ）、処理をステップＳ９０５に移行させる。ステップＳ９０５において、カメラアダプタ１２０は、処理中のカメラにおいて、前フレームの処理後に決定された量子化レベルに基づいて、前景毎の量子化パラメータを決定する。 Here, returning to the flowchart of FIG. 9 again, as described above, when it is determined that the frame of interest as the processing target is not the first frame from which shooting was started (No in S902), the process proceeds to step S905. . In step S905, the camera adapter 120 determines the quantization parameter for each foreground based on the quantization level determined after the processing of the previous frame in the camera being processed.

ステップＳ９０６において、カメラアダプタ１２０は、それまでの処理ステップまでで取得した前景毎の量子化パラメータを用いて、符号化処理を実行する（即ち、符号化データを生成する）。このステップＳ９０６における符号化処理のアルゴリズムは、量子化パラメータに基づいて量子化を実行可能なものであればよく、例えば、ＪＰＥＧやＪＰＥＧ２０００等を用いる。 In step S <b> 906, the camera adapter 120 executes an encoding process (that is, generates encoded data) using the quantization parameter for each foreground acquired up to the previous processing step. The encoding processing algorithm in step S906 may be any algorithm that can execute quantization based on the quantization parameter. For example, JPEG or JPEG2000 is used.

ステップＳ９０７において、カメラアダプタ１２０は、目標符号量と、実際に出力された符号量（以下、実符号量）との差分ΔＳを算出する。即ち、下式に基づいて、差分ΔＳを算出する。 In step S907, the camera adapter 120 calculates a difference ΔS between the target code amount and the actually output code amount (hereinafter, actual code amount). That is, the difference ΔS is calculated based on the following equation.

ステップＳ９０８において、カメラアダプタ１２０は、差分ΔＳの絶対値が閾値Ｔより大きいか否かを判定する。差分ΔＳの絶対値が閾値Ｔより大きいと判定すると（Ｓ９０８Ｙｅｓ）、処理をステップＳ９０９に移行させ、差分ΔＳの絶対値が閾値Ｔ以下であると判定すると（Ｓ９０８Ｎｏ）、図９の処理を終了する。 In step S908, the camera adapter 120 determines whether or not the absolute value of the difference ΔS is greater than the threshold value T. If it is determined that the absolute value of the difference ΔS is greater than the threshold T (Yes in S908), the process proceeds to step S909. If it is determined that the absolute value of the difference ΔS is equal to or less than the threshold T (No in S908), the process of FIG. finish.

ステップＳ９０９において、カメラアダプタ１２０は、量子化レベルを更新する。このステップＳ９０９における処理は、実符号量と目標符号量との差分を次のフレームで吸収させるために実行される。具体的には、差分ΔＳが閾値Ｔよりも大きい場合、即ち、実符号量が目標符号量を所定の幅Ｔ以上、上回る場合は量子化レベルをカウントアップする（＋１する）。これにより、次のフレームにおいて符号量を減少させる方向に調整することができる。また、差分ΔＳが閾値−Ｔよりも小さい場合、即ち、実符号量が目標符号量を所定の幅Ｔ以上、下回る場合は量子化レベルをカウントダウンする（−１する）。これにより、次のフレームにおいて符号量を増加させる方向に調整することができる。 In step S909, the camera adapter 120 updates the quantization level. The processing in step S909 is executed in order to absorb the difference between the actual code amount and the target code amount in the next frame. Specifically, when the difference ΔS is larger than the threshold T, that is, when the actual code amount exceeds the target code amount by a predetermined width T or more, the quantization level is counted up (+1 is added). As a result, the code amount can be adjusted to decrease in the next frame. When the difference ΔS is smaller than the threshold −T, that is, when the actual code amount is less than or equal to the target code amount by a predetermined width T, the quantization level is counted down (-1). As a result, the code amount can be adjusted to increase in the next frame.

但し、この量子化レベルを更新する処理は、予め規定した範囲（図１１では、０−６）内において、量子化レベルを超えないように制限した上で実行される。なお、閾値Ｔに関して、カメラアダプタ１２０に備えられているバッファが比較的大きい場合には大きく、そうでない場合には小さく設定される。カメラアダプタ１２０は、量子化レベルを更新すると、図９の処理を終了する。 However, the process of updating the quantization level is executed within a predetermined range (0-6 in FIG. 11) while limiting so as not to exceed the quantization level. It should be noted that the threshold value T is set to be large when the buffer provided in the camera adapter 120 is relatively large, and is set to be small otherwise. When the camera adapter 120 updates the quantization level, the process of FIG. 9 ends.

以上のように処理することで、前フレームでの処理結果に基づいて、視野重複度が小さいものの画質劣化を抑制しながら符号量の制御ができるので、仮想視点コンテンツに大きな画質劣化を発生させずにリアルタイム伝送を実現することができる。 By processing as described above, it is possible to control the amount of code while suppressing image quality deterioration even though the degree of visual field overlap is small based on the processing result in the previous frame, so that large image quality deterioration does not occur in the virtual viewpoint content. Real-time transmission can be realized.

（第２の実施形態）
上述の第１の実施形態では、処理の高速化のため、図９に示すように処理対象とするフレームにおける量子化レベルを、前フレームの処理で事前に決定された量子化レベルに基づいて決定した。 (Second Embodiment)
In the first embodiment described above, in order to speed up the processing, as shown in FIG. 9, the quantization level in the frame to be processed is determined based on the quantization level determined in advance in the processing of the previous frame. did.

但し、量子化レベルに関して、フレーム毎に、視野重複度毎の前景画素数に基づいて予測符号量を算出し、その予測符号量に基づいて量子化パラメータを決定するようにしてもよい。このように構成することにより、処理中のフレームの内容（即ち、視野重複度毎の前景画素数）に基づいた量子化レベルの決定ができるので、実符号量をより目標符号量に近付けることができる。 However, with respect to the quantization level, the prediction code amount may be calculated for each frame based on the number of foreground pixels for each field overlap, and the quantization parameter may be determined based on the prediction code amount. With this configuration, it is possible to determine the quantization level based on the contents of the frame being processed (that is, the number of foreground pixels for each visual field overlap), so that the actual code amount can be made closer to the target code amount. it can.

（第３の実施形態）
上述の第１の実施形態では、カメラ毎に目標符号量を予め設定しておき、その目標符号量と実符号量との差分ΔＳの絶対値が閾値Ｔより大きいか否かに基づいて、次のフレームの量子化レベル（即ち、符号化パラメータ）を設定した。但し、この第１の実施形態では、伝送ライン上の多くのカメラにおいて実符号量が目標符号量を超過するような場合、伝送に遅延が生じることが想定される。 (Third embodiment)
In the first embodiment described above, a target code amount is set in advance for each camera, and the following is performed based on whether or not the absolute value of the difference ΔS between the target code amount and the actual code amount is larger than the threshold T. The quantization level (that is, the encoding parameter) of each frame is set. However, in the first embodiment, when the actual code amount exceeds the target code amount in many cameras on the transmission line, it is assumed that transmission is delayed.

そこで、本実施形態では、処理中のカメラをｖ番目のカメラとした場合、上流（即ち、１〜ｖ−１番目まで）のカメラアダプタ１２０により出力された内容に応じて、処理中のカメラの符号化パラメータを制御する。以下、本実施形態における符号量制御処理に関して、図１３を用いて説明する。なお、システムの全体構成、及び符号量制御処理以外の処理は、第１の実施形態と同様であり、また、符号量制御処理に関しては、上述の第２の実施形態を前提に、フレーム毎に、視野重複度毎の前景画素数に基づいて、量子化パラメータを決定する。 Therefore, in this embodiment, when the camera being processed is the v-th camera, the camera being processed depends on the content output by the camera adapter 120 upstream (i.e., 1 to v-1). Control encoding parameters. Hereinafter, the code amount control process according to the present embodiment will be described with reference to FIG. The entire system configuration and processes other than the code amount control process are the same as those in the first embodiment, and the code amount control process is performed for each frame on the premise of the second embodiment described above. Quantization parameters are determined based on the number of foreground pixels for each field overlap.

先ず、ステップ１３０１において、カメラアダプタ１２０は、上流カメラから伝送されてくる積算目標符号量の差分ΔＵＳを取得し、その絶対値が閾値Ｔ２より大きいか否かを判定する。 First, in step 1301, the camera adapter 120 acquires the difference ΔUS in the accumulated target code amount transmitted from the upstream camera, and determines whether or not the absolute value is larger than the threshold value T2.

ここで、積算目標符号量は、上述のように、処理中のカメラをｖ番目のカメラとした場合、上流（即ち、１〜ｖ−１番目まで）に位置するカメラ全ての実符号量から、基本符号量を上流に位置するカメラの台数で乗算した値を減算することで算出される。なお、基本符号量とは、伝送路の通信帯域内で映像を伝送するために使用可能な帯域を視点数（カメラの台数）で除算した値である。 Here, as described above, when the camera being processed is the v-th camera, the integration target code amount is obtained from the actual code amounts of all the cameras located upstream (that is, from 1 to v-1). It is calculated by subtracting a value obtained by multiplying the basic code amount by the number of cameras located upstream. The basic code amount is a value obtained by dividing a band that can be used for transmitting video within the communication band of the transmission path by the number of viewpoints (the number of cameras).

補足として、処理中のカメラが最上流の場合、積算目標符号量の差分ΔＵＳを０とする。また、閾値Ｔ２は、カメラアダプタ１２０に備えられているバッファの大きさに応じて決定すればよい。そして、カメラアダプタ１２０は、積算目標符号量の差分ΔＵＳの絶対値が閾値Ｔ２よりも大きいと判定された場合には、処理をステップＳ１３０２に移行させ、そうでない場合には、処理をステップＳ１３０３に移行させる。 As a supplement, when the camera being processed is the most upstream, the difference ΔUS in the integrated target code amount is set to zero. The threshold T2 may be determined according to the size of the buffer provided in the camera adapter 120. If it is determined that the absolute value of the accumulated target code amount difference ΔUS is larger than the threshold value T2, the camera adapter 120 shifts the process to step S1302, and if not, the process shifts to step S1303. Transition.

ステップ１３０２において、処理中のカメラにおける目標符号量を、上流カメラまでの積算目標符号量であるΔＵＳを考慮し、下式に基づいて決定する。 In step 1302, the target code amount in the camera being processed is determined based on the following equation in consideration of ΔUS, which is the cumulative target code amount up to the upstream camera.

上式に基づいて目標符号量を決定することで、上流カメラまでで使い過ぎた帯域分だけ処理中のカメラの符号量を減らすことを試みる。また、積算目標符号量の差分ΔＵＳの絶対値が閾値Ｔ２以下であると判定された場合（Ｓ１３０１Ｎｏ）、ステップＳ１３０３において、カメラアダプタ１２０は、第１の実施形態と同様に、処理中のカメラの目標符号量を下式に基づいて設定する。 By determining the target code amount based on the above equation, an attempt is made to reduce the code amount of the camera being processed by the band that has been used up to the upstream camera. Also, when it is determined that the absolute value of the difference ΔUS in the accumulated target code amount is equal to or less than the threshold T2 (No in S1301), in step S1303, the camera adapter 120, as in the first embodiment, Is set based on the following equation.

ステップＳ１３０４において、カメラアダプタ１２０は、視野重複マップを照合し、前景毎に、自身を捕捉しているカメラの数（即ち、視野重複度）を取得する。即ち、ステップＳ１３０４における処理は、第１の実施形態における図９のステップＳ９０１と同様の処理である。 In step S1304, the camera adapter 120 collates the visual field overlap map, and acquires the number of cameras capturing itself (ie, the visual field overlap degree) for each foreground. That is, the processing in step S1304 is the same processing as step S901 in FIG. 9 in the first embodiment.

ステップＳ１３０５において、カメラアダプタ１２０は、接続されるカメラ１１２に写っている前景の画素数を、視野重複度毎に取得（計測）する。即ち、ステップＳ１３０５における処理は、第１の実施形態における図９のステップＳ９０３と同様の処理である。 In step S1305, the camera adapter 120 acquires (measures) the number of foreground pixels captured in the connected camera 112 for each field overlap degree. That is, the processing in step S1305 is the same processing as step S903 in FIG. 9 in the first embodiment.

ステップＳ１３０６において、カメラアダプタ１２０は、各前景に対し、符号化時に適用する量子化のパラメータを決定する。即ち、ステップＳ１３０６における処理は、第１の実施形態における図９のステップＳ９０４と同様の処理である。 In step S1306, the camera adapter 120 determines a quantization parameter to be applied to each foreground during encoding. That is, the processing in step S1306 is the same processing as step S904 in FIG. 9 in the first embodiment.

ステップＳ１３０７において、カメラアダプタ１２０は、それまでの処理ステップまでで取得した前景毎の量子化パラメータを用いて、符号化処理を実行する。即ち、ステップＳ１３０７における処理は、第１の実施形態における図９のステップＳ９０６と同様の処理である。 In step S1307, the camera adapter 120 executes encoding processing using the quantization parameter for each foreground acquired up to the previous processing step. That is, the processing in step S1307 is the same processing as step S906 in FIG. 9 in the first embodiment.

ステップＳ１３０８において、カメラアダプタ１２０は、下式に基づいて、積算目標符号量の差分ΔＵＳを更新する。 In step S1308, the camera adapter 120 updates the accumulated target code amount difference ΔUS based on the following equation.

上式に基づいて、積算目標符号量の差分ΔＵＳを更新することで、処理中のカメラにおける目標符号量と実符号量との差分を積算目標符号量の差分ΔＵＳに反映し、下流に渡すことができる。カメラアダプタ１２０は、ステップＳ１３０８における処理を実行することで、符号量制御処理を終了させる。 Based on the above formula, by updating the difference ΔUS in the accumulated target code amount, the difference between the target code amount and the actual code amount in the camera being processed is reflected in the difference ΔUS in the accumulated target code amount and passed downstream. Can do. The camera adapter 120 ends the code amount control process by executing the process in step S1308.

以上のような構成とすることで、直前のカメラまでで発生した、目標符号量と実符号量との差分を、処理中のカメラで吸収することができる。これにより、デイジーチェーントータルで伝送路の帯域内となるように符号量を制御することができるため、遅延の発生しない伝送が可能となる。 With the configuration as described above, the difference between the target code amount and the actual code amount generated up to the immediately preceding camera can be absorbed by the camera being processed. As a result, the amount of code can be controlled so that the total daisy chain is within the bandwidth of the transmission path, so that transmission without delay is possible.

（第４の実施形態）
上述の第１の実施形態では、現在処理中のフレームにおいて、実符号量が目標符号量と比較して過大となった場合に、次のフレームでその差分を補うように符号化処理を実行した。 (Fourth embodiment)
In the first embodiment described above, when the actual code amount is excessive compared with the target code amount in the frame currently being processed, the encoding process is performed so as to compensate for the difference in the next frame. .

但し、符号化処理を実行する上で処理時間に余裕がある場合には、現在処理中のフレームにおいて、量子化レベルを上げて再度、符号化処理を実行することで、実符号量を目標符号量に近付けるようにしてもよい。このように符号化処理を実行することで、カメラアダプタ１２０の送信待ち用バッファを節約することができる。 However, if there is enough processing time to execute the encoding process, the actual code amount is set to the target code by increasing the quantization level and executing the encoding process again in the frame currently being processed. You may make it approach the quantity. By executing the encoding process in this way, the transmission waiting buffer of the camera adapter 120 can be saved.

（第５の実施形態）
上述の第１の実施形態では、事前データとして、撮影前にシミュレーションを実行することで、視野重複マップを作成した。このように事前データとして視野重複マップを作成することは、撮影前に注視点が定まっており、カメラを固定的に設置しているシステムにおいて有効である。 (Fifth embodiment)
In the first embodiment described above, a field-of-view overlap map is created by executing a simulation before photographing as the preliminary data. Creating a field-of-view overlap map as prior data in this way is effective in a system in which a gazing point is determined before shooting and a camera is fixedly installed.

但し、例えば、同じスタジアムで複数の競技を撮影する場合等、注視点を変更することがある。この場合、撮影前のキャリブレーション実行時に視野重複マップを更新することで、本発明を実施することが可能となる。また、複数台のカメラを同期して動かすことができるシステム等の場合は、動的キャリブレーションの実行時に視野重複マップを都度更新するように、画像処理システムを構成すればよい。 However, the gaze point may be changed, for example, when shooting a plurality of competitions at the same stadium. In this case, the present invention can be implemented by updating the visual field overlap map at the time of executing calibration before photographing. In the case of a system or the like that can move a plurality of cameras in synchronization, the image processing system may be configured so that the visual field overlap map is updated each time dynamic calibration is executed.

（第６の実施形態）
上述の第１の実施形態では、視野重複度・量子化パラメータセット対応テーブルを、所望の符号量に応じた量子化を実行するためのパラメータセットとして説明した。但し、帯域がより逼迫した場合を想定して「送信しない」という選択肢を設けてもよい。 (Sixth embodiment)
In the first embodiment described above, the field overlap / quantization parameter set correspondence table has been described as a parameter set for performing quantization according to a desired code amount. However, an option of “not transmitting” may be provided assuming that the band is more tight.

この場合、図１４に示されるように、視野重複度・量子化パラメータセット対応テーブルに関して、第１の実施形態と同様に、視野重複度の高い方から低い方に量子化パラメータが段階的に高くなるように適用される。加えて、上述のように、帯域がより逼迫した場合を想定して、視野重複度が１０（ｄｎ＝１０）、量子化レベルが７（ｑｌ＝７）の場合に、データを送信しないように「送信しない」が設定される。つまり、視野重複度が所定の値以上（即ち、その前景を捕捉しているカメラが所定の台数以上）である場合に、データを符号化しないように設定される。 In this case, as shown in FIG. 14, with respect to the field overlap / quantization parameter set correspondence table, as in the first embodiment, the quantization parameter increases stepwise from the one with the highest field overlap to the lower. Applied to be. In addition, as described above, assuming that the bandwidth is more tight, data transmission is not performed when the field overlap is 10 (dn = 10) and the quantization level is 7 (ql = 7). “Do not send” is set. That is, it is set so that the data is not encoded when the visual field overlap degree is equal to or greater than a predetermined value (that is, a predetermined number of cameras capturing the foreground).

なお、量子化パラメータの設定に関して「送信しない」と設定した場合には、当該領域の視野重複度が「１台」少なくなることを意味する。即ち、処理中のカメラにおいて、量子化パラメータが「送信しない」に設定された場合には、信号として後続のカメラに伝達され、さらに、その後続のカメラにおいて、当該領域に関して、視野重複度を１台分減算した上で処理が実行される。補足として、この場合、その他の前景画像に関して、例えば、視野重複度が１０（ｄｎ＝１０）、かつ量子化レベルが７（ｑｌ＝７）でない限り、その後続のカメラにおいて、視野重複度を減算する処理は行わない。また、このように処理することで、前景の最小符号量を０までスケールさせることができるため、符号量制御の範囲を拡張することできる。 Note that, when “not transmit” is set for the quantization parameter setting, it means that the field overlap of the area is reduced by “1”. That is, when the quantization parameter is set to “not transmit” in the camera being processed, the signal is transmitted to the subsequent camera as a signal, and the visual field overlap degree is set to 1 for the region in the subsequent camera. Processing is performed after subtracting the number of units. As a supplement, in this case, with respect to the other foreground images, for example, unless the field overlap is 10 (dn = 10) and the quantization level is 7 (ql = 7), the field overlap is subtracted in the subsequent camera. No processing is performed. Further, by performing processing in this way, the minimum code amount of the foreground can be scaled up to 0, so that the range of code amount control can be expanded.

（第７の実施形態）
上述の第１の実施形態では、伝送ライン上の全てのカメラで撮影した画像（映像）を符号量制御の対象とした。但し、仮想視点コンテンツの画質上、重要な位置にあるカメラ（例えば、注視点を軸とし、所定の角度毎のカメラ等）で撮影した映像、又は望遠カメラ等で撮影した映像を符号量制御の対象としないように、画像処理システムを構成することもできる。 (Seventh embodiment)
In the first embodiment described above, an image (video) captured by all cameras on the transmission line is the target of code amount control. However, code amount control of video shot with a camera at an important position in the image quality of the virtual viewpoint content (for example, a camera shot at a predetermined angle with the gazing point as an axis, or a video shot with a telephoto camera, etc.) The image processing system can be configured not to be a target.

なお、この場合、事前データとして作成する視野重複マップに、仮想視点コンテンツの画質上、重要な位置にあるカメラの視野を含めないようにする。そして、符号化時には、そのカメラの符号量制御処理をスキップし、符号化処理のみを実行するようにすればよい。このように構成することにより、仮想視点コンテンツの画質上、重要な視点のカメラで撮影した映像を劣化させることなく伝送し、仮想視点コンテンツを生成することが可能となる。 In this case, the visual field overlap map created as the pre-data does not include the visual field of the camera at an important position in terms of the image quality of the virtual viewpoint content. Then, at the time of encoding, it is only necessary to skip the code amount control process of the camera and execute only the encoding process. With this configuration, it is possible to transmit a video captured by a camera with an important viewpoint in terms of the image quality of the virtual viewpoint content without deterioration, and generate a virtual viewpoint content.

（その他の実施形態）
上述の実施形態では、カメラアダプタ間を一本の伝送路で数珠繋ぎに接続する例を説明したが、本発明は必ずしもこれに限定されず、その他のトポロジーにおいても適用することができる。例えば、バス型においてもデイジーチェーンと同様の課題が発生するため、この場合、同様に近接するカメラアダプタの情報を取得しながら符号量の制御を行えばよい。 (Other embodiments)
In the above-described embodiment, the example in which the camera adapters are connected in a daisy chain with one transmission path has been described. However, the present invention is not necessarily limited to this, and can be applied to other topologies. For example, a problem similar to that of a daisy chain also occurs in the bus type. In this case, the code amount may be controlled while acquiring information on camera adapters that are close to each other.

また、スター型の場合には、近接するカメラアダプタ間は直接、接続されていないが、近接するカメラアダプタの画質等、必要な情報はフロントエンドサーバからそれぞれ取得するように、画像処理システムを構成すればよい。 In the case of the star type, the adjacent camera adapters are not directly connected, but the image processing system is configured so that necessary information such as the image quality of the adjacent camera adapters is obtained from the front-end server. do it.

また、上述の実施形態において、符号化を行う際のパラメータを設定することで画質を設定する例を説明したが、必ずしもこれに限定されない。例えば、解像度や画像のサイズを縮小したり、色の階調値を減らしたりして画質を設定してもよい。また、所定のオブジェクトを撮影するカメラの台数を示す情報として、視野重複マップを用いる例を説明したが、撮影対象となる３次元領域の座標とその座標を撮影可能なカメラの情報とを対応付けた情報等、他の情報を用いてもよい。 In the above-described embodiment, the example in which the image quality is set by setting the parameters for performing the encoding has been described, but the present invention is not necessarily limited thereto. For example, the image quality may be set by reducing the resolution or the image size or reducing the color gradation value. Moreover, although the example using a visual field duplication map was demonstrated as information which shows the number of the camera which image | photographs a predetermined object, the coordinate of the three-dimensional area | region used as imaging | photography object and the information of the camera which can image | photograph that coordinate are matched Other information such as information may be used.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１２カメラ
１２０カメラアダプタ
１８０ｂネットワーク
０６１３０画像処理部 112 Camera 120 Camera adapter 180b Network 06130 Image processing unit

Claims

In an image processing system that transmits images taken by a plurality of cameras through a predetermined transmission path and generates a virtual viewpoint image,
Camera number deriving means for deriving the number of cameras shooting a predetermined object;
An image quality setting means for setting an image quality of a foreground image including the predetermined object in accordance with the number of cameras capturing the predetermined object;
An image processing system comprising:

Encoding means for encoding the foreground image based on the image quality of the foreground image set by the image quality setting means;
2. The image quality setting unit sets the image quality of the foreground image so that the number of cameras that capture the predetermined object and the image quality of the foreground image are inversely correlated. Image processing system.

The number-of-camera deriving means uses a field-of-view overlap map created by mapping the field-of-view overlap of the camera as a region from the three-dimensional shape of the field to be photographed and the relative positional relationship of each camera. The image processing system according to claim 1, wherein the number of cameras that shoot a predetermined object is derived.

The number-of-camera deriving means, in the field-of-view overlap map, when the predetermined object spans a plurality of areas where the camera field-of-view overlap is mapped, 4. The image processing system according to claim 3, wherein the number of cameras having the smallest number of cameras shooting the area is derived as the number of cameras shooting the predetermined object.

5. The apparatus according to claim 1, wherein the camera number deriving unit derives the number of cameras that are photographing the predetermined object using coordinates of a foot of the predetermined object. The image processing system described.

The image processing system according to claim 2, wherein the encoding unit quantizes the foreground image with a quantization parameter set based on an image quality of the foreground image.

The encoding means quantizes the foreground image with a quantization parameter set based on a target code amount set for each of the plurality of cameras and an image quality of the foreground image. Item 3. The image processing system according to Item 2.

8. The encoding unit according to claim 7, wherein the encoding unit calculates the target code amount by dividing a band for encoded data in a communication band of the transmission path by the number of the plurality of cameras. Image processing system.

The encoding means includes
The basic code amount is calculated by dividing the band for encoded data in the communication band of the transmission path by the number of the plurality of cameras,
A value obtained by multiplying the integration target code amount by the actual code amount of all the cameras positioned upstream of one camera of the plurality of cameras by the number of cameras positioned upstream of the one camera. Is calculated by subtracting
9. The image processing according to claim 8, wherein when the integration target code amount is larger than a predetermined threshold, the target code amount is calculated by subtracting the integration target code amount from the basic code amount. system.

The encoding unit determines the quantization parameter of the next frame in the one camera according to a difference between the target code amount and an actual code amount of a current frame in one camera of the plurality of cameras. The image processing system according to claim 7, wherein the image processing system is an image processing system.

The image quality setting means sets the foreground image including the predetermined object not to be encoded when the number of cameras shooting the predetermined object is equal to or larger than the predetermined number. Item 11. The image processing system according to any one of Items 1 to 10.

The image quality setting means is set so as not to quantize a foreground image photographed by a camera arranged at a predetermined position in relation to the predetermined object among the plurality of cameras. The image processing system according to any one of 1 to 11.

The program for functioning a computer as an image processing system of any one of Claim 1 to 12.

An image processing system control method for transmitting images taken by a plurality of cameras through a predetermined transmission path and generating a virtual viewpoint image,
A camera number deriving step for deriving the number of cameras shooting a predetermined object by the camera number deriving means;
An image quality setting step of setting the image quality of the foreground image including the predetermined object according to the number of cameras that are shooting the predetermined object by the image quality setting means;
A control method for an image processing system.

In an image processing system for transmitting images taken by a plurality of cameras through a predetermined transmission path and generating a virtual viewpoint image, the transmission apparatus transmits the image to the predetermined transmission path.
Acquisition means for acquiring information indicating the number of cameras that photograph a predetermined object;
Image quality setting means for setting the image quality of the foreground image including the predetermined object based on the number of cameras indicated by the information acquired by the acquisition means;
Transmission means for transmitting the foreground image of the image quality set by the image quality setting means on the predetermined transmission path;
A transmission apparatus comprising:

A changing means for changing the image quality of the foreground image based on the image quality set by the image quality setting means;
16. The transmission apparatus according to claim 15, wherein the transmission unit transmits the foreground image whose image quality has been changed by the changing unit through the predetermined transmission path.

The changing means encodes the foreground image based on the image quality set by the image quality setting means,
17. The transmission apparatus according to claim 16, wherein the transmission unit transmits the foreground image encoded by the changing unit through the predetermined transmission path.

18. The information according to claim 15, wherein the information is acquired from a three-dimensional shape of imaging targets of the plurality of cameras and a degree of overlapping of the visual fields of the plurality of cameras based on a relative positional relationship between the cameras. 2. A transmission apparatus according to claim 1.

The transmission apparatus according to any one of claims 15 to 18, wherein the image quality setting means sets the image quality of the foreground image based on a transmission capacity of the predetermined transmission path.

An image transmission method in an image processing system for transmitting images taken by a plurality of cameras through a predetermined transmission path and generating a virtual viewpoint image,
Based on the number of cameras that shoot a predetermined object, set the image quality of the foreground image including the predetermined object,
A transmission method comprising transmitting the foreground image having a set image quality through the predetermined transmission path.