JP2020092301A

JP2020092301A - Image processing device, image processing system, image processing method, and program

Info

Publication number: JP2020092301A
Application number: JP2018226778A
Authority: JP
Inventors: 智一佐藤; Tomokazu Sato
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2020-06-11

Abstract

To make it possible to control a data amount of an image by suppressing image quality deterioration of a distant subject even if there are multiple subjects at different distances from a camera.SOLUTION: The image processing device includes: a foreground separation unit for obtaining a subject image related to a subject from an input image; a resolution acquisition unit for acquiring resolution of the subject image; a data compression/decompression unit for separating the subject image into one or more resolution components; and a priority calculation unit for calculating a priority for a separated resolution component. The priority calculation unit calculates the priority according to the resolution of the subject represented by the resolution component. The control related to a data amount is performed according to the calculated priority.SELECTED DRAWING: Figure 3

Description

本発明は、画像処理装置、画像処理システム、画像処理方法、及びプログラムに関する。 The present invention relates to an image processing device, an image processing system, an image processing method, and a program.

複数のカメラ（撮像装置）を異なる位置に設置して多視点で同期撮影するマルチカメラシステムにより得られた複数の視点画像を用いて仮想視点コンテンツを生成する技術が注目されている。複数の視点画像から仮想視点コンテンツを生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することができるため、通常の画像と比較してユーザに高臨場感を与えることができる。 A technique for generating virtual viewpoint content using a plurality of viewpoint images obtained by a multi-camera system in which a plurality of cameras (imaging devices) are installed at different positions and synchronously shoots from multiple viewpoints has been attracting attention. According to the technique of generating virtual viewpoint content from a plurality of viewpoint images, for example, a soccer or basketball highlight scene can be viewed from various angles, so that the user has a higher sense of realism than normal images. Can be given.

複数の視点画像に基づく仮想視点コンテンツの生成及び閲覧は、複数のカメラが撮影した画像をサーバ等の画像処理部に集約し、画像処理部にて３次元モデル生成やレンダリング等の処理を施し、ユーザ端末に伝送を行うことで実現できる。また、特許文献１には、カメラと撮影シーンの距離が近い場合にはフレームレートを高くし、その際の処理負荷を低減するために画像の解像度を低下させる技術が提案されている。 To generate and browse virtual viewpoint content based on a plurality of viewpoint images, images captured by a plurality of cameras are aggregated in an image processing unit such as a server, and the image processing unit performs processing such as three-dimensional model generation and rendering. It can be realized by transmitting to the user terminal. Patent Document 1 proposes a technique of increasing the frame rate when the distance between the camera and the shooting scene is short, and lowering the image resolution in order to reduce the processing load at that time.

特開２００７−１７２０３５号公報JP, 2007-172035, A

ここで、大規模なマルチカメラシステムを構築する際には、各視点で生成された画像データのデータ量が伝送帯域を超えてしまう場合がある。この場合、意図しない画像データの欠損が生じることで、生成される仮想視点コンテンツの画像品質を損なってしまう。これを防止する１つの方法として、特許文献１に記載のように画像の解像度を低下させることが考えられる。 Here, when constructing a large-scale multi-camera system, the data amount of image data generated from each viewpoint may exceed the transmission band. In this case, the image quality of the generated virtual viewpoint content is deteriorated due to the unintended loss of image data. As one method of preventing this, it is conceivable to reduce the resolution of the image as described in Patent Document 1.

しかしながら、サッカーやバスケットボールといったスポーツシーンの撮影においては、カメラ視野内にカメラからの距離が異なる複数の被写体が同時に存在することがある。カメラからの距離が近い被写体とカメラからの距離が遠い被写体とでは対象の画素密度が異なるので、良好な画質を得るために必要な解像度も異なる。したがって、特許文献１に記載のようにカメラ全体で解像度を制御する方法では、カメラからの距離が遠い被写体の画質が損なわれてしまう。 However, when shooting a sports scene such as soccer or basketball, a plurality of subjects having different distances from the camera may simultaneously exist within the field of view of the camera. Since the pixel density of the target differs between a subject having a short distance from the camera and a subject having a long distance from the camera, the resolution required to obtain a good image quality also differs. Therefore, the method of controlling the resolution of the entire camera as described in Patent Document 1 impairs the image quality of a subject far from the camera.

本発明は、このような事情に鑑みてなされたものであり、カメラからの距離が異なる複数の被写体が含まれていても距離が遠い被写体の画質劣化を抑制し画像のデータ量を制御できるようにすることを目的とする。 The present invention has been made in view of such circumstances, and it is possible to suppress image quality deterioration of a distant subject and control the amount of image data even if a plurality of subjects at different distances from the camera are included. The purpose is to

本発明に係る画像処理装置は、入力される画像から被写体に係る被写体画像を得る画像取得手段と、前記被写体画像の解像度を取得する解像度取得手段と、前記被写体画像を１つ以上の解像度成分に分離する処理手段と、分離された前記解像度成分に対する優先度を算出する優先度算出手段とを有し、前記優先度算出手段は、前記解像度成分が表す被写体の解像度が第１の解像度より小さい第２の解像度である場合、前記第１の解像度で算出される優先度よりも高い優先度を算出することを特徴とする。 An image processing apparatus according to the present invention includes an image acquisition unit that obtains a subject image of a subject from an input image, a resolution acquisition unit that obtains the resolution of the subject image, and the subject image as one or more resolution components. And a priority calculation means for calculating a priority for the separated resolution component, wherein the priority calculation means is such that the resolution of the subject represented by the resolution component is smaller than the first resolution. When the resolution is 2, a priority higher than the priority calculated in the first resolution is calculated.

本発明によれば、解像度に応じた優先度を設定して被写体毎に解像度を制御することが可能となり、カメラからの距離が異なる複数の被写体が含まれていても距離が遠い被写体の画質劣化を抑制し画像のデータ量を制御することができる。 According to the present invention, it is possible to set the priority according to the resolution and control the resolution for each subject, and even if a plurality of subjects with different distances from the camera are included, the image quality degradation of the subject with a long distance Can be suppressed and the amount of image data can be controlled.

本実施形態における画像処理システムの構成例を示す図である。It is a figure which shows the structural example of the image processing system in this embodiment. 第１の実施形態におけるカメラアダプタの構成例を示す図である。It is a figure which shows the structural example of the camera adapter in 1st Embodiment. 第１の実施形態における画像処理部の構成例を示す図である。It is a figure which shows the structural example of the image processing part in 1st Embodiment. 第１の実施形態における優先度の生成処理の例を示すフローチャートである。6 is a flowchart illustrating an example of priority generation processing according to the first embodiment. ウェーブレット変換係数の例を示す図である。It is a figure which shows the example of a wavelet transformation coefficient. 被写体位置の推定方法を説明する図である。It is a figure explaining the estimation method of a to-be-photographed object's position. 被写体の解像度を算出するためのパラメータを示す図である。It is a figure which shows the parameter for calculating the resolution of a to-be-photographed object. 第１の実施形態における伝送部の構成例を示す図である。It is a figure which shows the structural example of the transmission part in 1st Embodiment. 第１の実施形態におけるデータ圧縮・伸張部での処理例を示すフローチャートである。6 is a flowchart showing a processing example in a data compression/decompression unit in the first embodiment. 第１の実施形態におけるメッセージ制御部での処理例を示すフローチャートである。6 is a flowchart showing a processing example in a message control unit in the first embodiment. 第１の実施形態におけるパケット生成部での処理例を示すフローチャートである。6 is a flowchart showing a processing example in a packet generation unit in the first embodiment. 第２の実施形態における伝送部の構成例を示す図である。It is a figure which shows the structural example of the transmission part in 2nd Embodiment. 第２の実施形態におけるパケット制御部での処理例を示すフローチャートである。9 is a flowchart showing a processing example in a packet control unit in the second exemplary embodiment. ＤＣＴ変換係数の解像度成分の分類を示す図である。It is a figure which shows the classification of the resolution component of a DCT conversion coefficient. 本実施形態におけるカメラアダプタのハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of the camera adapter in this embodiment.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態について説明する。以下に説明する第１の実施形態では、異なる位置に設置された複数のカメラが同期して撮影を行う画像処理システム１００において、カメラ毎に独立して符号量（画像データのデータ量）の制御を行う。符号量制御は、１フレームあたりの符号量の上限を設定してフレーム単位で行う。１フレームの符号量が上限を超えない場合には可逆（ロスレス）のデータが伝送され、すべてのデータを伝送すると上限を超えてしまう場合には画像の品質に影響の少ない情報を優先して欠落させる。 (First embodiment)
A first embodiment of the present invention will be described. In the first embodiment described below, control of the code amount (data amount of image data) is independently performed for each camera in the image processing system 100 in which a plurality of cameras installed at different positions synchronize and shoot. I do. The code amount control is performed in frame units by setting the upper limit of the code amount per frame. If the code amount of one frame does not exceed the upper limit, reversible (lossless) data is transmitted, and if all the data exceeds the upper limit, information that has less influence on the image quality is given priority and dropped. Let

競技場（スタジアム）やコンサートホール等の施設に複数のカメラ及びマイクを設置して撮影及び集音を行うシステムについて、図１のシステム構成図を用いて説明する。図１は、本発明の一実施形態における画像処理システム１００の構成例を示すブロック図である。画像処理システム１００は、センサシステム１１０ａ〜１１０ｚ、画像コンピューティングサーバ１２０、コントローラ１３０、スイッチングハブ１４０、及びエンドユーザ端末１５０を有する。 A system in which a plurality of cameras and microphones are installed in a facility such as a stadium or a concert hall to perform shooting and sound collection will be described with reference to the system configuration diagram of FIG. 1. FIG. 1 is a block diagram showing a configuration example of an image processing system 100 according to an embodiment of the present invention. The image processing system 100 includes sensor systems 110a to 110z, an image computing server 120, a controller 130, a switching hub 140, and an end user terminal 150.

コントローラ１３０は、制御ステーション１３１及び仮想カメラ操作ＵＩ（ユーザインターフェース）１３２を有する。制御ステーション１３１は、画像処理システム１００が有するそれぞれの機能部（ブロック）に対してネットワーク１６１ａ〜１６１ｃ、１６２ａ、１６２ｂ、及び１６３ａ〜１６３ｙを通じて動作状態の管理及びパラメータの設定制御等を行う。ここで、ネットワークは、例えばＥｔｈｅｒｎｅｔ（登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサネット）や１０ＧｂＥでもよいし、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用イーサネット等を組み合わせて構成されてもよい。また、これらに限定されず、他の種別のネットワークであってもよい。 The controller 130 has a control station 131 and a virtual camera operation UI (user interface) 132. The control station 131 performs management of operating states and parameter setting control for each functional unit (block) included in the image processing system 100 through the networks 161a to 161c, 162a, 162b, and 163a to 163y. Here, the network may be GbE (Gigabit Ethernet) or 10 GbE conforming to the IEEE standard such as Ethernet (registered trademark), or may be configured by combining an interconnect Infiniband, an industrial Ethernet, and the like. Further, the network is not limited to these and may be another type of network.

２６セットのセンサシステム１１０ａ〜１１０ｚにより得られる画像及び音声を、センサシステム１１０ｚから画像コンピューティングサーバ１２０へ送信する動作を説明する。本実施形態における画像処理システム１００において、センサシステム１１０ａ〜１１０ｚはデイジーチェーンにより接続される。 The operation of transmitting images and sounds obtained by the 26 sets of sensor systems 110a to 110z from the sensor system 110z to the image computing server 120 will be described. In the image processing system 100 according to this embodiment, the sensor systems 110a to 110z are connected by a daisy chain.

なお、本実施形態において、特別な説明がない場合、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットのセンサシステムを区別せずにセンサシステム１１０と記載する。各センサシステム１１０内の装置についても同様に、特別な説明がない場合は区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１１５と記載する。なお、図１には、２６セットのセンサシステムを有する例を示したが、これは一例であり、センサシステムの数はこれに限定されるものではない。 In the present embodiment, unless otherwise specified, 26 sets of sensor systems from the sensor system 110a to the sensor system 110z will be referred to as the sensor system 110 without distinction. The devices in each sensor system 110 are also referred to as the microphone 111, the camera 112, the platform 113, the external sensor 114, and the camera adapter 115, unless otherwise specified. Note that FIG. 1 shows an example having 26 sets of sensor systems, but this is an example, and the number of sensor systems is not limited to this.

また、本実施形態では、特に断りがない限り、画像という文言が、動画と静止画の概念を含むものとして説明する。すなわち、本実施形態における画像処理システム１００は、静止画及び動画の何れについても処理可能である。また、画像処理システム１００により提供される仮想視点コンテンツには、仮想視点画像と仮想視点音声が含まれる例を説明するが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくてもよい。また例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であってもよい。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略するが、基本的に画像と音声は共に処理されるものとする。 In addition, in the present embodiment, the word “image” will be described as including the concept of a moving image and a still image unless otherwise specified. That is, the image processing system 100 according to the present embodiment can process both still images and moving images. Also, an example in which the virtual viewpoint content provided by the image processing system 100 includes a virtual viewpoint image and a virtual viewpoint audio will be described, but the invention is not limited to this. For example, the virtual viewpoint content may not include audio. Further, for example, the sound included in the virtual viewpoint content may be the sound collected by the microphone closest to the virtual viewpoint. Further, in the present embodiment, for simplification of description, description of sound is partially omitted, but basically, it is assumed that both image and sound are processed.

センサシステム１１０ａ〜１１０ｚは、それぞれ１台ずつのカメラ１１２ａ〜１１２ｚを有する。すなわち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。複数のセンサシステム１１０同士はデイジーチェーンにより接続される。この接続形態により、撮影画像の４Ｋや８Ｋ等への高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化ができる。なお、この接続形態に限らず、例えば、各センサシステム１１０ａ〜１１０ｚがスイッチングハブ１４０に接続されて、スイッチングハブ１４０を経由してセンサシステム１１０間のデータ送受信を行うスター型のネットワーク構成としてもよい。 Each of the sensor systems 110a to 110z has one camera 112a to 112z. That is, the image processing system 100 has a plurality of cameras for photographing a subject from a plurality of directions. The plurality of sensor systems 110 are connected to each other by a daisy chain. With this connection configuration, it is possible to reduce the number of connection cables and labor for wiring work in the case of increasing the resolution of captured images to 4K or 8K and increasing the image data capacity accompanying the increase in frame rate. Note that the present invention is not limited to this connection form, and may be, for example, a star type network configuration in which the sensor systems 110a to 110z are connected to the switching hub 140 and data is transmitted and received between the sensor systems 110 via the switching hub 140. ..

また、図１には、デイジーチェーンとなるようセンサシステム１１０のすべてがカスケード接続される構成を示したが、これに限定されるものではない。例えば、複数のセンサシステム１１０をいくつかのグループに分割して、分割したグループ単位でセンサシステム１１０間をデイジーチェーン接続してもよい。そして、分割単位の終端となるカメラアダプタ１１５がスイッチングハブに接続されて画像コンピューティングサーバ１２０へ画像の入力を行うようにしてもよい。このような構成は、例えば、複数階で構成されるスタジアムにおいて、フロア毎にセンサシステム１１０を配備する場合等に有効である。この場合、フロア毎、或いはスタジアムの半周毎に画像コンピューティングサーバ１２０への入力が可能になり、全センサシステム１１０を１つのデイジーチェーンで接続する配線が困難な場所でも設置の簡便化及びシステムの柔軟化を図ることができる。 Further, FIG. 1 shows a configuration in which all of the sensor systems 110 are cascade-connected to form a daisy chain, but the present invention is not limited to this. For example, the plurality of sensor systems 110 may be divided into some groups, and the sensor systems 110 may be connected in a daisy chain in units of the divided groups. Then, the camera adapter 115, which is the end of the division unit, may be connected to the switching hub to input an image to the image computing server 120. Such a configuration is effective, for example, when the sensor system 110 is provided for each floor in a stadium including a plurality of floors. In this case, it is possible to input to the image computing server 120 for each floor or for each half circumference of the stadium, which simplifies the installation and makes the system easy even in a place where it is difficult to connect all the sensor systems 110 with one daisy chain. Flexibility can be achieved.

また、デイジーチェーン接続されて画像コンピューティングサーバ１２０へ画像入力を行うカメラアダプタ１１５が１つであるか２つ以上であるかに応じて、画像コンピューティングサーバ１２０での画像処理の制御が切り替えられる。すなわち、センサシステム１１０が複数のグループに分割されているかどうかに応じて制御が切り替えられる。画像入力を行うカメラアダプタ１１５が１つである場合には、デイジーチェーン接続で画像伝送を行いながら競技場全周画像が生成されるため、画像コンピューティングサーバ１２０において全周の画像データが揃うタイミングは同期がとられている。すなわち、センサシステム１１０が複数のグループに分割されていなければ、同期はとれる。 Further, the control of the image processing in the image computing server 120 is switched according to whether there is one or two or more camera adapters 115 that are daisy-chain connected and input images to the image computing server 120. .. That is, control is switched depending on whether the sensor system 110 is divided into a plurality of groups. When the number of camera adapters 115 for image input is one, the image of the entire circumference of the stadium is generated while image transmission is performed by the daisy chain connection. Therefore, the timing when the image data of the entire circumference is gathered in the image computing server 120. Are synchronized. That is, unless the sensor system 110 is divided into a plurality of groups, synchronization can be achieved.

しかし、画像入力を行うカメラアダプタ１１５が複数である（センサシステム１１０が複数のグループに分割される）場合には、デイジーチェーンのレーン（経路）によって遅延が異なることが考えられる。そのため、画像コンピューティングサーバ１２０において全周の画像データが揃うまで待って同期をとる同期制御によって、画像データの集結をチェックしながら後段の画像処理を行う必要がある。 However, when there are a plurality of camera adapters 115 that perform image input (the sensor system 110 is divided into a plurality of groups), the delay may be different depending on the lane (route) of the daisy chain. Therefore, in the image computing server 120, it is necessary to perform the image processing in the subsequent stage while checking the aggregation of the image data by the synchronization control that waits until the image data of the entire circumference is gathered and synchronizes.

本実施形態では、センサシステム１１０ａ〜１１０ｚの各々は、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１１５を有する。なお、センサシステム１１０の構成は、これに限定されるものではなく、少なくとも１つのカメラアダプタ１１５と、１つのカメラ１１２又は１つのマイク１１１を有していればよい。また例えば、センサシステム１１０は、１つのカメラアダプタ１１５と複数のカメラ１１２で構成されてもよいし、１つのカメラ１１２と複数のカメラアダプタ１１５で構成されてもよい。すなわち、画像処理システム１００内の複数のカメラ１１２と複数のカメラアダプタ１１５とはＮ対Ｍ（ＮとＭは共に１以上の整数）で対応する。 In the present embodiment, each of the sensor systems 110a to 110z has a microphone 111, a camera 112, a platform 113, an external sensor 114, and a camera adapter 115. The configuration of the sensor system 110 is not limited to this, and may include at least one camera adapter 115 and one camera 112 or one microphone 111. Further, for example, the sensor system 110 may be configured by one camera adapter 115 and a plurality of cameras 112, or may be configured by one camera 112 and a plurality of camera adapters 115. That is, the plurality of cameras 112 and the plurality of camera adapters 115 in the image processing system 100 correspond to each other by N to M (N and M are both integers of 1 or more).

また、センサシステム１１０は、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１１５以外の装置を含んでいてもよい。また、センサシステム１１０は、例えばカメラ１１２とカメラアダプタ１１５が一体となって構成されていてもよい。さらに、カメラアダプタ１１５の機能の少なくとも一部をフロントエンドサーバ１２１が有していてもよい。なお、センサシステム１１０ａ〜１１０ｚは、同じ構成に限定されるものではなく、センサシステム１１０の一部又は全部が異なる構成でもよい。 Further, the sensor system 110 may include devices other than the microphone 111, the camera 112, the platform 113, the external sensor 114, and the camera adapter 115. Further, the sensor system 110 may be configured, for example, by integrating the camera 112 and the camera adapter 115. Furthermore, the front-end server 121 may have at least a part of the functions of the camera adapter 115. Note that the sensor systems 110a to 110z are not limited to the same configuration, and a part or all of the sensor system 110 may have a different configuration.

センサシステム１１０ａのマイク１１１ａで集音された音声とカメラ１１２ａで撮影された画像は、カメラアダプタ１１５ａにおいて画像処理が施された後、ネットワーク１６３ａを通してセンサシステム１１０ｂのカメラアダプタ１１５ｂに伝送される。同様に、センサシステム１１０ｂは、集音された音声と撮影された画像をセンサシステム１１０ａから取得した画像及び音声と合わせて、ネットワーク１６３ｂを介してセンサシステム１１０ｃに伝送する。この動作を続けることにより、センサシステム１１０ａ〜１１０ｚにより取得した画像及び音声が、センサシステム１１０ｚからネットワーク１６２ｂ及びスイッチングハブ１４０等を介して画像コンピューティングサーバ１２０へ伝送される。 The sound collected by the microphone 111a of the sensor system 110a and the image captured by the camera 112a are subjected to image processing by the camera adapter 115a, and then transmitted to the camera adapter 115b of the sensor system 110b through the network 163a. Similarly, the sensor system 110b transmits the collected sound and the captured image together with the image and sound acquired from the sensor system 110a to the sensor system 110c via the network 163b. By continuing this operation, the images and sounds acquired by the sensor systems 110a to 110z are transmitted from the sensor system 110z to the image computing server 120 via the network 162b, the switching hub 140, and the like.

なお、本実施形態では、センサシステム１１０において、カメラ１１２とカメラアダプタ１１５とが分離された構成にしているが、同一筺体で一体化されていてもよい。その場合、マイク１１１は、一体化されたカメラ１１２に内蔵されてもよいし、カメラ１１２の外部に接続されていてもよい。 Although the camera system 112 and the camera adapter 115 are separated in the sensor system 110 in the present embodiment, they may be integrated in the same housing. In that case, the microphone 111 may be built in the integrated camera 112, or may be connected to the outside of the camera 112.

次に、画像コンピューティングサーバ１２０の構成及び動作について説明する。本実施形態における画像コンピューティングサーバ１２０は、センサシステム１１０ｚから取得したデータの処理を行う。画像コンピューティングサーバ１２０は、フロントエンドサーバ１２１、データベース（ＤＢ）１２２、バックエンドサーバ１２３、及びタイムサーバ１２４を有する。 Next, the configuration and operation of the image computing server 120 will be described. The image computing server 120 in this embodiment processes the data acquired from the sensor system 110z. The image computing server 120 includes a front end server 121, a database (DB) 122, a back end server 123, and a time server 124.

タイムサーバ１２４は、時刻及び同期信号を配信する機能を有し、スイッチングハブ１４０を介してセンサシステム１１０ａ〜１１０ｚに時刻及び同期信号を配信する。時刻及び同期信号を受信したセンサシステム１１０ａ〜１１０ｚのカメラアダプタ１１５ａ〜１１５ｚは、カメラ１１２ａ〜１１２ｚを時刻及び同期信号を基に外部同期（Ｇｅｎｌｏｃｋ）させ画像フレーム同期を行う。すなわち、タイムサーバ１２４は、複数のカメラ１１２の撮影タイミングを同期させる。これにより、画像処理システム１００は同じタイミングで撮影された複数の撮影画像に基づいて仮想視点画像を生成できるため、撮影タイミングのずれによる仮想視点画像の品質低下を抑制できる。なお、本実施形態では、タイムサーバ１２４が複数のカメラ１１２の時刻同期を管理するが、これに限らず、時刻同期のための処理を各カメラ１１２又は各カメラアダプタ１１５が独立して行ってもよい。 The time server 124 has a function of delivering a time and a synchronization signal, and delivers the time and a synchronization signal to the sensor systems 110a to 110z via the switching hub 140. The camera adapters 115a to 115z of the sensor systems 110a to 110z that have received the time and the synchronization signal perform image frame synchronization by externally synchronizing (Genlock) the cameras 112a to 112z based on the time and the synchronization signal. That is, the time server 124 synchronizes the shooting timings of the plurality of cameras 112. As a result, the image processing system 100 can generate a virtual viewpoint image based on a plurality of captured images captured at the same timing, and thus it is possible to suppress deterioration in the quality of the virtual viewpoint image due to a shift in the capturing timing. In addition, in the present embodiment, the time server 124 manages the time synchronization of the plurality of cameras 112, but the present invention is not limited to this, and even if each camera 112 or each camera adapter 115 independently performs the process for time synchronization. Good.

フロントエンドサーバ１２１は、センサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換した後に、カメラの識別子やデータ種別、フレーム番号に応じてデータベース１２２に書き込む。バックエンドサーバ１２３は、仮想カメラ操作ＵＩ１３２から視点の指定を受け付け、受け付けた視点に基づいて、データベース１２２から対応する画像及び音声データを読み出し、レンダリング処理等を行って仮想視点画像を生成する。 The front-end server 121 reconstructs the segmented transmission packet from the image and sound acquired from the sensor system 110z to convert the data format, and then stores the data in the database 122 according to the camera identifier, the data type, and the frame number. Write. The back-end server 123 receives the designation of the viewpoint from the virtual camera operation UI 132, reads the corresponding image and audio data from the database 122 based on the received viewpoint, and performs a rendering process or the like to generate a virtual viewpoint image.

なお、画像コンピューティングサーバ１２０の構成は、これに限られるものではない。例えば、フロントエンドサーバ１２１、データベース１２２、及びバックエンドサーバ１２３のうちの少なくとも２つが一体となって構成されていてもよい。また、フロントエンドサーバ１２１、データベース１２２、及びバックエンドサーバ１２３の少なくとも何れかが複数含まれてもよい。また、画像コンピューティングサーバ１２０内の任意の位置に他の装置が含まれてもよい。さらに、画像コンピューティングサーバ１２０の機能の少なくとも一部をエンドユーザ端末１５０や仮想カメラ操作ＵＩ１３２が有していてもよい。 The configuration of the image computing server 120 is not limited to this. For example, at least two of the front end server 121, the database 122, and the back end server 123 may be integrally configured. Further, a plurality of at least one of the front end server 121, the database 122, and the back end server 123 may be included. Further, other devices may be included at any position in the image computing server 120. Furthermore, the end user terminal 150 and the virtual camera operation UI 132 may have at least a part of the functions of the image computing server 120.

レンダリング処理された画像は、バックエンドサーバ１２３からエンドユーザ端末１５０に送信され、エンドユーザ端末１５０を操作するユーザは視点の指定に応じた画像閲覧及び音声視聴を行うことができる。すなわち、バックエンドサーバ１２３は、複数のカメラ１１２により撮影された撮影画像（複数の視点画像）と視点情報とに基づく仮想視点コンテンツを生成する。具体的には、バックエンドサーバ１２３は、例えば複数のカメラアダプタ１１５により複数のカメラ１１２による撮影画像から抽出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ１２３は、生成した仮想視点コンテンツをエンドユーザ端末１５０に提供する。 The rendered image is transmitted from the backend server 123 to the end user terminal 150, and the user operating the end user terminal 150 can perform image viewing and audio viewing according to the designation of the viewpoint. That is, the back-end server 123 generates virtual viewpoint content based on captured images (plurality of viewpoint images) captured by the plurality of cameras 112 and viewpoint information. Specifically, the back-end server 123 uses the virtual viewpoint content based on the image data of the predetermined area extracted from the images captured by the plurality of cameras 112 by the plurality of camera adapters 115 and the viewpoint specified by the user operation. To generate. Then, the backend server 123 provides the generated virtual viewpoint content to the end user terminal 150.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像であるとも言える。仮想的な視点（仮想視点）は、ユーザにより指定されてもよいし、画像解析の結果等に基づいて自動的に指定されてもよい。すなわち、仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, it can be said that the virtual viewpoint image is an image showing the appearance at the specified viewpoint. The virtual viewpoint (virtual viewpoint) may be designated by the user, or may be automatically designated based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. Further, the image corresponding to the viewpoint specified by the user from the plurality of candidates and the image corresponding to the viewpoint automatically specified by the device are also included in the virtual viewpoint image.

なお、本実施形態では、仮想視点コンテンツは、音声データ（オーディオデータ）を含むものとして説明するが、音声データを含まなくてもよい。また、バックエンドサーバ１２３は、仮想視点画像を例えばＨ．２６４やＨＥＶＣ等の符号化方式に従って圧縮符号化したうえで、ＭＰＥＧ−ＤＡＳＨプロトコルを使ってエンドユーザ端末１５０へ送信してもよい。また、仮想視点画像は、非圧縮でエンドユーザ端末１５０へ送信されてもよい。例えば、圧縮符号化を行う前者はエンドユーザ端末１５０としてスマートフォンやタブレットを想定しており、後者は非圧縮画像を表示可能なディスプレイを想定している。すなわち、エンドユーザ端末１５０の種別に応じて画像フォーマットを切り替えるようにしてもよい。また、画像の送信プロトコルはＭＰＥＧ−ＤＡＳＨに限らず、例えば、ＨＬＳ（ＨＴＴＰＬｉｖｅＳｔｒｅａｍｉｎｇ）やその他の送信方法を用いてもよい。 In the present embodiment, the virtual viewpoint content is described as including audio data (audio data), but it does not have to include audio data. In addition, the back-end server 123 sends the virtual viewpoint image to, for example, H.264. It may be compressed and encoded according to an encoding method such as H.264 or HEVC and then transmitted to the end user terminal 150 using the MPEG-DASH protocol. Further, the virtual viewpoint image may be transmitted to the end user terminal 150 without being compressed. For example, the former that performs compression encoding assumes a smartphone or tablet as the end user terminal 150, and the latter assumes a display capable of displaying non-compressed images. That is, the image format may be switched according to the type of the end user terminal 150. Further, the image transmission protocol is not limited to MPEG-DASH, and for example, HLS (HTTP Live Streaming) or another transmission method may be used.

このように、画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインはセンサシステム１１０ａ〜１１０ｚを含み、データ保存ドメインはフロントエンドサーバ１２１、データベース１２２、及びバックエンドサーバ１２３を含み、映像生成ドメインは仮想カメラ操作ＵＩ１３２及びエンドユーザ端末１５０を含む。 As described above, the image processing system 100 has three functional domains: a video collection domain, a data storage domain, and a video generation domain. The image collection domain includes the sensor systems 110a to 110z, the data storage domain includes the front end server 121, the database 122, and the back end server 123, and the image generation domain includes the virtual camera operation UI 132 and the end user terminal 150.

なお、本構成に限らず、例えば、仮想カメラ操作ＵＩ１３２が直接センサシステム１１０ａ〜１１０ｚから画像を取得することも可能である。しかしながら、本実施形態では、センサシステム１１０ａ〜１１０ｚから直接画像を取得する方法ではなく、データ保存機能を中間に配置する方法をとる。具体的には、フロントエンドサーバ１２１がセンサシステム１１０ａ〜１１０ｚが生成した画像データや音声データ及びそれらのデータのメタ情報をデータベース１２２の共通スキーマ及びデータ型に変換する。これにより、センサシステム１１０ａ〜１１０ｚのカメラ１１２が他機種のカメラに変化しても、変化した差分をフロントエンドサーバ１２１が吸収し、データベース１２２に登録することができる。このことによって、カメラ１１２が他機種カメラに変わった場合に、仮想カメラ操作ＵＩ１３２が適切に動作しないおそれを低減できる。 Note that the present invention is not limited to this configuration, and for example, the virtual camera operation UI 132 can directly acquire an image from the sensor systems 110a to 110z. However, in the present embodiment, a method of arranging a data storage function in the middle is adopted instead of a method of directly acquiring an image from the sensor systems 110a to 110z. Specifically, the front-end server 121 converts image data and audio data generated by the sensor systems 110a to 110z and meta information of these data into a common schema and data type of the database 122. As a result, even if the camera 112 of the sensor systems 110a to 110z changes to a camera of another model, the changed difference can be absorbed by the front-end server 121 and registered in the database 122. This can reduce the possibility that the virtual camera operation UI 132 does not operate properly when the camera 112 is changed to another model camera.

また、仮想カメラ操作ＵＩ１３２は、直接データベース１２２にアクセスせずにバックエンドサーバ１２３を介してアクセスする構成である。バックエンドサーバ１２３で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を仮想カメラ操作ＵＩ１３２で行う。このことにより、仮想カメラ操作ＵＩ１３２の開発において、ＵＩ操作デバイスや生成したい仮想視点画像を操作するＵＩの機能要求に対する開発に注力することができる。また、バックエンドサーバ１２３は、仮想カメラ操作ＵＩ１３２の要求に応じて画像生成処理に係わる共通処理を追加又は削除することも可能である。このことによって、仮想カメラ操作ＵＩ１３２の要求に柔軟に対応することができる。 Further, the virtual camera operation UI 132 is configured to be accessed via the backend server 123 without directly accessing the database 122. The back-end server 123 performs common processing related to the image generation processing, and the virtual camera operation UI 132 performs the difference portion of the application related to the operation UI. As a result, in the development of the virtual camera operation UI 132, it is possible to focus on the development of the function request of the UI operating device and the UI for operating the virtual viewpoint image to be generated. The back-end server 123 can also add or delete a common process related to the image generation process in response to a request from the virtual camera operation UI 132. This makes it possible to flexibly respond to the request of the virtual camera operation UI 132.

このように、画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２による撮影に基づく画像データに基づいて、バックエンドサーバ１２３により仮想視点画像が生成される。なお、本実施形態における画像処理システム１００は、前述した物理的な構成に限定されるものではなく、論理的に構成されていてもよい。以上で画像処理システム１００の概要の説明を終える。 As described above, in the image processing system 100, the back-end server 123 generates a virtual viewpoint image based on the image data based on the shooting by the plurality of cameras 112 for shooting the subject from the plurality of directions. The image processing system 100 according to the present embodiment is not limited to the physical configuration described above, and may be configured logically. This is the end of the description of the outline of the image processing system 100.

次に、カメラアダプタ１１５について説明する。図２は、カメラアダプタ１１５の構成例を示すブロック図である。本実施形態におけるカメラアダプタ１１５は、符号量制御を行う機能を有する。カメラアダプタ１１５は、ネットワークアダプタ２１０、伝送部２２０、画像処理部２３０、及び外部機器制御部２４０を有する。 Next, the camera adapter 115 will be described. FIG. 2 is a block diagram showing a configuration example of the camera adapter 115. The camera adapter 115 in this embodiment has a function of controlling the code amount. The camera adapter 115 includes a network adapter 210, a transmission unit 220, an image processing unit 230, and an external device control unit 240.

ネットワークアダプタ２１０は、データ送受信部２１１及び時刻制御部２１２を有する。データ送受信部２１１は、ネットワーク等を介して、他のカメラアダプタ１１５、フロントエンドサーバ１２１、タイムサーバ１２４、及び制御ステーション１３１とデータ通信を行う。例えば、データ送受信部２１１は、カメラ１１２による撮影画像から前景背景分離部２３１により分離された前景画像及び背景画像を、データルーティング処理部２２２の処理に応じて予め定められた次のカメラアダプタ１１５に出力する。各カメラアダプタ１１５が前景画像と背景画像とを出力することで、複数の視点から撮影された前景画像と背景画像に基づいて仮想視点画像が生成される。なお、画像処理システム１００において、前景画像は出力し背景画像は出力しないカメラアダプタ１１５が存在してもよい。 The network adapter 210 has a data transmission/reception unit 211 and a time control unit 212. The data transmission/reception unit 211 performs data communication with other camera adapters 115, front-end servers 121, time servers 124, and control stations 131 via a network or the like. For example, the data transmission/reception unit 211 transfers the foreground image and the background image separated by the foreground/background separation unit 231 from the image captured by the camera 112 to the next camera adapter 115 predetermined according to the processing of the data routing processing unit 222. Output. Each camera adapter 115 outputs the foreground image and the background image, so that a virtual viewpoint image is generated based on the foreground image and the background image captured from a plurality of viewpoints. In the image processing system 100, the camera adapter 115 that outputs the foreground image but does not output the background image may exist.

時刻制御部２１２は、例えばＩＥＥＥ１５８８規格のＯｒｄｉｎａｒｙＣｌｏｃｋに準拠し、タイムサーバ１２４との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ１２４と時刻同期を行う機能を有する。なお、ＩＥＥＥ１５８８規格に限らず、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバとの時刻同期を実現してもよい。本実施形態では、ネットワークアダプタ２１０としてＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）を利用するが、ＮＩＣに限定するものではなく、同様の他のインターフェースを利用してもよい。 The time control unit 212 conforms to the Ordinary Clock of the IEEE 1588 standard, for example, and has a function of storing a time stamp of data transmitted/received to/from the time server 124 and a function of performing time synchronization with the time server 124. The time synchronization with the time server may be realized not only by the IEEE1588 standard but also by another EtherAVB standard or an original protocol. In the present embodiment, a NIC (Network Interface Card) is used as the network adapter 210, but the network adapter 210 is not limited to the NIC and other similar interfaces may be used.

伝送部２２０は、ネットワークアダプタ２１０を介してスイッチングハブ１４０等に対するデータの伝送を制御する機能を有する。伝送部２２０は、データ圧縮・伸張部２２１、データルーティング処理部２２２、時刻同期制御部２２３、画像・音声伝送処理部２２４、及びデータルーティング情報保持部２２５を有する。データ圧縮・伸張部２２１は、データ送受信部２１１を介して送受信されるデータに対して、所定の圧縮方式、圧縮率、及びフレームレートを適用してデータを圧縮する機能と、圧縮されたデータを伸張する機能を有する。 The transmission unit 220 has a function of controlling data transmission to the switching hub 140 and the like via the network adapter 210. The transmission unit 220 includes a data compression/decompression unit 221, a data routing processing unit 222, a time synchronization control unit 223, an image/voice transmission processing unit 224, and a data routing information holding unit 225. The data compression/decompression unit 221 applies a predetermined compression method, a compression rate, and a frame rate to the data transmitted/received via the data transmission/reception unit 211, and compresses the data. It has the function of expanding.

データルーティング処理部２２２は、データルーティング情報保持部２２５が保持する情報を利用し、データ送受信部２１１が受信したデータや画像処理部２３０で処理されたデータのルーティング先を決定する。また、データルーティング処理部２２２は、決定したルーティング先へデータを送信させる機能を有する。データルーティング情報保持部２２５は、データ送受信部２１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する機能を有する。 The data routing processing unit 222 uses the information held by the data routing information holding unit 225 to determine the routing destination of the data received by the data transmitting/receiving unit 211 or the data processed by the image processing unit 230. In addition, the data routing processing unit 222 has a function of transmitting data to the determined routing destination. The data routing information holding unit 225 has a function of holding address information for determining a transmission destination of data transmitted/received by the data transmission/reception unit 211.

ここで、ルーティング先は、同一の注視点に向けられたカメラ１１２に対応するカメラアダプタ１１５とするのが、カメラ同士の画像フレーム相関が高いため画像処理を行う上で好適である。複数のカメラアダプタ１１５のデータルーティング処理部２２２による決定に応じて、画像処理システム１００内において前景画像や背景画像をリレー形式で出力するカメラアダプタ１１５の順序が定まる。 Here, the routing destination is the camera adapter 115 corresponding to the cameras 112 directed to the same gazing point, since it is high in image frame correlation between the cameras, which is suitable for performing image processing. The order of the camera adapters 115 that output the foreground image and the background image in the relay format in the image processing system 100 is determined according to the determination by the data routing processing unit 222 of the plurality of camera adapters 115.

時刻同期制御部２２３は、ＩＥＥＥ１５８８規格のＰＴＰ（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌ）に準拠し、タイムサーバ１２４と時刻同期に係わる処理を行う機能を有する。なお、ＰＴＰに限らず、他の同様のプロトコルを利用して時刻同期に係わる処理を行ってもよい。 The time synchronization control unit 223 complies with the PTP (Precision Time Protocol) of the IEEE1588 standard, and has a function of performing processing related to time synchronization with the time server 124. Note that the process related to time synchronization may be performed using not only PTP but another similar protocol.

画像・音声伝送処理部２２４は、画像データ又は音声データを、データ送受信部２１１を介して他のカメラアダプタ１１５又はフロントエンドサーバ１２１へ転送するためのメッセージを作成する機能を有する。メッセージには、画像データ又は音声データ、及び各データのメタ情報が含まれる。メタ情報には、画像の撮影又は音声のサンプリングを行った時のタイムコード又はシーケンス番号、データ種別、及びカメラ１１２やマイク１１１の個体を示す識別子等が含まれる。 The image/sound transmission processing unit 224 has a function of creating a message for transferring image data or sound data to another camera adapter 115 or the front end server 121 via the data transmission/reception unit 211. The message includes image data or audio data, and meta information of each data. The meta information includes a time code or sequence number at the time of capturing an image or sampling audio, a data type, an identifier indicating an individual camera 112 or microphone 111, and the like.

また、画像・音声伝送処理部２２４は、他のカメラアダプタ１１５からデータ送受信部２１１を介してメッセージを受け取る。そして、画像・音声伝送処理部２２４は、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を画像データ又は音声データに復元する。なお、送信する画像データ又は音声データは、データ圧縮・伸張部２２１でデータ圧縮されていてもよい。また、データを復元した際にデータが圧縮されている場合、データ圧縮・伸張部２２１が伸張処理を行う。 The image/sound transmission processing unit 224 also receives a message from the other camera adapter 115 via the data transmission/reception unit 211. Then, the image/sound transmission processing unit 224 restores the data information fragmented to the packet size defined by the transmission protocol into image data or sound data according to the data type included in the message. The image data or audio data to be transmitted may be data-compressed by the data compression/decompression unit 221. If the data is compressed when the data is restored, the data compression/decompression unit 221 performs decompression processing.

画像処理部２３０は、カメラ１１２が撮影した画像データ及び他のカメラアダプタ１１５から受け取った画像データに対して処理を行う機能を有する。画像処理部２３０は、前景背景分離部２３１、優先度生成部２３２、及びキャリブレーション制御部２３３を有する。 The image processing unit 230 has a function of processing image data captured by the camera 112 and image data received from another camera adapter 115. The image processor 230 includes a foreground/background separator 231, a priority generator 232, and a calibration controller 233.

前景背景分離部２３１は、カメラ１１２が撮影した画像データを前景画像と背景画像に分離する機能を有する。すなわち、複数のカメラアダプタ１１５のそれぞれの前景背景分離部２３１は、複数のカメラ１１２のうち、対応するカメラ１１２による撮影画像から所定領域を抽出する。所定領域は、例えば撮影画像に対するオブジェクト検出の結果得られる前景画像であり、この抽出により前景背景分離部２３１は、撮影画像を前景画像と背景画像に分離する。なお、オブジェクトとは、例えば人物である。オブジェクトが、特定人物（選手、監督、及び／又は審判等）であってもよいし、ボールやゴール等の画像パターンが予め定められている物体であってもよい。また、オブジェクトとして、動体が検出されるようにしてもよい。 The foreground/background separation unit 231 has a function of separating the image data captured by the camera 112 into a foreground image and a background image. That is, the foreground/background separating unit 231 of each of the plurality of camera adapters 115 extracts a predetermined area from the image captured by the corresponding camera 112 among the plurality of cameras 112. The predetermined area is, for example, a foreground image obtained as a result of object detection on the captured image, and the foreground/background separation unit 231 separates the captured image into the foreground image and the background image by this extraction. The object is, for example, a person. The object may be a specific person (player, manager, and/or referee, etc.), or may be an object for which an image pattern such as a ball or goal is predetermined. Further, a moving body may be detected as the object.

前景背景分離部２３１により人物等の重要なオブジェクトを含む前景画像とオブジェクトを含まない背景画像を分離して処理することで、画像処理システム１００において生成される仮想視点画像のオブジェクトに該当する部分の画像の品質を向上できる。また、前景と背景の分離を複数のカメラアダプタ１１５のそれぞれが行うことで、複数のカメラ１１２を有する画像処理システム１００における負荷を分散させることができる。なお、所定領域は、前景画像に限らず、例えば背景画像であってもよい。 The foreground/background separation unit 231 separates and processes the foreground image including an important object such as a person and the background image that does not include the object, so that a portion corresponding to the object of the virtual viewpoint image generated in the image processing system 100 is processed. The image quality can be improved. Further, by separating the foreground and the background by each of the plurality of camera adapters 115, it is possible to distribute the load on the image processing system 100 having the plurality of cameras 112. The predetermined area is not limited to the foreground image, and may be the background image, for example.

優先度生成部２３２は、前景背景分離部２３１で分離された前景画像及びカメラパラメータを利用して優先度を生成する機能を有する。カメラパラメータは、カメラ固有の内部パラメータ（焦点距離、センサピッチ、画像中心、及びレンズ歪みパラメータ等）と、世界座標系に対するカメラの位置姿勢を表す外部パラメータ（回転行列及び位置ベクトル等）を含む。 The priority generation unit 232 has a function of generating a priority by using the foreground image separated by the foreground background separation unit 231 and the camera parameter. The camera parameters include internal parameters specific to the camera (focal length, sensor pitch, image center, lens distortion parameter, etc.) and external parameters (rotation matrix, position vector, etc.) representing the position and orientation of the camera with respect to the world coordinate system.

キャリブレーション制御部２３３は、キャリブレーションに必要な画像データを、カメラ制御部２４１を介してカメラ１１２から取得し、キャリブレーションに係わる演算処理を行うフロントエンドサーバ１２１に送信する機能を有する。また、キャリブレーション制御部２３３は、カメラ制御部２４１を介してカメラ１１２から取得した画像データに対して、予め設定されたパラメータに応じて撮影中のキャリブレーション（動的キャリブレーション）を行う機能を有する。本実施形態では、キャリブレーションに係わる演算処理をフロントエンドサーバ１２１で行うが、演算処理を行うノードはフロントエンドサーバ１２１に限定されない。例えば、制御ステーション１３１やカメラアダプタ１１５（他のカメラアダプタ１１５を含む）等の他のノードで演算処理を行ってもよい。 The calibration control unit 233 has a function of acquiring image data required for calibration from the camera 112 via the camera control unit 241 and transmitting the image data to the front-end server 121 that performs arithmetic processing related to calibration. Further, the calibration control unit 233 has a function of performing calibration (dynamic calibration) during image capturing on image data acquired from the camera 112 via the camera control unit 241 according to preset parameters. Have. In this embodiment, the calculation process related to the calibration is performed by the front-end server 121, but the node that performs the calculation process is not limited to the front-end server 121. For example, the arithmetic processing may be performed by another node such as the control station 131 or the camera adapter 115 (including another camera adapter 115).

外部機器制御部２４０は、カメラアダプタ１１５に接続する機器を制御する機能を有する。外部機器制御部２４０は、カメラ制御部２４１、マイク制御部２４２、雲台制御部２４３、及びセンサ制御部２４４を有する。 The external device control unit 240 has a function of controlling a device connected to the camera adapter 115. The external device control unit 240 includes a camera control unit 241, a microphone control unit 242, a platform control unit 243, and a sensor control unit 244.

カメラ制御部２４１は、カメラ１１２と接続し、カメラ１１２の制御、撮影画像取得、同期信号提供、及び時刻設定等を行う機能を有する。カメラ１１２の制御には、例えば撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランスの設定等）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、及びエラー等）の取得、撮影の開始及び停止や、ピント調整等がある。なお、本実施形態では、カメラ１１２を介してピント調整を行うが、取り外し可能なレンズがカメラ１１２に装着されている場合、カメラアダプタ１１５がレンズに接続して直接レンズの調整を行ってもよい。また、カメラアダプタ１１５がカメラ１１２を介してズーム等のレンズ調整を行ってもよい。 The camera control unit 241 has a function of connecting to the camera 112 and controlling the camera 112, acquiring a captured image, providing a synchronization signal, setting a time, and the like. Control of the camera 112 includes, for example, setting and reference of shooting parameters (number of pixels, color depth, frame rate, white balance, etc.), status of the camera 112 (shooting, stopping, synchronizing, error, etc.). Acquisition, start and stop of shooting, and focus adjustment. In this embodiment, focus adjustment is performed via the camera 112, but when a detachable lens is attached to the camera 112, the camera adapter 115 may be connected to the lens to directly adjust the lens. .. Further, the camera adapter 115 may perform lens adjustment such as zooming via the camera 112.

同期信号提供は、時刻同期制御部２２３がタイムサーバ１２４と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。時刻設定は、時刻同期制御部２２３がタイムサーバ１２４と同期した時刻を、例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受け取る画像データに提供したタイムコードが付与される。なお、タイムコードのフォーマットは、ＳＭＰＴＥ１２Ｍに限らず、他のフォーマットであってもよい。また、カメラ制御部２４１は、カメラ１１２に対してタイムコードの提供を行わず、カメラ１１２から受け取った画像データに自身がタイムコードを付与してもよい。 The synchronization signal is provided by the time synchronization control unit 223 using the time synchronized with the time server 124 and providing the camera 112 with the shooting timing (control clock). The time setting is performed by providing the time synchronized with the time server 124 by the time synchronization control unit 223 with a time code compliant with the format of SMPTE12M, for example. As a result, the time code provided to the image data received from the camera 112 is added. The format of the time code is not limited to SMPTE12M and may be another format. Further, the camera control unit 241 may add the time code to the image data received from the camera 112, without providing the time code to the camera 112.

マイク制御部２４２は、マイク１１１と接続し、マイク１１１の制御、収音の開始及び停止、及び収音された音声データの取得等を行う機能を有する。マイク１１１の制御は、例えば、ゲイン調整や状態取得等である。また、カメラ制御部２４１と同様に、マイク制御部２４２は、マイク１１１に対して音声サンプリングするタイミングとタイムコードを提供する。音声サンプリングのタイミングとなるクロック情報としては、タイムサーバ１２４からの時刻情報が、例えば４８ｋＨｚのワードクロックに変換されてマイク１１１に供給される。 The microphone control unit 242 is connected to the microphone 111 and has functions of controlling the microphone 111, starting and stopping sound collection, and acquiring sound data collected. The control of the microphone 111 is, for example, gain adjustment, state acquisition, or the like. Further, like the camera control unit 241, the microphone control unit 242 provides the microphone 111 with the timing and time code for audio sampling. As clock information that is the timing of audio sampling, time information from the time server 124 is converted into a word clock of 48 kHz and supplied to the microphone 111.

雲台制御部２４３は、雲台１１３と接続し、雲台１１３の制御を行う機能を有する。雲台１１３の制御は、例えば、パン・チルト制御や状態取得等がある。センサ制御部２４４は、外部センサ１１４と接続し、外部センサ１１４がセンシングしたセンサ情報を取得する機能を有する。例えば、外部センサ１１４としてジャイロセンサが利用される場合には、振動を表す情報を取得することができる。そして、センサ制御部２４４が取得した振動情報を用いて、画像処理部２３０は、前景背景分離部２３１での処理に先立って、振動を抑えた画像を生成することができる。 The platform controller 243 has a function of connecting to the platform 113 and controlling the platform 113. Control of the pan head 113 includes, for example, pan/tilt control and state acquisition. The sensor control unit 244 has a function of connecting to the external sensor 114 and acquiring sensor information sensed by the external sensor 114. For example, when a gyro sensor is used as the external sensor 114, it is possible to acquire information indicating vibration. Then, using the vibration information acquired by the sensor control unit 244, the image processing unit 230 can generate an image with suppressed vibration prior to the processing in the foreground/background separation unit 231.

振動情報は、例えば、８Ｋカメラの画像データを、振動情報を考慮して、元の８Ｋサイズよりも小さいサイズで切り出して、隣接設置されたカメラ１１２の画像との位置合わせを行う場合に利用される。これにより、建造物の躯体振動が各カメラに異なる周波数で伝搬しても、カメラアダプタ１１５に配備された本機能で位置合わせを行う。その結果、電子的に防振された画像データを生成でき、画像コンピューティングサーバ１２０におけるカメラ１１２の台数分の位置合わせの処理負荷を軽減する効果が得られる。なお、センサシステム１１０のセンサは、外部センサ１１４に限らず、カメラアダプタ１１５に内蔵されたセンサであってもよく、同様の効果が得られる。 The vibration information is used, for example, when the image data of the 8K camera is cut out in a size smaller than the original 8K size in consideration of the vibration information and the position of the image data of the adjacent camera 112 is aligned. It As a result, even if the structure vibration of the building propagates to each camera at different frequencies, the alignment is performed by this function provided in the camera adapter 115. As a result, electronically image-stabilized image data can be generated, and the effect of reducing the processing load of positioning for the number of cameras 112 in the image computing server 120 can be obtained. The sensor of the sensor system 110 is not limited to the external sensor 114 but may be a sensor built in the camera adapter 115, and the same effect can be obtained.

以上で、カメラアダプタ１１５の構成の説明を終える。カメラアダプタ１１５において、符号量制御に深く関わりを持つのが、優先度生成部２３２、データ圧縮・伸張部２２１、及び画像・音声伝送処理部２２４である。以下、図３及び図４を用いて符号量制御に係る符号の優先度を生成するための構成及び処理の流れについて説明する。 This is the end of the description of the configuration of the camera adapter 115. In the camera adapter 115, the priority generation unit 232, the data compression/decompression unit 221, and the image/sound transmission processing unit 224 are closely related to the code amount control. Hereinafter, the configuration and the flow of processing for generating the priority of the code relating to the code amount control will be described with reference to FIGS. 3 and 4.

図３は、カメラアダプタ１１５内の画像処理部２３０の構成例を示すブロック図である。図３において、図２に示したブロックと同一の機能を有するブロックには同一の符号を付している。キャリブレーション制御部２３３は、入力された画像に対して、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因するブレに対して画像の位置を安定させるためのブレ補正処理（電子防振処理）等を行う。 FIG. 3 is a block diagram showing a configuration example of the image processing unit 230 in the camera adapter 115. 3, blocks having the same functions as the blocks shown in FIG. 2 are designated by the same reference numerals. The calibration control unit 233 performs color correction processing on the input image to suppress color variations between cameras, and shake correction for stabilizing the position of the image against shake caused by camera vibration. Perform processing (electronic anti-vibration processing), etc.

前景背景分離部２３１について説明する。前景分離部３１１は、カメラ１１２の画像に関して位置合わせが行われた画像データに対して、背景画像３１２との比較により前景画像の分離処理を行う。ここで得られた前景画像は、その画像データと画像全体における前景画像領域のオフセット値（例えば、前景画像の左上の画素の位置）を対にして出力される。背景更新部３１３は、背景画像３１２とカメラ１１２の位置合わせが行われた画像を用いて新しい背景画像を生成し、背景画像３１２を新しい背景画像に更新する。背景切出部３１４は、背景画像３１２の一部を切り出す制御を行う。 The foreground/background separation unit 231 will be described. The foreground separation unit 311 performs the separation process of the foreground image by comparing the image data of which the position of the image of the camera 112 has been adjusted with the background image 312. The foreground image obtained here is output as a pair of the image data and the offset value of the foreground image region in the entire image (for example, the position of the upper left pixel of the foreground image). The background updating unit 313 generates a new background image using the background image 312 and the image in which the position of the camera 112 has been adjusted, and updates the background image 312 to the new background image. The background cutout unit 314 controls to cut out a part of the background image 312.

優先度生成部２３２について説明する。以降の説明では、前景画像を、被写体を映した画像領域として「被写体画像」とも呼ぶ。カメラパラメータ受信部３２１は、カメラパラメータを受信する。カメラパラメータは、例えばキャリブレーション処理で得られる情報であり、制御ステーション１３１から対象となるカメラアダプタ１１５に対して送信及び設定される。解像度取得部３２２は、前景分離部３１１で分離された被写体画像のオフセット値と、伝送部２２０を介して受信したカメラパラメータを用いて、被写体画像の解像度を取得し、分離パラメータ算出部３２３及び優先度算出部３２４に出力する。分離パラメータ算出部３２３は、入力された被写体画像内での被写体の解像度からデータ圧縮・伸張部２２１で利用される分離パラメータを算出し、伝送部２２０及び優先度算出部３２４に出力する。優先度算出部３２４は、入力された被写体の解像度及び分離パラメータから符号化後の各解像度成分に対する優先度を算出し、伝送部２２０に出力する。 The priority generation unit 232 will be described. In the following description, the foreground image is also referred to as a “subject image” as an image region showing the subject. The camera parameter receiving unit 321 receives camera parameters. The camera parameter is, for example, information obtained by the calibration process, and is transmitted and set from the control station 131 to the target camera adapter 115. The resolution acquisition unit 322 acquires the resolution of the subject image using the offset value of the subject image separated by the foreground separation unit 311 and the camera parameter received via the transmission unit 220, and the separation parameter calculation unit 323 and the priority are acquired. It is output to the degree calculation unit 324. The separation parameter calculation unit 323 calculates the separation parameter used by the data compression/decompression unit 221 from the resolution of the subject in the input subject image, and outputs it to the transmission unit 220 and the priority calculation unit 324. The priority calculation unit 324 calculates a priority for each encoded resolution component from the input subject resolution and separation parameters, and outputs the calculated priority to the transmission unit 220.

図４（Ａ）は、優先度生成部２３２による優先度及び分離パラメータの算出処理の例を示すフローチャートである。ステップＳ４０１では、カメラパラメータ受信部３２１が、撮影平面（地面）に対するカメラの位置・姿勢を示すカメラパラメータを入力する。ステップＳ４０２では、前景分離部３１１が、被写体画像の元画像全体におけるオフセット値を入力する。ステップＳ４０３では、解像度取得部３２２が、被写体画像の解像度を取得する。ここでは解像度を、撮像された被写体表面を表す単位面積あたりの画素数と定義する。解像度は、撮像センサのセンサピッチと焦点距離、被写体までの距離に依存する。例えば、解像度取得部３２２は、焦点距離が大きい程、あるいはセンサピッチが小さい程、大きい値を被写体画像の解像度とする。また、例えば、解像度取得部３２２は、被写体までの距離大きい程、小さい値を被写体画像の解像度とする。 FIG. 4A is a flowchart showing an example of priority and separation parameter calculation processing by the priority generation unit 232. In step S401, the camera parameter receiving unit 321 inputs the camera parameter indicating the position/orientation of the camera with respect to the shooting plane (ground). In step S402, the foreground separation unit 311 inputs the offset value of the entire original image of the subject image. In step S403, the resolution acquisition unit 322 acquires the resolution of the subject image. Here, the resolution is defined as the number of pixels per unit area representing the surface of the imaged subject. The resolution depends on the sensor pitch of the image sensor, the focal length, and the distance to the subject. For example, the resolution acquisition unit 322 sets the larger value as the resolution of the subject image as the focal length is larger or the sensor pitch is smaller. Further, for example, the resolution acquisition unit 322 sets the smaller value as the resolution of the subject image as the distance to the subject increases.

ステップＳ４０４では、分離パラメータ算出部３２３が、分離パラメータを算出する。本実施形態では、画像データをウェーブレット変換により周波数の異なる解像度成分に分離するものとし、分離パラメータは、ウェーブレット変換回数を表すものとする。ウェーブレット変換処理には方向があり、横方向に分離すると、元の画像サイズに対して横方向が半分であり、縦方向が同じサイズの２つの画像が高周波と低周波の解像度成分として生成される。ここで生成された解像度成分の画像をそれぞれさらに縦方向に分離すると、元の画像サイズに対して縦横のサイズが半分の４つの解像度成分が生成される。このように、縦横１回ずつの変換をあわせて１回のウェーブレット変換とする。 In step S404, the separation parameter calculation unit 323 calculates the separation parameter. In this embodiment, the image data is separated into resolution components having different frequencies by wavelet transform, and the separation parameter represents the number of wavelet transforms. The wavelet transform process has a direction, and when separated in the horizontal direction, two images having a horizontal size that is half the original image size and the same vertical size are generated as high-frequency and low-frequency resolution components. .. When the images of the resolution components generated here are further separated in the vertical direction, four resolution components whose vertical and horizontal sizes are half the original image size are generated. In this way, the vertical and horizontal conversions are performed once to form one wavelet conversion.

ウェーブレット変換によって生成される４つの解像度成分のうち、縦横ともに低周波の解像度成分をＬＬ成分と呼び、縦方向が高周波で横方向が低周波の解像度成分をＬＨ成分と呼ぶ。また、縦方向が低周波で横方向が高周波の解像度成分をＨＬ成分と呼び、縦横ともに高周波の解像度成分をＨＨ成分と呼ぶ。一般に、ＬＬ成分が元の画像に対して半分の解像度の縮小画像であり、ＬＨ成分が縦方向のエッジ情報、ＨＬ成分が横方向のエッジ情報、及びＨＨ成分が斜め方向のエッジ情報を保持している。さらに、解像度成分を分離する場合、ＬＬ成分についてウェーブレット変換を再度行い４つ解像度成分に階層的に分離する。 Of the four resolution components generated by the wavelet transform, a resolution component having a low frequency in both the vertical and horizontal directions is called an LL component, and a resolution component having a high frequency in the vertical direction and a low frequency in the horizontal direction is called an LH component. A resolution component having a low frequency in the vertical direction and a high frequency in the horizontal direction is called an HL component, and a resolution component having a high frequency in both the vertical and horizontal directions is called an HH component. In general, the LL component is a reduced image with half the resolution of the original image, the LH component holds vertical edge information, the HL component holds horizontal edge information, and the HH component holds diagonal edge information. ing. Further, when the resolution components are separated, the LL component is wavelet-transformed again to be hierarchically separated into four resolution components.

図５に、３回のウェーブレット変換を適用した例を示す。（０，０）は１回目の変換におけるＨＨ成分であり、（０，１）は１回目の変換におけるＨＬ成分であり、（１，０）は１回目の変換におけるＬＨ成分である。図５において、括弧内の１番目の要素は横方向の低周波成分を抽出した回数を示し、２番目の要素は縦方向の低周波成分を抽出した回数を示す。したがって、（３，３）は３回のウェーブレット変換の結果得られたＬＬ成分である。 FIG. 5 shows an example in which the wavelet transform is applied three times. (0,0) is the HH component in the first conversion, (0,1) is the HL component in the first conversion, and (1,0) is the LH component in the first conversion. In FIG. 5, the first element in parentheses indicates the number of times the horizontal low-frequency component is extracted, and the second element indicates the number of times the vertical low-frequency component is extracted. Therefore, (3, 3) is the LL component obtained as a result of three wavelet transforms.

前述のように、１回のウェーブレット変換により、縦横それぞれにおいて変換前の半分の解像度の画像がＬＬ成分として生成される。本実施形態では、ＬＬ成分での被写体の解像度が予め与えられた要求される解像度＜α＞を下回るまでウェーブレット変換を実行する。要求解像度とは、仮想視点画像の生成において、映像品質を担保するために最低限必要とする解像度である。入力された被写体画像の解像度が＜ｄ＞である場合、Ｓ回のウェーブレット変換を行った後のＬＬ成分の解像度は（ｄ×４^-S）となる。したがって、ＬＬ成分での被写体の解像度が要求される解像度＜α＞を下回る最小の回数Ｓは、次のように求められる。つまり、以下の例では、被写体画像の解像度＜ｄ＞と、被写体について要求される解像度＜α＞との比に基づいて、分離パラメータとしての回数Ｓが決定される。 As described above, a wavelet transform is performed once to generate an image having half the resolution before transformation in the vertical and horizontal directions as an LL component. In the present embodiment, the wavelet transform is executed until the resolution of the subject in the LL component falls below a predetermined required resolution <α>. The required resolution is the minimum resolution required to secure the video quality in the generation of the virtual viewpoint image. When the resolution of the input subject image is <d>, the resolution of the LL component after performing the wavelet transform S times is (d×4 ^−S ). Therefore, the minimum number S of times when the resolution of the subject in the LL component is lower than the required resolution <α> is obtained as follows. That is, in the following example, the number S of times as the separation parameter is determined based on the ratio between the resolution <d> of the subject image and the resolution <α> required for the subject.

ステップＳ４０５では、優先度算出部３２４が、優先度Ｐを算出する。優先度は、成分が表す画像情報の解像度が小さくなる、つまり低周波であるほど大きな値を設定する。図５において説明した縦横方向の低周波成分の抽出回数を（Ｗｘ，Ｗｙ）で表現すると、優先度Ｐは以下の式で定義される。 In step S405, the priority calculation unit 324 calculates the priority P. The priority is set to a larger value as the resolution of the image information represented by the component becomes smaller, that is, as the frequency becomes lower. When the number of times of extracting the low-frequency components in the vertical and horizontal directions described in FIG. 5 is expressed by (Wx, Wy), the priority P is defined by the following equation.

すなわち、優先度算出部３２４は、縦横方向の低周波成分の抽出回数（Ｗｘ，Ｗｙ）が多くなるに伴って高い優先度を算出し、被写体画像の解像度＜ｄ＞が小さくなるに伴って高い優先度を算出する。なお、図４（Ａ）に示した処理において、ステップＳ４０１とステップＳ４０２との処理順序は順不同であり、ステップＳ４０４とステップＳ４０５との処理順序は順不同である。 That is, the priority calculation unit 324 calculates a high priority as the number of times (Wx, Wy) of extracting low-frequency components in the horizontal and vertical directions increases, and increases as the resolution <d> of the subject image decreases. Calculate the priority. In the process illustrated in FIG. 4A, the processing order of step S401 and step S402 is out of order, and the processing order of step S404 and step S405 is out of order.

図４（Ｂ）は、図４（Ａ）に示したステップＳ４０３において解像度取得部３２２が被写体画像の解像度＜ｄ＞を取得する処理の例を示すフローチャートである。ステップＳ４１１では、解像度取得部３２２が、要求解像度αを取得する。この値は、カメラ間で優先度の基準を統一するため、事前に設定して、すべてのカメラで共通化しておくことが好ましく、この場合にはカメラパラメータと同時にネットワークを介して取得する。 FIG. 4B is a flowchart showing an example of processing in which the resolution acquisition unit 322 acquires the resolution <d> of the subject image in step S403 shown in FIG. 4A. In step S411, the resolution acquisition unit 322 acquires the required resolution α. This value is preferably set in advance and made common to all cameras in order to unify the priority standard among the cameras, and in this case, it is acquired through the network at the same time as the camera parameters.

ステップＳ４１２〜Ｓ４１４では、解像度取得部３２２が、視点と被写体との間の距離を求める。本実施形態では、すべての被写体（例えば人）が撮影平面上にあるものと仮定し、被写体画像において最も下の画素が、被写体が撮影平面に接地している点に対応するものとする。例えば、人であれば、その人の画像において最も下の画素が、足と撮影平面とが接触している点に対応する。 In steps S412 to S414, the resolution acquisition unit 322 obtains the distance between the viewpoint and the subject. In the present embodiment, it is assumed that all subjects (for example, people) are on the shooting plane, and the lowest pixel in the subject image corresponds to the point where the subject is grounded on the shooting plane. For example, in the case of a person, the lowest pixel in the person's image corresponds to the point where the foot and the shooting plane are in contact.

ステップＳ４１２〜Ｓ４１４の処理について説明する。ステップＳ４１２では、解像度取得部３２２が、被写体画像に係る代表画素として、被写体画像の内で最も下の画素を選択する。なお、最も下の画素が複数ある場合、それら複数の画素における中央の画素を代表画素として選択する。また、最も下の画素が複数ある場合、それら複数の画素の内で被写体領域の中央に最も近い画素を代表画素として選択するようにしてもよい。 The processing of steps S412 to S414 will be described. In step S412, the resolution acquisition unit 322 selects the lowest pixel in the subject image as the representative pixel related to the subject image. If there are a plurality of bottom pixels, the central pixel of the plurality of pixels is selected as the representative pixel. Further, when there are a plurality of lowest pixels, the pixel closest to the center of the subject area may be selected as the representative pixel among the plurality of pixels.

ステップＳ４１３では、解像度取得部３２２が、代表画素に対応する視線と撮影平面との交点＜Ｘ＞を求める。予めカメラパラメータが与えられており、撮影平面と視点の位置関係及びカメラ姿勢は既知であるため、交点を算出可能であることは自明である。ステップＳ４１４では、解像度取得部３２２が、視点位置＜Ｕ＞と交点＜Ｘ＞との距離を計算し、これを視点と被写体位置との距離＜Ｌ＞とする。 In step S413, the resolution acquisition unit 322 obtains an intersection <X> between the line of sight corresponding to the representative pixel and the shooting plane. Since camera parameters are given in advance and the positional relationship between the shooting plane and the viewpoint and the camera posture are known, it is obvious that the intersection can be calculated. In step S414, the resolution acquisition unit 322 calculates the distance between the viewpoint position and the intersection <X>, and sets this as the distance <L> between the viewpoint and the subject position.

図６に被写体位置の推定方法を説明する模式図を示す。被写体６０１に係る代表画素の位置から算出される代表視線６０４がカメラ６０２の視点位置＜Ｕ＞６０５から伸ばされ、これと撮影平面６０３との交点を＜Ｘ＞とする。この交点＜Ｘ＞が、撮影平面６０３に対する被写体の接地点６０６となることを仮定している。本実施形態では、被写体までの距離の推定に、被写体画像の画素位置から推定される３次元空間上の被写体位置を利用する。ただし、被写体までの距離を取得することができれば、他の方法でもよく、例えば距離カメラを用いてもよいし、その他の被写体位置取得センサを利用してもよい。 FIG. 6 shows a schematic diagram for explaining the method of estimating the subject position. The representative line-of-sight 604 calculated from the position of the representative pixel of the subject 601 is extended from the viewpoint position 605 of the camera 602, and the intersection of this and the shooting plane 603 is <X>. It is assumed that this intersection point <X> is the ground contact point 606 of the subject with respect to the photographing plane 603. In the present embodiment, the subject position in the three-dimensional space estimated from the pixel position of the subject image is used to estimate the distance to the subject. However, another method may be used as long as the distance to the subject can be acquired, for example, a distance camera may be used, or another subject position acquisition sensor may be used.

ステップＳ４１５では、解像度取得部３２２が、距離＜Ｌ＞の位置にある被写体の解像度＜ｄ＞を算出する。図７を用いて被写体の解像度の導出について説明する。焦点距離（レンズ面とセンサ面の距離）をＴ、レンズ面と被写体面の距離をＬ、センサの縦方向のサイズをＳ、被写体面の縦方向の画角のサイズをＨとする。この場合、倍率Ｍ（＝Ｈ／Ｓ）はＭ＝Ｌ／Ｔとなり、さらに、センサの縦方向の画素数をｈとすると、縦方向のセンサピッチＵはＵ＝Ｓ／ｈである。したがった、被写体の解像度＜ｄ＞は、ｄ＝Ｍ／Ｕで求められる。 In step S415, the resolution acquisition unit 322 calculates the resolution <d> of the subject located at the distance <L>. Derivation of the subject resolution will be described with reference to FIG. 7. Let T be the focal length (distance between the lens surface and the sensor surface), L be the distance between the lens surface and the subject surface, S be the vertical size of the sensor, and H be the vertical view angle size of the subject surface. In this case, the magnification M (=H/S) is M=L/T, and further, assuming that the number of pixels in the vertical direction of the sensor is h, the vertical sensor pitch U is U=S/h. Therefore, the resolution <d> of the subject is calculated by d=M/U.

なお、ここでは簡易なモデルを用いた解像度の算出方法について示したが、精度を必要とする場合、レンズの厚みや組み合わせ等を考慮した複雑なモデルから解像度を求めてもよい。また、縦方向のサイズと画素数の比（単位長あたりの画素数）から単位面積あたりの画素数を算出したが、縦横両方のサイズと画素数から、その積の比によって単位面積あたりの画素数を算出してもよい。さらには、撮影平面に対する代表視線の交点と視点との距離を被写体までの距離として算出したが、隣接する視点との視差から被写体までの距離を求めてもよいし、距離センサを使って距離を求めてもよいし、距離を求めることができればその他の方法でもよい。また、解像度は、撮像された被写体の単位面積あたりの画素数としているが、被写体の単位長さあたりの画素数としてもよい。以上で、解像度を取得する処理についての説明を終える。 Although a method of calculating the resolution using a simple model has been shown here, the resolution may be obtained from a complicated model in consideration of the thickness and combination of lenses when accuracy is required. In addition, the number of pixels per unit area was calculated from the ratio of the size in the vertical direction to the number of pixels (the number of pixels per unit length). The number may be calculated. Furthermore, although the distance between the viewpoint and the intersection of the representative line of sight with respect to the shooting plane is calculated as the distance to the subject, the distance to the subject may be calculated from the parallax between adjacent viewpoints, or the distance may be calculated using a distance sensor. It may be obtained, or another method may be used as long as the distance can be obtained. Further, the resolution is the number of pixels per unit area of the imaged subject, but may be the number of pixels per unit length of the subject. This is the end of the description of the process of acquiring the resolution.

次に、前述のようにして生成した優先度を考慮したデータ伝送パケットの生成、及びパケットの制御について説明する。図８は、符号量制御に関わる伝送部２２０内の構成例を示すブロック図である。図８においては、被写体画像を伝送する際のデータの流れを示しており、その説明に寄与しない他の構成の記載を省略している。 Next, generation of a data transmission packet in consideration of the priority generated as described above and packet control will be described. FIG. 8 is a block diagram showing a configuration example in the transmission unit 220 related to code amount control. FIG. 8 shows the flow of data when transmitting a subject image, and omits the description of other configurations that do not contribute to the description.

データ圧縮・伸張部２２１は、画像符号化部８１１及びメッセージ生成部８１２を有する。画像符号化部８１１は、画像処理部２３０から被写体画像及び分離パラメータを受け取り、分離パラメータに従って被写体画像を解像度成分に分離して符号化する。すなわち、画像符号化部８１１は、分離パラメータとして示されるウェーブレット変換回数分、被写体画像に対するウェーブレット変換を行って解像度成分に分離する。 The data compression/decompression unit 221 has an image coding unit 811 and a message generation unit 812. The image encoding unit 811 receives the subject image and the separation parameter from the image processing unit 230, separates the subject image into resolution components according to the separation parameter, and encodes the resolution component. That is, the image encoding unit 811 performs wavelet transform on the subject image for the number of wavelet transforms indicated as the separation parameter, and separates the image into resolution components.

メッセージ生成部８１２は、解像度成分及びその特定情報、優先度を受け取り、メッセージを生成して画像・音声伝送処理部２２４に出力する。ここで、特定情報は、対応する解像度成分が、どのカメラで撮像された、どの被写体画像のどの成分であるかを特定するための情報であり、例えば画像データの復号に利用される。メッセージ生成部８１２は、メッセージのデータ領域に解像度成分を格納し、ヘッダ領域に優先度と解像度成分の特定情報を格納する。 The message generation unit 812 receives the resolution component, its specific information, and the priority, generates a message, and outputs the message to the image/audio transmission processing unit 224. Here, the specific information is information for specifying which component of which subject image the corresponding resolution component is captured by which camera, and is used for decoding image data, for example. The message generator 812 stores the resolution component in the data area of the message, and stores the priority and the identification information of the resolution component in the header area.

画像・音声伝送処理部２２４は、メッセージ制御部８２１、メッセージ保持領域８２２、及びパケット生成部８２３を有する。メッセージ制御部８２１は、メッセージ生成部８１２により生成された優先度付き符号データをメッセージとして取り扱い、優先度を考慮してメッセージ保持領域８２２に保持されたメッセージを管理する。また、メッセージ制御部８２１は、メッセージをパケット生成部８２３に出力する。パケット生成部８２３は、入力されたメッセージを所定サイズ単位に分解してパケットを生成する。パケット生成部８２３により生成されたパケットはネットワークアダプタ２１０に出力される。 The image/sound transmission processing unit 224 includes a message control unit 821, a message holding area 822, and a packet generation unit 823. The message control unit 821 handles the coded data with priority generated by the message generation unit 812 as a message, and manages the message held in the message holding area 822 in consideration of the priority. The message control unit 821 also outputs the message to the packet generation unit 823. The packet generation unit 823 decomposes the input message into predetermined size units to generate packets. The packet generated by the packet generator 823 is output to the network adapter 210.

図９は、データ圧縮・伸張部２２１の処理例を示すフローチャートである。ステップＳ９０１では、データ圧縮・伸張部２２１の画像符号化部８１１が、画像処理部２３０から被写体画像及び分離パラメータを受け取る。ステップＳ９０２では、データ圧縮・伸張部２２１の画像符号化部８１１が、分離パラメータとして受け取ったウェーブレット変換回数に従い、被写体画像を解像度成分に分離して符号化する。ここでの符号化はＪＰＥＧ２０００を用いる。ＪＰＥＧ２０００はウェーブレット変換によって、画像データを解像度成分に分離することができる。符号化方式は、ＪＰＥＧ２０００のように画像データを解像度成分に分離可能なものであれば他の方式を利用してもよい。 FIG. 9 is a flowchart showing a processing example of the data compression/decompression unit 221. In step S901, the image encoding unit 811 of the data compression/decompression unit 221 receives the subject image and the separation parameter from the image processing unit 230. In step S902, the image encoding unit 811 of the data compression/decompression unit 221 separates and encodes the subject image into resolution components according to the number of wavelet transforms received as the separation parameter. JPEG2000 is used for encoding here. JPEG2000 can separate image data into resolution components by wavelet transformation. As the encoding method, another method may be used as long as the image data can be separated into resolution components such as JPEG2000.

ステップＳ９０３では、データ圧縮・伸張部２２１のメッセージ生成部８１２が、画像処理部２３０から優先度を受け取る。ステップＳ９０４では、データ圧縮・伸張部２２１のメッセージ生成部８１２が、ステップＳ９０２において生成した解像度成分に、対応する優先度を付加してメッセージを生成する。メッセージ生成部８１２は、データ領域に解像度成分を格納し、ヘッダ領域に優先度と解像度成分の特定情報を格納して、メッセージを生成する。ステップＳ９０５では、データ圧縮・伸張部２２１のメッセージ生成部８１２が、生成したメッセージを出力する。この図９に示した処理は、フレームから切り出されたすべての被写体画像に対して実行される。 In step S903, the message generation unit 812 of the data compression/decompression unit 221 receives the priority from the image processing unit 230. In step S904, the message generation unit 812 of the data compression/decompression unit 221 adds a corresponding priority to the resolution component generated in step S902 to generate a message. The message generation unit 812 stores the resolution component in the data area, stores the priority and the identification information of the resolution component in the header area, and generates the message. In step S905, the message generation unit 812 of the data compression/decompression unit 221 outputs the generated message. The process shown in FIG. 9 is executed for all the subject images cut out from the frame.

図１０は、メッセージ制御部８２１の処理例を示すフローチャートである。ステップＳ１００１では、データ圧縮・伸張部２２１からメッセージ制御部８２１に、メッセージ＜ｍ＞が入力される。ステップＳ１００２では、メッセージ制御部８２１が、メッセージ保持領域８２２の空き領域をチェックし、メッセージ保持領域８２２にメッセージ＜ｍ＞を保持できるか否かを判断する。メッセージ保持領域８２２にメッセージ＜ｍ＞を保持するための領域があると判断した場合（Ｙｅｓ）、ステップＳ１００３で、メッセージ制御部８２１が、メッセージ＜ｍ＞をメッセージ保持領域８２２に格納して処理を終了する。 FIG. 10 is a flowchart showing a processing example of the message control unit 821. In step S1001, the message <m> is input from the data compression/decompression unit 221 to the message control unit 821. In step S1002, the message control unit 821 checks the empty area of the message holding area 822 and determines whether the message <m> can be held in the message holding area 822. When it is determined that there is an area for holding the message <m> in the message holding area 822 (Yes), the message control unit 821 stores the message <m> in the message holding area 822 in step S1003 and executes the processing. finish.

一方、ステップＳ１００２において、メッセージ保持領域８２２にメッセージ＜ｍ＞を保持するための空き領域がないとメッセージ制御部８２１が判断した場合（Ｎｏ）、ステップＳ１００４へ進む。ステップＳ１００４では、メッセージ制御部８２１が、メッセージ保持領域８２２に格納されているメッセージ及びメッセージ＜ｍ＞の内で、メッセージ＜ｍ＞が最も優先度が高いメッセージであるか否かを判断する。すなわち、メッセージ制御部８２１が、メッセージ保持領域８２２に格納されているメッセージの中で最大の優先度とメッセージ＜ｍ＞の優先度とを比較する。 On the other hand, in step S1002, when the message control unit 821 determines that the message holding area 822 does not have a free area for holding the message <m> (No), the process proceeds to step S1004. In step S1004, the message control unit 821 determines whether the message <m> has the highest priority among the messages and the message <m> stored in the message holding area 822. That is, the message control unit 821 compares the highest priority among the messages stored in the message holding area 822 with the priority of the message <m>.

ステップＳ１００４において、メッセージ＜ｍ＞が最も優先度が高いメッセージであると判断した場合（Ｙｅｓ）、ステップＳ１００５で、メッセージ制御部８２１が、メッセージ＜ｍ＞をパケット生成部８２３に出力して処理を終了する。一方、ステップＳ１００４において、メッセージ＜ｍ＞が最も優先度が高いメッセージではないとメッセージ制御部８２１が判断した場合（Ｎｏ）、ステップＳ１００６へ進む。すなわち、メッセージ保持領域８２２に格納されているメッセージの内に最も優先度が高いメッセージがあるとメッセージ制御部８２１が判断した場合、ステップＳ１００６へ進む。ステップＳ１００６では、メッセージ制御部８２１が、メッセージ保持領域８２２の中で優先度が最大のメッセージをパケット生成部８２３に出力して、ステップＳ１００２に戻る。 When it is determined in step S1004 that the message <m> is the message with the highest priority (Yes), the message control unit 821 outputs the message <m> to the packet generation unit 823 in step S1005, and the processing is performed. finish. On the other hand, if the message control unit 821 determines in step S1004 that the message <m> is not the highest priority message (No), the process advances to step S1006. That is, when the message control unit 821 determines that the message stored in the message holding area 822 has the highest priority message, the process proceeds to step S1006. In step S1006, the message control unit 821 outputs the message with the highest priority in the message holding area 822 to the packet generation unit 823, and the process returns to step S1002.

このようにしてメッセージ制御部８２１は、順次入力されるメッセージをメッセージ保持領域８２２に適宜格納しながら、優先度が高いメッセージを選別して出力する役割を担う。メッセージ保持領域８２２のサイズが小さいと、メッセージの到着順序によっては所望の選別結果を得られないため、十分なサイズを保持するものとする。 In this way, the message control unit 821 plays a role of selecting and outputting a message having a high priority while appropriately storing sequentially input messages in the message holding area 822. If the size of the message holding area 822 is small, a desired selection result cannot be obtained depending on the order of arrival of the messages, so a sufficient size is held.

図１１は、パケット生成部８２３の処理例を示すフローチャートである。ステップＳ１１０１では、メッセージ制御部８２１からパケット生成部８２３に、メッセージ＜ｓ＞が入力される。ステップＳ１１０２では、パケット生成部８２３が、入力されたメッセージ＜ｓ＞を現在生成中のパケットに挿入できるか否かを判断する。パケットは所定のサイズが予め規定されており、パケット生成部８２３は、メッセージ＜ｓ＞を入力した場合にそのサイズを超えるかどうかを確認する。 FIG. 11 is a flowchart showing a processing example of the packet generation unit 823. In step S1101, the message <s> is input from the message controller 821 to the packet generator 823. In step S1102, the packet generation unit 823 determines whether the input message <s> can be inserted into the packet currently being generated. The packet has a predetermined size defined in advance, and the packet generation unit 823 confirms, when the message <s> is input, whether or not the size is exceeded.

ステップＳ１１０２においてメッセージ＜ｓ＞を現在生成中のパケットに挿入できると判断した場合（Ｙｅｓ）、ステップＳ１１０３で、パケット生成部８２３が、メッセージ＜ｓ＞をパケットに挿入して終了する。一方、ステップＳ１１０２においてメッセージ＜ｓ＞を現在生成中のパケットに挿入できないとパケット生成部８２３が判断した場合（Ｎｏ）、ステップＳ１１０４へ進む。ステップＳ１１０４では、パケット生成部８２３が、現在生成中のパケットの生成を完了して、パケットのヘッダ領域にパケット内で最大の優先度を書き込み、パケット生成部８２３にパケットを出力する。その後、ステップＳ１１０５で、パケット生成部８２３が、新規パケットを生成する。 When it is determined in step S1102 that the message <s> can be inserted into the packet currently being generated (Yes), the packet generation unit 823 inserts the message <s> into the packet and ends in step S1103. On the other hand, when the packet generation unit 823 determines in step S1102 that the message <s> cannot be inserted into the currently generated packet (No), the process proceeds to step S1104. In step S1104, the packet generation unit 823 completes the generation of the packet currently being generated, writes the highest priority in the packet in the header area of the packet, and outputs the packet to the packet generation unit 823. After that, in step S1105, the packet generation unit 823 generates a new packet.

以上で、メッセージ制御部８２１及びパケット生成部８２３の処理の説明を終える。前述した図１０及び図１１に示した処理は、フレームから取得されたすべての被写体画像に対して実行される。ただし、本実施形態においては、予め１フレームあたりのデータ量の上限値ＭＡＸが設定されている。１フレームあたりのデータ量の上限値ＭＡＸと１パケットあたりのサイズＡから、１フレームあたりに出力可能なパケット数Ｒが、次の式により求められる。 This is the end of the description of the processing of the message control unit 821 and the packet generation unit 823. The above-described processing shown in FIGS. 10 and 11 is executed for all the subject images acquired from the frame. However, in the present embodiment, the upper limit value MAX of the data amount per frame is set in advance. From the maximum value MAX of the amount of data per frame and the size A per packet, the number R of packets that can be output per frame is calculated by the following formula.

したがって、図１１に示したステップＳ１１０４でのパケットの出力がＲ回に達した時点で、現フレームの処理は終了される。また、すべての被写体画像の処理が完了しても出力したパケット数がＲに達していない場合、メッセージ制御部８２１は、メッセージ保持領域８２２に格納されたメッセージを優先度の高いものから順にパケット生成部８２３に出力する。この処理は、出力パケット数がＲに達するか、あるいはすべてのメッセージを出力するまで繰り返される。以上の処理により、優先度付きの解像度成分のうち、大きな優先度を持つものを優先的に伝送することができる。なお、メッセージ保持領域８２２は、フレーム単位で初期化される。 Therefore, when the output of the packet in step S1104 shown in FIG. 11 reaches R times, the processing of the current frame is ended. In addition, when the number of output packets does not reach R even after the processing of all the subject images is completed, the message control unit 821 generates the messages stored in the message holding area 822 in descending order of priority. It is output to the unit 823. This process is repeated until the number of output packets reaches R or all messages are output. Through the above processing, it is possible to preferentially transmit a resolution component having a high priority among resolution components with priorities. The message holding area 822 is initialized in units of frames.

第１の実施形態によれば、前景領域（被写体画像）の画像データを解像度成分に分離し、データ量が伝送帯域を超える場合には、解像度に応じた優先度に従って符号データを破棄する。これにより、意図しない画像情報の欠損を防ぎ、前景となる被写体の位置に依らず、仮想視点コンテンツの画像品質を安定させることができる。したがって、カメラからの距離が異なる複数の被写体が含まれていても距離が遠い被写体の画質劣化を抑制し画像のデータ量を適切に制御することができる。 According to the first embodiment, the image data of the foreground area (subject image) is separated into resolution components, and when the data amount exceeds the transmission band, the code data is discarded according to the priority according to the resolution. This makes it possible to prevent unintended loss of image information and stabilize the image quality of the virtual viewpoint content regardless of the position of the subject in the foreground. Therefore, even if a plurality of subjects having different distances from the camera are included, it is possible to suppress the image quality deterioration of the subject having a long distance and to appropriately control the data amount of the image.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第１の実施形態では、カメラ毎に独立して符号量制御を行う。しかしながら、カメラ毎に切り出された被写体の数や距離が大きく異なる場合、それぞれのカメラで品質の保証に必要とされる符号量が異なるため、システム全体で符号量制御を行うことが望ましい。例えば、切り出された被写体の数が多いほど必要な符号量が大きくなる。そこで、第２の実施形態では、カメラアダプタ１１５が伝送帯域の飽和度を監視し、システム全体として符号量（データ量）を制御することで、画像の品質をさらに高める。以下では、第２の実施形態における画像処理システムにおいて、前述した第１の実施形態と異なる点についてのみ説明する。 (Second embodiment)
Next, a second embodiment of the present invention will be described. In the first embodiment, the code amount control is independently performed for each camera. However, when the number of subjects cut out for each camera and the distance are greatly different, the code amount required for quality assurance differs for each camera, so it is desirable to control the code amount in the entire system. For example, the larger the number of cut-out subjects, the larger the required code amount. Therefore, in the second embodiment, the camera adapter 115 monitors the saturation of the transmission band and controls the code amount (data amount) of the entire system, thereby further improving the image quality. In the following, in the image processing system according to the second embodiment, only the points different from the above-described first embodiment will be described.

図１２は、第２の実施形態における伝送部２２０内の構成例を示すブロック図である。図１２において、図８に示したブロックに対応するブロックには同一の符号を付し、重複する説明は省略する。図１２に示すように、第２の実施形態における伝送部２２０において、画像・音声伝送処理部２２４は、メッセージ制御部８２１、メッセージ保持領域８２２、及びパケット生成部８２３に加え、パケット制御部８２４及びパケット保持領域８２５を有する。このような構成により、デイジーチェーンの上流から伝送されるパケットと自身が生成するパケットの優先度を鑑みて、システム全体で画像の品質劣化を低減するための符号量制御を行う。 FIG. 12 is a block diagram showing a configuration example inside the transmission unit 220 in the second embodiment. 12, the blocks corresponding to the blocks shown in FIG. 8 are designated by the same reference numerals, and the duplicated description will be omitted. As shown in FIG. 12, in the transmission unit 220 according to the second embodiment, the image/sound transmission processing unit 224 includes a packet control unit 824, a message control unit 821, a message holding area 822, and a packet generation unit 823. It has a packet holding area 825. With such a configuration, code amount control for reducing image quality deterioration in the entire system is performed in consideration of the priorities of the packet transmitted from the upstream of the daisy chain and the packet generated by itself.

第２の実施形態において、パケット生成部８２３は、生成したパケットをパケット制御部８２４に出力する。パケット制御部８２４は、パケット生成部８２３及びネットワークアダプタ２１０からそれぞれ自視点とネットワーク上のパケットを受け取り、優先度に応じたパケットの破棄と、ネットワークアダプタ２１０へのパケットの出力を行う。パケット保持領域８２５は、パケット制御部８２４が受け取ったパケットを適宜保持するための領域である。 In the second embodiment, the packet generator 823 outputs the generated packet to the packet controller 824. The packet control unit 824 receives packets from its own viewpoint and the network from the packet generation unit 823 and the network adapter 210, discards the packets according to the priority, and outputs the packets to the network adapter 210. The packet holding area 825 is an area for appropriately holding the packet received by the packet control unit 824.

図１３（Ａ）及び図１３（Ｂ）は、パケット制御部８２４の処理例を示すフローチャートである。図１３（Ａ）には、パケットの優先度に応じた破棄を行う処理の例を示している。ステップＳ１３０１では、パケット制御部８２４にパケット＜ｐ＞が入力される。ステップＳ１３０２では、パケット制御部８２４が、パケット保持領域８２５の空き領域をチェックし、パケット保持領域８２５にパケット＜ｐ＞を保持できるか否かを判断する。パケット保持領域８２５にパケット＜ｐ＞を保持するための領域があると判断した場合（Ｙｅｓ）、ステップＳ１３０３で、パケット制御部８２４が、パケット＜ｐ＞をパケット保持領域８２５に格納して処理を終了する。 13A and 13B are flowcharts showing a processing example of the packet control unit 824. FIG. 13A shows an example of processing for discarding packets according to the priority of the packet. In step S1301, the packet is input to the packet control unit 824. In step S1302, the packet control unit 824 checks the free area of the packet holding area 825 and determines whether or not the packet can be held in the packet holding area 825. When it is determined that there is an area for holding the packet in the packet holding area 825 (Yes), the packet control unit 824 stores the packet in the packet holding area 825 in step S1303 and executes the processing. finish.

一方、ステップＳ１３０２において、パケット保持領域８２５にパケット＜ｐ＞を保持するための空き領域がないとパケット制御部８２４が判断した場合（Ｎｏ）、ステップＳ１３０４へ進む。ステップＳ１３０４では、パケット制御部８２４が、パケット保持領域８２５に格納されているパケット及びパケット＜ｐ＞の内で、パケット＜ｐ＞が最も優先度が低いパケットであるか否かを判断する。すなわち、パケット制御部８２４が、パケット保持領域８２５に格納されているメッセージの中で最も低い優先度とパケット＜ｐ＞の優先度とを比較する。 On the other hand, in step S1302, when the packet control unit 824 determines that the packet holding area 825 has no free area for holding the packet (No), the process proceeds to step S1304. In step S1304, the packet control unit 824 determines whether the packet has the lowest priority among the packets and the packet stored in the packet holding area 825. That is, the packet control unit 824 compares the lowest priority of the messages stored in the packet holding area 825 with the priority of the packet .

ステップＳ１３０４において、パケット＜ｐ＞が最も優先度が低いメッセージであると判断した場合（Ｙｅｓ）、ステップＳ１３０５で、パケット制御部８２４が、パケット＜ｐ＞を破棄して処理を終了する。一方、ステップＳ１３０４において、パケット＜ｐ＞が最も優先度が低いメッセージではないとパケット制御部８２４が判断した場合（Ｎｏ）、ステップＳ１３０６へ進む。すなわち、パケット保持領域８２５に格納されているパケットの内に最も優先度が低いメッセージがあるとパケット制御部８２４が判断した場合、ステップＳ１３０６へ進む。ステップＳ１３０６では、パケット制御部８２４が、パケット保持領域８２５の中で優先度が最も低いパケットを破棄して、ステップＳ１３０２に戻る。 When it is determined in step S1304 that the packet is the message with the lowest priority (Yes), the packet control unit 824 discards the packet and ends the process in step S1305. On the other hand, if the packet control unit 824 determines in step S1304 that the packet is not the message with the lowest priority (No), the process advances to step S1306. That is, when the packet control unit 824 determines that the message with the lowest priority is included in the packets stored in the packet holding area 825, the process proceeds to step S1306. In step S1306, the packet control unit 824 discards the packet with the lowest priority in the packet holding area 825, and returns to step S1302.

図１３（Ｂ）には、ネットワークを流れるパケットと自視点のデータから生成したパケットをパケット保持領域８２５に一端格納し、優先度に応じてネットワークにパケットを伝送する処理の例を示している。ステップＳ１３１１では、パケット制御部８２４が、ネットワークアダプタ２１０へのパケットの出力可否を確認し、出力が可能になるまで待機する。出力が可能になったと判断すると、ステップＳ１３１２で、パケット制御部８２４が、パケット保持領域８２５の中で最も優先度が高いパケットをネットワークアダプタ２１０に出力する。 FIG. 13B shows an example of a process of temporarily storing a packet flowing through the network and a packet generated from the data of its own viewpoint in the packet holding area 825 and transmitting the packet to the network according to the priority. In step S1311, the packet control unit 824 confirms whether or not a packet can be output to the network adapter 210, and waits until output becomes possible. When it is determined that the output is possible, the packet control unit 824 outputs the packet with the highest priority in the packet holding area 825 to the network adapter 210 in step S1312.

本実施形態では、パケット制御部８２４及びパケット保持領域８２５によって実現されるパケット破棄を制御する機能を、画像・音声伝送処理部２２４の一部として有する例を示した。ただし、すべての画像・音声伝送処理部２２４がパケット破棄の制御機能を有する必要はなく、一部の画像・音声伝送処理部２２４がパケット破棄の制御機能を有していてもよい。また、パケット破棄の制御機能が画像・音声伝送処理部２２４から独立した装置として成り、ネットワーク上に単体で設置される構成としてもよい。ここで説明したどの形態においても、パケット制御部８２４及びパケット保持領域８２５の機能は変わらない。 In the present embodiment, an example is shown in which the packet control unit 824 and the packet holding area 825 have a function of controlling packet discard as a part of the image/voice transmission processing unit 224. However, it is not necessary that all the image/sound transmission processing units 224 have the packet discard control function, and some of the image/sound transmission processing units 224 may have the packet discard control function. The packet discard control function may be a device independent of the image/voice transmission processing unit 224, and may be installed alone on the network. The functions of the packet control unit 824 and the packet holding area 825 do not change in any of the forms described here.

また、本実施形態では被写体画像を解像度成分に分離する処理を、ウェーブレット変換ではなく、ＤＣＴ変換によって実現する。まず、被写体画像を８×８のブロックに分割し、ブロック単位にＤＣＴ変換を行う。このとき得られるＤＣＴ係数も８×８のブロックで得られ、左上ほど低周波の情報を表す。例えば、左上の４×４のブロックを取り出して逆変換を行えば、縦横方向のそれぞれにおいて解像度が半分の画像を得ることができる。図１４は、８×８のブロック単位でＤＣＴ変換した場合のＤＣＴ係数の解像度成分を表している。同じ番号を持つ係数は１つの解像度成分である。ブロックサイズをＫ×Ｋとし、解像度成分を表す番号をｉ（ｉ＜Ｋ）とした時、解像度成分の優先度Ｐを以下のように計算する。 Further, in the present embodiment, the processing of separating the subject image into resolution components is realized by DCT transformation instead of wavelet transformation. First, the subject image is divided into 8×8 blocks, and DCT conversion is performed in block units. The DCT coefficient obtained at this time is also obtained in an 8×8 block, and the lower left side indicates lower frequency information. For example, if the upper left 4×4 block is taken out and inverse transformation is performed, an image having half the resolution in each of the vertical and horizontal directions can be obtained. FIG. 14 shows the resolution component of the DCT coefficient when DCT conversion is performed in 8×8 block units. Coefficients having the same number are one resolution component. When the block size is K×K and the number representing the resolution component is i (i<K), the priority P of the resolution component is calculated as follows.

これは、元の解像度に対するｉ番目の解像度成分が表す解像度の画素数の割合を表した指標となっている。本実施形態では、前述の解像度成分を取得するのにＪＰＥＧのアルゴリズムを流用する。ＪＰＥＧの画像符号化アルゴリズムは、大きく分けて、ブロック分割、ＤＣＴ変換、及び変換係数の符号化の３ステップで構成される、ここでは、変換係数の符号化に修正を加える。ＪＰＥＧでは変換係数をジグザグスキャン順に量子化してハフマン符号化する。本実施形態では、変換係数を図１４に示したように解像度成分に分離し、解像度成分単位で変換係数を量子化して符号化する。ここで得られた符号データを解像度成分として取り扱う。ただし、解像度成分の取得方法は、これに限らず、ブロック分割は任意の大きさでよく、変換係数の符号化方法もハフマン符号化ではなく算術符号化等の他のエントロピー符号化アルゴリズムを用いてもよい。また、解像度成分の分離方法として、本実施形態ではＤＣＴ変換の例を示したが、同じように周波数成分に分離可能なものであればよく、例えばアダマール変換を用いてもよい。 This is an index showing the ratio of the number of pixels of the resolution represented by the i-th resolution component to the original resolution. In this embodiment, the JPEG algorithm is used to acquire the above-described resolution component. The image coding algorithm of JPEG is roughly divided into three steps of block division, DCT transform, and coding of transform coefficients. Here, the coding of transform coefficients is modified. In JPEG, transform coefficients are quantized in zigzag scan order and Huffman coded. In the present embodiment, the transform coefficient is separated into resolution components as shown in FIG. 14, and the transform coefficient is quantized and coded in resolution component units. The code data obtained here is treated as a resolution component. However, the resolution component acquisition method is not limited to this, and the block division may be of any size, and the transform coefficient coding method may also use other entropy coding algorithms such as arithmetic coding instead of Huffman coding. Good. Further, as the resolution component separating method, the example of the DCT transform is shown in the present embodiment, but any method that can similarly be separated into frequency components may be used, and for example, Hadamard transform may be used.

第２の実施形態によれば、第１の実施形態と同様の効果が得られるとともに、ネットワーク全体で符号量を制御することで、カメラ間で必要な符号量の偏りを吸収し、システム全体として得られる画像情報の品質を高めることができる。 According to the second embodiment, the same effect as that of the first embodiment can be obtained, and by controlling the code amount in the entire network, the deviation of the required code amount between the cameras can be absorbed, and the entire system can be realized. The quality of the obtained image information can be improved.

（その他の実施形態）
本実施形態を構成する各装置のハードウェア構成について説明する。前述した実施形態では、カメラアダプタ１１５がＦＰＧＡ及び／又はＡＳＩＣ等のハードウェアを実装し、これらのハードウェアによって、前述した各処理を実行する場合の例を説明した。それはセンサシステム１１０内の各種装置や、フロントエンドサーバ１２１、データベース１２２、バックエンドサーバ１２３、及びコントローラ１３０についても同様である。しかしながら、これらの装置のうち、少なくとも何れかが、例えばＣＰＵ、ＧＰＵ、ＤＳＰ等を用い、ソフトウェア処理によって本実施形態の処理を実行するようにしても良い。 (Other embodiments)
The hardware configuration of each device that constitutes this embodiment will be described. In the above-described embodiment, an example has been described in which the camera adapter 115 is equipped with hardware such as FPGA and/or ASIC, and the above-described processing is executed by these hardware. The same applies to various devices in the sensor system 110, the front-end server 121, the database 122, the back-end server 123, and the controller 130. However, at least one of these devices may use, for example, a CPU, GPU, DSP, or the like to execute the processing of the present embodiment by software processing.

図１５は、図２に示した機能構成をソフトウェア処理によって実現するための、カメラアダプタ１１５のハードウェア構成を示すブロック図である。なお、フロントエンドサーバ１２１、データベース１２２、バックエンドサーバ１２３、制御ステーション１３１、仮想カメラ操作ＵＩ１３２、及びエンドユーザ端末１５０等の装置も、図１５のハードウェア構成となりうる。カメラアダプタ１１５は、ＣＰＵ１５０１、ＲＯＭ１５０２、ＲＡＭ１５０３、補助記憶装置１５０４、表示部１５０５、操作部１５０６、通信部１５０７、及びバス１５０８を有する。 FIG. 15 is a block diagram showing a hardware configuration of the camera adapter 115 for realizing the functional configuration shown in FIG. 2 by software processing. Devices such as the front-end server 121, the database 122, the back-end server 123, the control station 131, the virtual camera operation UI 132, and the end user terminal 150 can also have the hardware configuration of FIG. The camera adapter 115 has a CPU 1501, a ROM 1502, a RAM 1503, an auxiliary storage device 1504, a display unit 1505, an operation unit 1506, a communication unit 1507, and a bus 1508.

ＣＰＵ１５０１は、ＲＯＭ１５０２やＲＡＭ１５０３に格納されているコンピュータプログラムやデータを用いてカメラアダプタ１１５の全体を制御する。ＲＯＭ１５０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ１５０３は、補助記憶装置１５０４から供給されるプログラムやデータ、及び通信部１５０７を介して外部から供給されるデータ等を一時記憶する。補助記憶装置１５０４は、例えばハードディスクドライブ等で構成され、静止画や動画等のコンテンツデータを記憶する。 The CPU 1501 controls the entire camera adapter 115 using computer programs and data stored in the ROM 1502 and the RAM 1503. The ROM 1502 stores programs and parameters that do not need to be changed. The RAM 1503 temporarily stores programs and data supplied from the auxiliary storage device 1504, data externally supplied via the communication unit 1507, and the like. The auxiliary storage device 1504 is composed of, for example, a hard disk drive or the like, and stores content data such as still images and moving images.

表示部１５０５は、例えば液晶ディスプレイ等で構成され、ユーザがカメラアダプタ１１５を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。操作部１５０６は、例えばキーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１５０１に入力する。通信部１５０７は、カメラ１１２やフロントエンドサーバ１２１等の外部の装置と通信を行う。例えば、カメラアダプタ１１５が外部の装置と有線で接続される場合には、ＬＡＮケーブル等が通信部１５０７に接続される。なお、カメラアダプタ１１５が外部の装置と無線通信する機能を有する場合、通信部１５０７はアンテナを備える。バス１５０８は、カメラアダプタ１１５の各部を繋いで情報を伝達する。 The display unit 1505 is composed of, for example, a liquid crystal display or the like, and displays a GUI (Graphical User Interface) for a user to operate the camera adapter 115. The operation unit 1506 is composed of, for example, a keyboard, a mouse, etc., and inputs various instructions to the CPU 1501 in response to an operation by the user. The communication unit 1507 communicates with external devices such as the camera 112 and the front end server 121. For example, when the camera adapter 115 is connected to an external device by wire, a LAN cable or the like is connected to the communication unit 1507. Note that when the camera adapter 115 has a function of wirelessly communicating with an external device, the communication unit 1507 includes an antenna. The bus 1508 connects the respective units of the camera adapter 115 and transmits information.

なお、例えばカメラアダプタ１１５の処理のうち一部をＦＰＧＡで行い、別の一部の処理を、ＣＰＵを用いたソフトウェア処理によって実現するようにしてもよい。また、本実施形態では表示部１５０５と操作部１５０６はカメラアダプタ１１５の内部に存在するが、カメラアダプタ１１５は表示部１５０５及び操作部１５０６の少なくとも一方を備えていなくてもよい。また、表示部１５０５及び操作部１５０６の少なくとも一方がカメラアダプタ１１５の外部に別の装置として存在していて、ＣＰＵ１５０１が、表示部１５０５を制御する表示制御部、及び操作部１５０６を制御する操作制御部として動作してもよい。 Note that, for example, a part of the processing of the camera adapter 115 may be performed by the FPGA, and another part of the processing may be realized by software processing using a CPU. Further, although the display unit 1505 and the operation unit 1506 are present inside the camera adapter 115 in this embodiment, the camera adapter 115 may not include at least one of the display unit 1505 and the operation unit 1506. Further, at least one of the display unit 1505 and the operation unit 1506 is present as a separate device outside the camera adapter 115, and the CPU 1501 controls the display unit 1505 and the operation control unit 1506. You may operate as a part.

また、前述の実施形態は、画像処理システム１００が競技場やコンサートホール等の施設に設置される場合の例を中心に説明した。施設の他の例としては、例えば、遊園地、公園、競馬場、競輪場、カジノ、プール、スケートリンク、スキー場、ライブハウス等がある。また、各種施設で行われるイベントは、屋内で行われるものであっても屋外で行われるものであってもよい。また、本実施形態における施設は、一時的に（期間限定で）建設される施設も含む。 Further, the above-described embodiment has been described focusing on an example in which the image processing system 100 is installed in a facility such as a stadium or a concert hall. Other examples of facilities include, for example, amusement parks, parks, racetracks, bicycle racetracks, casinos, pools, skating rinks, ski areas, live houses, and the like. Further, the event held at various facilities may be held indoors or outdoors. Further, the facility in the present embodiment also includes a facility which is temporarily (for a limited time) constructed.

本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. It can also be realized by the processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 It should be noted that each of the above-described embodiments is merely an example of the embodiment in carrying out the present invention, and the technical scope of the present invention should not be limitedly interpreted by these. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１００：画像処理システム１１０：センサシステム１１２：カメラ１１４：外部センサ１１５：カメラアダプタ１２０：画像コンピューティングサーバ１２１：フロントエンドサーバ１２２：データベース１２３：バックエンドサーバ１３０：コントローラ１３１：制御ステーション１３２：仮想カメラ操作ＵＩ１５０：エンドユーザ端末２１０：ネットワークアダプタ２１１：データ送受信部２２０：伝送部２２１：データ圧縮・伸張部２２４：画像・音声伝送処理部２３０：画像処理部２３１：前景背景分離部２３２：優先度生成部２３３：キャリブレーション制御部２４０：外部機器制御部３１１：前景分離部３２１：カメラパラメータ受信部３２２：解像度取得部３２３：分離パラメータ算出部３２４：優先度算出部８１１：画像符号化部８１２：メッセージ生成部８２１：メッセージ制御部８２２：メッセージ保持領域８２３：パケット生成部８２４：パケット制御部８２５：パケット保持領域 100: Image Processing System 110: Sensor System 112: Camera 114: External Sensor 115: Camera Adapter 120: Image Computing Server 121: Front End Server 122: Database 123: Back End Server 130: Controller 131: Control Station 132: Virtual Camera Operation UI 150: End user terminal 210: Network adapter 211: Data transmission/reception unit 220: Transmission unit 221: Data compression/decompression unit 224: Image/audio transmission processing unit 230: Image processing unit 231: Foreground/background separation unit 232: Priority Generation unit 233: Calibration control unit 240: External device control unit 311: Foreground separation unit 321: Camera parameter reception unit 322: Resolution acquisition unit 323: Separation parameter calculation unit 324: Priority calculation unit 811: Image coding unit 812: Message generation unit 821: Message control unit 822: Message holding area 823: Packet generation unit 824: Packet control unit 825: Packet holding area

Claims

Image acquisition means for obtaining a subject image relating to the subject from the input image;
Resolution acquisition means for acquiring the resolution of the subject image,
Processing means for separating the subject image into one or more resolution components;
And a priority calculation means for calculating the priority for the separated resolution components,
When the resolution of the subject represented by the resolution component is a second resolution smaller than the first resolution, the priority calculation means calculates a priority higher than the priority calculated in the first resolution. An image processing device characterized by.

The image according to claim 1, wherein the priority calculation unit calculates a priority for the resolution component based on a resolution of the subject image and a parameter related to the separation of the subject image in the processing unit. Processing equipment.

The image processing apparatus according to claim 1, wherein the priority calculated by the priority calculating unit is higher as the distance to the subject is larger.

The priority calculation means calculates a high priority as the resolution of the subject image decreases,
The image processing apparatus according to claim 1, wherein the resolution acquisition unit sets a smaller value to the resolution of the subject image as the distance to the subject increases.

A parameter receiving unit for acquiring the focal length and the sensor pitch of the subject image,
The priority calculation means calculates a high priority as the resolution of the subject image decreases,
The image according to any one of claims 1 to 3, wherein the resolution acquisition unit sets a larger value to the resolution of the subject image as the focal length is larger or the sensor pitch is smaller. Processing equipment.

It has a parameter calculation means for setting a parameter related to the separation of the subject image in the processing means so that the resolution of the subject image can be separated into a smaller resolution as the resolution of the subject image increases. The image processing apparatus according to claim 1.

7. The image processing apparatus according to claim 6, wherein the parameter calculation unit determines the parameter based on a ratio between a resolution of the subject image and a required resolution for the subject in the image.

The image processing apparatus according to claim 1, further comprising an encoding unit that encodes the resolution component.

Message generating means for generating a message by adding the priority corresponding to the resolution component,
Packet generating means for storing a plurality of the generated messages and generating a packet,
9. The image processing according to claim 1, further comprising a message control unit that outputs the message generated by the message generation unit to the packet generation unit according to the added priority. apparatus.

The packet generation means stores the highest priority among the messages stored in the packet as the priority of the packet in the packet,
10. The image processing apparatus according to claim 9, further comprising a packet control unit that discards the packet having the lowest priority when the input packet cannot be held in the holding area.

The image processing apparatus according to claim 1, wherein the resolution acquisition unit sets the number of pixels per unit area of the subject as the resolution of the subject image.

The image processing apparatus according to claim 1, wherein the resolution acquisition unit sets the number of pixels per unit length of the subject as the resolution of the subject image.

A plurality of image processing devices according to any one of claims 1 to 12,
An image processing system comprising: an image generation device that generates a virtual viewpoint image based on image data acquired from a plurality of image processing devices.

An image acquisition step of obtaining a subject image relating to the subject from the input image,
A resolution acquisition step of acquiring the resolution of the subject image,
A processing step of separating the subject image into one or more resolution components;
A priority calculation step of calculating a priority for the separated resolution components,
In the priority calculation step, when the resolution of the subject represented by the resolution component is the second resolution smaller than the first resolution, a priority higher than the priority calculated in the first resolution is calculated. An image processing method characterized by:

An image acquisition step of obtaining a subject image relating to the subject from the input image;
A resolution acquisition step of acquiring the resolution of the subject image,
A processing step of separating the subject image into one or more resolution components;
Causing the computer to execute a priority calculation step of calculating a priority for the separated resolution components,
Further, in the priority calculation step, when the resolution of the subject represented by the resolution component is the second resolution smaller than the first resolution, a priority higher than the priority calculated in the first resolution is calculated. A program that causes a computer to perform processing.