JP2020004053A

JP2020004053A - Generation apparatus, generation method, and program

Info

Publication number: JP2020004053A
Application number: JP2018122424A
Authority: JP
Inventors: 知頼岩尾; Tomoyori IWAO
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-01-09
Anticipated expiration: 2038-06-27
Also published as: JP7278720B2

Abstract

To acquire high precision three-dimensional shape data regardless of a pattern on a subject surface.SOLUTION: A generation apparatus includes: first acquisition means of acquiring multiple captured images acquired by imaging a subject surface from multiple directions; second acquisition means of acquiring information indicating position and shape of a pattern on the subject surface; determination means of determining three-dimensional position information for each area in the subject surface, on the basis of the captured images acquired by the first acquisition means and the information indicating position and shape of the pattern acquired by the second acquisition means; and generation means of generating three-dimensional shape data corresponding to the subject surface on the basis of the three-dimensional position information of each area determined by the determination means.SELECTED DRAWING: Figure 5

Description

本発明は、３次元形状データを生成する生成装置、生成方法及びプログラムに関する。 The present invention relates to a generation device, a generation method, and a program for generating three-dimensional shape data.

地形分析やストリートビューイングなど様々な用途のために、カメラの撮影画像を解析して、地形の３次元位置情報（３次元形状データ）を取得する需要がある。特許文献１では、ステレオマッチング法を用いて、地形を複数の方向から撮影した航空写真から３次元形状データの取得を行っている。 For various uses such as terrain analysis and street viewing, there is a demand to analyze images captured by a camera and acquire three-dimensional position information (three-dimensional shape data) of the terrain. In Patent Literature 1, three-dimensional shape data is obtained from aerial photographs obtained by photographing terrain from a plurality of directions using a stereo matching method.

国際公開ＷＯ０８／１５２７４０号明細書International Publication WO08 / 152740

しかし、特許文献１のようなステレオマッチング法では、画像間の画素毎のマッチングを利用しているため、被写体面上の模様によっては、精度よく３次元形状データを取得することができない場合がある。例えば、競技フィールドに描かれたラインのように色がほぼ同じでかつ、ある方向に延伸するような模様が描かれた被写体面については、上記マッチングの精度が上がらず、被写体面の３次元形状データを精度よく取得することはできない。 However, in the stereo matching method as disclosed in Patent Document 1, since matching between pixels between images is used, it may not be possible to accurately obtain three-dimensional shape data depending on a pattern on a subject surface. . For example, for a subject surface having a pattern that is almost the same in color and extends in a certain direction, such as a line drawn on a competition field, the accuracy of the above-described matching is not improved, and the three-dimensional shape of the subject surface is not improved. Data cannot be acquired with high accuracy.

本発明では、被写体面上の模様によらず、高精度の３次元形状データを取得することを目的とする。 An object of the present invention is to acquire high-precision three-dimensional shape data regardless of a pattern on a subject surface.

本発明の生成装置は、被写体面を複数の方向から撮影して取得される複数の撮影画像を取得する第１取得手段と、前記被写体面における模様の位置及び形状を示す情報を取得する第２取得手段と、前記第１取得手段により取得された前記複数の撮影画像と、前記第２取得手段により取得された前記模様の位置及び形状を示す情報とに基づき、前記被写体面における複数の領域毎に３次元位置情報を決定する決定手段と、前記決定手段により決定された前記領域毎の３次元位置情報に基づき、前記被写体面に対応する３次元形状データを生成する生成手段と、を有することを特徴とする。 The generating apparatus according to the present invention includes a first obtaining unit configured to obtain a plurality of captured images obtained by capturing an image of a subject surface from a plurality of directions, and a second obtaining unit configured to obtain information indicating a position and a shape of a pattern on the subject surface. Acquiring means, based on the plurality of captured images acquired by the first acquiring means, and information indicating the position and shape of the pattern acquired by the second acquiring means, for each of a plurality of regions on the object plane Determining means for determining three-dimensional position information; and generating means for generating three-dimensional shape data corresponding to the subject plane based on the three-dimensional position information for each area determined by the determining means. It is characterized by.

本発明によれば、被写体面上の模様によらず、高精度の３次元形状データを取得することができる。 According to the present invention, highly accurate three-dimensional shape data can be obtained regardless of a pattern on a subject surface.

被写体面の３次元形状データを生成する生成装置の構成の一例を示す図。FIG. 3 is a diagram illustrating an example of a configuration of a generation device that generates three-dimensional shape data of a subject surface. カメラ群を構成する各カメラの一例を示した図。The figure which showed an example of each camera which comprises a camera group. 実施形態１の概念を説明するための図。FIG. 3 is a diagram for explaining the concept of the first embodiment. フィールドに描かれるラインの形状の模式図。The schematic diagram of the shape of the line drawn in the field. 実施形態１に係る生成装置の論理構成の一例を示すブロック図。FIG. 2 is a block diagram illustrating an example of a logical configuration of the generation device according to the first embodiment. 実施形態１に係る生成装置の処理の流れを示すフローチャート。5 is a flowchart illustrating a flow of processing of the generation device according to the first embodiment. 投影面の高さの違いによる投影画像上のラインの見え方の違いを表す模式図。FIG. 9 is a schematic diagram illustrating a difference in how a line on a projected image looks due to a difference in height of a projection surface. カメラ信頼度を説明するための図。FIG. 4 is a diagram for explaining camera reliability. 投影面の高さ毎に、投影画像を合成した合成画像を表す模式図。FIG. 9 is a schematic diagram illustrating a composite image obtained by combining the projection images for each height of the projection plane. 実施形態２に係る生成装置の論理構成の一例を示すブロック図。FIG. 9 is a block diagram showing an example of a logical configuration of a generation device according to the second embodiment. 実施形態２に係る生成装置の処理の流れを示すフローチャート。9 is a flowchart illustrating a flow of processing of the generation device according to the second embodiment. 実施形態３に係る画像処理システムの構成例を示す図。FIG. 9 is a diagram illustrating a configuration example of an image processing system according to a third embodiment.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the following embodiments do not limit the present invention, and not all combinations of the features described in the present embodiments are necessarily essential to the solution of the present invention. The same components will be described with the same reference numerals.

また、以下の実施形態では、スタジアムのフィールド（グラウンド）の３次元形状データの生成を例に説明を行うが、本発明の適用はこれに限られない。本発明は、一般的な地形、道路の形状、壁面の形状、絵画の形状、壁画の形状についての３次元情報の取得にも応用することができる。 Further, in the following embodiment, description will be made by taking an example of generation of three-dimensional shape data of a stadium field (ground), but application of the present invention is not limited to this. The present invention can also be applied to the acquisition of three-dimensional information on general terrain, road shapes, wall shapes, painting shapes, and mural shapes.

本実施形態において、３次元形状データは、ほぼ平面である被写体面の３次元形状を表すデータであり、例えば、撮像対象となる撮像空間を一意に示す世界座標空間における３次元空間のｘ、ｙ、ｚの位置情報を持った点群で表現されるものである。また、３次元形状データは、点群で表現されるものに限定されず、他のデータ形式で表現されてもよく、例えば、三角形や四角形などの単純な凸多角形（ポリゴンと呼ばれる）の面で構成されるポリゴンメッシュデータやボクセルなどで表現されてもよい。 In the present embodiment, the three-dimensional shape data is data representing a three-dimensional shape of a substantially flat subject surface. , Z represented by a point group having position information. In addition, the three-dimensional shape data is not limited to data represented by a point group, but may be represented by another data format. For example, a surface of a simple convex polygon (called a polygon) such as a triangle or a quadrangle may be used. May be represented by voxels or polygon mesh data.

また、本実施形態における画像は、画像データであって、必ずしもディスプレイ等の表示デバイスで表示させるために生成される、視認可能な画像でなくてもよい。 Further, the image in the present embodiment is image data, and need not necessarily be a viewable image generated for display on a display device such as a display.

［実施形態１］
図１は本実施形態における、３次元形状データを生成する生成装置１００の構成の一例を示す図である。生成装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１、メインメモリ１０２、記憶部１０３、入力部１０４、表示部１０５、外部Ｉ／Ｆ部１０６を備え、各部がバス１０７を介して接続されている。まず、ＣＰＵ１０１は、生成装置１００を統括的に制御する演算処理装置であり、記憶部１０３等に格納された各種プログラムを実行して様々な処理を行う。メインメモリ１０２は、各種処理で用いるデータやパラメータなどを一時的に格納するほか、ＣＰＵ１０１に作業領域を提供する。記憶部１０３は、各種プログラムやＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）表示に必要な各種データを記憶する大容量記憶装置で、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。 [Embodiment 1]
FIG. 1 is a diagram illustrating an example of a configuration of a generation device 100 that generates three-dimensional shape data according to the present embodiment. The generation device 100 includes a CPU (Central Processing Unit) 101, a main memory 102, a storage unit 103, an input unit 104, a display unit 105, and an external I / F unit 106, and each unit is connected via a bus 107. First, the CPU 101 is an arithmetic processing unit that controls the generation apparatus 100 in an integrated manner, and performs various processes by executing various programs stored in the storage unit 103 and the like. The main memory 102 temporarily stores data, parameters, and the like used in various processes, and also provides the CPU 101 with a work area. The storage unit 103 is a large-capacity storage device that stores various programs and various data necessary for displaying a GUI (Graphical User Interface). For example, a non-volatile memory such as a hard disk or a silicon disk is used.

入力部１０４は、キーボードやマウス、電子ペン、タッチパネル等の装置であり、ユーザからの操作入力を受け付ける。表示部１０５は、液晶パネルなどで構成され、分析結果のＧＵＩ表示などを行う。外部Ｉ／Ｆ部１０６は、カメラ群１０９を構成する各カメラとＬＡＮ１０８を介して接続され、映像データや制御信号データの送受信を行う。バス１０７は上述の各部を接続し、データ転送を行う。 The input unit 104 is a device such as a keyboard, a mouse, an electronic pen, and a touch panel, and receives an operation input from a user. The display unit 105 is configured by a liquid crystal panel or the like, and performs a GUI display of an analysis result and the like. The external I / F unit 106 is connected to each camera constituting the camera group 109 via the LAN 108, and transmits and receives video data and control signal data. The bus 107 connects the above-described units and performs data transfer.

生成装置１００は、ＬＡＮ１０８経由でカメラ群１０９と接続されている。カメラ群１０９は、生成装置１００からの制御信号をもとに、撮影の開始や停止、カメラ設定（シャッタースピード、絞りなど）の変更、撮影した映像データの転送を行う。 The generation device 100 is connected to a camera group 109 via a LAN 108. The camera group 109 starts and stops shooting, changes camera settings (shutter speed, aperture, etc.), and transfers captured video data based on a control signal from the generation device 100.

なお、生成装置１００の構成については、上記以外にも、様々な構成要素が存在するが、本実施形態の主眼ではないので、その説明は省略する。 The configuration of the generation device 100 includes various components other than the above, but is not the main focus of the present embodiment, and a description thereof will be omitted.

図２は、カメラ群１０９を構成する各カメラの配置例を示した図である。ここではスタジアムに１０台のカメラを設置した例を示すが、カメラの台数や設置位置はこれに限られない。競技を行うフィールド２０１上に選手２０２やボールが存在し、１０台のカメラ２０３ａ〜２０３ｊがフィールド２０１の周りに配置されている。カメラ群１０９を構成する個々のカメラ２０３ａ〜２０３ｊにおいて、フィールド２０１全体、或いはフィールド２０１の注目領域が画角内に収まるように、適切なカメラの向き、焦点距離、露出制御パラメータ等が設定されている。 FIG. 2 is a diagram showing an example of the arrangement of each camera constituting the camera group 109. Here, an example in which ten cameras are installed in a stadium is shown, but the number and installation positions of the cameras are not limited to this. A player 202 and a ball are present on a field 201 where a game is to be played, and ten cameras 203 a to 203 j are arranged around the field 201. In each of the cameras 203a to 203j constituting the camera group 109, appropriate camera orientation, focal length, exposure control parameters, and the like are set so that the entire field 201 or the attention area of the field 201 falls within the angle of view. I have.

図３は本実施形態の概要を示す図である。本実施形態では、まずＳＴＥＰ．１のように、フィールド２０１に描かれるラインの２次元的な位置や形状を基に、フィールドを複数の領域に分割する。図４には、フィールドに描かれたラインの形状を表す。このラインは、ある方向に延伸し、その方向に垂直な方向に有限の幅を持つパターンのことを指す。ラインの長さや幅は、競技の規格でその大きさが決められている。本実施形態では、ラインのような特定パターン（模様）を基に、領域毎にフィールドの高さを取得するため、このラインを含むように、フィールド２０１を仮想的に複数の分割領域に分割する。この分割は、カメラで撮影して得られた撮影画像において分割されてもよい。図３（ａ）においては、６つの分割領域（Ｓ１〜Ｓ６）に分割される例を示す。なお、分割領域は、撮像画像に対して仮想的に設定されてもよいし、後述する投影画像に対して仮想的に設定されてもよい。 FIG. 3 is a diagram showing an outline of the present embodiment. In the present embodiment, first, in STEP. 1, the field is divided into a plurality of regions based on the two-dimensional position and shape of the line drawn in the field 201. FIG. 4 shows the shape of the line drawn in the field. This line refers to a pattern that extends in a certain direction and has a finite width in a direction perpendicular to the direction. The length and width of the line are determined by competition standards. In the present embodiment, the field 201 is virtually divided into a plurality of divided regions so as to include the line in order to acquire the height of the field for each region based on a specific pattern (pattern) such as a line. . This division may be performed on a photographed image obtained by photographing with a camera. FIG. 3A shows an example in which the image is divided into six divided areas (S1 to S6). Note that the divided area may be set virtually for a captured image or may be set virtually for a projected image described later.

この特定パターンであるラインに関する情報（例えば、フィールド上の位置や、延伸方向の長さや延伸方向に垂直な方向の幅などの形状を示す情報）に関しては、事前に入力部１０４を介して生成装置１００に入力される。ただし、撮影画像や、この後のＳＴＥＰ．２で出力する投影画像から生成装置１００が決定するようにしてもよい。本実施形態では、ラインに関する情報を事前に入力されて生成装置１００が取得する場合について述べる。 The information regarding the line as the specific pattern (for example, information indicating the position on the field, the shape such as the length in the stretching direction, and the width in the direction perpendicular to the stretching direction) is generated in advance by the generation device via the input unit 104. 100 is input. However, the captured image and the subsequent STEP. The generation apparatus 100 may determine the projection image from the projection image output in step 2. In the present embodiment, a case will be described in which information about a line is input in advance and the generation apparatus 100 acquires the information.

次にＳＴＥＰ．２のように、被写体面であるフィールドからの距離（高さ）が異なる投影面での投影画像を、カメラの撮影画像に基づき生成する。図３（ｂ）では、分割領域Ｓ１については、カメラ１〜１０のすべてのカメラで撮影している例を示す。この場合、領域Ｓ１については、カメラ１により取得された撮影画像をフィールドからの高さが異なる複数の投影面へ投影し、複数の投影画像を生成する。例えば、高さを５ｃｍ刻みで、フィールドの中央点の高さを０ｃｍとして、−１５ｃｍから＋１５ｃｍまでの投影面に投影して投影画像を生成する場合、カメラ１により取得された撮影画像からは、７枚の投影画像が生成される。この投影画像を、分割領域Ｓ１を撮影しているカメラ１〜１０の１０台から生成すると、分割領域Ｓ１で、被写体面の距離毎に、計７０（＝７×１０）枚の投影画像が生成される。また、他の分割領域においても同様に、複数の投影画像が生成される。ただし、例えば、ある分割領域においては、カメラ３によって撮影されていなかった場合は、そのカメラ３以外のカメラの撮影画像から、投影画像を生成するようにする。 Next, STEP. A projection image on a projection plane having a different distance (height) from a field, which is a subject plane, as shown in FIG. 2, is generated based on an image captured by a camera. FIG. 3B shows an example in which the divided region S1 is photographed by all the cameras 1 to 10. In this case, for the area S1, the captured images acquired by the camera 1 are projected onto a plurality of projection planes having different heights from the field, and a plurality of projection images are generated. For example, when the height is set at 5 cm intervals and the height of the center point of the field is set to 0 cm, and projected onto a projection surface from -15 cm to +15 cm to generate a projection image, a captured image obtained by the camera 1 Seven projection images are generated. When this projection image is generated from ten cameras 1 to 10 capturing the divided area S1, a total of 70 (= 7 × 10) projection images are generated for each distance of the subject surface in the divided area S1. Is done. Similarly, a plurality of projection images are generated in other divided regions. However, for example, if a certain divided area has not been photographed by the camera 3, a projection image is generated from a photographed image of a camera other than the camera 3.

また、この投影面の高さは、フィールド上における任意の点を基準としてもよい。例えば、フィールドの中央点を基準点として、このフィールドの基準点を含む平行な平面に平行な面を投影面として設定してもよい。具体的には、その基準点を３次元座標で（０，０，０）として、基準点近傍の被写体面上の領域をｘｙ平面である基準平面として、その基準平面からのｚ方向の距離を変えて投影面を設定してもよい。 The height of the projection plane may be based on an arbitrary point on the field. For example, a plane parallel to a parallel plane including the reference point of the field may be set as the projection plane, with the center point of the field as the reference point. Specifically, the reference point is defined as (0, 0, 0) in three-dimensional coordinates, an area on the subject plane near the reference point is defined as a reference plane which is an xy plane, and a distance in the z direction from the reference plane is determined. Alternatively, the projection plane may be set.

ＳＴＥＰ．２によって、フィールドからの高さを変えることにより、図３（ｂ）のカメラ１の複数の投影画像のように、投影画像内のラインの位置が変わる。これは、カメラ１がラインの延伸方向に垂直な方向から分割領域Ｓ１を撮影しているからである。そして、撮影画像をフィールドからの高さが異なる投影画像において、撮影画像の画素位置と投影面上の位置との対応関係から、投影面の高さが変わると、延伸方向と垂直な方向に対してそれぞれの投影画像内におけるラインの位置が変わる。 STEP. By changing the height from the field according to 2, the position of the line in the projection image changes like the plurality of projection images of the camera 1 in FIG. This is because the camera 1 captures the divided area S1 from a direction perpendicular to the line extending direction. Then, in a projected image having a different height from the field in the projected image, when the height of the projected surface changes from the correspondence relationship between the pixel position of the captured image and the position on the projection surface, the projection image is moved in a direction perpendicular to the stretching direction. The position of the line in each projection image changes.

一方、図３（ｂ）のカメラ１０の複数の投影画像では、投影画像内のラインの位置は変わらないように見える。これは、カメラ１０がラインの延伸方向に平行な方向から分割領域Ｓ１を撮影しているからである。この場合は、撮影画像の画素位置と投影面上の位置との対応関係により、投影画像内のラインの位置は、投影面が変わると、延伸方向と平行な方向に変わるため、ラインの位置が変わらないように見える。 On the other hand, in the plurality of projection images of the camera 10 of FIG. 3B, the position of the line in the projection image does not seem to change. This is because the camera 10 photographs the divided area S1 from a direction parallel to the line extending direction. In this case, the position of the line in the projected image changes in a direction parallel to the stretching direction when the projection surface changes, due to the correspondence between the pixel position of the captured image and the position on the projection surface. Seems unchanged.

なお、ＳＴＥＰ．１とＳＴＥＰ．２の順番は入れ替えてもよい。具体的には、フィールドからの高さ毎に、カメラ２０３ａ〜２０３ｊの投影画像を生成し、その高さ毎の投影画像を領域毎に分割するようにしてもよい。 Note that STEP. 1 and STEP. The order of 2 may be changed. Specifically, the projection images of the cameras 203a to 203j may be generated for each height from the field, and the projection image for each height may be divided into regions.

次に、本実施形態については、ＳＴＥＰ．３のように投影画像を、設定領域毎に、及びフィールドからの高さ毎に合成し、合成画像を生成する。図３（ｂ）、（ｃ）に示すように、領域Ｓ１において、例えば高さ＋１５ｃｍにおいて、カメラ１〜１０のそれぞれの撮影画像から生成された投影画像３０１Ａａ〜３０１Ａｊの１０枚の画像が合成されて、合成画像３０２Ａが生成される。他の合成画像（例えば合成画像３０２Ｄ、３０２Ｇ）も同様に生成される。ここで、３０１Ａａの“Ａ”は、高さを表し、同じ“Ａ”であれば、フィールドからの高さが同じであることを表している。また、“ａ”は、カメラ番号を指し、同じ“ａ”であれば同じカメラで取得された撮像画像に基づく投影画像であることを意味する。 Next, in this embodiment, STEP. As shown in 3, the projection image is synthesized for each set area and for each height from the field to generate a synthesized image. As shown in FIGS. 3B and 3C, in the region S1, for example, at a height of +15 cm, ten images of the projection images 301Aa to 301Aj generated from the captured images of the cameras 1 to 10 are combined. Thus, a composite image 302A is generated. Other composite images (for example, composite images 302D and 302G) are similarly generated. Here, “A” of 301Aa represents the height, and if the “A” is the same, it indicates that the height from the field is the same. “A” indicates a camera number, and the same “a” means a projection image based on a captured image obtained by the same camera.

次に、ＳＴＥＰ．４のように、分割領域Ｓ１のフィールドからの高さ毎の合成画像３０２Ａ〜３０２Ｇの中から、ラインが最も鮮鋭となる合成画像を決定する。そして、決定された合成画像に対応するフィールドからの高さを、分割領域Ｓ１におけるフィールドからの高さと決定する。このＳＴＥＰ．４の処理を設定領域毎に行うことで、各分割領域におけるフィールドからの高さが決定される。なお、画像の鮮鋭度ではなく、他の評価値に基づき、各分割領域におけるフィールドからの高さを決定するようにしてもよい。 Next, STEP. As shown in FIG. 4, from among the composite images 302A to 302G for each height from the field of the divided area S1, the composite image with the sharpest line is determined. Then, the height from the field corresponding to the determined composite image is determined as the height from the field in the divided area S1. This STEP. By performing the process 4 for each set area, the height from the field in each divided area is determined. The height from the field in each divided area may be determined based on another evaluation value instead of the sharpness of the image.

最後にＳＴＥＰ．５のように、決定された各分割領域におけるフィールドからの高さを基に、フィールドの３次元形状データを生成する。以下で、生成装置１００が行う処理について詳細に説明する。 Finally, STEP. As shown in 5, the three-dimensional shape data of the field is generated based on the determined height from the field in each divided area. Hereinafter, the processing performed by the generation device 100 will be described in detail.

図５は、生成装置１００の機能構成を示すブロック図である。本実施形態では、生成装置１００によりスタジアムのフィールドの３次元形状データを生成する。生成装置１００は、画像取得部５０１、カメラパラメータ取得部５０２、投影部５０３、領域設定部５０４、信頼度算出部５０５、合成部５０６、距離決定部５０７、生成部５０８、及び特定パターン取得部５０９を有する。 FIG. 5 is a block diagram illustrating a functional configuration of the generation device 100. In the present embodiment, the generation device 100 generates three-dimensional shape data of a stadium field. The generation device 100 includes an image acquisition unit 501, a camera parameter acquisition unit 502, a projection unit 503, an area setting unit 504, a reliability calculation unit 505, a synthesis unit 506, a distance determination unit 507, a generation unit 508, and a specific pattern acquisition unit 509. Having.

画像取得部５０１は、カメラ群１０９で撮影された複数の撮影画像を取得する。カメラ群１０９は、図２で示すカメラ２０３ａ〜２０３ｊの１０台のカメラで構成される。そして、画像取得部５０１は、カメラ２０３ａ〜２０３ｊそれぞれから撮影画像を取得する。画像取得部５０１は、カメラパラメータ取得部５０２及び投影部５０３に、撮影画像を出力する。 The image acquisition unit 501 acquires a plurality of captured images captured by the camera group 109. The camera group 109 includes ten cameras 203a to 203j shown in FIG. Then, the image obtaining unit 501 obtains a captured image from each of the cameras 203a to 203j. The image acquisition unit 501 outputs a captured image to the camera parameter acquisition unit 502 and the projection unit 503.

カメラパラメータ取得部５０２は、画像取得部５０１から出力された撮影画像からカメラキャリブレーションを行い、カメラの外部パラメータ、内部パラメータ、歪曲パラメータを含むカメラパラメータを取得する。外部パラメータは、カメラの位置姿勢を表すパラメータであり、回転行列及び位置ベクトル等である。内部パラメータは、カメラ固有のパラメータであり、焦点距離、及び画像中心等である。カメラパラメータ取得部５０２は、投影部５０３及び信頼度算出部５０５に、カメラパラメータを出力する。 The camera parameter acquisition unit 502 performs camera calibration from the captured image output from the image acquisition unit 501, and acquires camera parameters including external parameters, internal parameters, and distortion parameters of the camera. The external parameters are parameters representing the position and orientation of the camera, such as a rotation matrix and a position vector. The internal parameters are parameters unique to the camera, such as the focal length and the image center. The camera parameter acquisition unit 502 outputs camera parameters to the projection unit 503 and the reliability calculation unit 505.

投影部５０３は、画像取得部５０１から出力された複数の撮影画像と、カメラパラメータ取得部５０２から出力されたカメラパラメータと、後述する領域設定部５０４から出力される設定領域を示す情報を基に、投影画像を生成する。投影画像は、撮影画像それぞれを被写体面からの距離が異なる投影面に投影し、設定領域毎及び被写体面からの距離毎に投影画像を生成する。投影部５０３は、合成部５０６に、設定領域毎及び被写体面からの距離毎の投影画像を出力する。 The projection unit 503 is based on a plurality of captured images output from the image acquisition unit 501, camera parameters output from the camera parameter acquisition unit 502, and information indicating a setting area output from an area setting unit 504 described below. And generate a projection image. As the projection image, each of the captured images is projected onto a projection plane having a different distance from the object plane, and a projection image is generated for each set area and each distance from the object plane. The projection unit 503 outputs a projection image for each set area and for each distance from the subject plane to the synthesis unit 506.

特定パターン取得部５０９は、外部から特定パターンに関する情報を取得する。特定パターンに関する情報は、特定パターンの被写体面上の位置情報、特定パターンの形状を示す情報、被写体面の他の領域と特定パターンとの色差を示す情報などを含む。特定パターンがフィールドに描かれたラインの場合、ラインの延伸方向やその長さ、ラインの延伸方向に垂直な方向の幅がラインの形状を示す情報である。特定パターン取得部５０９は、特定パターンに関する情報を領域設定部５０４及び距離決定部５０７に出力する。 The specific pattern acquisition unit 509 acquires information on the specific pattern from outside. The information on the specific pattern includes position information of the specific pattern on the subject surface, information indicating the shape of the specific pattern, information indicating a color difference between another region of the subject surface and the specific pattern, and the like. In the case where the specific pattern is a line drawn in a field, the extending direction and length of the line and the width in the direction perpendicular to the extending direction of the line are information indicating the shape of the line. The specific pattern acquisition unit 509 outputs information on the specific pattern to the area setting unit 504 and the distance determination unit 507.

領域設定部５０４は、特定パターン取得部５０９から出力された特定パターンに関する情報を基に、被写体面の３次元位置情報を決定するための被写体面上に仮想的に複数の領域を設定する。具体的には、領域設定部５０４は、特定パターンの少なくとも一部が、複数の設定領域それぞれに含まれるように仮想的に領域を設定する。なお、設定される領域は、ここでは投影画像上で設定する例を示すが、撮像画像で領域を設定してもよい。領域設定部５０４は、投影部５０３及び信頼度算出部５０５に、被写体面の基準点の距離を決定するための複数の設定領域を示す情報を出力する。 The region setting unit 504 virtually sets a plurality of regions on the subject plane for determining three-dimensional position information of the subject plane based on the information on the specific pattern output from the specific pattern acquisition unit 509. Specifically, the region setting unit 504 virtually sets the region such that at least a part of the specific pattern is included in each of the plurality of setting regions. Here, an example is shown in which the set area is set on the projection image, but the area may be set in the captured image. The region setting unit 504 outputs information indicating a plurality of setting regions for determining the distance of the reference point on the subject plane to the projection unit 503 and the reliability calculation unit 505.

信頼度算出部５０５は、領域設定部５０４が設定した領域毎、及びカメラ毎（投影画像毎）のカメラ信頼度を算出する。カメラ信頼度は、カメラパラメータ取得部５０２から出力されたカメラパラメータと、領域設定部５０４から出力された複数の設定領域を示す情報と、特定パターン取得部５０９から出力された特定パターンの位置及び形状を示す情報に基づき算出される。カメラ信頼度は、後述する合成部５０６が投影画像を合成する際に利用される。信頼度算出部５０５は、合成部５０６に、カメラ信頼度を出力する。 The reliability calculation unit 505 calculates the camera reliability for each area set by the area setting unit 504 and for each camera (each projection image). The camera reliability is calculated based on the camera parameters output from the camera parameter obtaining unit 502, the information indicating the plurality of setting areas output from the area setting unit 504, and the position and shape of the specific pattern output from the specific pattern obtaining unit 509. Is calculated based on the information indicating The camera reliability is used when a synthesizing unit 506 described later synthesizes a projected image. The reliability calculation unit 505 outputs the camera reliability to the synthesis unit 506.

合成部５０６は、信頼度算出部５０５から出力されたカメラ信頼度を基に、被写体面からの距離が同じ投影面における、設定領域毎の投影画像を合成し、合成画像を生成する。このカメラ信頼度は、重み付き平均化処理により合成を行う際に、各投影画像の重みとして利用される。ここでいう重み付き平均化処理とは、複数の投影画像それぞれの対応する画素の画素値どうしを重み付きで平均化して画素値を算出することをいう。合成部５０６は、設定領域毎及び被写体面からの距離毎の合成画像を、距離決定部５０７に出力する。 The synthesizing unit 506 synthesizes a projection image for each set area on the projection plane having the same distance from the subject plane based on the camera reliability output from the reliability calculation unit 505, and generates a synthesized image. The camera reliability is used as a weight of each projected image when performing synthesis by weighted averaging processing. Here, the weighted averaging process refers to calculating a pixel value by averaging the pixel values of the corresponding pixels of each of the plurality of projection images with weight. The combining unit 506 outputs a combined image for each set area and for each distance from the subject plane to the distance determination unit 507.

距離決定部５０７は、合成部５０６から出力された、設定領域毎及び被写体面からの距離毎の合成画像と、特定パターン取得部５０９から出力された特定パターンの位置及び形状を示す情報とに基づいて、被写体面からの距離を決定する。具体的には、距離決定部５０７は、設定領域毎及び被写体面からの距離毎の合成画像において、特定パターンの評価値を計算し、その評価値を基に被写体面からの距離を決定する。例えば、距離決定部５０７は、評価値として鮮鋭度を用いることができる。この場合、ある設定領域の、被写体面からの距離毎の合成画像において、特定パターンの鮮鋭度が最も高い合成画像に対応する被写体面からの距離を、その設定領域における被写体面からの距離と決定する。距離決定部５０７は、生成部５０８に、設定領域毎に決定された被写体面の基準点からの距離を出力する。この距離決定部５０７により、設定領域の３次元位置情報が決定される。つまり、設定領域のｘｙ座標は領域設定部５０４に基づく座標であり、距離決定部５０７が決定した被写体面の基準点からの距離は、ｚ座標に対応する。 The distance determining unit 507 is based on the synthesized image output from the synthesizing unit 506 for each set area and each distance from the subject plane, and information indicating the position and shape of the specific pattern output from the specific pattern acquisition unit 509. To determine the distance from the object plane. More specifically, the distance determination unit 507 calculates an evaluation value of a specific pattern in a composite image for each set area and each distance from the subject plane, and determines a distance from the subject plane based on the calculated evaluation value. For example, the distance determination unit 507 can use the sharpness as the evaluation value. In this case, in a composite image for each distance from the object plane in a certain setting area, the distance from the object plane corresponding to the composite image having the highest sharpness of the specific pattern is determined as the distance from the object plane in the setting area. I do. The distance determining unit 507 outputs, to the generating unit 508, the distance from the reference point on the subject plane determined for each set area. The three-dimensional position information of the setting area is determined by the distance determining unit 507. That is, the xy coordinates of the setting area are coordinates based on the area setting unit 504, and the distance from the reference point on the subject plane determined by the distance determining unit 507 corresponds to the z coordinate.

生成部５０８は、距離決定部５０７から出力された設定領域毎に決定された被写体面の基準点からの距離、つまり、設定領域毎に決定された３次元位置情報を基に、被写体面の３次元形状データを生成する。生成部５０８は、被写体面の３次元形状データを出力する。 Based on the distance from the reference point of the object plane determined for each set area output from the distance determination unit 507, that is, the three-dimensional position information determined for each set area, the generation unit 508 generates Generate dimensional shape data. The generation unit 508 outputs three-dimensional shape data of the subject surface.

次に、生成装置１００で行われる処理について、図６に示すフローチャートを参照して、詳細に説明する。この一連の処理は、ＣＰＵ１０１が、所定のプログラムを記憶部１０３から読み込んでメインメモリ１０２に展開し、これをＣＰＵ１０１が実行することで実現される。 Next, a process performed by the generation device 100 will be described in detail with reference to a flowchart illustrated in FIG. This series of processing is realized by the CPU 101 reading a predetermined program from the storage unit 103, expanding the program in the main memory 102, and executing the program.

Ｓ６０１において、キャリブレーション処理により、カメラパラメータを取得する。まず、画像取得部５０１がＬＡＮ１０８経由でカメラ群１０９に撮影指示を送る。撮影された画像は、画像取得部５０１で取得される。カメラ群１０９は、図２で示すように、撮影方向が異なる複数のカメラ２０３ａ〜２０３ｊで構成されている。カメラパラメータ取得部５０２が、画像取得部５０１が取得した画像からカメラ群１０９の各カメラのパラメータを算出する。カメラパラメータは、カメラの撮影位置が異なる複数の画像を入力とした、カメラキャリブレーション処理によって算出する。以下、簡単なカメラキャリブレーションの手順の例を示す。 In step S601, camera parameters are obtained by a calibration process. First, the image acquisition unit 501 sends a shooting instruction to the camera group 109 via the LAN 108. The captured image is acquired by the image acquisition unit 501. As shown in FIG. 2, the camera group 109 includes a plurality of cameras 203a to 203j having different photographing directions. The camera parameter acquisition unit 502 calculates parameters of each camera of the camera group 109 from the image acquired by the image acquisition unit 501. The camera parameters are calculated by a camera calibration process in which a plurality of images at different photographing positions of the camera are input. Hereinafter, an example of a simple camera calibration procedure will be described.

第一に、スクエアグリッド等の平面パターンを多視点から撮影する。第二に、撮影画像の特徴点を検出し、画像座標系において特徴点の座標を求める。ここで、スクエアグリッドの特徴点とは、直線の交点のことである。第三に、算出した特徴点座標を用いてカメラの内部パラメータの初期値を算出する。ここでカメラの内部パラメータとは焦点距離や、主点と呼ばれる光学的中心を表すパラメータである。また、カメラの内部パラメータの初期値は必ずしも、画像中の特徴点から算出する必要はなく、カメラの設計値を用いてもよい。第四に、バンドル調整と呼ばれる非線形最適化処理によって、カメラの内部パラメータ、外部パラメータ、歪曲収差係数を算出する。ここでカメラの外部パラメータとは、カメラの位置、視線方向、視線方向を軸とする回転角を表すパラメータのことである。また、歪曲収差係数とは、レンズの屈折率の違いによって生じる半径方向の画像の歪みや、レンズとイメージプレーンが平行にならないことによって生じる円周方向の歪みを表す係数のことである。カメラキャリブレーションの手法は他にも多く存在するが、本実施形態の主眼でないため詳細は省略する。 First, a plane pattern such as a square grid is photographed from multiple viewpoints. Second, feature points of the captured image are detected, and the coordinates of the feature points are determined in the image coordinate system. Here, the feature points of the square grid are the intersections of the straight lines. Third, an initial value of an internal parameter of the camera is calculated using the calculated feature point coordinates. Here, the internal parameters of the camera are parameters representing a focal length and an optical center called a principal point. Further, the initial values of the internal parameters of the camera do not always need to be calculated from the feature points in the image, and the design values of the camera may be used. Fourth, the camera calculates internal parameters, external parameters, and distortion coefficients by a non-linear optimization process called bundle adjustment. Here, the external parameters of the camera are parameters representing the position of the camera, the line-of-sight direction, and the rotation angle about the line-of-sight direction as an axis. Further, the distortion aberration coefficient is a coefficient representing distortion of an image in a radial direction caused by a difference in refractive index of a lens, and distortion in a circumferential direction caused by a lens and an image plane not being parallel. There are many other camera calibration methods, but the details are omitted because they are not the main focus of the present embodiment.

Ｓ６０２において、画像取得部５０１がカメラ群１０９に対してフィールドを撮影するように撮影指示を送る。カメラ群１０９を構成する撮影方向が異なる複数のカメラ２０３ａ〜２０３ｊが被写体面を撮影して取得した撮影画像を画像取得部５０１が受け取る。 In step S602, the image acquisition unit 501 sends a shooting instruction to the camera group 109 to shoot a field. The image acquisition unit 501 receives images captured by the plurality of cameras 203a to 203j that form the camera group 109 and have different shooting directions by shooting the subject surface.

Ｓ６０３において、特定パターン取得部５０９は、特定パターンに関する情報を取得する。具体的には、特定パターン取得部５０９は、特定パターンであるラインの形状やそのラインの被写体面上の位置を含む情報を、Ｓ６０２で取得した撮影画像に基づき、抽出する。 In S603, the specific pattern acquisition unit 509 acquires information on the specific pattern. Specifically, the specific pattern acquisition unit 509 extracts information including the shape of the line as the specific pattern and the position of the line on the subject surface based on the captured image acquired in S602.

Ｓ６０４において、Ｓ６０３で取得された特定パターンに関する情報に基づいて、領域設定部５０４が被写体面について、距離を決定するための複数の所定の領域を設定する。具体的には、領域設定部５０４は、各設定領域に特定パターンの少なくとも一部が含まれるように領域を設定する。また、領域設定部５０４は、特定パターンである１つのラインを均等に分割するように領域を分割してもよい。分割する場合は、投影面の高さが変化した際に、ラインが一つの領域に含まれるように領域幅を決定するようにするのが好ましい。複数の設定領域により被写体面がすべて覆われるように設定領域が設定されればよい。つまり、より密に高さを算出するために、設定領域が重なるように設定してもよい。また、設定領域は、被写体面を重複なく分割するように設定されてもよい。設定された領域は、互いに同じ大きさや同じ形状でなくてもよく、特定パターンであるラインを含むのであれば領域の大きさや形状が異なっていてもよい。 In step S604, based on the information on the specific pattern acquired in step S603, the region setting unit 504 sets a plurality of predetermined regions for determining a distance on the subject surface. Specifically, the area setting unit 504 sets the area so that each set area includes at least a part of the specific pattern. Further, the area setting unit 504 may divide the area so as to equally divide one line as the specific pattern. In the case of division, when the height of the projection plane changes, it is preferable to determine the area width so that the line is included in one area. The setting area may be set so that the object surface is entirely covered by the plurality of setting areas. That is, in order to calculate the height more densely, the setting areas may be set so as to overlap. Further, the setting area may be set so as to divide the object plane without overlapping. The set areas do not have to be the same size or the same shape as each other, and may have different sizes and shapes as long as they include a line that is a specific pattern.

Ｓ６０５において、投影部５０３が、Ｓ６０１で取得したカメラパラメータを基に、Ｓ６０４で設定された複数の設定領域毎に、複数の撮影画像を用いて、被写体面からの距離が異なる複数の投影面に投影した投影画像を生成する。なお、投影部５０３は、撮像画像を異なる投影面に投影して投影画像を生成した後、その投影画像に対して設定領域に対応する投影画像を抽出してもよい。また、投影部５０３は、撮像画像から設定領域毎で画像を抽出した後、その画像を投影面に投影して設定領域に対応する投影画像を生成するようにしてもよい。 In step S605, based on the camera parameters acquired in step S601, the projection unit 503 uses a plurality of captured images for each of the plurality of setting areas set in step S604 to project a plurality of projection planes having different distances from the subject plane. Generate a projected image. Note that the projection unit 503 may project the captured image onto a different projection plane to generate a projection image, and then extract a projection image corresponding to the setting area from the projection image. Further, the projection unit 503 may extract an image from the captured image for each setting area, and then project the image on a projection plane to generate a projection image corresponding to the setting area.

また、カメラキャリブレーションを行う際、図２のフィールド２０１を高さがおおよそ０ｍとなる平面の基準とし、フィールドの直軸方向をｘ軸、短軸方向をｙ軸、フィールドの鉛直方向をｚ軸と設定し、原点をフィールド中心に設定する。投影面は、被写体面であるフィールドに水平な面である。投影面に投影する水平方向の範囲に関して、フィールド全体を網羅できるように、ラインの位置や形状を示す情報を用いて決定する。例えば、図４のラインの形状に基づくと、縦８０ｍ、横１２０ｍの範囲に対して投影を行う。もちろん、実際のフィールドとラインの形状との誤差を考慮して、数％の余剰を持たして投影を行ってもよい。 When performing camera calibration, the field 201 in FIG. 2 is used as a reference for a plane having a height of approximately 0 m, the direct axis direction of the field is the x axis, the short axis direction is the y axis, and the vertical direction of the field is the z axis. And set the origin at the center of the field. The projection plane is a plane that is horizontal to the field that is the subject plane. The range in the horizontal direction projected on the projection plane is determined using information indicating the position and shape of the line so as to cover the entire field. For example, based on the line shape in FIG. 4, projection is performed on a range of 80 m in length and 120 m in width. Of course, the projection may be performed with a surplus of several percent in consideration of the error between the actual field and the line shape.

フィールド全体の高さを算出できるように、高さの異なる複数の投影面に対して行うが、投影する高さの範囲に関して、競技場のフィールド勾配に関する規格に合わせて決定する。例えば、フィールド規格として、フィールド中心に対して、フィールドの端までの勾配が０．３％までなどの規格があるとする。この場合、フィールドの原点からフィールド端までの距離が４０ｍであれば、許容される高さの変動は、１２ｃｍまでとなる。そのため、投影する高さの範囲は、これを網羅するように、−１５ｃｍ〜＋１５ｃｍなどに設定する。この範囲の中で、高さの刻みは任意に設定することが可能である。この刻みの数、つまり投影面の数を多くすれば、精度の高い３次元形状データが得られる。 The calculation is performed for a plurality of projection planes having different heights so that the height of the entire field can be calculated. The range of the height to be projected is determined in accordance with the standard regarding the field gradient of the stadium. For example, it is assumed that there is a field standard such that the gradient from the center of the field to the edge of the field is up to 0.3%. In this case, if the distance from the origin of the field to the end of the field is 40 m, the allowable variation in height is up to 12 cm. Therefore, the range of the projected height is set to -15 cm to +15 cm or the like so as to cover this range. Within this range, the height increment can be set arbitrarily. By increasing the number of notches, that is, the number of projection surfaces, highly accurate three-dimensional shape data can be obtained.

投影画像を生成する際に、まずカメラの内部パラメータと歪みパラメータに合わせて各カメラの撮影画像の歪み補正を行う。画像の歪み補正に用いるパラメータはＳ６０１で算出した内部パラメータ、及び歪曲パラメータである。 When generating a projection image, first, distortion correction of an image captured by each camera is performed in accordance with internal parameters and distortion parameters of the camera. The parameters used for the image distortion correction are the internal parameters calculated in S601 and the distortion parameters.

次に、投影画像の座標と撮像画像の座標との変換行列を算出する。投影面が存在するワールド座標系からカメラ座標系への変換行列をＶと定義する。ここでカメラ座標系とは、座標系の原点を始点とし、ｘ軸、ｙ軸をそれぞれ画像の水平方向、垂直方向とし、ｚ軸をカメラの視線方向になるように設定する。さらに、カメラ座標系からスクリーン座標系への変換行列をＰと定義する。これはカメラ座標系に対して存在する３次元座標を有する被写体面を２次元平面上に対して射影する変換行列である。すなわち投影画像上の点Ｘの同次座標（ｘ、ｙ、ｚ、ｗ）の撮影画像上の点Ｕの同次座標（ｘ’、ｙ’、ｚ’、ｗ’）に射影する式（１）は、以下のとおりである。 Next, a transformation matrix between the coordinates of the projected image and the coordinates of the captured image is calculated. A transformation matrix from the world coordinate system where the projection plane exists to the camera coordinate system is defined as V. Here, the camera coordinate system is set such that the origin of the coordinate system is the starting point, the x-axis and the y-axis are the horizontal and vertical directions of the image, respectively, and the z-axis is the line of sight of the camera. Further, a transformation matrix from the camera coordinate system to the screen coordinate system is defined as P. This is a transformation matrix for projecting a subject plane having three-dimensional coordinates existing on the camera coordinate system onto a two-dimensional plane. That is, the equation (1) that projects the homogeneous coordinates (x, y, z, w) of the point X on the projected image onto the homogeneous coordinates (x ′, y ′, z ′, w ′) of the point U on the captured image. ) Is as follows.

ここで、並進変換を加えるため、座標ｗ及びｗ’を追加し、４次元座標とした。この式（１）を用いて、各カメラの撮像画像をそれぞれ、異なる高さｚの投影面に対して、投影して、投影画像を生成する。具体的には、撮像画像の各座標の画素値を、撮像画像の座標それぞれに対応する投影画像の座標の画素値とすることで投影画像が生成される。 Here, coordinates w and w 'were added to perform the translational transformation, and the coordinates were set to four-dimensional coordinates. Using this equation (1), the images captured by the cameras are respectively projected onto projection planes having different heights z to generate projection images. Specifically, a projected image is generated by setting the pixel value of each coordinate of the captured image to the pixel value of the coordinate of the projected image corresponding to each coordinate of the captured image.

Ｓ６０６において、信頼度算出部５０５は、Ｓ６０４で設定された設定領域毎に、Ｓ６０１で取得されたカメラパラメータと、Ｓ６０３で取得された特定パターンから、カメラ信頼度を算出する。ここでカメラ信頼度とは、設定領域毎に被写体面の距離を決定する際に、各カメラから取得された撮影画像が、どの程度距離の決定に有用かを示す指標となる。簡単な例を図７に示す。 In S606, the reliability calculation unit 505 calculates the camera reliability from the camera parameters acquired in S601 and the specific pattern acquired in S603 for each of the setting areas set in S604. Here, the camera reliability is an index indicating how useful a captured image obtained from each camera is in determining the distance when determining the distance of the object plane for each set area. A simple example is shown in FIG.

図７は、カメラ４台（７０１〜７０４）でフィールドを撮影している模式図である。ここでは、フィールド中央に設定された矩形領域７０５の高さを求めるものとする。矩形領域７０５の高さが変化したとき、ラインの延伸方向に対して垂直な視線ベクトル（光軸）を持つカメラ７０１、７０３の撮影画像から投影画像を生成した場合、各投影画像におけるラインの位置が大きく変化する。一方、ラインの延伸方向に対して平行な視線ベクトル（光軸）を持つカメラ７０２、７０４の撮影画像から投影面の高さを変えて投影画像を生成した場合、各投影画像におけるラインの位置はほとんど変化しない。 FIG. 7 is a schematic diagram in which a field is photographed by four cameras (701 to 704). Here, the height of the rectangular area 705 set at the center of the field is determined. When the height of the rectangular area 705 changes, and a projection image is generated from images captured by the cameras 701 and 703 having a line-of-sight vector (optical axis) perpendicular to the line extending direction, the position of the line in each projection image Changes greatly. On the other hand, when the projection images are generated by changing the height of the projection plane from the images captured by the cameras 702 and 704 having a line-of-sight vector (optical axis) parallel to the line extending direction, the position of the line in each projection image is Hardly change.

本実施形態では、後述するように、設定領域毎に及びフィールド面からの高さ毎に、各カメラの投影画像を合成して合成画像を生成する。その合成画像において、ラインのずれやボケ度合いといった評価値を算出するため、合成画像にそのような特徴が表れやすい画像を選択的に用いる、もしくは重みづけをしてから合成するなどの処理を行うことが望ましい。その画像の選択や合成の際の重みの算出のために、カメラ信頼度を定義する。 In the present embodiment, as described later, a composite image is generated by synthesizing the projection images of the cameras for each set area and for each height from the field plane. In order to calculate an evaluation value such as a line shift or a degree of blur in the composite image, processing such as selectively using an image in which such characteristics are likely to appear in the composite image, or performing weighting and then compositing is performed. It is desirable. The camera reliability is defined for calculating the weight at the time of selecting and combining the images.

従って、図８に示すように、ライン８０１に対してカメラ８０２が設置されているとき、ラインの中心に対するカメラの位置を水平角φ、仰角θで定義すると、例えばカメラ信頼度ωは、式（２）で表される。 Therefore, as shown in FIG. 8, when the camera 802 is installed with respect to the line 801 and the position of the camera with respect to the center of the line is defined by the horizontal angle φ and the elevation angle θ, for example, the camera reliability ω becomes It is represented by 2).

これは、カメラの視線ベクトルがラインの向きに対して垂直に近く、かつ仰角θが小さいカメラの方が、カメラ信頼度が高くなることを表す。つまり、矩形領域７０５の高さを決定する際には、図７で示す、カメラ７０１、７０３のカメラ信頼度が高くなる。式（２）から明らかなように、設定領域毎に、各カメラのカメラ信頼度は変わるため、設定領域毎、及びカメラ毎にカメラ信頼度を算出する。また、カメラ信頼度は、被写体面からの距離、つまり被写体面の３次元位置情報を決定するために各カメラと対応する撮影画像又は投影画像がどれくらい有用か否かを示す信頼度であると言える。また、カメラ信頼度は、設定領域毎に被写体面の基準点からの距離、つまり被写体面の３次元位置情報を決定するための指標であり、設定領域毎の、各撮影画像又は各投影画像の距離決定時の有用性の度合いを示す指標である。 This means that a camera having a line of sight vector that is close to perpendicular to the direction of the line and having a smaller elevation angle θ has higher camera reliability. That is, when determining the height of the rectangular area 705, the camera reliability of the cameras 701 and 703 shown in FIG. 7 increases. As is clear from equation (2), the camera reliability of each camera changes for each set area, and therefore the camera reliability is calculated for each set area and each camera. In addition, the camera reliability is a reliability indicating a distance from a subject plane, that is, how useful a captured image or a projected image corresponding to each camera is to determine three-dimensional position information of the subject plane. . Further, the camera reliability is an index for determining the distance from the reference point of the object plane for each set area, that is, the three-dimensional position information of the object plane. This is an index indicating the degree of usefulness when determining the distance.

式（２）からわかるように、特定パターンがラインのようなものであれば、ラインの延伸方向と、カメラの光軸をラインの延伸方向の面内に射影したときの直線方向とのなす角度（９０°−φ）が大きいカメラほど、カメラ信頼度が大きくなる。また、仰角θが小さいカメラほどカメラ信頼度が大きくなる。 As can be seen from equation (2), if the specific pattern is a line, the angle between the line extending direction and the linear direction when the optical axis of the camera is projected into the plane in the line extending direction. A camera with a larger (90 ° −φ) has a higher camera reliability. In addition, a camera having a smaller elevation angle θ has a higher camera reliability.

カメラ信頼度の決定方法は、上記の方法に限られない。カメラからラインまでの物理的な距離ｄ、焦点距離ｆ、画素数ｐなどによって、ラインを明瞭にカメラで撮影できているかどうかに違いが出るため、それらのパラメータに応じてカメラの信頼度を大きくしてもよい。また、明瞭にラインを撮影できているカメラかつ、水平角、仰角が小さいカメラの信頼度を大きくするなどを組み合わせてカメラ信頼度ωを算出してもよい。この組み合わせによるカメラ信頼度ωは、式（３）のとおりである。ここでα、βは重みパラメータである。 The method for determining the camera reliability is not limited to the above method. Depending on the physical distance d from the camera to the line, the focal length f, the number of pixels p, etc., whether or not the line can be clearly captured by the camera will differ, and the reliability of the camera will increase according to those parameters. May be. Alternatively, the camera reliability ω may be calculated by combining a camera capable of clearly capturing a line and a camera having a small horizontal angle and a small elevation angle to increase the reliability. The camera reliability ω by this combination is as shown in Expression (3). Here, α and β are weight parameters.

また、カメラ信頼度が所定の閾値より低いカメラ信頼度を０とするなどの、閾値処理を行ってもよい。 Also, threshold processing may be performed, such as setting the camera reliability whose camera reliability is lower than a predetermined threshold to 0.

Ｓ６０７において、合成部５０６が、Ｓ６０６で取得したカメラ信頼度を基に、Ｓ６０５で取得した複数の投影画像を、設定領域毎に、同じ投影面の複数の投影画像を合成する。具体的にはまず、各カメラで撮影できている領域は異なるため、設定領域を各カメラが撮影できているかどうかを判定する。具体的には、設定領域毎の投影画像において、各領域の全ての画素に撮影画像を投影した画素値が存在すれば、その設定領域を該当のカメラで撮影できているとする。この判定は、Ｓ６０４において、領域設定部５０４が行うようにしてもよいし、Ｓ６０５で投影部５０３が行うようにしてもよい。 In step S607, the combining unit 506 combines the plurality of projection images acquired in step S605 with the plurality of projection images on the same projection plane for each set area based on the camera reliability acquired in step S606. More specifically, first, since the area where each camera can shoot is different, it is determined whether or not each camera can shoot the set area. Specifically, in the projection image for each set area, if there is a pixel value obtained by projecting the captured image to all the pixels in each area, it is assumed that the set area can be shot by the corresponding camera. This determination may be performed by the region setting unit 504 in S604, or may be performed by the projection unit 503 in S605.

次に、設定領域毎に、撮影できている複数のカメラのそれぞれの投影画像を用いて、投影面毎に投影画像を合成する。具体的には、Ｓ６０６で算出した、設定領域毎のカメラ信頼度ωに基づいた重み付きの平均化処理を行い、合成画像を生成する。すなわち、投影画像をｒｇｂ画像として、投影面の高さｈにおける各設定領域Ｂｊの合成画像は式（４）で表される。なお、カメラ番号をｋとする。 Next, the projection images are synthesized for each projection plane using the projection images of the plurality of cameras that can be photographed for each set area. Specifically, a weighted averaging process based on the camera reliability ω for each set area calculated in S606 is performed to generate a composite image. That is, with the projection image as an rgb image, a composite image of each setting area Bj at the height h of the projection plane is expressed by Expression (4). Note that the camera number is k.

Ｓ６０６の処理で述べたように、必ずしもカメラ信頼度を全て用いて重み付き平均画像を生成しなくてもよい。例えば、カメラ信頼度が所定の閾値以下のカメラに対応する投影画像を用いない、もしくは所定の閾値以上のカメラに対応する投影画像だけを用いて平均化処理を行い、合成画像を生成するようにしてもよい。また、カメラ信頼度を用いて平均化処理を行って合成画像を生成しなくても、投影画像の対応する画素どうしの画素値の単純な平均化値や中間値を用いて、合成画像を生成してもよい。 As described in the processing of S606, it is not always necessary to generate a weighted average image using all camera reliability. For example, the averaging process is performed using only the projection image corresponding to the camera whose camera reliability is equal to or less than the predetermined threshold or using only the projection image corresponding to the camera having the predetermined threshold or more to generate a composite image. May be. In addition, even if an average process is not performed by using the camera reliability to generate a composite image, a composite image is generated by using a simple average value or an intermediate value of pixel values of corresponding pixels of a projection image. May be.

Ｓ６０７において、距離決定部５０７は、Ｓ６０７にて合成された、設定領域毎及び投影面毎の合成画像に基づいて、設定領域毎の被写体面の基準点からの距離、つまり、設定領域毎の３次元位置情報を決定する。被写体面の基準点からの距離を決定する際は、合成画像の設定領域毎の特定パターンの評価値を評価する。評価値とは、具体的には、以下では鮮鋭度を用いて説明するが、これに限られない。 In step S607, the distance determination unit 507 determines the distance from the reference point of the object plane for each set area, that is, 3 for each set area, based on the synthesized image for each set area and each projection plane synthesized in S607. Determine dimensional position information. When determining the distance from the reference point of the object plane, the evaluation value of the specific pattern for each set area of the composite image is evaluated. The evaluation value is specifically described below using the sharpness, but is not limited to this.

ある設定領域における、被写体面の基準点からの距離（高さ）毎の合成画像の例を図９に示す。具体的には、図９では、被写体面に対して−１５ｃｍ〜＋１５ｃｍまで、０．５ｃｍ刻みで高さを変えて投影面を設定し、それぞれの高さで合成画像が生成された例を示している。図９に示すように、この合成画像のうち、適切な高さ（０ｃｍ）における合成画像では、ラインが鮮明に見える。これは、合成に使用された複数の投影画像それぞれに含まれるラインの位置がほぼ同じ位置で重なるからである。位置が重なるということは、実際のラインの高さがその投影面にあることを意味する。一方、それ以外の高さにおける合成画像では、合成に使用された複数の投影画像それぞれに含まれるラインの位置はずれるため、ラインがぼやけてしまっている。 FIG. 9 shows an example of a composite image for each distance (height) from the reference point on the subject plane in a certain setting area. Specifically, FIG. 9 shows an example in which the projection plane is set by changing the height in steps of 0.5 cm from −15 cm to +15 cm with respect to the subject plane, and a composite image is generated at each height. ing. As shown in FIG. 9, in the composite image at an appropriate height (0 cm) among the composite images, the lines look clear. This is because the positions of the lines included in each of the plurality of projection images used for the composition overlap at substantially the same position. Overlapping positions mean that the actual line height is at its projection plane. On the other hand, in the composite image at other heights, the positions of the lines included in each of the plurality of projection images used for the composition are shifted, so that the lines are blurred.

このため、ラインの鮮鋭度を評価することで、ラインの実際の高さを決定することができる。画像の鮮鋭度を評価するため、例えばラプラシアンフィルタＬ等のフィルタを使用する。高さｈにおける、ある設定領域Ｂｊの合成画像の鮮鋭度Ｓｊ，ｈは式（５）で表される。 Therefore, the actual height of the line can be determined by evaluating the sharpness of the line. In order to evaluate the sharpness of the image, for example, a filter such as a Laplacian filter L is used. The sharpness Sj, h of the composite image of a certain setting area Bj at the height h is represented by Expression (5).

なお、画像の鮮鋭度を評価するフィルタは、ラプラシアンフィルタに限定されず、一次微分フィルタ、Ｐｒｅｗｉｔｔフィルタ、Ｓｏｂｅｌフィルタ等を用いてもよい。また、平滑化フィルタをかけた合成画像とフィルタをかけない元の合成画像との差分を算出して、差分が大きくなるような高さの合成画像を、設定領域の適切な高さとして決定するようにしもよい。 Note that the filter for evaluating the sharpness of the image is not limited to the Laplacian filter, and a primary differential filter, a Prewitt filter, a Sobel filter, or the like may be used. Further, a difference between the combined image subjected to the smoothing filter and the original combined image not subjected to the filter is calculated, and a combined image having a height such that the difference is increased is determined as an appropriate height of the setting area. You may do so.

鮮鋭度を評価した後、高さを決定する方法もいくつか存在する。例えば、鮮鋭度が最大となるような高さＨｊを、設定領域Ｂｊの適切な高さとして決定する（式（６）参照）。 After evaluating the sharpness, there are several ways to determine the height. For example, the height Hj at which the sharpness is maximized is determined as an appropriate height of the setting area Bj (see Expression (6)).

適切な高さが正しく算出できているかどうかの指標として、例えば、高さを変化させた時に鮮鋭度が滑らかに変化しているかどうかを判定するようにしてもよい。具体的には、鮮鋭度が最大となる高さの周辺で、鮮鋭度が滑らかに変化しているか否かについて式（７）を用いて判定する。ここで投影画像を生成している高さ幅（刻み）をａとする。式（７）を満たしていれば。その高さを設定領域Ｂｊの適切な高さであると判断し、満たしていなければ、その次に鮮鋭度が大きい合成画像に対応する高さについて同様の判定を行う。 As an index of whether or not an appropriate height has been correctly calculated, for example, it may be determined whether or not the sharpness changes smoothly when the height is changed. Specifically, it is determined using Equation (7) whether or not the sharpness changes smoothly around the height where the sharpness is maximum. Here, the height width (step) at which the projection image is generated is defined as a. If equation (7) is satisfied. It is determined that the height is the appropriate height of the setting area Bj. If the height is not satisfied, the same determination is performed for the height corresponding to the synthesized image having the next highest sharpness.

また、最大の鮮鋭度に近い鮮鋭度が複数算出されたときに、それらの鮮鋭度に対応する投影面の高さを平均して、設定領域の適切な高さとして算出してもよい。 Further, when a plurality of sharpness levels close to the maximum sharpness level are calculated, the heights of the projection planes corresponding to the sharpness levels may be averaged and calculated as an appropriate height of the setting area.

さらに、隣接する設定領域との高さの連続性を拘束条件としてもよい。被写体面の高さは滑らかに変化するため、隣接する設定領域との高さの差は小さくなるはずである。この拘束条件と上記の鮮鋭度を組み合わせて、領域Ｂｊにおける高さｈｊの尤度Ｍｊ，ｈは、式（８）で算出できる。ここで、α、βは重みパラメータである。そして、この尤度が最も大きくなる高さを設定領域の最適な高さと決定するようにしてもよい。 Furthermore, the continuity of the height with the adjacent setting area may be used as the constraint condition. Since the height of the object plane changes smoothly, the difference in height between adjacent setting areas should be small. By combining this constraint condition and the sharpness described above, the likelihood Mj, h of the height hj in the region Bj can be calculated by Expression (8). Here, α and β are weight parameters. Then, the height at which the likelihood becomes maximum may be determined as the optimum height of the setting area.

式（８）の右辺第１項は、被写体面からの距離の連続性を示す、隣り合う２つの設定領域の被写体面からの距離の変化を示す指標である。 The first term on the right side of Expression (8) is an index indicating the continuity of the distance from the object plane and indicating the change in the distance between two adjacent setting areas from the object plane.

このようにして、合成画像を用いて、各設定領域における適切な高さが決定される。また、距離決定部５０７は、領域設定部５０４で設定された、設定領域が互いに重複する部分を含む設定領域においては、まず、それぞれの設定領域の頂点の被写体面の基準点からの距離を、上記の方法で決定する。そして、距離決定部５０７は、それぞれの設定領域の頂点の被写体面の基準点からの距離の中間値や平均値を、重複部分の頂点における被写体面の基準点からの距離と決定するようにしてもよい。 In this way, an appropriate height in each setting area is determined using the composite image. In addition, in the setting areas set by the area setting unit 504 and including the portions where the setting areas overlap with each other, the distance determining unit 507 first calculates the distance from the reference point of the vertex of each setting area to the subject plane. Determined by the above method. Then, the distance determination unit 507 determines an intermediate value or an average value of the distances of the vertexes of the respective setting areas from the reference point of the object plane as the distances from the reference points of the object plane at the vertices of the overlapping portion. Is also good.

Ｓ６０９において、生成部５０８が、Ｓ６０８で決定された各設定領域の高さに合わせて、被写体面の３次元形状データを生成する。具体的には、設定領域の幾何学的中心位置に頂点が存在するとして、その頂点座標をＳ６０８で決定された高さに合わせて変更する。この場合、信頼度の低いカメラしか高さ算出に用いられていない領域については、３次元形状データを生成する際の頂点として用いないなどの処理を行ってもよい。また、ライン上以外の領域については、ライン部分の頂点座標を用いて頂点座標を生成できる。例えば、ライン上ではない領域の頂点ｖの高さｖｚは、その頂点の近傍領域Ωに存在する頂点ｖ’との距離に関する重み付き平均で算出することができる（式（９）、式（１０）、式（１１）参照）。 In step S609, the generation unit 508 generates three-dimensional shape data of the subject surface in accordance with the height of each setting area determined in step S608. Specifically, assuming that a vertex exists at the geometric center position of the setting area, the vertex coordinates are changed according to the height determined in S608. In this case, for an area in which only cameras with low reliability are used for height calculation, processing such as not using it as a vertex when generating three-dimensional shape data may be performed. For regions other than those on the line, vertex coordinates can be generated using the vertex coordinates of the line portion. For example, the height vz of the vertex v of the region not on the line can be calculated by a weighted average of the distance from the vertex v ′ existing in the neighboring region Ω of the vertex (expression (9), expression (10) ), Equation (11)).

３次元形状データの表現方法は、算出した頂点だけを用いて、点群として形状を表現してもよい。この場合、撮像空間を一意に示す世界座標空間における３次元空間のｘ、ｙ、ｚの位置情報を持った点群で表現される。また、３次元形状データは、設定領域の幾何学的な中心位置を結ぶ面を生成して、複数の面の集合としてポリゴンメッシュデータとして表現されてもよい。また、３次元形状データは、ボクセルで表現されてもよい。 In the method of expressing the three-dimensional shape data, the shape may be expressed as a point group using only the calculated vertices. In this case, it is expressed by a point group having x, y, and z position information in a three-dimensional space in a world coordinate space that uniquely indicates an imaging space. In addition, the three-dimensional shape data may generate a surface connecting the geometric center positions of the setting area, and may be expressed as polygon mesh data as a set of a plurality of surfaces. Further, the three-dimensional shape data may be represented by voxels.

以上のように、本実施形態では、領域設定部５０４において、特定パターン（模様）の少なくとも一部を含むように、３次元位置情報を決定する領域を設定し、特定パターン（模様）を利用して被写体面の３次元位置情報を決定している。そのため、精度よく３次元形状データを生成することができる。 As described above, in the present embodiment, the area setting unit 504 sets the area for determining the three-dimensional position information so as to include at least a part of the specific pattern (pattern), and uses the specific pattern (pattern). Thus, three-dimensional position information of the object plane is determined. Therefore, three-dimensional shape data can be generated with high accuracy.

本実施形態における模様は、フィールドに描かれたラインを例に説明したが、これに限られない。例えば模様は、図形、標識、絵画などを含んでもよい。また、模様は、人工的な作られた模様でもいいし、自然にできた模様でもよい。また、模様は、被写体面において、模様とは異なる他の領域の色とは、異なる色であることが望ましい。 Although the pattern in the present embodiment has been described by taking the line drawn in the field as an example, the pattern is not limited to this. For example, the pattern may include a figure, a sign, a painting, and the like. In addition, the pattern may be an artificially made pattern or a naturally made pattern. Further, it is desirable that the pattern has a different color from the color of another area different from the pattern on the subject surface.

［実施形態２］
実施形態１では投影画像を合成した後に、設定領域毎の高さを決定した。本実施形態では、投影画像を合成せずに、設定領域毎の高さを決定する形態について説明する。図１０は、本実施形態に係る生成装置１０００の機能構成を示すブロック図である。また、図１１は、生成装置１０００で行われる処理のフローチャートである。図１０、１１において、実施形態１と同じ構成については、同じ符号を付す。 [Embodiment 2]
In the first embodiment, the height of each set area is determined after the projection images are combined. In the present embodiment, an embodiment will be described in which the height of each set area is determined without combining projected images. FIG. 10 is a block diagram illustrating a functional configuration of the generation device 1000 according to the present embodiment. FIG. 11 is a flowchart of a process performed by the generation device 1000. 10 and 11, the same components as those of the first embodiment are denoted by the same reference numerals.

生成装置１０００は、画像取得部５０１、カメラパラメータ取得部５０２、投影部５０３、領域設定部５０４、信頼度算出部１００１、合成部５０６、距離決定部１００２、生成部５０８、及び特定パターン取得部５０９を有する。実施形態１の信頼度算出部５０５は、合成部５０６にカメラ信頼度を出力していたが、本実施形態の信頼度算出部１００１は、距離決定部１００２にカメラ信頼度を出力する点のみが異なる。 The generation apparatus 1000 includes an image acquisition unit 501, a camera parameter acquisition unit 502, a projection unit 503, an area setting unit 504, a reliability calculation unit 1001, a synthesis unit 506, a distance determination unit 1002, a generation unit 508, and a specific pattern acquisition unit 509. Having. Although the reliability calculation unit 505 of the first embodiment outputs the camera reliability to the combining unit 506, the reliability calculation unit 1001 of the present embodiment is different from the first embodiment only in that the camera reliability is output to the distance determination unit 1002. different.

図１１において、カメラ信頼度を算出するまでの処理（Ｓ６０１〜Ｓ６０６）及び、３次元形状データを生成する処理（Ｓ６０９）は、実施形態１と同様であるため、説明を省略する。以下ではＳ１１０１の処理について具体的に説明する。 In FIG. 11, the processing (S601 to S606) until the camera reliability is calculated and the processing (S609) for generating the three-dimensional shape data are the same as those in the first embodiment, and thus description thereof will be omitted. Hereinafter, the process of S1101 will be specifically described.

Ｓ１１０１において、距離決定部１００２は、Ｓ６０５で算出されたカメラ信頼度と、Ｓ６０５で取得された設定領域毎及び被写体面からの距離毎の投影画像から、適切な距離を決定する。まずは実施形態１で述べたように、各カメラで撮影できている領域は異なるため、設定領域を各カメラが撮影できているかどうかを判定する。 In step S1101, the distance determination unit 1002 determines an appropriate distance from the camera reliability calculated in step S605 and the projection images for each set area and each distance from the subject plane acquired in step S605. First, as described in the first embodiment, since the area where each camera can shoot is different, it is determined whether each camera can shoot the set area.

次に、設定領域に存在するラインを検出する。ラインは、芝生や地面の上に、所定の規格で明瞭に描かれているため、色検出や輝度が大きい領域を抽出する処理によって容易に抽出可能である。投影面の高さを変更した場合、カメラ信頼度が高いカメラに対応する投影画像間では、ラインの位置が大きく変化する。しかし、適切な投影面の高さに投影した場合、どのカメラに対応する投影画像であってもラインの位置もほぼ一致する。すなわち高さｈについて、カメラｉの投影画像の任意の領域Ｂｉ，ｊ，ｈのラインの存在領域Ｗｉ，ｊの積集合が最大になるような投影面の高さＨを、設定領域の最適な高さと決定することができる（式（１２）参照）。なお、設定領域の適切な高さを決定する方法は、この方法に限られない。例えば、ラインの存在領域の和集合が最小となるような高さを算出してもよい（式（１３）参照）。なお、ラインの存在領域とは、投影画像内のラインが描画される領域のことである。 Next, a line existing in the set area is detected. Since the line is clearly drawn on the lawn or the ground in a predetermined standard, the line can be easily extracted by color detection or a process of extracting a region having a large luminance. When the height of the projection plane is changed, the position of the line greatly changes between projection images corresponding to cameras with high camera reliability. However, when the image is projected at an appropriate height of the projection plane, the positions of the lines are almost the same regardless of the projection image corresponding to any camera. That is, for the height h, the height H of the projection plane that maximizes the intersection of the existing areas Wi, j of the lines of the arbitrary areas Bi, j, h of the projection image of the camera i is determined as the optimal setting area. The height can be determined (see equation (12)). Note that a method for determining an appropriate height of the setting area is not limited to this method. For example, the height may be calculated so that the union of the existing areas of the lines is minimized (see Expression (13)). Note that the region where the line exists is the region where the line in the projected image is drawn.

また隣接する設定領域の連続性を拘束条件とする場合、カメラ信頼度が高い投影画像のラインの存在領域の連続性を考慮すればよい。例えば、領域の一部重複するように領域分割されている場合、隣接するラインの存在領域の積集合が最大となる高さを採用するような拘束条件を加えればよい。 When the continuity of the adjacent set areas is used as the constraint condition, the continuity of the existing area of the line of the projected image with high camera reliability may be considered. For example, when the area is divided so that the area partially overlaps, a constraint condition that employs a height at which the intersection of the existing areas of the adjacent lines becomes the maximum may be added.

［実施形態３］
以下では、本実施形態の仮想視点画像を生成する画像処理システムについて説明する。上述した実施形態で生成されたフィールドの３次元形状データは、仮想視点画像を生成する際に利用される。 [Embodiment 3]
Hereinafter, an image processing system that generates a virtual viewpoint image according to the present embodiment will be described. The three-dimensional shape data of the field generated in the above-described embodiment is used when generating a virtual viewpoint image.

競技場（スタジアム）やコンサートホールなどの施設に複数のカメラ及びマイクを設置し撮影及び集音を行うシステムについて、図１２のシステム構成図を用いて説明する。画像処理システム１２００は、センサシステム１２１０ａ〜１２１０ｊ、画像コンピューティングサーバ１３００、コントローラ１４００、スイッチングハブ１２８０、及びエンドユーザ端末１２９０を有する。 A system in which a plurality of cameras and microphones are installed in a facility such as a stadium or a concert hall to capture and collect sound will be described with reference to a system configuration diagram in FIG. The image processing system 1200 includes sensor systems 1210a to 1210j, an image computing server 1300, a controller 1400, a switching hub 1280, and an end user terminal 1290.

コントローラ１４００は、制御ステーション１４１０と仮想カメラ操作ＵＩ１４３０を有する。制御ステーション１４１０は、画像処理システム１２００を構成するそれぞれのブロックに対してネットワーク１４１０ａ〜１４１０ｃ、１３９１、１２８０ａ、１２８０ｂ、及び１２７０ａ〜１２７０ｉを通じて動作状態の管理及びパラメータ設定制御などを行う。ここで、ネットワークはＥｔｈｅｒｎｅｔ（登録商標、以下省略）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサーネット）や１０ＧｂＥでもよいし、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用イーサーネット等を組合せて構成されてもよい。また、これらに限定されず、他の種別のネットワークであってもよい。 The controller 1400 has a control station 1410 and a virtual camera operation UI 1430. The control station 1410 performs operation state management and parameter setting control for each block constituting the image processing system 1200 through the networks 1410a to 1410c, 1391, 1280a, 1280b, and 1270a to 1270i. Here, the network may be GbE (Gigabit Ethernet) or 10 GbE conforming to the IEEE standard, which is Ethernet (registered trademark, hereinafter abbreviated), or may be configured by combining interconnect Infiniband, industrial Ethernet, and the like. The network is not limited to these, and may be another type of network.

最初に、センサシステム１２１０ａ〜１２１０ｊの１０セットの画像及び音声をセンサシステム１２１０ｊから画像コンピューティングサーバ１３００へ送信する動作を説明する。本実施形態の画像処理システム１２００は、センサシステム１２１０ａ〜１２１０ｊがデイジーチェーンにより接続される。 First, an operation of transmitting ten sets of images and sounds of the sensor systems 1210a to 1210j from the sensor system 1210j to the image computing server 1300 will be described. In the image processing system 1200 of the present embodiment, the sensor systems 1210a to 1210j are connected by a daisy chain.

本実施形態において、特別な説明がない場合は、センサシステム１２１０ａ〜１２１０ｊまでの１０セットのシステムを区別せずセンサシステム１２１０と記載する。各センサシステム１２１０内の装置についても同様に、特別な説明がない場合は区別せず、マイク１２１１、カメラ１２１２、雲台１２１３、外部センサ１２１４、及びカメラアダプタ１２２０と記載する。なお、センサシステムの台数として６セットと記載しているが、あくまでも一例であり、台数をこれに限定するものではない。撮像システムのカメラ１２１２ａ〜１２１２ｊそれぞれは、異なるカメラの対称位置以外の位置に配置されている。 In the present embodiment, unless otherwise specified, the sensor system 1210 is described without distinguishing ten sets of the sensor systems 1210a to 1210j. Similarly, the devices in each sensor system 1210 will be described as a microphone 1211, a camera 1212, a camera platform 1213, an external sensor 1214, and a camera adapter 1220 without distinction unless otherwise specified. Although the number of sensor systems is described as six sets, this is merely an example, and the number is not limited to this. Each of the cameras 1212a to 1212j of the imaging system is arranged at a position other than the symmetric position of a different camera.

また、複数のセンサシステム１２１０は同一の構成でなくてもよく、例えばそれぞれが異なる機種の装置で構成されていてもよい。なお、本実施形態では、特に断りがない限り、画像という文言が、動画と静止画の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１２００は、静止画及び動画の何れについても処理可能である。また、本実施形態では、画像処理システム１２００により提供される仮想視点コンテンツには、仮想視点画像と仮想聴収点音響が含まれる例を中心に説明するが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくてもよい。また例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音響であってもよい。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略しているが、基本的に画像と音声は共に処理されるものとする。 Further, the plurality of sensor systems 1210 do not have to have the same configuration, and for example, each may be configured by a device of a different model. In the present embodiment, unless otherwise specified, the term image is described as including the concept of a moving image and a still image. That is, the image processing system 1200 of the present embodiment can process both still images and moving images. In the present embodiment, an example will be mainly described in which the virtual viewpoint content provided by the image processing system 1200 includes a virtual viewpoint image and a virtual listening point sound, but the present invention is not limited to this. For example, sound may not be included in the virtual viewpoint content. Also, for example, the sound included in the virtual viewpoint content may be sound collected by a microphone closest to the virtual viewpoint. In the present embodiment, for simplicity of description, description of audio is partially omitted, but it is assumed that both image and audio are basically processed.

センサシステム１２１０ａ〜１２１０ｊは、それぞれ１台ずつのカメラ１２１２ａ〜１２１２ｊを有する。すなわち、画像処理システム１２００は、被写体を複数の方向から撮影するための複数のカメラ１２１２を有する。なお、複数のカメラ１２１２は同一符号を用いて説明するが、性能や機種が異なっていてもよい。複数のセンサシステム１２１０同士はデイジーチェーンにより接続される。この接続形態により、撮影画像の４Ｋや８Ｋなどへの高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化ができる効果があることをここに明記しておく。 Each of the sensor systems 1210a to 1210j has one camera 1212a to 1212j. That is, the image processing system 1200 has a plurality of cameras 1212 for photographing a subject from a plurality of directions. The plurality of cameras 1212 will be described using the same reference numerals, but may have different performances and models. The plurality of sensor systems 1210 are connected by a daisy chain. With this connection form, it is possible to reduce the number of connection cables and to save labor for wiring work in increasing the resolution of a captured image to 4K or 8K and increasing the capacity of image data accompanying a higher frame rate. It is specified here.

なおこれに限らず、接続形態として、各センサシステム１２１０ａ〜１２１０ｊがスイッチングハブ１２８０に接続されて、スイッチングハブ１２８０を経由してセンサシステム１２１０間のデータ送受信を行うスター型のネットワーク構成としてもよい。 The connection configuration is not limited to this, and a star-type network configuration in which the sensor systems 1210a to 1210j are connected to the switching hub 1280 and data is transmitted and received between the sensor systems 1210 via the switching hub 1280 may be used.

また、図１２では、デイジーチェーンとなるようセンサシステム１２１０ａ〜１２１０ｊの全てがカスケード接続されている構成を示したがこれに限定するものではない。例えば、複数のセンサシステム１２１０をいくつかのグループに分割して、分割したグループ単位でセンサシステム１２１０間をデイジーチェーン接続してもよい。そして、分割単位の終端となるカメラアダプタ１２２０がスイッチングハブに接続されて画像コンピューティングサーバ１３００へ画像の入力を行うようにしてもよい。このような構成は、スタジアムにおいて特に有効である。例えば、スタジアムが複数階で構成され、フロア毎にセンサシステム１２１０を配備する場合が考えられる。この場合、フロア毎、又はスタジアムの半周毎に画像コンピューティングサーバ１３００への入力を行うことができ、全センサシステム１２１０を１つのデイジーチェーンで接続する配線が困難な場所でも設置の簡便化及びシステムの柔軟化を図ることができる。 Further, FIG. 12 shows a configuration in which all of the sensor systems 1210a to 1210j are cascaded to form a daisy chain, but the present invention is not limited to this. For example, a plurality of sensor systems 1210 may be divided into some groups, and the sensor systems 1210 may be daisy-chained in divided groups. Then, the camera adapter 1220 which is the end of the division unit may be connected to the switching hub to input an image to the image computing server 1300. Such a configuration is particularly effective in a stadium. For example, a case is considered in which a stadium is configured with a plurality of floors, and a sensor system 1210 is provided for each floor. In this case, the input to the image computing server 1300 can be performed for each floor or for every half turn of the stadium, and the installation and simplification of the system can be achieved even in a place where wiring is difficult to connect all the sensor systems 1210 with one daisy chain. Can be made more flexible.

また、デイジーチェーン接続されて画像コンピューティングサーバ１３００へ画像入力を行うカメラアダプタ１２２０が１つであるか２つ以上であるかに応じて、画像コンピューティングサーバ１３００での画像処理の制御が切り替えられる。すなわち、センサシステム１２１０が複数のグループに分割されているかどうかに応じて制御が切り替えられる。画像入力を行うカメラアダプタ１２２０が１つの場合は、デイジーチェーン接続で画像伝送を行いながら競技場全周画像が生成されるため、画像コンピューティングサーバ１３００において全周の画像データが揃うタイミングは同期がとられている。すなわち、センサシステム１２１０がグループに分割されていなければ、同期はとれる。 In addition, control of image processing in the image computing server 1300 is switched according to whether there is one or more camera adapters 1220 connected in a daisy chain and inputting an image to the image computing server 1300. . That is, the control is switched according to whether the sensor system 1210 is divided into a plurality of groups. When the number of camera adapters 1220 for inputting an image is one, an image of the entire circumference of the stadium is generated while performing image transmission by daisy chain connection. Has been taken. That is, if the sensor system 1210 is not divided into groups, synchronization is achieved.

しかし、画像入力を行うカメラアダプタ１２２０が複数になる場合は、画像が撮影されてから画像コンピューティングサーバ１３００に入力されるまでの遅延がデイジーチェーンのレーン（経路）毎に異なる場合が考えられる。すなわち、センサシステム１２１０がグループに分割される場合は、画像コンピューティングサーバ１３００に全周の画像データが入力されるタイミングは同期がとられないことがある。そのため、画像コンピューティングサーバ１３００において、全周の画像データが揃うまで待って同期をとる同期制御によって、画像データの集結をチェックしながら後段の画像処理を行う必要があることを明記しておく。 However, when a plurality of camera adapters 1220 perform image input, the delay from when an image is captured to when it is input to the image computing server 1300 may be different for each daisy chain lane (path). That is, when the sensor system 1210 is divided into groups, the timing at which the image data of the entire circumference is input to the image computing server 1300 may not be synchronized. Therefore, it is specified that the image computing server 1300 needs to perform image processing in the subsequent stage while checking the aggregation of the image data by the synchronization control that waits until the image data of the entire circumference is prepared and synchronizes.

本実施形態では、センサシステム１２１０ａはマイク１２１１ａ、カメラ１２１２ａ、雲台１２１３ａ、外部センサ１２１４ａ、及びカメラアダプタ１２２０ａを有する。なお、この構成に限定するものではなく、少なくとも１台のカメラアダプタ１２２０ａと、１台のカメラ１２１２ａまたは１台のマイク１２１１ａを有していればよい。また例えば、センサシステム１２１０ａは１台のカメラアダプタ１２２０ａと、複数のカメラ１２１２ａで構成されてもよいし、１台のカメラ１２１２ａと複数のカメラアダプタ１２２０ａで構成されてもよい。すなわち、画像処理システム１２００内の複数のカメラ１２１２と複数のカメラアダプタ１２２０はＮ対Ｍ（ＮとＭは共に１以上の整数）で対応する。また、センサシステム１２１０は、マイク１２１１ａ、カメラ１２１２ａ、雲台１２１３ａ、及びカメラアダプタ１２２０ａ以外の装置を含んでいてもよい。さらに、カメラアダプタ１２２０の機能の少なくとも一部をフロントエンドサーバ１３３０が有していてもよい。本実施形態では、センサシステム１２１０ｂ〜１２１０ｊについては、センサシステム１２１０ａと同様の構成なので省略する。なお、センサシステム１２１０ａと同じ構成に限定されるものではなく、其々のセンサシステム１２１０が異なる構成でもよい。 In the present embodiment, the sensor system 1210a has a microphone 1211a, a camera 1212a, a camera platform 1213a, an external sensor 1214a, and a camera adapter 1220a. Note that the present invention is not limited to this configuration, and it is sufficient that at least one camera adapter 1220a and one camera 1212a or one microphone 1211a are provided. Further, for example, the sensor system 1210a may be configured with one camera adapter 1220a and a plurality of cameras 1212a, or may be configured with one camera 1212a and a plurality of camera adapters 1220a. That is, the plurality of cameras 1212 and the plurality of camera adapters 1220 in the image processing system 1200 correspond to N to M (N and M are both integers of 1 or more). Further, the sensor system 1210 may include devices other than the microphone 1211a, the camera 1212a, the camera platform 1213a, and the camera adapter 1220a. Further, at least a part of the functions of the camera adapter 1220 may be included in the front-end server 1330. In the present embodiment, the sensor systems 1210b to 1210j have the same configuration as that of the sensor system 1210a, and a description thereof will be omitted. Note that the configuration is not limited to the same as the sensor system 1210a, and each sensor system 1210 may have a different configuration.

マイク１２１１ａにて集音された音声と、カメラ１２１２ａにて撮影された画像は、カメラアダプタ１２２０ａにおいて、様々な処理などが施された後、デイジーチェーン１２７０ａを通してセンサシステム１２１０ｂのカメラアダプタ１２２０ｂに伝送される。同様にセンサシステム１２１０ｂは、集音された音声と撮影された画像を、センサシステム１２１０ａから取得した画像及び音声と合わせてセンサシステム１２１０ｃに伝送する。 The sound collected by the microphone 1211a and the image captured by the camera 1212a are subjected to various processes in the camera adapter 1220a, and then transmitted to the camera adapter 1220b of the sensor system 1210b through the daisy chain 1270a. You. Similarly, the sensor system 1210b transmits the collected sound and the captured image to the sensor system 1210c together with the image and sound obtained from the sensor system 1210a.

カメラアダプタ１２２０は、カメラ１２１２が撮影した画像データ及び他のカメラアダプタ１２２０から受取った画像データに対して、前景背景分離処理、前景３次元形状データ情報生成処理、動的キャリブレーションなどの処理を行う。カメラアダプタ１２２０により、撮像画像に対する前景背景分離処理に基づき、動的オブジェクトのシルエット画像が生成される。また、他のカメラアダプタ１２２０から受け取った複数のシルエット画像に基づき、視体積交差法などにより、動的オブジェクトに対応する３次元形状データを生成する。後述する画像コンピューティングサーバ１３００により複数の３次元形状データが統合される。なお、カメラアダプタ１２２０では、動的オブジェクトに対応する３次元形状データを生成せずに、画像コンピューティングサーバ１３００により、一括で複数の動的オブジェクトに対応する３次元形状データを生成するようにしてもよい。なお、ここでいう３次元形状データは、上述した実施形態１，２で生成される３次元形状データとは異なり、動的オブジェクトに対応する３次元形状データである。動的オブジェクトとは、時系列で同じ方向から撮影を行った場合において動きのある（その絶対位置が変化し得る）オブジェクト、つまり、動体を指す。動的オブジェクトは、例えば、人物や球技におけるボールを指す。 The camera adapter 1220 performs processing such as foreground / background separation processing, foreground three-dimensional shape data information generation processing, and dynamic calibration on image data captured by the camera 1212 and image data received from another camera adapter 1220. . The camera adapter 1220 generates a silhouette image of the dynamic object based on the foreground / background separation process on the captured image. Further, based on a plurality of silhouette images received from another camera adapter 1220, three-dimensional shape data corresponding to the dynamic object is generated by a volume intersection method or the like. A plurality of three-dimensional shape data is integrated by an image computing server 1300 described later. Note that the camera adapter 1220 does not generate three-dimensional shape data corresponding to a dynamic object, but instead generates three-dimensional shape data corresponding to a plurality of dynamic objects collectively by the image computing server 1300. Is also good. The three-dimensional shape data referred to here is different from the three-dimensional shape data generated in the first and second embodiments, and is three-dimensional shape data corresponding to a dynamic object. A dynamic object refers to an object that moves (its absolute position may change) when images are taken in the same direction in a time series, that is, a moving object. The dynamic object refers to, for example, a person or a ball in a ball game.

前述した動作を続けることにより、センサシステム１２１０ａ〜１２１０ｊが取得した画像及び音声は、センサシステム１２１０ｊから１２８０ｂを用いてスイッチングハブ１２８０に伝わり、その後、画像コンピューティングサーバ１３００へ伝送される。 By continuing the above-described operation, the images and sounds acquired by the sensor systems 1210a to 1210j are transmitted to the switching hub 1280 using the sensor systems 1210j to 1280b, and then transmitted to the image computing server 1300.

なお、本実施形態では、カメラ１２１２ａ〜１２１２ｊとカメラアダプタ１２２０ａ〜１２２０ｊが分離された構成にしているが、同一筺体で一体化されていてもよい。その場合、マイク１２１１ａ〜１２１１ｊは一体化されたカメラ１２１２に内蔵されてもよいし、カメラ１２１２の外部に接続されていてもよい。 In this embodiment, the cameras 1212a to 1212j and the camera adapters 1220a to 1220j are separated from each other, but they may be integrated in the same housing. In that case, the microphones 1211a to 1211j may be built in the integrated camera 1212, or may be connected to the outside of the camera 1212.

次に、画像コンピューティングサーバ１３００の構成及び動作について説明する。本実施形態の画像コンピューティングサーバ１３００は、センサシステム１２１０ｊから取得したデータの処理を行う。画像コンピューティングサーバ１３００はフロントエンドサーバ１３３０、データベース１３５０（以下、ＤＢとも記載する。）、バックエンドサーバ１３７０、タイムサーバ１３９０を有する。なお、実施形態１，２で生成される被写体面であるフィールドに対応する３次元形状データは、このＤＢ１３５０に予め格納されている。 Next, the configuration and operation of the image computing server 1300 will be described. The image computing server 1300 according to the present embodiment processes data acquired from the sensor system 1210j. The image computing server 1300 includes a front-end server 1330, a database 1350 (hereinafter, also referred to as DB), a back-end server 1370, and a time server 1390. Note that the three-dimensional shape data corresponding to the field, which is the object plane, generated in the first and second embodiments is stored in the DB 1350 in advance.

タイムサーバ１３９０は時刻及び同期信号を配信する機能を有し、スイッチングハブ１２８０を介してセンサシステム１２１０ａ〜１２１０ｊに時刻及び同期信号を配信する。時刻と同期信号を受信したカメラアダプタ１２２０ａ〜１２２０ｊは、カメラ１２１２ａ〜１２１２ｊを時刻と同期信号をもとにＧｅｎｌｏｃｋさせ画像フレーム同期を行う。すなわち、タイムサーバ１３９０は、複数のカメラ１２１２の撮影タイミングを同期させる。これにより、画像処理システム１２００は同じタイミングで撮影された複数の撮影画像に基づいて仮想視点画像を生成できるため、撮影タイミングのずれによる仮想視点画像の品質低下を抑制できる。なお、本実施形態ではタイムサーバ１３９０が複数のカメラ１２１２の時刻同期を管理するものとするが、これに限らず、時刻同期のための処理を各カメラ１２１２又は各カメラアダプタ１２２０が独立して行ってもよい。 The time server 1390 has a function of distributing a time and synchronization signal, and distributes the time and synchronization signal to the sensor systems 1210a to 1210j via the switching hub 1280. The camera adapters 1220a to 1220j that have received the time and the synchronization signal perform Genlock of the cameras 1212a to 1212j based on the time and the synchronization signal, and perform image frame synchronization. That is, the time server 1390 synchronizes the shooting timings of the cameras 1212. Accordingly, since the image processing system 1200 can generate a virtual viewpoint image based on a plurality of captured images captured at the same timing, it is possible to suppress a decrease in the quality of the virtual viewpoint image due to a shift in the capturing timing. In this embodiment, the time server 1390 manages the time synchronization of the plurality of cameras 1212. However, the present invention is not limited to this, and each camera 1212 or each camera adapter 1220 performs processing for time synchronization independently. You may.

フロントエンドサーバ１３３０は、センサシステム１２１０ｊから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換した後に、カメラの識別子やデータ種別、フレーム番号に応じてＤＢ１３５０に書き込む。 The front-end server 1330 reconstructs a segmented transmission packet from the image and sound acquired from the sensor system 1210j to convert the data format, and then writes the data in the DB 1350 according to the camera identifier, data type, and frame number. .

次に、バックエンドサーバ１３７０では、仮想カメラ操作ＵＩ１４３０から視点の指定を受け付け、受け付けられた視点に基づいて、ＤＢ１３５０から対応する画像及び音声データ等のデータを読み出し、レンダリング処理を行って仮想視点画像を生成する。この読みだされるデータとしては、スタジアムに対応する３次元形状データや、フィールドに対応する３次元形状データなども含まれる。 Next, the back-end server 1370 accepts the designation of a viewpoint from the virtual camera operation UI 1430, reads out corresponding data such as image and audio data from the DB 1350 based on the accepted viewpoint, performs rendering processing, and performs a virtual viewpoint image. Generate The read data includes three-dimensional shape data corresponding to a stadium, three-dimensional shape data corresponding to a field, and the like.

なお、画像コンピューティングサーバ１３００の構成はこれに限らない。例えば、フロントエンドサーバ１３３０、データベース１３５０、及びバックエンドサーバ１３７０のうち少なくとも２つが一体となって構成されていてもよい。また、フロントエンドサーバ１３３０、データベース１３５０、及びバックエンドサーバ１３７０の少なくとも何れかが複数含まれていてもよい。また、画像コンピューティングサーバ１３００内の任意の位置に上記の装置以外の装置が含まれていてもよい。さらに、画像コンピューティングサーバ１３００の機能の少なくとも一部をエンドユーザ端末１２９０や仮想カメラ操作ＵＩ１４３０が有していてもよい。 Note that the configuration of the image computing server 1300 is not limited to this. For example, at least two of the front-end server 1330, the database 1350, and the back-end server 1370 may be integrally configured. Further, a plurality of at least one of the front-end server 1330, the database 1350, and the back-end server 1370 may be included. Further, a device other than the above devices may be included at an arbitrary position in the image computing server 1300. Furthermore, at least a part of the functions of the image computing server 1300 may be included in the end user terminal 1290 or the virtual camera operation UI 1430.

レンダリング処理された画像は、バックエンドサーバ１３７０からエンドユーザ端末１２９０に送信され、エンドユーザ端末１２９０を操作するユーザは視点の指定に応じた画像閲覧及び音声視聴ができる。すなわち、バックエンドサーバ１３７０は、複数のカメラ１２１２により撮影された撮影画像（複数視点画像）と視点情報とに基づく仮想視点コンテンツを生成する。より具体的には、バックエンドサーバ１３７０は、例えば複数のカメラアダプタ１２２０により複数のカメラ１２１２による撮影画像から抽出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そしてバックエンドサーバ１３７０は、生成した仮想視点コンテンツをエンドユーザ端末１２９０に提供する。なお、本実施形態において仮想視点コンテンツは画像コンピューティングサーバ１３００により生成されるものであり、特にバックエンドサーバ１３７０により生成される場合を中心に説明する。ただしこれに限らず、仮想視点コンテンツは、画像コンピューティングサーバ１３００に含まれるバックエンドサーバ１３７０以外の装置により生成されてもよいし、コントローラ１４００やエンドユーザ端末１２９０により生成されてもよい。 The image subjected to the rendering processing is transmitted from the back-end server 1370 to the end user terminal 1290, and the user operating the end user terminal 1290 can perform image browsing and audio viewing according to the designation of the viewpoint. That is, the back-end server 1370 generates virtual viewpoint content based on the captured images (multiple viewpoint images) captured by the plurality of cameras 1212 and the viewpoint information. More specifically, the back-end server 1370 generates a virtual viewpoint based on image data of a predetermined area extracted from images captured by a plurality of cameras 1212 by a plurality of camera adapters 1220 and a viewpoint specified by a user operation. Generate content. Then, the backend server 1370 provides the generated virtual viewpoint content to the end user terminal 1290. Note that, in the present embodiment, the virtual viewpoint content is generated by the image computing server 1300, and a description will be mainly given of a case where the virtual viewpoint content is generated by the back-end server 1370. However, not limited to this, the virtual viewpoint content may be generated by a device other than the back-end server 1370 included in the image computing server 1300, or may be generated by the controller 1400 or the end-user terminal 1290.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像であるとも言える。仮想的な視点（仮想視点）は、ユーザにより指定されてもよいし、画像解析の結果等に基づいて自動的に指定されてもよい。すなわち仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。 The virtual viewpoint content according to the present embodiment is a content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, it can be said that the virtual viewpoint image is an image representing the appearance at the designated viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on a result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily specified by the user. In addition, an image corresponding to a viewpoint designated by the user from a plurality of candidates and an image corresponding to a viewpoint automatically designated by the device are also included in the virtual viewpoint image.

なお、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる場合の例を中心に説明するが、必ずしも音声データが含まれていなくてもよい。また、バックエンドサーバ１３７０は、仮想視点画像を例えばＨ．２６４やＨＥＶＣなどの符号化方式に従って圧縮符号化したうえで、ＭＰＥＧ−ＤＡＳＨプロトコルを使ってエンドユーザ端末１２９０へ送信してもよい。また、仮想視点画像は、非圧縮でエンドユーザ端末１２９０へ送信されてもよい。とくに圧縮符号化を行う前者はエンドユーザ端末１２９０としてスマートフォンやタブレットを想定しており、後者は非圧縮画像を表示可能なディスプレイを想定している。すなわち、エンドユーザ端末１２９０の種別に応じて画像フォーマットが切り替え可能であることを明記しておく。また、画像の送信プロトコルはＭＰＥＧ−ＤＡＳＨに限らず、例えば、ＨＬＳ（ＨＴＴＰＬｉｖｅＳｔｒｅａｍｉｎｇ）やその他の送信方法を用いてもよい。 In the present embodiment, an example in which audio data (audio data) is included in the virtual viewpoint content will be mainly described, but audio data may not necessarily be included. Further, the back-end server 1370 converts the virtual viewpoint image into, for example, H.264. After compression-encoding according to an encoding method such as H.264 or HEVC, it may be transmitted to the end user terminal 1290 using the MPEG-DASH protocol. Further, the virtual viewpoint image may be transmitted to the end user terminal 1290 without compression. In particular, the former performing compression encoding assumes a smartphone or a tablet as the end user terminal 1290, and the latter assumes a display capable of displaying an uncompressed image. That is, it is specified that the image format can be switched according to the type of the end user terminal 1290. Further, the image transmission protocol is not limited to MPEG-DASH, and for example, HLS (HTTP Live Streaming) or another transmission method may be used.

このように、画像処理システム１２００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインはセンサシステム１２１０〜１２１０ｊを含む。また、データ保存ドメインはデータベース１３５０、フロントエンドサーバ１３３０及びバックエンドサーバ１３７０を含む。また、映像生成ドメインは仮想カメラ操作ＵＩ１４３０及びエンドユーザ端末１２９０を含む。なお本構成に限らず、例えば、仮想カメラ操作ＵＩ１４３０が直接センサシステム１２１０ａ〜１２１０ｊから画像を取得する事も可能である。しかしながら、本実施形態では、センサシステム１２１０ａ〜１２１０ｊから直接画像を取得する方法ではなくデータ保存機能を中間に配置する方法をとる。具体的には、フロントエンドサーバ１３３０がセンサシステム１２１０ａ〜１２１０ｊが生成した画像データや音声データ及びそれらのデータのメタ情報をデータベース１３５０の共通スキーマ及びデータ型に変換している。これにより、センサシステム１２１０ａ〜１２１０ｊのカメラ１２１２が他機種のカメラに変化しても、変化した差分をフロントエンドサーバ１３３０が吸収し、データベース１３５０に登録することができる。このことによって、カメラ１２１２が他機種カメラに変わった場合に、仮想カメラ操作ＵＩ１４３０が適切に動作しないおそれを低減できる。 As described above, the image processing system 1200 has three functional domains: a video collection domain, a data storage domain, and a video generation domain. The image collection domain includes sensor systems 1210-1210j. The data storage domain includes a database 1350, a front-end server 1330, and a back-end server 1370. The video generation domain includes a virtual camera operation UI 1430 and an end user terminal 1290. The present invention is not limited to this configuration. For example, the virtual camera operation UI 1430 can directly acquire images from the sensor systems 1210a to 1210j. However, in the present embodiment, a method of arranging a data storage function in the middle is used instead of a method of directly acquiring images from the sensor systems 1210a to 1210j. Specifically, the front-end server 1330 converts the image data and audio data generated by the sensor systems 1210a to 1210j and the meta information of the data into a common schema and data type of the database 1350. Thus, even if the camera 1212 of the sensor system 1210a to 1210j changes to a camera of another model, the changed difference can be absorbed by the front-end server 1330 and registered in the database 1350. This can reduce the risk that the virtual camera operation UI 1430 will not operate properly when the camera 1212 is changed to another camera.

また、仮想カメラ操作ＵＩ１４３０は、直接データベース１３５０にアクセスせずにバックエンドサーバ１３７０を介してアクセスする構成である。バックエンドサーバ１３７０で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を仮想カメラ操作ＵＩ１４３０で行っている。このことにより、仮想カメラ操作ＵＩ１４３０の開発において、ＵＩ操作デバイスや、生成したい仮想視点画像を操作するＵＩの機能要求に対する開発に注力する事ができる。また、バックエンドサーバ１３７０は、仮想カメラ操作ＵＩ１４３０の要求に応じて画像生成処理に係わる共通処理を追加又は削除する事も可能である。このことによって仮想カメラ操作ＵＩ１４３０の要求に柔軟に対応する事ができる。 Further, the virtual camera operation UI 1430 is configured to access via the back-end server 1370 without directly accessing the database 1350. A common process related to the image generation process is performed by the back-end server 1370, and a difference portion of the application related to the operation UI is performed by the virtual camera operation UI 1430. As a result, in the development of the virtual camera operation UI 1430, it is possible to focus on the development for the UI operation device and the function request of the UI for operating the virtual viewpoint image to be generated. The back-end server 1370 can also add or delete common processing related to image generation processing in response to a request from the virtual camera operation UI 1430. This makes it possible to flexibly respond to the request of the virtual camera operation UI 1430.

このように、画像処理システム１２００においては、被写体を複数の方向から撮影するための複数のカメラ１２１２による撮影に基づく画像データに基づいて、バックエンドサーバ１３７０により仮想視点画像が生成される。なお、本実施形態における画像処理システム１２００は、上記で説明した物理的な構成に限定される訳ではなく、論理的に構成されていてもよい。 As described above, in the image processing system 1200, the virtual viewpoint image is generated by the back-end server 1370 based on the image data based on the photographing by the cameras 1212 for photographing the subject from a plurality of directions. Note that the image processing system 1200 according to the present embodiment is not limited to the physical configuration described above, and may be configured logically.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program for realizing one or more functions of the above-described embodiments to a system or an apparatus via a network or a storage medium, and one or more processors in a computer of the system or the apparatus read and execute the program. It can also be realized by the following processing. Further, it can be realized by a circuit (for example, an ASIC) that realizes one or more functions.

１００生成装置
５０１画像取得部
５０４領域設定部
５０７決定部
５０８生成部
５０９特定パターン取得部 Reference Signs List 100 generating device 501 image obtaining unit 504 area setting unit 507 determining unit 508 generating unit 509 specific pattern obtaining unit

Claims

First obtaining means for obtaining a plurality of captured images obtained by capturing an image of a subject surface from a plurality of directions;
A second acquisition unit that acquires information indicating a position and a shape of a pattern on the subject surface;
Based on the plurality of captured images obtained by the first obtaining unit and the information indicating the position and shape of the pattern obtained by the second obtaining unit, a three-dimensional position for each of a plurality of regions on the subject surface Determining means for determining information;
Generating means for generating three-dimensional shape data corresponding to the object plane based on the three-dimensional position information for each area determined by the determining means.

Setting means for setting a plurality of regions on the subject surface based on information indicating the position and shape of the pattern acquired by the second acquiring means,
The apparatus according to claim 1, wherein the determining unit determines three-dimensional position information for each of a plurality of regions on the subject plane set by the setting unit.

The generation apparatus according to claim 2, wherein the setting unit sets the plurality of regions such that each of the plurality of regions includes at least a part of the pattern.

A projection unit configured to generate a projection image on a projection plane having a different distance from the subject plane for each of the regions and each distance from the subject plane based on the plurality of captured images acquired by the first acquisition unit. And
2. The apparatus according to claim 1, wherein the determination unit determines three-dimensional position information for each of the regions based on the projection images generated by the projection unit for each of the regions and for each distance from the object plane. The generating device according to any one of claims 3 to 3.

A synthesizing unit for synthesizing a plurality of projection images having the same distance from the object plane for each area and each distance from the object plane to generate a synthesized image,
5. The three-dimensional position information for each of the regions, based on a composite image generated for each of the regions and for each distance from the subject plane, generated by the combining unit. The generating device according to claim 1.

The determining means outputs three-dimensional position information corresponding to a synthesized image in which the pattern becomes sharp among the synthesized images generated by the synthesizing means for each area and each distance from the object plane, The generation apparatus according to claim 5, wherein the three-dimensional position information is determined.

A third acquisition unit configured to acquire a parameter of the imaging apparatus;
Determining three-dimensional position information of the plurality of regions based on information indicating a position and a shape of the pattern acquired by the second acquiring unit and the parameter acquired by the third acquiring unit; Further comprising calculating means for calculating the index of
The generating apparatus according to claim 5, wherein the synthesizing unit synthesizes the plurality of projection images having the same distance from the subject plane based on the index calculated by the calculating unit. .

The synthesizing unit, when synthesizing the plurality of projection images having the same distance from the subject plane, uses the index calculated by the calculation unit as a weight, and calculates a pixel value of a pixel corresponding to each projection image. The generating apparatus according to claim 7, wherein a weighted averaging process is performed on the generated data.

The determining means determines three-dimensional position information for each of a plurality of regions on the subject surface based on the region where the pattern is present in the projected image for each of the regions and for each distance from the subject surface. The generating device according to claim 4.

The determining unit may determine three-dimensional position information for each of a plurality of regions on the subject surface based on a product set of the existing regions of the pattern in the projection image for each of the regions and for each distance from the subject surface. The generating device according to claim 9, wherein:

The determining unit may determine three-dimensional position information for each of a plurality of regions on the subject surface based on a union of the existing regions of the pattern in the projection image for each of the regions and for each distance from the subject surface. The generating device according to claim 9, wherein:

The determining means determines three-dimensional position information for each of a plurality of regions on the subject surface based on continuity of a distance from the subject surface between two adjacent regions among the plurality of regions. The generating device according to any one of claims 4 to 11, wherein:

13. The method according to claim 12, wherein the continuity of the distance from the subject plane is represented by using an index indicating a change in a distance from the subject plane between the two adjacent areas of the plurality of areas. The generating device as described.

The generation device according to any one of claims 1 to 13, wherein the pattern is a line drawn in a field for performing a game.

The generation device according to claim 1, wherein the pattern has a color different from a color of another region on a subject plane.

A first obtaining step of obtaining a plurality of captured images obtained by capturing an image of a subject surface from a plurality of directions;
A second acquisition step of acquiring information indicating a position and a shape of a pattern on the object plane;
Based on the plurality of captured images obtained in the first obtaining step and the information indicating the position and shape of the pattern obtained in the second obtaining step, a three-dimensional image is formed for each of the plurality of regions on the subject plane. A determination step of determining location information;
Generating a three-dimensional shape data corresponding to the subject plane based on the three-dimensional position information for each of the regions determined in the determining step.

A setting step of setting a plurality of regions on the subject surface based on the information indicating the position and the shape of the pattern acquired in the second acquisition step;
17. The generation method according to claim 16, wherein the determining step determines three-dimensional position information for each of a plurality of regions on the subject plane set in the setting step.

18. The generation method according to claim 17, wherein the setting step sets the plurality of regions such that each of the plurality of regions includes at least a part of the pattern.

A projection step of generating a projection image on a projection plane having a different distance from the object plane for each of the regions and each distance from the object plane based on the plurality of captured images acquired in the first acquisition step. And
17. The method according to claim 16, wherein the determining step determines three-dimensional position information for each of the regions based on the projection images generated for each of the regions and for each distance from the subject plane, which are generated in the projecting step. 19. The generation method according to any one of claims 18 to 18.

A program for causing a computer to function each unit of the generation device according to claim 1.