JP6641313B2

JP6641313B2 - Region extraction device and program

Info

Publication number: JP6641313B2
Application number: JP2017015185A
Authority: JP
Inventors: 軍陳; 敬介野中; 浩嗣三功; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-01-31
Filing date: 2017-01-31
Publication date: 2020-02-05
Anticipated expiration: 2037-01-31
Also published as: JP2018125642A

Description

本発明は、ビジュアル・ハル生成及びこれに基づく自由視点映像生成を高速化・高精度化するのに好適な、複数対象が撮影された多視点カメラ画像において対象のオクルージョンが発生するような場合でも、個別対象の空間領域をそれぞれ抽出できる領域抽出装置及びプログラムに関する。 The present invention is suitable for speeding up and increasing accuracy of visual hull generation and free viewpoint video generation based thereon, even in a case where occlusion of an object occurs in a multi-view camera image in which a plurality of objects are captured. The present invention relates to an area extracting device and a program capable of extracting a spatial area of an individual object.

モデルベースの自由視点映像の生成においては、多視点のカメラ映像から撮影されている対象のボクセル空間上における3次元形状モデル（Visual Hull、ビジュアル・ハル、以下適宜、「VH」と略称する。）の生成が利用される。自由視点映像の高速且つ高精度の生成のために、VHの生成を高速且つ高精度に行うことが望まれる。 In the generation of a model-based free viewpoint video, a three-dimensional shape model (Visual Hull, Visual Hull, hereinafter abbreviated as “VH” as appropriate) in a voxel space of an object captured from a multi-view camera video. Is used. In order to generate a free viewpoint video at high speed and with high accuracy, it is desired to generate VH at high speed and with high accuracy.

VH生成の高速化・高精度化に関して、非特許文献１や２の技術が存在する。 Non-Patent Literatures 1 and 2 relate to speeding up and increasing accuracy of VH generation.

非特許文献１では、リアルタイムのVH生成を目指して、第一アプローチとしてVH生成のためのボクセル空間上の初期直方体領域（bounding box）を予め計算し、ルックアップテーブルに保持している。また、第二アプローチとして、画像をダウンサンプリングすることで、ボクセルの中心点が近似的に画像の1ピクセルに射影されるようにしている。図８は当該初期領域（bounding box）の模式図である。 In Non-Patent Document 1, aiming at real-time VH generation, as a first approach, an initial rectangular parallelepiped region (bounding box) in a voxel space for VH generation is calculated in advance and stored in a lookup table. As a second approach, the image is downsampled so that the center point of the voxel is approximately projected to one pixel of the image. FIG. 8 is a schematic diagram of the initial area (bounding box).

非特許文献２では、初期直方体領域（bounding box）の推定と、視体積交差法における交点検出と、の2ステップからなる、VHのロバストかつ正確な計算を提案している。 Non-Patent Document 2 proposes a robust and accurate calculation of VH, which includes two steps of estimating an initial rectangular parallelepiped region (bounding box) and detecting an intersection in the visual volume intersection method.

Alexander Ladikos, Selim Benhimane, and Nassir Navab. "Efficient Visual Hull Computation for Real-Time 3D Reconstruction using CUDA", Proceedings of 2008 Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, 2008.Alexander Ladikos, Selim Benhimane, and Nassir Navab. "Efficient Visual Hull Computation for Real-Time 3D Reconstruction using CUDA", Proceedings of 2008 Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-8, 2008. Peng Song, Xiaojun Wu, and Michael Yu Wang. "A robust and Accurate Method for Visual Hull Computation", International Conference on Information and Automation, pp. 784-789, 2009.Peng Song, Xiaojun Wu, and Michael Yu Wang. "A robust and Accurate Method for Visual Hull Computation", International Conference on Information and Automation, pp. 784-789, 2009.

以上のように、ボクセル密度その他に関しての初期領域の適切な設定は、ＶＨ生成の高速化・高精度化に不可欠である。 As described above, appropriate setting of the initial region with respect to voxel density and the like is indispensable for increasing the speed and accuracy of VH generation.

しかしながら、いずれの従来技術も、図９に模式的に示すように、多視点カメラ画像に複数対象が撮影されている場合に、VH生成のための適切な初期領域（bounding box）として当該複数対象のそれぞれについて初期領域を得るようにすることができなかった。すなわち、図９では２つの対象がある場合の例が示され、従来手法では[2]のように２つの対象があっても１つの初期領域しか求めることができなかった。これは、複数領域の存在によってカメラ画像に複数対象のオクルージョンが発生すること等が原因であり、従来手法ではこのような場合に対処することができなかった。[1]に示すように、２つの対象のそれぞれに応じた初期領域を抽出することが望まれる。 However, in each of the conventional techniques, as shown schematically in FIG. 9, when a plurality of objects are captured in a multi-view camera image, the plurality of objects are used as an appropriate initial region (bounding box) for VH generation. Could not get an initial area. That is, FIG. 9 shows an example in which there are two objects, and in the conventional method, only one initial area can be obtained even if there are two objects as in [2]. This is because occlusion of a plurality of objects occurs in a camera image due to the presence of a plurality of regions, and the conventional method cannot cope with such a case. As shown in [1], it is desired to extract an initial region corresponding to each of two objects.

本発明は、上記従来技術の課題に鑑み、複数対象が撮影された多視点カメラ画像において対象のオクルージョンが発生するような場合でも、個別対象の空間領域をそれぞれ抽出できる領域抽出装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems of the related art, and provides an area extraction device and a program that can respectively extract a space area of an individual object even when occlusion of the object occurs in a multi-view camera image obtained by capturing a plurality of objects. The purpose is to do.

上記目的を達成するため、本発明は、領域抽出装置であって、数対象の撮影された多視点のカメラ画像における各カメラ画像より、対象を包含する１つ以上の画像領域を抽出する第一抽出部と、各カメラ画像において前記抽出された１つ以上の画像領域が複数対象のいずれに該当するかを、カメラ画像間での対象の対応関係と共に決定する対応取得部と、前記決定された対応関係で結ばれる各カメラ画像における前記抽出された画像領域に基づき、当該対応関係にある対象を包含する空間領域を抽出する第二抽出部と、を備えることを特徴とする。また、コンピュータを前記領域抽出装置として機能させるプログラムであることを特徴とする。 In order to achieve the above object, the present invention relates to an area extracting apparatus, which extracts one or more image areas including an object from each camera image of a plurality of captured multi-view camera images. An extraction unit, and a correspondence acquisition unit that determines which of the plurality of objects the one or more extracted image regions correspond to in each camera image, along with a correspondence relationship between the camera images. A second extraction unit configured to extract, based on the extracted image area in each camera image linked in correspondence, a spatial area including the target in the correspondence. Further, it is a program for causing a computer to function as the region extraction device.

本発明によれば、多視点カメラ画像において複数対象が撮影されている場合であっても、個別対象の空間領域をそれぞれ適切に抽出することができる。 ADVANTAGE OF THE INVENTION According to this invention, even when several objects are image | photographed in a multi-view camera image, the space area of an individual object can be each extracted appropriately.

一実施形態に係る領域抽出装置の機能ブロック図である。It is a functional block diagram of an area extraction device concerning one embodiment. 領域抽出装置の各部の処理を説明するための模式例を示す図である。FIG. 4 is a diagram illustrating a schematic example for describing processing of each unit of the region extraction device. 領域抽出装置の各部の処理を説明するための模式例を示す図である。FIG. 4 is a diagram illustrating a schematic example for describing processing of each unit of the region extraction device. 一実施形態に係る対応取得部の動作のフローチャートである。It is a flowchart of operation | movement of the correspondence acquisition part which concerns on one Embodiment. 第二抽出部の処理を説明するための図である。It is a figure for explaining processing of the 2nd extraction part. 第二抽出部の処理を説明するための図である。It is a figure for explaining processing of the 2nd extraction part. 本発明による効果を従来手法との対比で示す表である。5 is a table showing the effect of the present invention in comparison with a conventional method. VH生成における初期領域（bounding box）の模式図である。It is a schematic diagram of an initial area (bounding box) in VH generation. 従来手法の課題を説明するための模式図である。FIG. 9 is a schematic diagram for explaining a problem of the conventional method.

図１は、一実施形態に係る領域抽出装置10の機能ブロック図である。領域抽出装置10は、第一抽出部1、対応取得部2、第二抽出部3及び追加処理部4を備える。第一抽出部1はさらに、輪郭抽出部11、代表点決定部12、対象数決定部13及び画像領域決定部14を備える。各部の処理概要は以下の通りである。 FIG. 1 is a functional block diagram of an area extraction device 10 according to one embodiment. The region extraction device 10 includes a first extraction unit 1, a correspondence acquisition unit 2, a second extraction unit 3, and an additional processing unit 4. The first extracting unit 1 further includes a contour extracting unit 11, a representative point determining unit 12, a target number determining unit 13, and an image area determining unit 14. The processing outline of each unit is as follows.

第一抽出部1では、多視点カメラ画像として構成された各カメラ視点k(k=1,2, ..., N)の画像P(1), P(2), ..., P(N)を受け取り、各画像P(k)における対象を包含する１つ以上の領域を画像領域として、その関連情報（各対象の代表点位置の情報、対象数の情報など）と共に、対応取得部2及び第二抽出部3へ出力する。当該多視点カメラ画像には複数対象が撮影されている。当該多視点カメラ画像は映像における各時刻tの画像として構成することができるが、以下の説明ではある任意の時刻tにおける画像P(k)であるものとして、時刻への言及は省略する。 In the first extraction unit 1, images P (1), P (2), ..., P () of each camera viewpoint k (k = 1, 2, ..., N) configured as a multi-view camera image N), and at least one region including the target in each image P (k) is set as an image region, and its associated information (information on the representative point position of each target, information on the number of targets, etc.) and a corresponding acquisition unit 2 and output to the second extraction unit 3. A plurality of objects are photographed in the multi-view camera image. The multi-view camera image can be configured as an image at each time t in the video. However, in the following description, reference to time is omitted as an image P (k) at an arbitrary time t.

第一抽出部1の各部の処理概要は以下の通りである。輪郭抽出部11は、多視点カメラ画像の各カメラ画像P(k)より対象の輪郭を１つ以上抽出して、代表点決定部12、対象数決定部13及び画像領域決定部14へと出力する。代表点決定部12は、当該抽出された各対象の輪郭よりその代表点を決定して対応取得部2へ出力する。対象数決定部13は、当該抽出された輪郭の個数を各画像P(k)において調べることで、多視点カメラ画像において撮影されている複数対象の数を決定して、第二抽出部3へ出力する。画像領域決定部14は、当該抽出された輪郭を包含する領域として、各対象の画像領域を決定して対応取得部2及び第二抽出部3へと出力する。 The processing outline of each unit of the first extraction unit 1 is as follows. The contour extracting unit 11 extracts one or more target contours from each camera image P (k) of the multi-view camera image, and outputs the extracted contours to the representative point determining unit 12, the target number determining unit 13, and the image area determining unit 14. I do. The representative point determination unit 12 determines a representative point from the extracted outline of each object and outputs the representative point to the correspondence acquisition unit 2. The number-of-objects determination unit 13 determines the number of a plurality of objects captured in the multi-view camera image by examining the number of the extracted contours in each image P (k), and sends the number to the second extraction unit 3. Output. The image area determination unit 14 determines an image area of each target as an area including the extracted outline, and outputs the image area to the correspondence acquisition unit 2 and the second extraction unit 3.

対応取得部2では、第一抽出部1より得られた各画像P(k)の対象を包含する１つ以上の画像領域が、多視点カメラ画像全体において撮影されている複数対象のいずれに該当するかの特定を、異なるカメラ画像P(i),P(j)(i≠j)間での対象の対応関係の決定と共に行い、当該得られた対応関係を第二抽出部3へと出力する。 In the correspondence acquisition unit 2, one or more image regions including the object of each image P (k) obtained by the first extraction unit 1 correspond to any of a plurality of objects captured in the entire multi-view camera image. Is determined together with the determination of the target correspondence between the different camera images P (i), P (j) (i ≠ j), and the obtained correspondence is output to the second extraction unit 3. I do.

第二抽出部3では、対応取得部2で得られた各画像P(k)間での対象同士の対応関係と、画像領域決定部14で得られた各画像P(k)における各対象を包含する画像領域と、に基づき、多視点カメラ画像の撮影されているボクセル空間内に、各対象を包含する空間領域を決定して出力する。当該決定された空間領域は、視体積交差法によって対象のVH生成を行う際の初期領域として利用可能である。本発明によれば特に、多視点カメラ画像に複数の対象が撮影されており、一部のカメラ画像P(k)において対象のオクルージョンが発生し、その領域同士が重複等している場合であっても、各対象についてそれぞれ最適化されたサイズの空間領域を得ることができる。 In the second extraction unit 3, the correspondence between the objects among the images P (k) obtained by the correspondence acquisition unit 2, and each object in each image P (k) obtained by the image region determination unit 14. Based on the included image area, a spatial area including each object is determined and output in the voxel space where the multi-view camera image is captured. The determined spatial region can be used as an initial region when generating a target VH by the visual volume intersection method. According to the present invention, in particular, a case where a plurality of targets are photographed in a multi-view camera image, occlusion of the target occurs in some camera images P (k), and the regions overlap each other, etc. However, it is possible to obtain a spatial region having a size optimized for each object.

追加処理部4では、第二抽出部3の抽出した各対象の空間領域を用いて、任意の追加処理を行うことができる。追加処理として、各対象の空間領域から対象のVHを生成してもよいし、さらに当該生成されたVHを用いて、領域抽出装置10に入力された多視点カメラ画像（映像）に対応する自由視点映像を生成してもよい。追加処理部4におけるVH生成及び自由視点映像生成の処理には、任意の既存手法を利用してよい。 The additional processing unit 4 can perform any additional processing using the spatial region of each target extracted by the second extraction unit 3. As an additional process, a target VH may be generated from the spatial region of each target, and the generated VH may be used to freely correspond to a multi-view camera image (video) input to the region extraction device 10. A viewpoint video may be generated. For the processing of VH generation and free viewpoint video generation in the additional processing unit 4, any existing method may be used.

以下、図１の各部の処理の詳細を説明する。図２及び図３は、当該各部の処理によって得られるデータの模式的な例を図２において[1]〜[3]、図３において[4],[5]と分けてそれぞれ示す図であり、以下の説明において適宜参照する。図２及び図３では、多視点カメラ画像は複数対象の例として2人のサッカープレイヤーを撮影しており、領域抽出装置10に入力される多視点カメラ画像が8個のカメラ画像によって構成されている場合に、図１の各部で得られるデータの模式例がそれぞれ示されている。なお、図２及び図３ではカメラ1〜カメラ8の各画像において得られる各データが列挙して示されているが、図３の[4]だけ説明の便宜上、カメラ1〜カメラ8のデータの並び方をその他の[1]〜[3],[5]とは変えていることに注意されたい。 Hereinafter, the processing of each unit in FIG. 1 will be described in detail. FIG. 2 and FIG. 3 are diagrams showing schematic examples of the data obtained by the processing of the respective parts separately in [1] to [3] in FIG. 2 and [4] and [5] in FIG. , Will be referred to as appropriate in the following description. In FIGS. 2 and 3, the multi-view camera image captures two soccer players as an example of a plurality of targets, and the multi-view camera image input to the region extraction device 10 is configured by eight camera images. In this case, a schematic example of data obtained by each unit in FIG. 1 is shown. In FIGS. 2 and 3, each data obtained in each image of the camera 1 to the camera 8 is enumerated, but only [4] of FIG. Note that the arrangement is changed from the other [1] to [3], [5].

輪郭抽出部11では、多視点カメラ画像の各カメラ画像P(k)より１つ以上の対象の輪郭を抽出して、各部12,13,14へと出力する。当該輪郭抽出には任意の既存手法を利用してよく、例えば混合正規分布を用いた前景・背景の領域分割を用いてもよいし、映像としての多視点カメラ画像の時間変化（例えばオプティカルフローなど）を利用する手法を用いてもよいし、こうした既存手法を組み合わせてもよい。ある時刻に抽出した輪郭について以降の時刻に関してトラッキング（追跡）を行うようにしてもよい。領域抽出後にさらに、小さすぎる領域を近傍の大きな領域にマージする等のノイズ除去を施して、対象輪郭を得るようにしてもよい。 The contour extraction unit 11 extracts one or more target contours from each camera image P (k) of the multi-view camera image, and outputs the extracted contours to the units 12, 13, and 14. For the contour extraction, any existing method may be used, for example, foreground / background region division using a mixed normal distribution may be used, or a time change of a multi-view camera image as an image (for example, optical flow, etc.) ) May be used, or such existing methods may be combined. Tracking may be performed on a contour extracted at a certain time with respect to a subsequent time. After the region is extracted, the target contour may be obtained by performing noise removal such as merging an excessively small region with a nearby large region.

図２の[1]には、当該得られたマスク画像としての輪郭画像（シルエット画像）の例が示されており、カメラ1,3,4,5,7,8からは、2人のプレイヤーに関して画像上でその領域が分離された状態で2つの輪郭が抽出され、カメラ2,6からは、2人のプレイヤーが重なってしまっていることによるオクルージョンで、１つのみの輪郭が抽出されている。 FIG. 2 [1] shows an example of the obtained contour image (silhouette image) as a mask image, and two players from cameras 1, 3, 4, 5, 7, and 8 are shown. The two contours are extracted in a state where the area is separated on the image, and only one contour is extracted from the cameras 2 and 6 due to the occlusion due to the overlap of two players. I have.

代表点決定部12では、輪郭抽出部11で抽出された各画像の各輪郭について、その代表点を決定して、当該代表点の位置の情報を対応取得部2へと出力する。当該決定される代表点は、輪郭の領域に基づいて定義される任意種類の代表点を用いてよいが、例えば重心（セントロイド）として代表点を求めてもよい。 The representative point determination unit 12 determines a representative point for each contour of each image extracted by the contour extraction unit 11, and outputs information on the position of the representative point to the correspondence acquisition unit 2. As the determined representative point, any type of representative point defined based on the contour area may be used. For example, the representative point may be obtained as a center of gravity (centroid).

重心は具体的には例えば次のようにして計算することができる。カメラkの画像P(k)における輪郭画像（シルエット画像）をI_k(x,y)（当該輪郭に属する位置(x,y)では「１」を与え、当該輪郭に属さない位置(x,y)では「0」を与える）とすると、まず以下の式（１）で画像P(k)の対象o_k ^l（lは輪郭領域として定まる対象を識別するインデクスである）に関する0次及び1次の各モーメント特徴（(p,q)=(0,0), (1, 0), (0,1)）を求める。 The center of gravity can be specifically calculated, for example, as follows. The contour image (silhouette image) in the image P (k) of the camera k is given by I _k (x, y) (“1” is assigned to the position (x, y) belonging to the contour, and the position (x, When y) in providing a "0"), the object o _k ^l (l image P (k) is first in the following equation (1) is an index that identifies the object determined as a contour region) about zero and first The following moment features ((p, q) = (0,0), (1, 0), (0,1)) are obtained.

ここで、画像座標(x,y)においてはx軸を横方向、y軸を縦方向に取るものとし、W、Lについては当該対象o_k ^lを囲む矩形（カメラ画像P(k)の縦・横に平行な辺を有する矩形）の横幅及び縦幅である。以下に説明する画像領域決定部14において当該矩形を求めるので、W、Lの情報を取得することができる。（当該情報取得の流れは図１では省略している。）(x_l, y_l)は対象o_k ^lに属する画素位置であり、式（１）では計算高速化のためにx=4m, y=4nのように4ピクセルおきのみに間引いて計算を行っているが、間引きのピクセル数は任意に設定してよい。また、間引きを実施しなくともよい。 Here, vertical in image coordinates (x, y) is assumed to take lateral and y-axis in the vertical direction in the x-axis, W, for L surrounds the object o _k ^l rectangular (camera image P (k) A rectangle having sides parallel to each other). Since the rectangle is determined in the image area determination unit 14 described below, it is possible to acquire information on W and L. (Flow of the information acquisition is omitted in FIG. 1.) (X _l, y _l) is a pixel positions belonging to the target o _k ^l, x = 4m for formula (1) in computing speed, Although the calculation is performed by thinning out only every four pixels as in y = 4n, the number of pixels for thinning out may be set arbitrarily. Further, it is not necessary to perform the thinning.

次に、上記式（１）で求めた各モーメント特徴を用いることで、画像P(k)の対象o_k ^lの重心（セントロイド）位置(x(o_k ^l),y(o_k ^l))を以下の式（２）のように求めることができる。 Then, by using each moment characteristic obtained by the above formula (1), the center of gravity (centroid) position of the object o _k ^l image P (k) (x (o k l), y (o k l) ) Can be obtained as in the following equation (2).

図２では[3]に、代表点決定部12が代表点として決定した各対象の重心位置（セントロイド位置）の例が、その座標値と共に、その位置を「＋」印で表すことによって示されている。 In FIG. 2, [3] shows an example of the position of the center of gravity (centroid position) of each object determined by the representative point determination unit 12 as a representative point by indicating the position with a “+” sign along with its coordinate value. Have been.

対象数決定部13では、輪郭抽出部11が各カメラ画像P(k)において抽出した輪郭領域のうちの最大数として、多視点カメラ画像の全体において撮影されている対象数θ_sを決定し、当該決定した対象数を対応取得部13へと出力する。画像P(k)において抽出された対象数（輪郭領域の数であり、オクルージョン等があれば実際の数よりも減る）をθ_kと書くと、対象数は以下の式（３）で与えられる。例えば図２及び図３の例であれば、2プレイヤーが撮影されているため対象数θ_s=2である。 In the object number determination unit 13, the contour extraction unit 11 as the maximum number of contour regions extracted in each camera image P (k), to determine the target number theta _s has been taken in the entire multi-viewpoint camera image, The determined number of targets is output to the correspondence acquisition unit 13. If the number of objects extracted in the image P (k) (the number of contour areas, which is reduced from the actual number if there is occlusion, etc.) is written as θ _k , the number of objects is given by the following equation (3). . For example, in the example of FIGS. 2 and 3, the number of targets θ _s = 2 because two players are photographed.

上記対象数θsを自動算出するにあたって、本発明では次のような仮定を置いている。すなわち、複数対象が撮影されるシーンにおいては、対象の間でのオクルージョンの発生は避けようがない。しかしながら、多視点カメラ画像の取得において各カメラを異なる位置に設置しておくことで、少なくとも１つのカメラではオクルージョンが発生せずに全ての対象が分離して撮影されている、という仮定である。 In automatically calculating the number of objects θs, the present invention makes the following assumptions. That is, in a scene where a plurality of objects are photographed, occurrence of occlusion between the objects cannot be avoided. However, it is assumed that, by setting each camera at a different position in obtaining a multi-viewpoint camera image, at least one camera does not cause occlusion and all objects are separately photographed.

画像領域決定部14では、各画像P(k)に関して輪郭抽出部11でその輪郭が抽出された対象o_k ^lの包含領域を、対象o_k ^lの画像領域R(o_k ^l)（Region of Interest）として対応取得部2及び第二抽出部3へと出力する。当該包含する画像領域R(o_k ^l)は、輪郭を所定手法で膨張変形させて作ってもよいし、対象o_k ^lの輪郭を囲む矩形として作ってもよい。（当該矩形の情報は前述の式（１）の計算の際に参照される。）囲む矩形として作る場合は、画像P(k)の縦・横に矩形の縦・横が平行であり、輪郭を覆う最小の矩形として作ればよい。図２では[2]に、輪郭を囲む矩形として画像領域決定部14が求めた矩形の例が示されている。 The image area determination unit 14, the inclusion region of interest o _k ^l where the outline is extracted by the contour extraction unit 11 for each image P (k), the target o _k ^l of the image region _{^{R (o k l) (Region}} of (Interest) to the correspondence acquisition unit 2 and the second extraction unit 3. The encompassing image region R (o _k ^l) may be prepared by the outline is expanded and deformed by a predetermined method may be made as a rectangle enclosing the contour of the object o _k ^l. (The information of the rectangle is referred to when calculating the above-described formula (1).) When the rectangle is formed as a surrounding rectangle, the height and width of the rectangle are parallel to the height and width of the image P (k), and the outline is Should be made the smallest rectangle that covers. In FIG. 2, [2] shows an example of a rectangle obtained by the image area determination unit 14 as a rectangle surrounding the outline.

対応取得部2では、以上の第一抽出部1で得られた一連の情報を入力として、各画像P(k)においてその画像領域R(o_k ^l)が得られている対象o_k ^lが、別の画像P(k+1)におけるどの対象o_k+1 ^l'に該当するかの対応付けを、全てのカメラ画像の間において実施し、得られた対応関係を第二抽出部3へと出力する。すなわち、第一抽出部1で得られた時点の画像P(k)の対象o_k ^lは、別の画像P(k+1)ではどの対象に該当するかという対応関係は未知の状態である（対象のインデクスlが異なる画像間において整合していない状態である）が、対応取得部2によって当該対応関係が求められる。なお、当該求まる対応関係は、対象o_k ^lに紐づく画像領域R(o_k ^l)の対応関係でもある。 In correspondence acquisition section 2, as an input a set of information obtained by the first extraction unit 1 described above, the target o _k ^l where the image region R (o _k ^l) is obtained in each image P (k) In the other image P (k + 1), the correspondence of which object o _{k + 1} ^{l ′} is applicable is performed among all camera images, and the obtained correspondence is sent to the second extraction unit 3. Is output. That is, the target o _kl of the image P (k) obtained at the time of the first extraction unit 1 is in an unknown state in which the target o ^kl corresponds to another target in another image P (k + 1). (The target index l is not consistent between different images), but the correspondence relationship is obtained by the correspondence acquisition unit 2. Incidentally, the obtained correspondence relationship is also a corresponding relationship between the object o _k ^l brute string in the image region R (o _k ^l).

図４は、一実施形態に係る対応取得部2の動作のフローチャートである。 FIG. 4 is a flowchart of the operation of the correspondence acquisition unit 2 according to one embodiment.

ステップS1では、参照位置の初期化を行ってから、ステップS2へと進む。具体的には、前述の式（３）において最大値としての対象数θ_sを与えた画像P(k)（すなわち、オクルージョンが発生することなく全ての対象の輪郭が分離されている画像）を選び、当該画像P(k)に対して代表点決定部12が求めた各対象o_k ^lの代表点(x(o_k ^l),y(o_k ^l))(l=1,2, ..., θ_s)をそのまま、当該画像P(k)における各対象o^lの参照位置(S_x(o_k ^l),Sy(o_k ^l))として設定する。 In step S1, the reference position is initialized, and the process proceeds to step S2. Specifically, the image P (k) in which the number of objects θ _s is given as the maximum value in the above-described equation (3) (that is, an image in which the contours of all objects are separated without causing occlusion) is select, representative points of each target o _k ^l where the representative point determining unit 12 is obtained for the image P (k) (x (o k l), y (o k l)) (l = 1,2,. .., as it theta _s), (see the position of each object o ^l in _{_{k) (S x (o k}} l) the image P, and set as Sy (o _k ^l)).

なお、「参照位置」とは、当該画像P(k)をリファレンス（参照対象）として別の画像P(k+1)との間で対象同士の対応関係を求めるためのものであり、この名称を付している。またステップS1では当該参照位置が初めて求まることから、「初期化」と称している。 Note that the “reference position” is used to determine the correspondence between targets with another image P (k + 1) using the image P (k) as a reference (reference target). Is attached. In step S1, since the reference position is obtained for the first time, it is called "initialization".

ステップS1において、最大値としての対象数θ_sを与える画像P(k)が複数ある場合、一実施形態では、任意の一つをランダムに参照位置の設定用に用いればよい。 In step S1, when a plurality images give the object number θ _s P (k) is the maximum value, in one embodiment, may be used for setting the reference position randomly any one.

また、別の一実施形態として、最大値としての対象数θ_sを与える画像P(k)が複数ある場合、対象同士の画像上での空間的な分離が大きいもの（対象の画像領域同士がより広く離れているもの）を参照位置の設定用として決定するようにしてもよい。ここで、各画像P(k)における対象同士の画像上での空間的な分離は、例えば代表点決定部12が決定した代表点位置の分散として計算すればよい。当該求めた分散をさらに各画像P(k)における対象の画像領域（又は輪郭領域）の面積総和で割って規格化するようにしてもよい。また、代表点位置の分散ではなく、全ての画像領域（又は輪郭領域）ペア間の距離の総和として、空間的な分離を計算するようにしてもよい。以上のいずれかの手法で計算した空間的な分離が最大のものを参照位置の設定に用いてもよいし、当該空間的な分離が所定値以上となるものの中からランダムに参照位置の設定に用いるものを決定してもよい。 Further, as another embodiment, when there are a plurality of images P (k) that give the number of targets θ _s as the maximum value, a case where spatial separation on the images of the targets is large (the image regions of the targets are (Which are farther apart) may be determined for setting the reference position. Here, the spatial separation of the objects in the images P (k) on the images may be calculated, for example, as the variance of the representative point positions determined by the representative point determining unit 12. The obtained variance may be further normalized by dividing by the total area of the target image area (or contour area) in each image P (k). Alternatively, the spatial separation may be calculated not as the variance of the representative point positions but as the sum of the distances between all image region (or contour region) pairs. The one with the largest spatial separation calculated by any of the above methods may be used for setting the reference position, or the one where the spatial separation is equal to or more than a predetermined value may be used to randomly set the reference position. What to use may be determined.

以下、図４のフローの説明においては、説明の便宜上、k=1のP(1)がステップS1において参照位置の設定に用いられたものとし（すなわち、ステップS1の時点でk=1であるものとし）、最初に画像P(1)を参照して画像P(2)との対象の対応関係を求め、次いで画像P(2)を参照して画像P(3)との対象の対応関係を求め、同様に継続して画像P(k)を参照して画像P(k+1)との対応関係を求める、ということを繰り返すものとして説明する。 Hereinafter, in the description of the flow of FIG. 4, for convenience of explanation, it is assumed that P (1) of k = 1 is used for setting the reference position in step S1 (that is, k = 1 at the time of step S1). First, the correspondence between the object and the image P (2) is determined by referring to the image P (1), and then the correspondence between the object and the image P (3) is determined by referring to the image P (2). , And similarly, continuously referring to the image P (k) to determine the correspondence with the image P (k + 1).

ステップS2は、上記説明した通りの繰り返し処理が以降のステップS3〜S7で繰り返されることを示すためのダミーステップ（フローチャートの繰り返し構造を示すためのダミーステップ）であり、当該処理対象のP(k)を設定したうえでステップS3へと進む。 Step S2 is a dummy step (dummy step for indicating a repetitive structure of the flowchart) to indicate that the above-described repetitive processing is repeated in subsequent steps S3 to S7, and the P (k ) Is set, and the process proceeds to step S3.

ステップS3では、当該参照元としての画像P(k)における各対象o_k ^lの参照位置(S_x(o_k ^l), Sy(o_k ^l))の画像P(k+1)における対応位置(s'_x(o_k+1 ^l), s'_y(o_k+1 ^l))を、カメラ画像P(k),P(k+1)間の座標変換（透視変換、perspective transfomation）を行うホモグラフィ行列H_(k,k+1)で変換することで、以下の式（４）のようにして求めてから、ステップS4へと進む。 In step S3, the corresponding position in the reference position of each object _{^{_{o k l (S x (o}}} k l), Sy (o k l)) image P of the (k + 1) in the image P (k) as the reference source (s ' _x (o _{k + 1} ^l ), s' _y (o _{k + 1} ^l )) and coordinate transformation (perspective transfomation) between camera images P (k) and P (k + 1) The conversion is performed using the homography matrix H _{(k, k + 1)} to be obtained as in the following equation (4), and then the process proceeds to step S4.

ステップS4では、当該参照元としての画像P(k)における各対象o_k ^qの参照位置について上記式（４）で求めた画像P(k+1)における対応位置(s'_x(o_k+1 ^q), s'_y(o_k+1 ^q))と、画像P(k+1)の各対象o_k+1 ^lについて代表点決定部12で求められている代表位置(x(o_k+1 ^l),y(o_k+1 ^l))と、のオイラー距離（すなわち、qの取りうる数とlの取りうる数との積による組み合わせ数だけ、当該距離が存在する）をそれぞれ以下のように計算してから、ステップS5へと進む。なお、距離計算はオイラー距離（L1ノルム距離）以外の距離として、ユークリッド空間で定義される任意の距離（Lnノルム距離、L∞ノルム距離など）を用いてもよい。 In step S4, the target o in the image P (k) as the reference source _k for the reference position of the ^q corresponding positions in at determined image P (k + 1) the formula _{(4) (s' x (} o k + ₁ ^q ), s' _y (ok _{+ 1} ^q )) and the representative position (x (ok _k ) determined by the representative point determination unit 12 for each object ok _{+ 1} ^l of the image P (k + 1). ₊₁ ^l ), y (o _{k + 1} ^l )) and the Euler distance (that is, the number of combinations of the number of possible values of q and the number of possible values of l is the same) Then, the process proceeds to step S5. In the distance calculation, any distance (Ln norm distance, L∞ norm distance, etc.) defined in Euclidean space may be used as a distance other than the Euler distance (L1 norm distance).

ステップS5では、参照元としての画像P(k)における各対象o_k ^qに画像P(k+1)において対応するo_k+1 ^qを、上記ステップS4において求めた距離を最小にするものとして以下の式のように決定することで、画像P(k),P(k+1)間での対象同士の対応関係を求めてから、ステップS6へと進む。 In step S5, the corresponding o _{k + 1} ^q in each target o _k ^q in an image P in the image P (k) as a reference source (k + 1), as to minimize the distance obtained in step S4 By determining as in the following equation, the correspondence between the objects between the images P (k) and P (k + 1) is obtained, and then the process proceeds to step S6.

ステップS6では、画像P(k+1)における各対象o_k+1 ^qの参照位置を設定してから、ステップS7へと進む。ここで、対象o_k+1 ^qの対応が１対１であった場合、すなわち、画像P(k)におけるただ１つだけの対象がステップS6において対象o_k+1 ^qに対応すると判定されている場合は、対象o_k+1 ^qについて代表点決定部12で求められている代表位置(x(o_k+1 ^q),y(o_k+1 ^q))をそのまま参照位置として設定する。一方、1対１でない場合、すなわち、画像P(k)における２つ以上の対象o_k ^qがステップS6において対象o_k+1 ^qに対応すると判定されている場合は、当該対応する２つ以上の対象o_k ^q位置を前述の式（４）によって画像P(k+1)上へと変換した２つ以上の位置(s'_x(o_k+1 ^q), s'_y(o_k+1 ^q))を、当該2つ以上の対象のそれぞれの参照位置として設定する。 In step S6, after setting the reference position of each object o _{k + 1} ^q in the image P (k + 1), the process proceeds to step S7. Here, when the corresponding object o _{k + 1} ^q is 1 to 1, i.e., subject only only one in the image P (k) is determined corresponding to the target o _{k + 1} ^q In step S6 In this case, the representative position (x (ok ₊ ^1q ), y (ok ₊ ^1q )) obtained by the representative point determination unit 12 for the object ok ₊ ^1q is set as a reference position. On the other hand, if it is not one-to-one, i.e., if more than one object o _k ^q in the image P (k) is determined corresponding to the target o _{k + 1} ^q In step S6, the corresponding two or more object o _k ^q positions of the aforementioned formula (4) by the image P (k + 1) 2 or more positions was converted to upper (s of _{_{^{'x (o k + 1 q}}} ), s' y (o k + ₁ ) Set ^q )) as the reference position of each of the two or more objects.

ステップS7では、以上のステップS2〜S6の繰り返しが全視点のカメラ画像P(k)について実施済みであるか否かを判定し、実施済みであれば当該フローは終了する。当該フロー全体が終了することで、対応取得部2は全てのカメラ画像P(k)間での間の対象の対応関係を取得した状態となり、当該得られた対応関係を第二抽出部3へと出力する。一方、ステップS7で全てのカメラ画像P(k)について処理が未完了であれば、ステップS2へと戻り、次の画像P(k+1)を処理対象に設定したうえで、当該ステップS2〜S7を繰り返す。以上の説明より明らかなように、ステップS6で画像P(k+1)について設定された参照位置が、次の繰り返し処理において次の画像P(k+2)との対応関係を得るために利用されることとなる。 In step S7, it is determined whether or not the repetition of the above steps S2 to S6 has been performed for the camera images P (k) of all viewpoints. If the repetition has been performed, the flow ends. When the entire flow is completed, the correspondence acquisition unit 2 acquires a correspondence between the objects between all the camera images P (k), and the acquired correspondence is sent to the second extraction unit 3. Is output. On the other hand, if the processing has not been completed for all camera images P (k) in step S7, the process returns to step S2, and after setting the next image P (k + 1) as a processing target, the processing in steps S2 to Repeat S7. As is clear from the above description, the reference position set for the image P (k + 1) in step S6 is used for obtaining the correspondence with the next image P (k + 2) in the next repetition processing. Will be done.

図３では[4]に、カメラ画像1,2間、2,3間、…7,8間において「プレイヤー１」に対応する対象について、上記のステップS3で変換位置を求めた例が示されている。ここで、「＋」印が変換される元の参照位置であり、「×」印が参照位置を変換して得られる位置を表している。図３では[5]に、以上の図４のフローによってカメラ画像1〜8の全体に渡って「プレイヤー１」の領域と「プレイヤー２」の領域との区別が得られた例が示されている。カメラ画像2,6においてはオクルージョンが発生している１つの領域が「プレイヤー１」であり且つ同時に「プレイヤー２」の領域として決定されていることに注意されたい。（ただし図４のステップS6において前述の通り、カメラ画像2,6においても「プレイヤー１」の代表点と「プレイヤー２」の代表点とは別のものとして求まることとなる。） In FIG. 3, [4] shows an example in which the conversion position is obtained in step S3 for the object corresponding to “player 1” between camera images 1, 2, 2, 3,... ing. Here, the “+” mark indicates the original reference position to be converted, and the “x” mark indicates the position obtained by converting the reference position. In FIG. 3, [5] shows an example in which the area of “player 1” and the area of “player 2” are obtained over the entire camera images 1 to 8 by the flow of FIG. I have. Note that in the camera images 2 and 6, one area where occlusion has occurred is “player 1” and is also determined as the area of “player 2” at the same time. (However, as described above in step S6 in FIG. 4, the representative points of "player 1" and the representative points of "player 2" are obtained as different points in camera images 2 and 6 as well.)

第二抽出部3は、以上の対応取得部2で各画像P(k)間での対応が取得された各対象o_k ^qの画像領域R(o_k ^q)（当該画像領域は画像領域決定部14から得られる）を入力として、各対象o^qをボクセル空間上で包含する空間領域Vol(o^q)を求め、領域抽出装置10からの出力となす。空間領域Vol(o_q)の取得は、各画像P(k)において画像領域R(o_k ^q)の内部に空間領域Vol(o_q)が収まる関係を満たすような任意の既存手法を利用することができる。 Second extraction unit 3, or more image regions R (o _k ^q) of each target o _k ^q corresponding is acquired at the corresponding acquisition unit 2 between the image P (k) (the image region image region determination (Obtained from the unit 14) as an input, a spatial region Vol (o ^q ) that includes each object o ^q in the voxel space is obtained, and is output from the region extracting device 10. Obtaining spatial domain Vol (o _q) utilize any existing technique that satisfies the relationship of spatial domain Vol (o _q) falls within each image P (k) in the image region R (o _k ^q) be able to.

また、第二抽出部3では以下の手法のようにして各対象o^qの空間領域Vol(o_q)を求めてもよい。図５及び図６は当該手法を説明するための模式図である。図５では、カメラkのカメラ中心(c_k(x),c_k(y),c_k(z))から画像P(k)におけるある対象を囲む矩形（画像領域）の４頂点へと向かうベクトル（光線）がそれぞれl_k ¹,l_k ²,l_k ³,l_k ⁴として、当該矩形の右上頂点に対応するl_k ¹から始まる時計回りの順で示されている。画像P(k+1)においても画像P(k)との対応対象を囲む矩形について同様のものが示されている。図５に示すように、カメラk+1の光線l_k+1 ¹,l_k+1 ²によって平面P_(k,k+1)が張られる（その法線ベクトルを(n₁,n₂,n₃)とする）と共に、当該平面P_(k,k+1)とカメラkの光線l_k ³,l_k ⁴との交点としてそれぞれ点p³ _(k,k+1)(x,y,z)及び点p⁴ _(k,k+1)(x,y,z)が得られる。以上の表記により、当該交点p³ _(k,k+1)(x,y,z)及び点p⁴ _(k,k+1)(x,y,z)をそれぞれ以下の式(5),(6)のように求めることができる。 Further, the second extraction unit 3 may obtain the space area Vol (o _q ) of each object o ^{q in} the following manner. FIG. 5 and FIG. 6 are schematic diagrams for explaining the method. In FIG. 5, the direction from the camera center (c _k (x), c _k (y), c _k (z)) of the camera k to four vertices of a rectangle (image area) surrounding a certain object in the image P (k) is shown. The vectors (light rays) are shown as l _k ¹ , l _k ² , l _k ³ , and l _k ⁴ , respectively, in a clockwise order starting from l _k ¹ corresponding to the upper right vertex of the rectangle. In the image P (k + 1), the same thing is shown for the rectangle surrounding the target corresponding to the image P (k). As shown in FIG. 5, a plane P _{(k, k + 1)} is extended by the rays l _{k + 1} ¹ and l _{k + 1} ² of the camera k + 1 (the normal vector is represented by (n ₁ , n ₂ , n ₃ )), and the point p ³ _{(k, k + 1)} (x, y, as the intersection of the plane P _{(k, k + 1)} and the rays l _k ³ , l _k ⁴ of the camera k. z) and the point p ⁴ _{(k, k + 1)} (x, y, z) are obtained. With the above notation, the intersection point p ³ _{(k, k + 1)} (x, y, z) and the point p ⁴ _{(k, k + 1)} (x, y, z) are respectively expressed by the following equations (5), It can be obtained as in (6).

さらに、以下の式(7)によって各対象o^qの空間領域Vol(o^q)をボクセル空間内の矩形領域Vol(o^q)={(x,y,z)|x_min≦x≦x_max, y_min≦y≦y_max, z_min≦z≦z_max}として求めることができる。図６は、当該求める矩形領域をボクセル空間のうち(x,y)平面を切り取って模式的に示すものであり、8個のカメラ中止からの射影（projection）が行われている。 Furthermore, the following formula (7) is used to ^convert the spatial region Vol (o ^q ) of each object o ^q into a rectangular region Vol (o ^q ) = {(x, y, z) | x _min ≦ x ≦ x _max in the voxel space. , y _min ≦ y ≦ y _max , z _min ≦ z ≦ z _max }. FIG. 6 schematically shows the rectangular area to be obtained by cutting out the (x, y) plane in the voxel space, and projections from eight camera stops are performed.

以上、本発明によれば、多視点カメラ画像に複数対象が撮影されている場合であっても、複数対象のそれぞれの体積領域を区別して求めることができる。図７に本発明の効果を、式(7)の出力の形で示す。図７の例は図３，４のプレイヤー１，２の2人が対象として含まれている場合に本発明を適用した結果と、本発明を適用しない従来手法の結果（プレイヤー１，２を区別せずこれら全体を包含する体積領域を求める場合）と、が対比して示されており、従来手法の体積に比べてプレイヤー１，２の領域がそれぞれ1/7, 1/8で抽出されていることが見て取れる。 As described above, according to the present invention, even when a plurality of objects are captured in a multi-view camera image, it is possible to separately determine the volume regions of the plurality of objects. FIG. 7 shows the effect of the present invention in the form of the output of equation (7). The example of FIG. 7 shows a result of applying the present invention when two players 1 and 2 of FIGS. 3 and 4 are included as a target and a result of the conventional method without applying the present invention (the players 1 and 2 are distinguished). In the case where the volume area encompassing the whole is obtained without using the above method), the areas of the players 1 and 2 are extracted by 1/7 and 1/8, respectively, compared to the volume of the conventional method. You can see that there is.

以下、本発明の補足事項を説明する。 Hereinafter, supplementary items of the present invention will be described.

（１）図４のステップS3において変換位置を求めるためのホモグラフィ行列H_(k,k+1)は、各カメラk,k+1間でキャリブレーションをしておくことで所定行列として予め求めておいたものを利用すればよい。図５，６で説明した計算に必要な情報も、当該キャリブレーションの際に取得しておけばよい。また、当該キャリブレーションを行うことなく、画像P(k),P(k+1)間で点対応を求めることによって既存手法により、その場で取得するようにしてもよい。 (1) The homography matrix H _{(k, k + 1)} for obtaining the conversion position in step S3 in FIG. 4 is obtained in advance as a predetermined matrix by performing calibration between the cameras k and k + 1. You can use what you have set. Information necessary for the calculations described in FIGS. 5 and 6 may be acquired at the time of the calibration. Further, the point correspondence may be obtained between the images P (k) and P (k + 1) without performing the calibration, and the point correspondence may be obtained on the spot by the existing method.

（２）図４のフローを行うための画像P(1),P(2), ..., P(N)の順序は、ステップS1で設定される画像に応じて、また、各カメラの位置関係に応じて、予め所定順番を定めておけばよい。図４では説明の便宜上、P(1)が参照位置の初期化用に設定されてP(1)→P(2)→P(3)→P(4)→P(5)→P(6)→P(7)→P(8)の順に対応関係を求めた。同様に、例えばP(4)が参照位置の初期化用に設定された場合であれば、P(4)→P(5)→P(6)→P(7)→P(8)→P(1)→P(2)→P(3)の順で対応関係を求めてもよいし、P(4)→P(3)→P(2)→P(1)→P(8)→P(7)→P(6)→P(5)の順で対応関係を求めてもよい。 (2) The order of the images P (1), P (2),..., P (N) for performing the flow of FIG. 4 depends on the image set in step S1, and A predetermined order may be determined in advance according to the positional relationship. In FIG. 4, for convenience of explanation, P (1) is set for initializing the reference position, and P (1) → P (2) → P (3) → P (4) → P (5) → P (6) ) → P (7) → P (8). Similarly, for example, if P (4) is set for initializing the reference position, P (4) → P (5) → P (6) → P (7) → P (8) → P (1) → P (2) → P (3) in order, or P (4) → P (3) → P (2) → P (1) → P (8) → The correspondence may be obtained in the order of P (7) → P (6) → P (5).

（３）本発明は、コンピュータを領域抽出装置10として機能させるプログラムとしても提供可能である。当該コンピュータには、CPU(中央演算装置)、メモリ及び各種I/Fといった周知のハードウェア構成のものを採用することができ、CPUが領域抽出装置10の各部の機能に対応する命令を実行することとなる。 (3) The present invention can also be provided as a program that causes a computer to function as the region extraction device 10. The computer may have a well-known hardware configuration such as a CPU (Central Processing Unit), a memory, and various I / Fs, and the CPU executes an instruction corresponding to a function of each unit of the area extraction device 10. It will be.

10…領域抽出装置、1…第一抽出部、2…対応取得部、3…第二抽出部 10… Area extraction device, 1… First extraction unit, 2… Correspondence acquisition unit, 3… Second extraction unit

Claims

A first extraction unit that extracts one or more image regions including the target from each camera image in the captured multi-viewpoint camera images of the plurality of targets;
A correspondence acquisition unit for determining which of the plurality of objects the one or more extracted image regions correspond to in each camera image, together with the correspondence of the objects between camera images;
Based on the extracted image region in each camera image linked by the determined correspondence, based on the second extraction unit that extracts a spatial region that includes the target in the corresponding relationship ,
In the correspondence acquisition unit, by calculating the correspondence between the positions of the representative points of the objects in the image area, between different camera images, the correspondence of the objects between the camera images is obtained,
The correspondence obtaining unit obtains a correspondence between positions of the representative points of the object in the image area between different camera images in a predetermined order determined between the camera images, and obtains the obtained correspondence. When it is determined that the overlap of the target occurs in the image area in the first camera image, the representative position of the target in the second camera image for which the correspondence relationship with the first camera image has been obtained is set to the first camera image. by converting the coordinate, area extraction apparatus according to claim Rukoto set representative points of the target in the first camera image.

A first extraction unit that extracts one or more image regions including the target from each camera image in the captured multi-viewpoint camera images of the plurality of targets;
A correspondence acquisition unit for determining which of the plurality of objects the one or more extracted image regions correspond to in each camera image, together with the correspondence of the objects between camera images;
Based on the extracted image region in each camera image linked by the determined correspondence, based on the second extraction unit that extracts a spatial region that includes the target in the corresponding relationship ,
In the correspondence acquisition unit, by calculating the correspondence between the positions of the representative points of the objects in the image area, between different camera images, the correspondence of the objects between the camera images is obtained,
And in the correspondence acquisition section, as a representative point of the object in the image region, region extraction apparatus which is characterized that you adopt the centroid of the target contour area.

A first extraction unit that extracts one or more image regions including the target from each camera image in the captured multi-viewpoint camera images of the plurality of targets;
A correspondence acquisition unit for determining which of the plurality of objects the one or more extracted image regions correspond to in each camera image, together with the correspondence of the objects between camera images;
Based on the extracted image region in each camera image linked in the determined correspondence, based on the second extraction unit that extracts a spatial region that includes the target in the corresponding relationship ,
In the correspondence acquisition unit, by calculating the correspondence between the positions of the representative points of the objects in the image area, between different camera images, the correspondence of the objects between the camera images is obtained,
In the correspondence acquisition unit, a maximum number of the image areas extracted by the first extraction unit in each camera image is determined as the number of the plurality of objects, and one of the camera images from which the maximum number is extracted is set. By setting the position of the representative point of the object in the reference position as a reference position and obtaining the corresponding position of the reference position in the other camera image, the correspondence of the position of the representative point of the object in the image area between the different camera images is obtained. area extraction apparatus according to claim Rukoto obtained relation.

In the correspondence acquisition unit, when there are a plurality of camera images that give the maximum number, from among the camera images that are determined to have a large spatial separation between the image regions extracted by the first extraction unit, 4. The region extracting apparatus according to claim 3 , wherein one camera image for setting the reference position is determined.

A first extraction unit that extracts one or more image regions including the target from each camera image in the captured multi-viewpoint camera images of the plurality of targets;
A correspondence acquisition unit for determining which of the plurality of objects the one or more extracted image regions correspond to in each camera image, together with the correspondence of the objects between camera images;
Based on the extracted image region in each camera image linked by the determined correspondence, based on the second extraction unit that extracts a spatial region that includes the target in the corresponding relationship ,
In the first extraction unit, the image area is obtained as a rectangular area,
In the second extraction unit, two vertices of the rectangular area of the first camera image from the camera center of the first camera image based on the image area of the extracted rectangular area in each camera image connected in the determined correspondence relationship The two intersections of the two straight lines extending to and the plane on which the camera center of the second camera image and the two vertices of the rectangular area of the second camera image ride are found, and a series of the first camera image and the second camera image are obtained. based on a set of two-intersection to which the determined between, area extraction apparatus characterized that you extracted spatial region encompassing subjects at the corresponding relationship.

The first extraction unit extracts a contour of the object from each camera image, the area extracting device according to any one of claims 1 and extracts the image area as an area encompassing the contour 5 .

The correspondence obtaining unit obtains the correspondence between the representative points of the objects in the image area and the position between different camera images, thereby obtaining the correspondence between the camera images. Item 7. The region extraction device according to any one of Items 1 to 6 .

And in the correspondence acquiring section, the region extraction device according to correspondence relationship between the position between the different camera images, to claim 1, 2, 3, 4 or 7, wherein the determination by the conversion by the homography matrix.

A program for causing a computer to function as the region extraction device according to any one of claims 1 to 8 .