JP2020035218A

JP2020035218A - Image processing device, method, and program

Info

Publication number: JP2020035218A
Application number: JP2018161868A
Authority: JP
Inventors: 敬介野中; Keisuke Nonaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2020-03-05
Anticipated expiration: 2038-08-30
Also published as: JP7045964B2

Abstract

To provide an image processing device, capable of appropriately modelling an object moving at high speeds, with an allowance for possible presence of deviations within certain accuracy, concerning clock synchronization in a multi-viewpoint video.SOLUTION: Provided is an image processing device 10 comprising a generation section 2 for generating a three-dimensional model of a subject by applying a visual volume intersection method to each image of a multi-viewpoint image, and the generation section 2 generates the three-dimensional model after easing an intersection determination for the visual volume intersection method as to a space where the velocity of subjects is large. Moreover, an estimation section 1 for estimating a velocity field in each of the images of the multi-viewpoint image is further comprised, and the generation section 2 generates the three-dimensional model after easing the intersection determination of the visual volume intersection method as to a space, where the velocity of subjects is large, the space being determined from the velocity field estimated in each of the images.SELECTED DRAWING: Figure 1

Description

本発明は、多視点映像における時刻同期に関して一定精度内でのズレが存在しうることを許容したうえで、高速に動く物体を適切にモデル化することが可能な画像処理装置、方法及びプログラムに関する。 The present invention relates to an image processing apparatus, a method, and a program capable of appropriately modeling a fast-moving object while allowing a deviation within a certain precision with respect to time synchronization in a multi-view video. .

従来、スポーツシーンなどを対象として、カメラで撮影されていない自由な視点からの映像（以下、自由視点映像）を生成する技術が提案されている。この技術は複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 2. Description of the Related Art Conventionally, there has been proposed a technique for generating an image from a free viewpoint not captured by a camera (hereinafter, a free viewpoint image) for a sports scene or the like. This technology combines images taken from multiple cameras with virtual viewpoints where they are not arranged, and displays the results on the screen to enable viewing from various viewpoints. It is assumed that.

ここで、自由視点映像合成技術のうち、視体積交差法と呼ばれる原理を利用して、被写体の3次元コンピュータグラフィクス（3DCG）モデルを生成することで高品質な自由視点映像を合成する既存技術が存在する（非特許文献１）。この方式では、複数のカメラから得られる被写体の概形情報を3次元空間に逆投影し、それらを膨大な数の点群データに記述し、被写体の概形を精緻に再現するものである。あらかじめ生成された被写体の3DCGモデルを入力として、仮想視点の位置を決めてディスプレイ上にレンダリングすることで、自由視点映像が生成される。この他に、点群データを介さずに仮想的な平面群を用いて視体積交差法を実現する技術が提案されている（特許文献１）。 Here, among the free viewpoint video synthesis technologies, there is an existing technology that synthesizes high-quality free viewpoint video by generating a three-dimensional computer graphics (3DCG) model of the subject using a principle called the volume intersection method. Exists (Non-Patent Document 1). In this method, outline information of an object obtained from a plurality of cameras is back-projected into a three-dimensional space, and the information is described in an enormous number of point cloud data to precisely reproduce the outline of the object. A 3D CG model of a subject generated in advance is input, and the position of a virtual viewpoint is determined and rendered on a display, thereby generating a free viewpoint video. In addition, a technique has been proposed for realizing the visual volume intersection method using a virtual plane group without using the point cloud data (Patent Document 1).

以上のように、複数のカメラ映像から被写体を3DCGモデル化し、任意の視点の仮想映像を合成する発明は複数提案されているが、それらの多くが視体積交差法の原理に則っていることがわかる。 As described above, a plurality of inventions have been proposed in which a 3DCG model of a subject is formed from a plurality of camera images and a virtual image of an arbitrary viewpoint is synthesized, but many of them are based on the principle of the volume intersection method. Understand.

特願2017-167472号Japanese Patent Application No. 2017-167472

Laurentini, A. "The Visual Hull Concept for Silhouette Based Image Understanding."IEEE PAMI, 16,2 (1994), 150-162Laurentini, A. "The Visual Hull Concept for Silhouette Based Image Understanding." IEEE PAMI, 16,2 (1994), 150-162.

多様なシーンにおける被写体の3DCGモデル化を可能とする視体積交差法であるが、その適用には「（１）原則、被写体を全てのカメラで捉えていること」、「（２）正確なカメラキャリブレーションができていること」、「（３）複数のカメラ間において時刻同期が取れていること」、といった３つの前提条件が存在する。ここで、（１）、（２）の条件については、カメラの設置時の位置調整やキャリブレーション用マーカーの配置といった比較的簡易な解決策を講じることが可能である。 The visual volume intersection method enables 3DCG modeling of subjects in various scenes, but its application is "(1) In principle, subjects must be captured by all cameras", "(2) Accurate camera There are three prerequisites, such as "calibration is completed" and "(3) time synchronization is established between a plurality of cameras." Here, with respect to the conditions (1) and (2), it is possible to take relatively simple solutions such as position adjustment at the time of installing a camera and arrangement of a calibration marker.

一方で、（３）の時刻同期については、被写体シーンによっては厳密なものが求められる。例えば、被写体人物がゆっくりと動く場合は、前後のフレーム間での映像の変化が少ないため、数フレームの同期ずれが起こった場合でも品質劣化を起こすことはない。しかしながら、野球のボールなど高速に動く物体をモデル化する際には、1フレームのズレが実空間での数十cmのズレに相当するため、視体積の交差する箇所のズレによって不適切な形状が得られる等の大きな品質劣化に繋がる。あるいはさらに、当該ズレがさらに大きい場合には全カメラによる視体積が交差する箇所が消失してしまい、モデル化自体が行われないという問題も起こりうる。 On the other hand, the time synchronization of (3) is required to be strict depending on the subject scene. For example, when the subject person moves slowly, there is little change in the video between the previous and next frames, so that even if a synchronization shift of several frames occurs, the quality does not deteriorate. However, when modeling a fast-moving object such as a baseball ball, the displacement of one frame is equivalent to a displacement of several tens of centimeters in real space. Leads to significant quality deterioration such as Alternatively, if the deviation is even larger, the intersection of the viewing volumes of all cameras disappears, which may cause a problem that modeling itself is not performed.

この問題を時刻同期の精度を高めて解決するには、Genlockなどの同期用の信号を受ける専用機材が必要であるが、一般に高額なものとなる。またその場合でも、カメラから映像信号を送るケーブル長の違いや映像処理側のサーバプログラムの処理順によって、1フレーム程度のズレは起こりうるため、完全に解決することは難しく、これらのズレを許容するように映像処理側で対策を行う必要がある。しかしながら、従来技術においてはこのような対策は提供されていなかった。 In order to solve this problem by increasing the accuracy of time synchronization, a dedicated device for receiving a synchronization signal such as Genlock is required, but it is generally expensive. Even in such a case, it is difficult to completely resolve this, because a difference of about 1 frame can occur depending on the difference in the cable length for sending the video signal from the camera and the processing order of the server program on the video processing side. It is necessary to take measures on the video processing side so that However, such measures have not been provided in the prior art.

上記のような従来技術の課題に鑑み、本発明は、多視点映像における時刻同期に関して一定精度内でのズレが存在しうることを許容したうえで、高速に動く物体を適切にモデル化することが可能な画像処理装置、方法及びプログラムを提供することを目的とする。 In view of the above-mentioned problems of the related art, the present invention appropriately models a fast-moving object while allowing a deviation within a certain precision with respect to time synchronization in a multi-view video. It is an object of the present invention to provide an image processing apparatus, a method, and a program that can perform the processing.

上記目的を達成するため、本発明は、多視点画像の各画像に視体積交差法を適用して被写体の３次元モデルを生成する生成部を備える画像処理装置であって、前記生成部は、被写体の速度の大きい空間に関しては視体積交差法の交差判定を緩和したうえで前記３次元モデルを生成することを第一の特徴とする。また、本発明は、多視点画像の各画像に視体積交差法を適用して被写体の３次元モデルを生成する生成部を備える画像処理装置であって、前記多視点画像の各画像において速度場を推定する推定部と、前記多視点画像の各画像より前景と背景とを区別したマスクを抽出したうえでさらに、前景を前記推定した速度場に応じて膨張させるように当該マスクを加工する抽出部と、をさらに備え、前記生成部では、当該加工されたマスクを対象として視体積交差法を適用して被写体の３次元モデルを生成することを第二の特徴とする。また、当該第一及び第二の特徴に係る装置に対応する方法及びプログラムであることを特徴とする。 In order to achieve the above object, the present invention is an image processing apparatus including a generation unit that generates a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-view image, wherein the generation unit includes: The first feature is that the three-dimensional model is generated in a space where the speed of the subject is high, after the intersection determination by the visual volume intersection method is eased. The present invention is also an image processing apparatus including a generation unit configured to generate a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-viewpoint image, wherein a speed field is included in each image of the multi-viewpoint image. And an extraction unit that extracts a mask that distinguishes foreground and background from each image of the multi-viewpoint image, and further processes the mask to expand the foreground according to the estimated velocity field. And a generation unit that generates a three-dimensional model of the subject by applying the volume intersection method to the processed mask. Further, the present invention is a method and a program corresponding to the device according to the first and second features.

本発明の第一の特徴によれば、被写体の速度の大きい空間に関しては視体積交差法の交差判定を緩和することにより、多視点映像における時刻同期に関して一定精度内でのズレが存在する場合であっても、高速に動く物体を適切にモデル化することが可能となる。本発明の第二の特徴によれば、被写体の速度が大きいことが想定される前景に関してはその前景を膨張させて加工したマスクを利用して視体積交差法を適用することにより、多視点映像における時刻同期に関して一定精度内でのズレが存在する場合であっても、高速に動く物体を適切にモデル化することが可能となる。 According to the first feature of the present invention, in a space where the speed of the subject is large, the intersection determination of the visual volume intersection method is relaxed, so that there is a deviation within a certain accuracy with respect to time synchronization in a multi-view video. Even so, it is possible to appropriately model an object moving at high speed. According to the second aspect of the present invention, for a foreground in which the speed of a subject is assumed to be high, a multi-viewpoint image is applied by applying a volume intersection method using a mask processed by expanding the foreground. It is possible to appropriately model a fast-moving object even if there is a deviation within a certain precision with respect to time synchronization in.

一実施形態に係る画像処理装置の機能ブロック図である。FIG. 2 is a functional block diagram of the image processing apparatus according to one embodiment. 抽出部での処理の模式例を示す図である。It is a figure showing the model example of the processing in an extraction part. 抽出部での処理の模式例を示す図である。It is a figure showing the model example of the processing in an extraction part.

図１は、一実施形態に係る画像処理装置の機能ブロック図である。図示する通り、画像処理装置10は、推定部1、生成部2、合成部3、抽出部4及び校正部5を備える。ここで、生成部2は判定部21及び交差部22を備える。 FIG. 1 is a functional block diagram of an image processing apparatus according to one embodiment. As illustrated, the image processing apparatus 10 includes an estimation unit 1, a generation unit 2, a synthesis unit 3, an extraction unit 4, and a calibration unit 5. Here, the generation unit 2 includes a determination unit 21 and an intersection 22.

図示する通り、画像処理装置10はその全体的な動作として、多視点映像の各時刻t（t=1,2, …）の多視点画像Pi(t)（i=1,2, …, N；すなわち、iは全N台ある中から当該画像Pi(t)を撮影するカメラCiを指定するインデクスである。）を推定部1、合成部3、抽出部4及び校正部5において読み込み、生成部2において時刻tにおける多視点画像内の被写体の３次元モデルMD(t)を生成したうえで、ユーザ指定等により与えられる時刻tの仮想視点VP(t)での自由視点画像FR(t)を合成部3において合成することにより、各時刻tでの自由視点画像FR(t)として自由視点映像を合成することができる。 As shown in the drawing, the image processing apparatus 10 has a multi-view image Pi (t) (i = 1, 2,..., N) at each time t (t = 1, 2,. That is, i is an index for designating a camera Ci that captures the image Pi (t) from among all N units.) The estimation unit 1, the synthesis unit 3, the extraction unit 4, and the calibration unit 5 read and generate After generating a three-dimensional model MD (t) of the subject in the multi-view image at time t in the unit 2, the free viewpoint image FR (t) at the virtual viewpoint VP (t) at time t given by the user or the like Are synthesized by the synthesizing unit 3 so that a free viewpoint video can be synthesized as the free viewpoint image FR (t) at each time t.

ここで、入力される各カメラCiの多視点画像Pi(t)の時刻t（=t_[i]とする）とは、対応する多視点映像を所定の映像撮影システム等により得た際に、各カメラCi間において所定精度内で同期しているものとして、時系列上のフレーム番号として付与された時刻t_[i]である。従って例えば、カメラC1の画像P1(t_[1])に付与されている時刻t_[1]と、カメラC2の画像P2(t_[2])に付与されている時刻t_[2]との間には、現実に撮影された時刻に関して所定精度内において時刻ズレが存在しうる。例えば、時刻t_[1],t_[2]をフレーム番号ではなく現実に撮影された時刻として見ると、1フレーム分のズレがあり「t_[1]=t_[2]+1」という関係が成立していることがありうる。また、当該ズレは一般には必ずしも整数単位フレーム分ではなく、小数単位フレーム分となりうるものである。 Here, the time t (= t _[i] ) of the input multi-view image Pi (t) of each camera Ci is defined as a time when a corresponding multi-view image is obtained by a predetermined video shooting system or the like. Time t _[i] assigned as a time-series frame number assuming that the cameras Ci are synchronized within a predetermined accuracy. During Thus, for example, an image P1 of the camera C1 (t _[1]) time t _[1] which is applied to an image P2 of the camera C2 (t _[2]) time granted to t _[2] , There may be a time lag within a predetermined accuracy with respect to the time at which the image was actually taken. For example, if the times t _[1] and t _[2] are not the frame numbers but the actual shooting times, there is a shift of one frame, and the relationship “t _[1] = t _[2] +1” is obtained. It may be true. In general, the deviation is not necessarily an integer unit frame, but may be a decimal unit frame.

本発明によればこのようにフレーム番号としての時刻tに所定精度内のズレがある場合であっても、多視点映像に撮影されている高速移動する被写体を適切に３次元モデル化し、高速移動する被写体に関しても当該ズレにより大きく品質を損なうことなく一定品質を確保した自由視点映像を生成することが可能である。 According to the present invention, even when the time t as the frame number is shifted within the predetermined accuracy, the subject moving at high speed captured in the multi-view video is appropriately formed into a three-dimensional model, and It is possible to generate a free-viewpoint video with a fixed quality even for a subject that does not lose much quality due to the deviation.

なお、上記説明のように実際の撮影時刻という観点からは所定精度内でのズレが存在しうることを前提に、以下では各カメラCiの画像Pi(t)に共通付与された時系列上でのフレーム番号として「時刻t」を用いるものとする。また、特に時刻tに言及する必要がない場合、例えばいずれの時刻tであっても共通の処理等に関して説明する場合に関しては、特に時刻tに言及せずに説明を行うものとする。例えば、カメラCiの時刻tの画像Pi(t)に関して、単に画像Piとして説明を行うものとする。 Note that, as described above, from the viewpoint of the actual shooting time, it is assumed that there is a deviation within a predetermined accuracy, and hereinafter, on a time series commonly assigned to the image Pi (t) of each camera Ci. It is assumed that “time t” is used as the frame number. In addition, when it is not particularly necessary to refer to the time t, for example, in the case where the common processing is described at any time t, the description will be made without particularly referring to the time t. For example, an image Pi (t) at time t of the camera Ci will be described simply as an image Pi.

図１にも示される通り、画像処理装置10の各部の概略的な処理内容と、各部の間の処理の連携（各部間でのデータの授受）とはそれぞれ以下の通りである。 As shown in FIG. 1, the schematic processing contents of each unit of the image processing apparatus 10 and the cooperation of the processing between the units (data transfer between the units) are as follows.

推定部1は多視点画像における各カメラCiの画像Piに関して速度場Viを推定し、当該速度場Vi（又は当該速度場Viから定まる画像Pi内での高速領域Ri）を判定部21へと出力する。推定部1は一実施形態ではさらに、当該推定した速度場Vi（又は高速領域Ri）をさらに抽出部4へと出力するようにしてもよい。 The estimating unit 1 estimates the speed field Vi with respect to the image Pi of each camera Ci in the multi-viewpoint image, and outputs the speed field Vi (or the high-speed area Ri in the image Pi determined from the speed field Vi) to the determination unit 21. I do. In one embodiment, the estimating unit 1 may further output the estimated speed field Vi (or the high-speed region Ri) to the extracting unit 4.

判定部21は推定部1から得た各画像Piの速度場Vi（又は高速領域Ri）より、後段側の処理部である交差部22において実施する視体積交差法における交差判定を緩和すべき空間領域としての緩和領域を判定し、当該判定した緩和領域を交差部22へと出力する。 Based on the velocity field Vi (or the high-speed region Ri) of each image Pi obtained from the estimation unit 1, the determination unit 21 determines the space in which the intersection determination in the visual volume intersection method performed in the intersection unit 22, which is a processing unit on the subsequent stage, should be relaxed. A relaxation region as a region is determined, and the determined relaxation region is output to the intersection 22.

交差部22は、校正部5より得られる校正データ（各カメラCiのカメラパラメータ）を用いて、抽出部4により得られる各画像PiのマスクMSiを対象として視体積交差法を適用することにより、多視点画像に撮影されている被写体の３次元モデルを得て、当該３次元モデルを合成部3へと出力する。当該生成する際に、交差部22では判定部21から得た緩和領域に関しては視体積交差法における交差判定の条件を緩和したうえで、被写体の３次元モデルを生成する。 The intersection 22 uses the calibration data (camera parameters of each camera Ci) obtained from the calibration unit 5 and applies the visual volume intersection method to the mask MSi of each image Pi obtained by the extraction unit 4, A three-dimensional model of the subject captured in the multi-viewpoint image is obtained, and the three-dimensional model is output to the synthesizing unit 3. At the time of the generation, the intersection 22 generates a three-dimensional model of the subject after relaxing the conditions of the intersection determination in the visual volume intersection method for the relaxation area obtained from the determination unit 21.

判定部21及び交差部22を備えて構成される生成部2は以上のようにして、画像処理装置10に入力された多視点画像（より正確には、当該多視点画像が抽出部4を経てマスクとなったもの）に対して視体積交差法を適用することで多視点画像の被写体の３次元モデルを生成するものである。そして、一実施形態において生成部2では当該生成する際に、推定部1から得られる速度場（又は高速領域）に基づき、３次元空間内において被写体の速度が大きいと考えられる領域を定めたうえで、当該領域に関しては視体積交差法における交差判定を緩和することにより、被写体が高速な場合であっても一定品質を確保した３次元モデルを得ることができる。 As described above, the generation unit 2 including the determination unit 21 and the intersection unit 22 performs the multi-view image input to the image processing apparatus 10 (more precisely, the multi-view image A three-dimensional model of a subject in a multi-viewpoint image is generated by applying the volume intersection method to the mask. Then, in one embodiment, at the time of the generation, the generation unit 2 determines an area in the three-dimensional space where the speed of the subject is considered to be high, based on the velocity field (or high-speed area) obtained from the estimation unit 1. By relaxing the intersection determination in the visual volume intersection method for the region, it is possible to obtain a three-dimensional model with a fixed quality even when the subject is moving at high speed.

一実施形態ではさらに、推定部1から得た速度場（又は高速領域）を考慮することにより抽出部4において高速な被写体に関してはマスクを通常のものより拡張して得るようにしたうえで、当該拡張されたマスクに対して生成部2において視体積交差法を適用することにより、被写体が高速な場合であっても一定品質を確保した３次元モデルを得ることができる。なお、後述するように、判定部21を省略して、すなわち、緩和領域を用いることなく、抽出部4による拡張されたマスクのみを用いて交差部22において視体積交差法を適用する一実施形態も可能であり、同じく、被写体が高速な場合であっても一定品質を確保した３次元モデルを得ることができる。 In one embodiment, the extraction unit 4 further expands the mask for a high-speed subject by taking into account the speed field (or high-speed area) obtained from the estimation unit 1, and then obtains the mask. By applying the visual volume intersection method to the expanded mask in the generation unit 2, it is possible to obtain a three-dimensional model with a fixed quality even when the subject is at high speed. As described later, an embodiment in which the determination unit 21 is omitted, that is, the visual volume intersection method is applied at the intersection 22 using only the mask expanded by the extraction unit 4 without using the relaxation region. It is also possible to obtain a three-dimensional model with a fixed quality even when the subject is moving at high speed.

合成部3は、生成部2から得た３次元モデルと、ユーザ指定等により与えられる仮想視点と、校正部5から得られる校正データ（各カメラCiのカメラパラメータ）と、画像処理装置10への入力としての多視点画像と、を用いて、当該仮想視点における自由視点画像を合成する。 The synthesizing unit 3 transmits the three-dimensional model obtained from the generating unit 2, a virtual viewpoint given by user designation or the like, calibration data (camera parameters of each camera Ci) obtained from the calibrating unit 5, Using the multi-viewpoint image as an input, a free viewpoint image in the virtual viewpoint is synthesized.

抽出部4は多視点画像における各カメラCiの画像PiよりマスクMSiを抽出して交差部22へと出力する。一実施形態ではさらに、抽出部4では推定部1から得られる速度場（又は高速領域）を利用して、高速な被写体に該当しうる箇所に関しては領域を拡張（膨張）したものとして、マスクMSiを抽出するようにしてもよい。 The extracting unit 4 extracts the mask MSi from the image Pi of each camera Ci in the multi-viewpoint image and outputs the mask MSi to the intersection 22. Further, in one embodiment, the extraction unit 4 uses the velocity field (or high-speed region) obtained from the estimation unit 1 to expand (expand) the region for a portion that can correspond to a high-speed subject, and the mask MSi May be extracted.

校正部5は、一実施形態において多視点画像における各カメラCiの画像Piより各カメラのカメラパラメータを校正データとして算出し、当該算出した校正データを交差部22及び合成部3へと出力する。 In one embodiment, the calibration unit 5 calculates the camera parameters of each camera as calibration data from the image Pi of each camera Ci in the multi-viewpoint image, and outputs the calculated calibration data to the intersection unit 22 and the synthesis unit 3.

以上、図１に示す画像処理装置10の各部の処理内容の概略及び各部間でのデータ授受に関して説明した。以下、当該各部の個別の処理内容の詳細に関して説明する。 The outline of the processing contents of each unit of the image processing apparatus 10 shown in FIG. 1 and the data transfer between the units have been described above. Hereinafter, details of the individual processing contents of each unit will be described.

＜校正部5＞
校正部5は、既存手法としてのカメラキャリブレーションを行うものであり、時刻tにおいて撮影されている映像（画像Pi(t)）のフィールドの特徴的な点（例えばスポーツ映像の場合であればコートの白線の交点など）と実際の実空間上のフィールド上の点との対応付けを行いカメラパラメータ（外部パラメータ及び内部パラメータ）として算出する。例えば、入力される多視点映像が一般的なスポーツ映像である場合は、コートのサイズが規格化されているため、画像平面上の点が実空間上（世界座標系）のどの座標に対応するかを容易に計算することが可能である。 <Calibration unit 5>
The calibration unit 5 performs camera calibration as an existing method, and performs a characteristic point of a field of a video (image Pi (t)) captured at time t (for example, in the case of sports video, Are associated with points on the field in the actual real space, and are calculated as camera parameters (external parameters and internal parameters). For example, when the input multi-viewpoint video is a general sports video, since the size of the court is standardized, a point on the image plane corresponds to any coordinate in the real space (world coordinate system). Can be easily calculated.

このカメラキャリブレーションは、手動のほか、任意の既存の自動キャリブレーション手法を用いても行うことができる。例えば、手動の方法としては画面上の白線の交点をユーザ操作により選択し、あらかじめ測定されたフィールドモデルとの対応付けをとることで、カメラパラメータを推定できる。なお、画面に歪みがある場合は先に内部パラメータを推定しておけばよい。 This camera calibration can be performed not only manually but also using any existing automatic calibration method. For example, as a manual method, a camera parameter can be estimated by selecting an intersection of white lines on the screen by a user operation and associating the intersection with a field model measured in advance. If the screen has distortion, the internal parameters may be estimated first.

固定カメラでの撮影を前提とした場合、すなわち、多視点映像の各カメラCiが固定されて撮影を行っている場合は、本カメラキャリブレーションの機能は映像生成の最初（最初の時刻t=1）に一度のみ行えばよい。あるいは、当該行ったものを固定的なパラメータとして与えておくことで、校正部5の処理を省略してもよい。また、移動カメラを前提とした場合、すなわち、多視点映像の各カメラCiが移動しながら撮影を行っている場合は、前述の任意の既存の自動キャリブレーション機能（フィールド上の特徴点等をマーカーとして利用する手法）により毎フレーム処理（各時刻tでの処理）を行うようにすればよい。 When photographing with a fixed camera is assumed, that is, when photographing is performed with each camera Ci of a multi-view image fixed, the function of this camera calibration is performed at the beginning of image generation (first time t = 1). ) Only need to be done once. Alternatively, the processing performed by the calibration unit 5 may be omitted by giving the result as a fixed parameter. In addition, when a moving camera is assumed, that is, when each camera Ci of a multi-view video is shooting while moving, any of the above-described existing automatic calibration functions (characteristic points on the field are marked with a marker). It is sufficient to perform each frame processing (processing at each time t) according to the method used as (1).

＜推定部1＞
後段側の交差部22において被写体の速度に応じた最終的な視体積交差法のパラメータを設定することを可能とすべく、推定部1では、カメラ画像Pi上の物体の速度を速度場Viとして算出する。この速度の算出には、オプティカルフローと呼ばれる、画像上の速度ベクトルマップを推定する公知の技術を利用することができる。この処理により得られたマップ（速度場Vi）からさらに、しきい値処理により速度が速い領域（高速度領域）とそうでない領域（低速度領域）に分割したマップとして、高速領域Ri（高速領域に該当しない領域として低速領域の情報をも含む）を得るようにしてもよい。ここで、この領域は必ずしも2段階に分けられる必要はなく、段階的な閾値を設定することにより任意段階数の領域に分けるようにしてもよい。 <Estimation unit 1>
In order to make it possible to set the parameters of the final volume intersection method according to the speed of the subject at the intersection 22 on the subsequent stage, the estimating unit 1 sets the speed of the object on the camera image Pi as the speed field Vi. calculate. A known technique for estimating a speed vector map on an image, called an optical flow, can be used for calculating the speed. The map (speed field Vi) obtained by this processing is further divided into a high-speed area (high-speed area) and a low-speed area (low-speed area) by threshold processing to obtain a high-speed area Ri (high-speed area). (Including information on a low-speed area as an area that does not correspond to). Here, this region does not necessarily need to be divided into two stages, and may be divided into an arbitrary number of regions by setting a stepwise threshold value.

推定部1ではオプティカルフロー以外にも、画像Pi上の物体の動きベクトルを推定する任意の既存技術を用いるようにしてよい。例えば、物体追跡の技術を応用することで、対象の物体のフレーム間の動きベクトルを算出し、その大小に応じて高低速度領域を分割するようにしてよい。 The estimating unit 1 may use any existing technology for estimating a motion vector of an object on the image Pi, other than the optical flow. For example, by applying an object tracking technique, a motion vector between frames of a target object may be calculated, and the high / low speed area may be divided according to the magnitude.

ここで、画像上の動きベクトルの大小はカメラと物体の距離に応じて変化することに注意されたい。すなわち、遠く離れた物体であれば高速に動いていても画像上ではあまり動かない。例えば、ある画像Piではある物体がカメラCiから遠くに位置するため画像上で低速となるが、別の画像Pj（j≠i）では同物体がカメラCjから近く位置しており、同物体の実際の高速な速度を反映して画像上でも高速となる場合がありうる。このような場合に、遠方となっている画像Piにおいて画像上の見かけの速度は低速であっても、３次元空間内での実際の高速な速度が反映された速度場を得られるようにすることが望ましい。 Here, it should be noted that the magnitude of the motion vector on the image changes according to the distance between the camera and the object. In other words, if the object is far away, even if it moves at high speed, it does not move much on the image. For example, in an image Pi, a certain object is located far from the camera Ci, so the speed is low on the image. In another image Pj (j ≠ i), the same object is located near the camera Cj, and There may be a case where the speed becomes high even on the image reflecting the actual high speed. In such a case, even if the apparent speed on the image Pi is distant, the speed field reflecting the actual high speed in the three-dimensional space is obtained even if the apparent speed on the image is low. It is desirable.

従って、一実施形態において推定部1では、画像上のみでの動きベクトルとして求めた速度場をさらに加工して、実際の３次元空間での速度場を推定したものとしての速度場を求め、閾値判定の対象とするようにしてもよい。例えば、画素位置(x,y)で指定される画像上のみで求めた速度場(v_x(x,y), v_y(x,y))に対して、その深度マップd(x,y)を乗じることで遠方ほど値が大きくなるようにすることで３次元空間での大きさを簡略的に推定した速度場(d(x,y)*v_x(x,y), d(x,y)*v_y(x,y))を求めるようにしてよい。（なお、深度マップd(x,y)をさらに所定関数f（増加関数）に引数として与えた値f(d(x,y))を速度場(v_x(x,y), v_y(x,y))に乗じてもよい。）深度マップd(x,y)に関しては、任意の既存手法（例えば、あるカメラCiの画像Piと別のカメラCjの画像Pjとの間での点対応を行ったうえでのステレオマッチング）で動的に求めてもよいし、各カメラCiが固定位置で撮影している場合には固定的なマップとして事前に与えておくようにしてもよい。（ここで、被写体となる物体の位置は多視点映像におけるフィールド平面内から高さ方向に大きく乖離することはないという前提のもとで、当該フィールド平面を表現したマップとして、固定的なマップを与えておくことができる。） Therefore, in one embodiment, the estimating unit 1 further processes the velocity field obtained as a motion vector only on an image to obtain a velocity field as an estimated velocity field in an actual three-dimensional space, and sets a threshold value. You may make it the object of determination. For example, for a velocity field (v _x (x, y), v _y (x, y)) determined only on the image specified by the pixel position (x, y), its depth map d (x, y) ) To make the value larger in the distance, the velocity field (d (x, y) * v _x (x, y), d (x , y) * v _y (x, y)). (Note that a value f (d (x, y)) obtained by further giving the depth map d (x, y) as an argument to the predetermined function f (increase function) is represented by a velocity field (v _x (x, y), v _y ( x, y)).) Regarding the depth map d (x, y), any existing method (for example, a point between an image Pi of one camera Ci and an image Pj of another camera Cj) may be used. It may be obtained dynamically by stereo matching after taking measures, or may be given in advance as a fixed map when each camera Ci is shooting at a fixed position. (Here, on the assumption that the position of the object to be the subject does not largely deviate in the height direction from within the field plane in the multi-view video, a fixed map is used as a map representing the field plane. Can be given.)

＜判定部21＞
判定部21は、推定部1にて求まったカメラCiごとの高速領域Riを用いて、次の交差部22において実際に視体積交差法による判定をする際の、当該判定を緩和する領域を算出する。一実施形態では、以下の式(1),(2)のようにcone(Ri)の積集合（全てのcone(Ri)の共通部分）として算出される領域を緩和領域Mn（n=1, …,C：ここでCは緩和領域の総数）とする。式(1)においてcone(Ri)は、カメラCiのカメラ中心と画像Pi内の高速領域Riの境界上の各点とを通る直線によって構成される3D空間内の錐体である。 <Judgment unit 21>
The determining unit 21 uses the high-speed region Ri for each camera Ci obtained by the estimating unit 1 to calculate a region for relaxing the determination when actually performing the determination by the visual volume intersection method at the next intersection unit 22. I do. In one embodiment, a region calculated as a product set of cone (Ri) (a common part of all cone (Ri)) as in the following equations (1) and (2) is defined as a relaxation region Mn (n = 1, ..., C: where C is the total number of relaxation regions. In Expression (1), cone (Ri) is a cone in a 3D space formed by a straight line passing through the camera center of the camera Ci and each point on the boundary of the high-speed region Ri in the image Pi.

なお、式(1)により緩和領域の全体Mを求めて、当該全体Mが複数（C個でC≧2の場合）の領域（連結領域）で構成されている場合にさらに式(2)により個別の緩和領域Mnを求めるようにすればよい。式(1)で積集合として緩和領域全体Mを求める手法は交差部22における視体積交差法と同様の手法を用いればよい。式(2)による個別の連結領域Mnへの分解も、任意の既存手法を用いればよい。 In addition, the whole M of the relaxation region is obtained by the equation (1), and when the whole M is composed of a plurality of (in the case of C and C ≧ 2) areas (connected areas), further the equation (2) is used. What is necessary is just to obtain the individual relaxation region Mn. A method similar to the visual volume intersection method at the intersection 22 may be used as a method for obtaining the entire relaxation region M as a product set in Expression (1). Decomposition into individual connected regions Mn according to equation (2) may be performed using any existing method.

上記の一実施形態は、すべてのカメラCi(i=1,2, …, N)において高速領域として判定された3D空間上の共通領域をMとして用いている。別の一実施形態では、式(1)の代わりに、錐体cone(Ri)のうち所定数k個以上（1≦k＜N）の領域が通過する（存在する）として判定された場合において、その領域をMとしてもよい。この場合、交差部22における視体積交差法と同様の手法に各錐体cone(Ri)の通過数（存在数）の空間(X,Y,Z)内の分布を重複度r(X,Y,Z)として求め、重複度r(X,Y,Z)≧kとなるような領域をMとして、式(1)の別実施形態として求めるようにすればよい。（なお、重複度r(X,Y,Z)=Nとなる領域が式(1)の領域Mである。） In the above embodiment, the common area in the 3D space determined as the high-speed area in all the cameras Ci (i = 1, 2,..., N) is used as M. In another embodiment, instead of Expression (1), when a predetermined number k or more (1 ≦ k <N) of cones cone (Ri) is determined as passing (existing), The area may be set to M. In this case, the distribution in the space (X, Y, Z) of the number of passing cones (Ri) in the space (X, Y, Z) is calculated using the same method as the visual volume intersection method at the intersection 22. , Z), and a region where the degree of overlap r (X, Y, Z) ≧ k is set to M, and may be obtained as another embodiment of Expression (1). (A region where the degree of overlap r (X, Y, Z) = N is a region M in Expression (1).)

＜抽出部4＞
抽出部4は、フレーム（各画像Pi）ごとの被写体（動物体）の形状を0,1の2値マスクMSiとして得るものである。得られた2値マスク画像MSiは交差部22に入力され、被写体の3DCGモデル形状の生成に利用される。 <Extractor 4>
The extraction unit 4 obtains the shape of a subject (animal) for each frame (each image Pi) as a binary mask MSi of 0,1. The obtained binary mask image MSi is input to the intersection 22, and is used for generating a 3DCG model shape of the subject.

ここで、2値マスクを得るための方法として、既存技術である背景差分法を利用する。この技術では、あらかじめ被写体のいない映像またはその平均値などの統計情報を背景統計情報として登録し、背景統計情報と対象時刻のカメラ映像との差分をとり、それに対してしきい値処理を行うことで被写体領域を抽出する。 Here, as a method for obtaining a binary mask, a background difference method, which is an existing technique, is used. In this technology, statistical information such as an image without a subject or its average value is registered in advance as background statistical information, the difference between the background statistical information and the camera image at the target time is calculated, and threshold processing is performed on the difference. To extract the subject area.

本発明の一実施形態においてはさらに、この背景差分による被写体マスクの抽出において、前述の推定部1で得た速度場Vi（及びこれに基づく高速領域Ri）の情報を利用するようにしてもよい。高速に動く物体は多くの場合、既存手法の背景差分をそのまま適用すると、モーションブラーを伴うため実際の大きさより小さく抽出される（または多少の位置ずれを起こす）ことも想定される。そこで、推定部1において画像Pi内の高速と判定された領域Riを利用して、上記の問題を解決することができる。説明のため、既存手法の背景差分で画像Piから抽出されるマスクをマスクmsiとする。 In one embodiment of the present invention, the information of the velocity field Vi (and the high-speed area Ri based on the velocity field) obtained by the above-described estimating unit 1 may be used in the extraction of the subject mask based on the background difference. . In many cases, if the background difference of the existing method is applied as it is, the object that moves at high speed will be extracted (or slightly displaced) smaller than the actual size due to motion blur. Therefore, the above problem can be solved by using the region Ri determined to be fast in the image Pi in the estimating unit 1. For the sake of explanation, a mask extracted from the image Pi by the background subtraction of the existing method is referred to as a mask msi.

すなわち、抽出部4で得るマスクMSiは、一実施形態では既存手法の背景差分によるものをそのまま採用（MSi=msi）してもよいが、別の一実施形態ではさらに、当該msiをさらに加工することで、モーションブラー等に対処したものとしてもよい。 That is, in one embodiment, the mask MSi obtained by the extraction unit 4 may be the one based on the background difference of the existing method as it is (MSi = msi), but in another embodiment, the msi is further processed. By doing so, it is possible to deal with motion blur and the like.

具体的には、推定部1において高速と判定された画像領域の2値マスクmsiに対して、モルフォロジー処理を行いa_iピクセルだけ膨張させることで、過小に抽出された2値マスクが、実際の物体の大きさを包含するように修正を行うことで、最終的な2値マスクMSiを得る。ここで、画像Piの2値マスクmsi（一般に1以上の連結領域で構成される）の各領域に対して、高速領域Riとの距離が所定閾値以下となるような領域を、膨張させる対象とすればよい。従って、高速領域Riとの重複がある領域（距離がゼロとなる領域）に関しては膨張させる対象とすることができる。 Specifically, the binary mask msi of the image area determined to be high-speed by the estimating unit 1 is subjected to morphological processing and expanded by _ai pixels, whereby the under-extracted binary mask is converted into an actual binary mask. By performing correction so as to include the size of the object, a final binary mask MSi is obtained. Here, for each area of the binary mask msi (generally composed of one or more connected areas) of the image Pi, an area whose distance from the high-speed area Ri is equal to or less than a predetermined threshold is determined as an object to be expanded. do it. Therefore, a region having an overlap with the high-speed region Ri (a region where the distance becomes zero) can be set as an expansion target.

ここで、膨張させるピクセル数a_iについては、想定される物体のフレームごとの速度v=p₁-p₂=(X₁-X₂,Y₁-Y₂,Z₁-Z₂),（p₁,p₂はそれぞれ前後のフレームの物体の空間座標を示す）とカメラパラメータによって決定される。具体的には、各画像Piに関して、以下の式(3)のように画面内おいて1フレーム間で動く画面上の動きベクトルb_i=(x,y)を算出する。
b_i=H_ip₁- H_ip₂ (3) Here, as for the number of pixels a _i to be expanded, the speed v = p ₁ -p ₂ = (X ₁ -X ₂ , Y ₁ -Y ₂ , Z ₁ -Z ₂ ), ( p ₁ and p ₂ indicate the spatial coordinates of the object in the previous and subsequent frames, respectively) and camera parameters. Specifically, for each image Pi, a motion vector b _i = (x, y) on the screen moving between one frame in the screen is calculated as in the following equation (3).
b _i = H _i p ₁ -H _i p ₂ (3)

ここでH_iは校正部5にて算出されたカメラCiに関するカメラパラメータより求められるワールド座標系(X,Y,Z)からカメラCiの画像平面の座標系(x,y)へホモグラフィ行列である。ここから、a_i≦|b_i|を満たすように、すなわちベクトルb_iの絶対値よりもピクセル数a_iが小さくなる又は等しいような所定値として、a_iを設定すればよい。 Here H _i is the world coordinate system obtained from the camera parameters relating to the camera Ci calculated by the correction unit 5 (X, Y, Z) coordinate system of the image plane from the camera Ci (x, y) to the at homography matrix is there. From this, it is sufficient to set a _i so as to satisfy a _i ≦ | b _i |, that is, as a predetermined value such that the number of pixels a _i is smaller or equal to the absolute value of the vector b _i .

なお、各画像Piについて想定される高速な速度v（3D空間内の方向の違いを含めて複数あってもよい）を予め与えておき、上記の式(3)の計算を予め行っておくことで、各画像Piについての膨張させるピクセル数a_iを定数として用意しておいてもよい。なお、以上の説明では各画像Piにおける膨張させるピクセル数a_i（及び対応する閾値を与えるベクトルb_i）として、当該画像Pi内のある１つのマスク領域（近傍に高速領域があるマスク領域）に関して膨張処理を行うことを暗に前提としていた。このような近傍に高速領域があるマスク領域が画像Pi内に２つ以上ある場合には、当該領域ごとにピクセル数a_i及びベクトルb_iを設定して、以上と同様にすればよい。 In addition, it is necessary to provide in advance a high speed v assumed for each image Pi (a plurality of speeds may be included including a difference in the direction in the 3D space), and to calculate the above equation (3) in advance. The number _ai of pixels to be expanded for each image Pi may be prepared as a constant. In the above description, the number of pixels a _i to be expanded in each image Pi (and the vector b _i that gives a corresponding threshold value) is defined as a certain mask region (a mask region having a high-speed region in the vicinity) in the image Pi. It was implicitly assumed that expansion processing was performed. If the mask region in such proximity that there is a high speed region is two or more in the image Pi, set the number of pixels a _i and vector b _i for each said region, may be the same as above.

図２及び図３は、抽出部4での処理の模式例を示す図である。図２では、ある時刻のあるカメラの画像Pについて、既存手法の背景差分により前景F1（野球のピッチャー）と、前景F2（当該ピッチャーが投げたボール）と、当該前景F1,F2以外の領域として背景Bと、が得られたことが示されている。一実施形態では、当該前景F1,F2をそのままマスクとしてよい。図３は、図２の例に対してピクセル数a_iだけ膨張させたマスクを利用する例を示している。図３では、前景F2（高速移動するボールの領域）の近辺が高速領域Rとして検出されていることにより前景F2が膨張の対象となり、膨張させた前景F20が得られる。一方、図３にて前景F1（高速移動はしないピッチャーの領域）はその近辺に高速領域は存在しないので膨張の対象とはされない。以上のように、図３の例では、前景F1及び膨張させた前景F20がマスクとして利用されることとなる。 FIG. 2 and FIG. 3 are diagrams illustrating a schematic example of the processing in the extraction unit 4. In FIG. 2, the image P of a certain camera at a certain time is defined as a region other than the foreground F1 (baseball pitcher), the foreground F2 (the ball thrown by the pitcher), and the foreground F1 and F2 by the background difference of the existing method. The background B is shown to have been obtained. In one embodiment, the foregrounds F1 and F2 may be used as masks. FIG. 3 shows an example in which a mask expanded by the number of pixels a _{i with} respect to the example of FIG. 2 is used. In FIG. 3, since the vicinity of the foreground F2 (the area of the ball moving at high speed) is detected as the high-speed area R, the foreground F2 is a target of expansion, and the expanded foreground F20 is obtained. On the other hand, in FIG. 3, the foreground F1 (the area of the pitcher that does not move at high speed) is not targeted for expansion because there is no high-speed area near it. As described above, in the example of FIG. 3, the foreground F1 and the expanded foreground F20 are used as masks.

＜交差部22＞
交差部22では、3DCGモデルの形状を生成する。ここで、基本的な処理フローは任意の既存の視体積交差法（例えば前掲の特許公報１や非特許公報１のもの）に則り、例えば3D空間上のボクセル集合として、あるいはポリゴンモデルとして、3D CGモデル形状を求めればよい。本発明においては特に、判定部21において緩和領域として判定された3D空間上の領域については、交差部22において既存の視体積交差法を適用する際のモデル形状生成の判定を変更する。 <Intersection 22>
At the intersection 22, a shape of the 3DCG model is generated. Here, the basic processing flow is based on an arbitrary existing visual volume intersection method (for example, the one disclosed in Patent Document 1 or Non-Patent Document 1 described above), and is, for example, a 3D space voxel set or a polygon model, What is necessary is just to obtain the CG model shape. In the present invention, in particular, for the region in the 3D space determined as the relaxation region by the determination unit 21, the determination of the model shape generation when the existing visual volume intersection method is applied in the intersection 22 is changed.

一般に、視体積交差法はすべてのカメラの被写体形状情報（本明細書では2値マスク画像）の積集合を取ることで、3DCGモデルの形状を得る。そのため、複数のカメラ2値マスクのうち1つでも欠損領域を含む場合、その3DCGモデル形状も欠損する。この欠損は、時刻同期ズレなどでも起こり、被写体の動きが速い場合は被写体自体が消失することも起こりうる。 In general, the volume intersection method obtains the shape of a 3DCG model by taking the intersection of the object shape information (binary mask images in this specification) of all cameras. Therefore, when even one of the plurality of camera binary masks includes a missing area, the 3DCG model shape is also lost. This loss occurs even when the time is out of synchronization, and when the subject moves fast, the subject itself may disappear.

これに対して本発明では、緩和領域においては、すべてのカメラ（前述の通りN個）における2値マスクの積を取るのではなく、そのうちm個以上のカメラにおいて1（前景）となっている場合は、被写体モデルに含まれると判定する。ここで、当該閾値mは、1≦m＜Nとすればよい。一実施形態としてm=kとしてもよい。このことにより、いくつかのカメラにおいて誤った時刻同期またはマスク抽出などが行われた場合においても、その他多数のカメラの情報を信頼することで、被写体の消失などを防ぐことができる。また、特に重要であるカメラについては、優先的にm個のカメラに含めることも可能である。 On the other hand, in the present invention, in the relaxation area, the product of the binary masks of all the cameras (N as described above) is not calculated, but 1 (foreground) is set for m or more of the cameras. In this case, it is determined that the object is included in the subject model. Here, the threshold m may be 1 ≦ m <N. In one embodiment, m = k may be set. Thus, even when erroneous time synchronization, mask extraction, or the like is performed in some of the cameras, the loss of the subject can be prevented by relying on the information of many other cameras. In addition, cameras that are particularly important can be preferentially included in m cameras.

＜合成部3＞
合成部3では、交差部22にて得られた3DCGモデルの形状に応じて最終的な仮想視点からの映像を合成する。当該合成は、任意の既存手法に則って実施すればよい。ユーザ入力等によって与えられた仮想視点の位置座標に応じて、その近傍のカメラテクスチャを利用して被写体3Dモデルの色情報を決定する。また3DCGモデル化されていない被写体以外の背景等については、あらかじめ制作しておいたスタジオムのCGモデルなどを利用し、上記3DCGモデルと重畳することで最終的な合成映像を得る。 <Synthesis unit 3>
The synthesizing unit 3 synthesizes a video from a final virtual viewpoint according to the shape of the 3DCG model obtained at the intersection 22. The synthesis may be performed according to any existing method. According to the position coordinates of the virtual viewpoint given by a user input or the like, the color information of the subject 3D model is determined using the camera texture in the vicinity. For the background other than the subject that is not 3DCG modeled, a final composite video is obtained by using a previously produced studio CG model and superimposing it on the 3DCG model.

なお、交差部22にて得られた3DCGモデルのうち、緩和領域に関しては、2値マスクが1（前景）となっているm個以上のカメラの画像のテクスチャを利用して、合成部3では合成映像を得るようにすればよい。この際、前景となっているm個以上のカメラ画像のうち、例えば仮想視点の位置座標からはカメラ位置が逆向きにある等により合成に利用することが不適切と判断されるものは、そのテクスチャを利用しないようにしてもよい。前景となっているm個以上の各カメラ画像のテクスチャを利用するか否かの判断は、合成に利用する既存手法に即して実施すればよい。 In the 3D CG model obtained at the intersection 22, for the relaxation area, the synthesizing unit 3 uses the texture of m or more camera images whose binary mask is 1 (foreground). What is necessary is just to obtain a composite image. At this time, among the m or more camera images serving as foregrounds, those that are judged to be inappropriate to use for synthesis because the camera position is in the opposite direction from the position coordinates of the virtual viewpoint, for example, Textures may not be used. The determination as to whether or not to use the texture of each of the m or more camera images serving as the foreground may be performed in accordance with the existing method used for synthesis.

以上、本発明によれば、多視点画像に撮影されている被写体の速度に応じて、視体積交差法の判定条件を適応的に変更することで、高速に動く物体の合成が可能となる。以下、本発明における追加・変形実施形態その他に関して補足説明を行う。 As described above, according to the present invention, an object that moves at high speed can be synthesized by adaptively changing the determination conditions of the volume intersection method according to the speed of a subject captured in a multi-viewpoint image. Hereinafter, supplementary explanations regarding additional and modified embodiments and the like of the present invention will be given.

（１）多視点映像上に所定シーンが撮影されていることが事前に既知であり、当該所定シーンにおいて高速物体が現れうる3D空間の領域の情報が予め既知である場合は、推定部1による速度場Vi（及び高速領域Ri）の推定を利用することなく、判定部21において当該予め既知の3次元空間内の領域を緩和領域として交差部22へと出力すればよい。 (1) If it is known in advance that a predetermined scene has been photographed on the multi-view video, and if information on a region in the 3D space where a high-speed object can appear in the predetermined scene is known in advance, the estimation unit 1 Instead of using the estimation of the speed field Vi (and the high-speed region Ri), the determination unit 21 may output the previously known region in the three-dimensional space to the intersection 22 as the relaxation region.

例えば図２や図３の例のように野球のピッチャーの投球シーンであることが（例えば映像に付与された番組情報などによって）事前に既知であり、投げられたボールのみが高速物体となることが事前に既知である場合、当該投げられたボールが通過しうる所定範囲（野球のピッチャーとキャッチャーとの間の所定範囲）を判定部21において固定的に緩和領域として出力してよい。 For example, it is known in advance that the scene is a baseball pitcher's pitching scene (for example, by program information added to a video image) as in the examples of FIGS. 2 and 3, and only the thrown ball becomes a high-speed object. Is known in advance, the predetermined range (the predetermined range between the baseball pitcher and the catcher) through which the thrown ball can pass may be fixedly output as the relaxation area in the determination unit 21.

（２）抽出部4において各画像Piに対して抽出するマスクMSiにおいては、前景となっている領域（ピクセル数a_iだけ膨張させた場合も含む）のそれぞれの、各画像間での対応関係（画像Piのどの前景領域が別の画像Pjのどの前景領域に対応するかの関係）を与えておき、交差部22においては当該対応関係にある前景をそれぞれ対象として視体積交差法を適用するようにしてもよい。各画像間での対応関係を与える手法には任意の既存手法を用いてよく、例えば、画像特徴量などに基づく画像認識を各画像の各領域に適用してもよいし、当該画像認識の後に各領域を追跡することで領域のID（識別子）を保持する処理を行うようにしてもよい。 (2) In the mask MSi extracted for each image Pi in the extraction unit 4, the correspondence between each image of the foreground region (including the case where the image is expanded by the number of pixels a _i ) (A relationship between which foreground area of the image Pi corresponds to which foreground area of another image Pj) is given, and the intersection volume 22 applies the volume intersection method to the corresponding foregrounds. You may do so. Any existing method may be used as a method of giving a correspondence relationship between each image.For example, image recognition based on an image feature amount or the like may be applied to each region of each image, or after the image recognition, A process of holding the ID (identifier) of an area by tracking each area may be performed.

（３）前述の通り、画像処理装置10において判定部21を省略する実施形態（緩和領域の情報を利用しない実施形態）も可能である。この場合、推定部1では各画像Piの速度場Viを推定して抽出部4へと出力し、抽出部4では速度場Viに基づいて、高速物体に対応すると考えられる前景をピクセル数a_iだけ膨張させる処理を行えばよい。そして、交差部22においては、緩和領域の情報は用いることなく、抽出部4により膨張させた処理を行ったマスクMSiを利用して、既存手法の視体積交差法を適用すればよい。 (3) As described above, an embodiment in which the determination unit 21 is omitted in the image processing apparatus 10 (an embodiment in which the information of the relaxation area is not used) is also possible. In this case, the estimating unit 1 estimates the speed field Vi of each image Pi and outputs it to the extracting unit 4, and the extracting unit 4 determines the foreground corresponding to the high-speed object based on the speed field Vi by the number of pixels a _i Only the process of expanding is performed. Then, in the intersection portion 22, the visual volume intersection method of the existing method may be applied using the mask MSi subjected to the processing expanded by the extraction unit 4 without using the information of the relaxation area.

（４）本発明は、コンピュータを画像処理装置10として機能させるプログラムとしても提供可能である。当該コンピュータには、CPU(中央演算装置)、メモリ及び各種I/Fといった周知のハードウェア構成のものを採用することができ、CPUが画像処理装置10の各部の機能に対応する命令を実行することとなる。また、当該コンピュータはさらに、CPUよりも並列処理を高速実施可能なGPU（グラフィック処理装置）を備え、CPUに代えて画像処理装置10の全部又は任意の一部分の機能を当該GPUにおいてプログラムを読み込んで実行するようにしてもよい。 (4) The present invention can also be provided as a program that causes a computer to function as the image processing device 10. The computer may have a well-known hardware configuration such as a CPU (Central Processing Unit), a memory, and various I / Fs, and the CPU executes an instruction corresponding to a function of each unit of the image processing apparatus 10. It will be. In addition, the computer further includes a GPU (graphic processing device) capable of performing parallel processing at a higher speed than the CPU, and the whole or any part of the function of the image processing device 10 is read by the GPU to read a program in place of the CPU. It may be executed.

10…画像処理装置、1…推定部、2…生成部、21…判定部、22…交差部、3…合成部、4…抽出部、5…校正部 10 image processing apparatus, 1 estimation unit, 2 generation unit, 21 determination unit, 22 intersection, 3 synthesis unit, 4 extraction unit, 5 calibration unit

Claims

An image processing apparatus comprising: a generation unit configured to generate a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-view image,
The image processing apparatus according to claim 1, wherein the generation unit generates the three-dimensional model after relaxing the intersection determination by the visual volume intersection method in a space where the speed of the subject is high.

The image processing apparatus further includes an estimating unit that estimates a speed field in each of the multi-view images.
The generation unit is determined from the velocity field estimated in each of the images, for the space where the speed of the subject is large, relaxes the intersection determination of the volume intersection method, and then generates the three-dimensional model. Image processing device.

The estimating unit further obtains a high-speed area in which the estimated speed field is determined to be large in each image, and applies a visual volume intersection method to the high-speed area in each image, thereby obtaining a space in which the speed of the subject is large. 3. The image processing apparatus according to claim 2, wherein:

The estimating unit obtains a three-dimensional model from the intersection determination based on the visual volume intersection method obtained by a high-speed area of only some of the images of the multi-viewpoint image, so that the space where the speed of the subject is large is obtained. The image processing apparatus according to claim 3, wherein:

The image processing apparatus according to claim 2, wherein the estimating unit estimates the speed field by calculating an optical flow or calculating a motion vector by tracking an object.

After extracting a mask that distinguishes the foreground and background from each image of the multi-viewpoint image, further includes an extraction unit that processes the mask so as to expand the foreground according to the estimated velocity field,
The image processing apparatus according to claim 2, wherein the generation unit generates a three-dimensional model of the subject by applying a visual volume intersection method to the processed mask.

7. The image processing apparatus according to claim 6, wherein the extraction unit processes the extracted mask so as to expand a foreground that is close to a region where the estimated velocity field is determined to be large.

The estimating unit obtains a velocity field based on the two-dimensional image information of each image of the multi-viewpoint image, and further, based on depth information attached to each image, generates a two-dimensional image information with a farther depth as the depth increases. The image processing apparatus according to claim 2, wherein the estimated speed field is obtained by correcting the value of the speed field based on the value to be larger.

The generation unit may reduce the intersection determination of the visual volume intersection method by obtaining a three-dimensional model from the intersection determination obtained from the foreground of only some of the images of the multi-viewpoint image. The image processing apparatus according to claim 1, wherein

An image processing apparatus comprising: a generation unit configured to generate a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-view image,
An estimating unit that estimates a speed field in each image of the multi-view image,
After extracting a mask that distinguishes foreground and background from each image of the multi-viewpoint image, further includes an extraction unit that processes the mask so as to expand the foreground according to the estimated velocity field,
The image processing apparatus, wherein the generation unit generates a three-dimensional model of the subject by applying a volume intersection method to the processed mask.

The image according to claim 1, further comprising a combining unit configured to combine a free viewpoint image of the multi-view image corresponding to a designated virtual viewpoint based on the generated three-dimensional model. Processing equipment.

An image processing method comprising: a generation step of generating a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-viewpoint image,
The image processing method according to claim 1, wherein in the generating step, the three-dimensional model is generated after relaxing the intersection determination by the visual volume intersection method in a space where the speed of the subject is high.

An image processing program that causes a computer to function as an image processing apparatus including a generation unit that generates a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-viewpoint image,
An image processing program, wherein the generation unit generates the three-dimensional model after relaxing the intersection determination by the visual volume intersection method in a space where the speed of the subject is high.

An image processing method comprising: a generation unit configured to generate a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-viewpoint image,
An estimation step of estimating a velocity field in each of the multi-view images,
After extracting a mask that distinguishes the foreground and background from each image of the multi-view image, further comprises an extraction step of processing the mask so as to expand the foreground according to the estimated velocity field,
In the generating step, a three-dimensional model of a subject is generated by applying a volume intersection method to the processed mask.

An image processing program that causes a computer to function as an image processing apparatus including a generation unit that generates a three-dimensional model of a subject by applying a volume intersection method to each image of a multi-viewpoint image,
The image processing device,
An estimating unit that estimates a speed field in each image of the multi-view image,
After extracting a mask that distinguishes the foreground and background from each image of the multi-viewpoint image, further includes an extraction unit that processes the mask so as to expand the foreground according to the estimated velocity field,
An image processing program, wherein the generation unit generates a three-dimensional model of a subject by applying a visual volume intersection method to the processed mask.