JP7159384B2

JP7159384B2 - Image processing device, image processing method, and program

Info

Publication number: JP7159384B2
Application number: JP2021068487A
Authority: JP
Inventors: 希名板倉
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-09-28
Filing date: 2021-04-14
Publication date: 2022-10-24
Anticipated expiration: 2036-09-28
Also published as: JP2021108193A; JP2018055367A; JP6873644B2

Description

本発明は、撮像画像から前景の被写体による領域を抽出する技術に関する。 The present invention relates to a technique for extracting a foreground subject area from a captured image.

従来、被写体（前景の被写体と背景の被写体とを含む）を撮像することで取得した撮像画像から前景の被写体による領域を抽出する手法として、背景差分法が存在する。背景差分法では、前景の被写体と背景の被写体とが写っている撮像画像の画素値と、背景の被写体のみが写っている背景画像の画素値との画素毎の差分に基づいて、前景の被写体による領域を抽出した前景画像を作成する。このとき、特定の条件の元で予め撮像した背景のみが写っている画像を背景画像として用いた場合、時間の経過に伴う日照の変化などにより背景が変化すると、前景の被写体による領域を抽出する精度が低下してしまうという問題があった。 2. Description of the Related Art Conventionally, there is a background subtraction method as a method for extracting an area of a foreground subject from a captured image obtained by imaging a subject (including a foreground subject and a background subject). In the background subtraction method, the foreground object is calculated based on the pixel-by-pixel difference between the pixel values of the captured image showing the foreground and background objects and the pixel value of the background image showing only the background object. Create a foreground image by extracting the region by . At this time, if an image in which only the background is captured in advance under specific conditions is used as the background image, and the background changes due to changes in sunlight over time, the area of the subject in the foreground is extracted. There is a problem that the accuracy is lowered.

上記の問題を解決するために、特許文献１は、撮像時刻が異なる複数の画像に基づいて作成した背景画像を用いることで、背景の変化によらず前景の被写体による領域を抽出する技術を開示する。 In order to solve the above problem, Patent Literature 1 discloses a technique for extracting a foreground subject area regardless of changes in the background by using a background image created based on a plurality of images captured at different times. do.

また、特許文献２は、同一時刻において異なる視点から撮像した複数の画像に基づいて作成した背景画像を用いて、時間の経過に伴う被写体の変化によらず前景の被写体による領域を抽出する技術を開示する。 Further, Patent Document 2 discloses a technique for extracting an area of a foreground subject regardless of changes in the subject over time using a background image created based on a plurality of images captured from different viewpoints at the same time. Disclose.

特開２０１２－１０４０５３号公報JP 2012-104053 A 特開２０１４－２３０１８０号公報Japanese Patent Application Laid-Open No. 2014-230180

しかしながら、特許文献１では、前景の被写体が動かないで停止している場合、この前景の被写体による領域を背景の被写体による領域と誤って判定するため、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 However, in Patent Document 1, when the foreground subject is stationary without moving, the area of the foreground subject is erroneously determined to be the area of the background subject, so the background image cannot be created with high accuracy. Therefore, there is a problem that the accuracy of extracting the area of the subject in the foreground is lowered.

また、特許文献２では、単一の視点からでは見えない背景の被写体の情報を、他の視点における情報により補うことで背景画像を作成するが、シーン内に存在する前景の被写体が密集し前景の被写体が重なる領域などにおいて、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 In Patent Document 2, a background image is created by supplementing information on a background subject that cannot be seen from a single viewpoint with information from another viewpoint. Background images cannot be created with high accuracy in areas such as areas where multiple subjects overlap. Therefore, there is a problem that the accuracy of extracting the area of the subject in the foreground is lowered.

そこで本発明は、上記の課題を鑑みて、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することを目的とする。 Therefore, in view of the above-mentioned problems, the present invention is capable of accurately identifying an area of a foreground subject regardless of whether or not the subject changes (moves, etc.) with the passage of time and the state of the subject such as the density of the foreground subject. The purpose is to extract.

本発明は、着目視点から撮影されて取得された、被写体を含む着目画像を取得する第１取得手段と、前記着目視点とは異なる複数の視点から撮影されて取得された複数の参照画像を取得する第２取得手段と、前記第２取得手段により取得された複数の参照画像を変換して、前記着目視点から見た場合の複数の変換画像を生成する生成手段と、前記第１取得手段により取得された着目画像の着目画素の画素値と、前記生成手段により生成された複数の変換画像それぞれにおける前記着目画素と対応する画素の画素値との差分に関する指標に基づいて、前記着目画像における前記被写体の画像領域を決定する決定手段と、を有し、前記決定手段は、前記指標に基づいて、前記着目画像における補正対象画素を特定し、特定した補正対象画素を補正することにより補正着目画像を生成し、前記第１取得手段により取得された着目画像と、前記補正着目画像とに基づいて、前記着目画像における前記被写体の画像領域を決定することを特徴とする画像処理装置である。 The present invention provides a first acquisition unit that acquires an image of interest including a subject that is captured from a viewpoint of interest and acquires a plurality of reference images that are captured from a plurality of viewpoints different from the viewpoint of interest. a generation means for generating a plurality of transformed images viewed from the viewpoint of interest by transforming the plurality of reference images acquired by the second acquisition means; and the first acquisition means Based on an index relating to a difference between a pixel value of a pixel of interest in the obtained image of interest and a pixel value of a pixel corresponding to the pixel of interest in each of the plurality of converted images generated by the generating means, the determining means for determining an image area of a subject , wherein the determining means specifies pixels to be corrected in the image of interest based on the index, and corrects the specified pixels to be corrected to obtain a corrected image of interest. and determines an image area of the subject in the image of interest based on the image of interest acquired by the first acquisition means and the corrected image of interest .

本発明によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 According to the present invention, it is possible to extract the area of the foreground object with high accuracy regardless of whether or not the object changes (moves, etc.) with the passage of time and the condition of the object such as the density of the foreground object.

実施例１乃至３における画像処理装置のハードウェア構成を示すブロック図FIG. 2 is a block diagram showing the hardware configuration of the image processing apparatus according to Embodiments 1 to 3; 実施例１における画像処理装置の機能構成を示すブロック図FIG. 2 is a block diagram showing the functional configuration of the image processing apparatus according to the first embodiment; 実施例１における前景領域を抽出する処理の流れを示すフローチャート4 is a flow chart showing the flow of processing for extracting a foreground region in the first embodiment; 実施例１における前景領域を抽出する処理の概要を説明する図FIG. 4 is a diagram for explaining an outline of processing for extracting a foreground region according to the first embodiment; 実施例１における画像変換を説明する図4A and 4B are diagrams for explaining image conversion in the first embodiment; 実施例１の効果を説明する図A diagram for explaining the effects of the first embodiment. 実施例２における画像処理装置の機能構成を示すブロック図FIG. 11 is a block diagram showing the functional configuration of the image processing apparatus according to the second embodiment; 実施例２における前景領域を抽出する処理の流れを示すフローチャートFIG. 10 is a flow chart showing the flow of processing for extracting a foreground region in the second embodiment; FIG. 実施例２における連続性の算出手法を説明する図FIG. 5 is a diagram for explaining a continuity calculation method in Example 2; 実施例２の効果を説明する図A diagram for explaining the effect of the second embodiment. 実施例３における画像処理装置の機能構成を示すブロック図FIG. 11 is a block diagram showing the functional configuration of the image processing apparatus according to the third embodiment; 実施例３における前景領域を抽出する処理の流れを示すフローチャート10 is a flow chart showing the flow of processing for extracting a foreground region in the third embodiment; 実施例３の効果を説明する図A diagram for explaining the effect of the third embodiment.

以下、本発明の実施形態について、図面を参照して説明する。ただし、以下の実施形態は本発明を限定するものではなく、また、以下の実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成要素については、同じ符号を付して説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the following embodiments do not limit the present invention, and not all combinations of features described in the following embodiments are essential for the solution of the present invention. In addition, the same components will be described with the same reference numerals.

［実施例１］
実施例１では、多視点画像、具体的には複数の異なる視点における前景の被写体の画像を一部含む背景の被写体の画像（以下、不完全な背景画像）に基づき、着目視点における前景の被写体の画像を含まない背景の被写体の画像（以下、完全な背景画像）を作成する。そして、完全な背景画像を用いて、処理対象の画像から前景の被写体による領域を抽出する。 [Example 1]
In the first embodiment, a foreground subject at a viewpoint of interest is detected based on a multi-viewpoint image, specifically, an image of a background subject that partially includes images of the foreground subject at a plurality of different viewpoints (hereinafter referred to as an incomplete background image). Create an image of the background subject that does not include the image of the background (hereinafter referred to as the complete background image). The complete background image is then used to extract the foreground subject area from the image to be processed.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について、図４を用いて説明する。本実施例では、まず、複数の異なる視点における背景画像データ４０１を取得する。背景画像データとは、背景の被写体の画像、所謂背景画像である。ここで取得する背景画像データは、前景の被写体の画像（以下、前景画像）を全く含まない完全な背景画像である必要はないが、前景領域を抽出する対象の画像を撮像した時刻に近い時刻に撮像した画像であることが望ましい。取得する複数の背景画像データ４０１の中には、前景領域を抽出する対象の画像を撮像した視点４０２と同一の視点における画像が含まれているものとする。以下、前景領域を抽出する対象の画像を対象画像（データ）と呼び、対象画像（データ）を撮像した視点を着目視点と呼ぶ。 <Outline of processing for extracting the foreground area>
An overview of the processing for extracting the foreground region in this embodiment will be described below with reference to FIG. In this embodiment, first, background image data 401 at a plurality of different viewpoints is obtained. Background image data is an image of a background subject, a so-called background image. The background image data acquired here does not need to be a complete background image that does not include the image of the subject in the foreground (hereinafter referred to as the foreground image). It is desirable that the image is taken at It is assumed that the plurality of background image data 401 to be acquired includes an image at the same viewpoint as the viewpoint 402 from which the image from which the foreground region is to be extracted is captured. Hereinafter, an image from which a foreground region is to be extracted is called a target image (data), and a viewpoint from which the target image (data) is captured is called a viewpoint of interest.

次に、取得した背景画像データ４０１を、視点毎に、地上面を基準として着目視点４０２から見た場合の画像へと変換することで、着目視点における背景画像データ４０３を作成する。ここで作成される背景画像データ４０３の数は、背景画像データ４０１の数と同一である。以下、背景画像データ４０１を変換することで得られる背景画像データ４０３を、変換背景画像データ４０３と呼ぶ。 Next, background image data 403 at the viewpoint of interest is created by converting the acquired background image data 401 for each viewpoint into an image when viewed from the viewpoint of interest 402 with the ground surface as a reference. The number of background image data 403 created here is the same as the number of background image data 401 . Background image data 403 obtained by converting the background image data 401 is hereinafter referred to as converted background image data 403 .

ここで、前景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して近い位置に存在する被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、選手や審判などの人物や、ゴールやボールなどの器具が前景の被写体であり、前景の被写体には、時系列に沿って連続で撮像した複数の画像において概ね動き続けるものが含まれる。一方で、背景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して遠い位置に存在するため前景の被写体の背後となる被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、芝や土で構成されるグラウンド、体育館の床などが背景の被写体であり、背景の被写体は、時系列に沿って連続で撮像した複数の画像において概ね止まっているものが多い。 Here, the subject in the foreground means a subject existing in a position close to the imaging device among the subjects included in the captured image. For example, if the target image data is data of a competition scene such as a sport, the foreground subjects are people such as athletes and referees, and equipment such as goals and balls. In a plurality of images captured continuously, images that generally continue to move are included. On the other hand, the background subject means a subject behind the foreground subject because it exists at a position far from the imaging device among the subjects included in the captured image. For example, if the target image data is the data of a competition scene such as a sport, the background object may be a ground made of grass or soil, or the floor of a gymnasium, and the background object is continuous along the time series. Many of the images captured in .

このような前景の被写体は地上面からの高さを持つ一方で、背景の被写体は地上面からの高さを持たない。そのため、複数の変換背景画像データ４０３を用いて、地上面からの高さを持つ被写体つまり前景の被写体の画像（前景画像）を検出し、該検出した前景画像を不完全な背景画像から除去することで、着目視点４０２における完全な背景画像を作成する。具体的には、着目視点４０２における画像を含む複数の変換背景画像データ４０３について、着目画素間の一致の度合いを画素毎に算出し、一致の度合いが低い画素を前景の被写体の画像領域の画素として検出する。上述の通り、変換背景画像データ４０３は、背景画像データ４０１を、地上面を基準面として着目視点４０２から見た場合の画像に変換することで得られる画像である。そのため、地上面に存在し高さを持たない被写体４０４に対応する、背景画像データ４０１における領域４０５～４０７の画素の座標はそれぞれ、全ての変換背景画像データ４０３において共通して同じ位置に存在する領域４０８の画素の座標へと変換される。一方、高さを持つ被写体４０９に対応する、背景画像データ４０１における領域４１０～４１２の画素の座標はそれぞれ、視点によって位置が異なる領域４１３～４１５の画素の座標へと変換される。従って、複数の変換背景画像データ４０３において、着目画素間の一致の度合いが高い画素を、高さを持たない背景の被写体の画像領域の画素とみなし、一致の度合いが低い画素を、高さを持つ前景の被写体の画像領域の画素とみなす。これにより、完全な背景画像を作成する。最後に、作成した着目視点４０２における完全な背景画像と対象画像データとを比較することで前景領域を抽出する。 Such foreground objects have height above the ground plane, while background objects do not. Therefore, using a plurality of converted background image data 403, an image (foreground image) of a subject having a height above the ground surface, that is, a subject in the foreground is detected, and the detected foreground image is removed from the incomplete background image. Thus, a complete background image at the viewpoint 402 of interest is created. Specifically, for a plurality of pieces of converted background image data 403 including an image at a viewpoint 402 of interest, the degree of matching between pixels of interest is calculated for each pixel, and pixels with a low degree of matching are pixels of the image area of the subject in the foreground. Detect as As described above, the converted background image data 403 is an image obtained by converting the background image data 401 into an image viewed from the viewpoint 402 with the ground surface as a reference plane. Therefore, the coordinates of the pixels in the areas 405 to 407 in the background image data 401 corresponding to the object 404 existing on the ground and having no height are commonly present in the same position in all the transformed background image data 403. Transformed into coordinates of pixels in region 408 . On the other hand, the coordinates of the pixels in areas 410 to 412 in the background image data 401 corresponding to the object 409 having height are converted into the coordinates of the pixels in areas 413 to 415 whose positions differ depending on the viewpoint. Therefore, in a plurality of pieces of converted background image data 403, pixels with a high degree of matching between pixels of interest are regarded as pixels in an image area of a background object that does not have a height, and pixels with a low degree of matching are regarded as having a height. The pixels in the image area of the foreground subject with This creates a complete background image. Finally, the foreground region is extracted by comparing the complete background image and the target image data at the created viewpoint 402 of interest.

以上が、本実施例で行われる処理の概要である。なお、用いる対象画像データは上記の例に限られず、監視カメラで撮像したデータなど様々な画像データを用いることができる。また、ここでは、背景画像データ４０１の中に着目視点における画像が含まれる場合について説明したが、背景画像データの中に着目視点における画像が含まれない場合にも本実施例を適用可能であり、具体的な処理方法は後述する。 The above is the outline of the processing performed in the present embodiment. Note that the target image data to be used is not limited to the above example, and various image data such as data captured by a surveillance camera can be used. Also, here, the case where the background image data 401 includes the image at the viewpoint of interest has been described, but the present embodiment can also be applied to the case where the background image data does not include the image at the viewpoint of interest. , a specific processing method will be described later.

＜画像処理装置のハードウェア構成について＞
以下、本実施例の画像処理装置のハードウェア構成について述べる。図１は、本実施例の画像処理装置のハードウェア構成の一例を示すブロック図である。本実施例の画像処理装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、二次記憶装置１０４、入力インターフェース１０５、及び出力インターフェース１０６を備え、これらの構成要素は、システムバス１０７によって相互に接続されている。また、画像処理装置１００は、入力インターフェース１０５を介して外部記憶装置１０８に接続されており、出力インターフェース１０６を介して外部記憶装置１０８と表示装置１０９とに接続されている。 <Hardware Configuration of Image Processing Apparatus>
The hardware configuration of the image processing apparatus of this embodiment will be described below. FIG. 1 is a block diagram showing an example of the hardware configuration of the image processing apparatus of this embodiment. The image processing apparatus 100 of this embodiment comprises a CPU 101 , a RAM 102 , a ROM 103 , a secondary storage device 104 , an input interface 105 and an output interface 106 , and these components are interconnected by a system bus 107 . The image processing apparatus 100 is also connected to an external storage device 108 via an input interface 105 , and is connected to an external storage device 108 and a display device 109 via an output interface 106 .

ＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして、ＲＯＭ１０３に格納されたプログラムを実行し、システムバス１０７を介して画像処理装置１００の各構成要素を統括的に制御する。これにより、後述する様々な処理が実行される。 The CPU 101 executes programs stored in the ROM 103 using the RAM 102 as a work memory, and controls each component of the image processing apparatus 100 through the system bus 107 . As a result, various processes to be described later are executed.

二次記憶装置１０４は、画像処理装置１００で取り扱われる種々のデータを記憶する記憶装置であり、本実施例ではＨＤＤが用いられる。ＣＰＵ１０１は、システムバス１０７を介して二次記憶装置１０４へのデータの書き込みと二次記憶装置１０４に記憶されたデータの読出しとを行うことができる。なお、二次記憶装置１０４としてＨＤＤの他に、光ディスクドライブやフラッシュメモリなど、様々な記憶デバイスを用いることが可能である。 The secondary storage device 104 is a storage device that stores various data handled by the image processing apparatus 100, and an HDD is used in this embodiment. The CPU 101 can write data to the secondary storage device 104 and read data stored in the secondary storage device 104 via the system bus 107 . In addition to the HDD, various storage devices such as an optical disk drive and a flash memory can be used as the secondary storage device 104 .

入力インターフェース１０５は、例えばＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースであり、外部装置から画像処理装置１００へのデータや命令等の入力は、入力インターフェース１０５を介して行われる。画像処理装置１００は、入力インターフェース１０５を介して、外部記憶装置１０８（例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリなどの記憶媒体）からデータを取得する。なお、入力インターフェース１０５にはマウスやキーボードなどユーザーが入力するための入力デバイス（不図示）も接続可能である。出力インターフェース１０６は、入力インターフェース１０５と同様のＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースの他に、例えばＤＶＩやＨＤＭＩ（登録商標）等の映像出力端子も含む。画像処理装置１００から外部装置へのデータの出力は、出力インターフェース１０６を介して行われる。画像処理装置１００は、出力インターフェース１０６を介して表示装置１０９（液晶ディスプレイなどの各種画像表示デバイス）に処理した画像などを出力することで、画像の表示を行う。なお、画像処理装置１００の構成要素は上述のもの以外にも存在するが、本発明の主眼ではないため、説明を省略する。 The input interface 105 is, for example, a serial bus interface such as USB or IEEE1394. The image processing apparatus 100 acquires data from an external storage device 108 (for example, a storage medium such as a hard disk, memory card, CF card, SD card, USB memory, etc.) via the input interface 105 . An input device (not shown) for user input, such as a mouse and a keyboard, can also be connected to the input interface 105 . The output interface 106 includes a serial bus interface such as USB and IEEE1394 similar to the input interface 105, as well as video output terminals such as DVI and HDMI (registered trademark). Data is output from the image processing apparatus 100 to an external device via the output interface 106 . The image processing apparatus 100 displays images by outputting processed images to a display device 109 (various image display devices such as a liquid crystal display) via an output interface 106 . The image processing apparatus 100 also has components other than those described above.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図２及び図３を用いて説明する。図２は、画像処理装置１００の機能構成を示すブロック図であり、図３は、前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図２に示す各構成要素として機能し、図３に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <Regarding the process of extracting the foreground area>
Processing for extracting a foreground region performed by the image processing apparatus 100 according to the present embodiment will be described below with reference to FIGS. 2 and 3. FIG. FIG. 2 is a block diagram showing the functional configuration of the image processing apparatus 100, and FIG. 3 is a flow chart showing the flow of processing for extracting the foreground region. The CPU 101 of the image processing apparatus 100 uses the RAM 102 as a work memory and executes the program stored in the ROM 103 to function as each component shown in FIG. 2 and execute a series of processes shown in FIG. It should be noted that the CPU 101 does not need to perform all of the processing described below, and the image processing apparatus 100 may be configured such that a part or all of the processing is performed by one or a plurality of processing circuits other than the CPU 101. good.

以下、各構成要素により行われる処理の流れを説明する。ステップＳ３０１において、対象画像データ取得部２０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを取得する。上述の通り、対象画像データとは、前景領域を抽出する対象となる画像である。また、対象画像データ取得部２０１は、対象画像データを撮像したカメラの視点を着目視点と定める。なお、ここでは、対象画像データが１枚の画像である場合について説明しているが、対象画像データが複数枚の画像である場合についても、本実施例を適用することが可能である。さらに、対象画像データ取得部２０１は、対象画像データを撮像したカメラのパラメータ（以下、カメラパラメータ）を、対象画像データとともに取得する。ここでカメラパラメータとは、３次元空間中の点をカメラで撮像される画像上に射影する計算を可能とするパラメータであって、カメラの位置、姿勢を表す外部パラメータと、焦点距離、光学中心を表す内部パラメータとを含む。メモリ上に予め記憶されている計測値や設計値を、カメラパラメータとして用いて良い。対象画像データ取得部２０１は、対象画像データを前景抽出部２０７に、カメラパラメータを画像変換部２０３に出力する。 The flow of processing performed by each component will be described below. In step S301 , the target image data acquisition unit 201 acquires target image data from the external storage device 108 or the secondary storage device 104 via the input interface 105 . As described above, the target image data is an image from which the foreground area is to be extracted. Further, the target image data acquisition unit 201 defines the viewpoint of the camera that captured the target image data as the viewpoint of interest. Although the case where the target image data is a single image is described here, the present embodiment can also be applied to the case where the target image data is a plurality of images. Furthermore, the target image data acquisition unit 201 acquires parameters of the camera that captured the target image data (hereinafter referred to as camera parameters) together with the target image data. Here, the camera parameter is a parameter that enables calculation of projecting a point in a three-dimensional space onto an image captured by the camera, and is an external parameter representing the position and orientation of the camera, a focal length, and an optical center. and an internal parameter representing Measured values and design values stored in memory in advance may be used as camera parameters. The target image data acquisition unit 201 outputs the target image data to the foreground extraction unit 207 and outputs the camera parameters to the image conversion unit 203 .

ステップＳ３０２において、背景画像データ取得部２０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の背景画像データを取得する。ここで背景画像データとは、対象画像データを撮像した際の環境と略同一の環境（天候や時間帯など）における背景の被写体の画像である。なお、上述の通り、本ステップで取得する背景画像データは、前景画像を全く含まない背景画像（完全な背景画像）である必要はない。 In step S302 , the background image data acquisition unit 202 acquires a plurality of background image data at a plurality of different viewpoints from the external storage device 108 or the secondary storage device 104 via the input interface 105 . Here, the background image data is an image of an object in the background in an environment (weather, time of day, etc.) that is substantially the same as the environment when the target image data was captured. Note that, as described above, the background image data acquired in this step does not need to be a background image that does not include any foreground image (a complete background image).

本実施例では、シーンを同一の視点から時系列に沿って連続で撮像することで取得した複数の異なる時刻に対応する複数の画像に対して、中間値フィルタを用いたフィルタ処理を行うことで、各視点における背景画像データを作成する。ただし、背景画像データを作成する手法はこの手法に限られない。例えば、平均値フィルタなど他のフィルタを用いて背景画像データを作成しても良いし、複数の画像に対するクラスタリング処理を行うことで、背景画像データを作成しても良い。また、視点毎に、前景の被写体が存在しない状態で事前に撮像することで取得した背景画像データを用いても良い。 In this embodiment, a plurality of images corresponding to a plurality of different times acquired by continuously capturing a scene from the same viewpoint along the time series are filtered using a median filter. , create background image data for each viewpoint. However, the method of creating background image data is not limited to this method. For example, the background image data may be created using another filter such as a mean filter, or the background image data may be created by performing clustering processing on a plurality of images. Alternatively, background image data obtained by capturing an image in advance without a foreground subject may be used for each viewpoint.

また、背景画像データ取得部２０２は、各背景画像データに対応するカメラパラメータを、背景画像データとともに取得する。さらに、背景画像データ取得部２０２は、複数の背景画像データのそれぞれを区別するため、各背景画像データを、カメラの視点を区別する番号（以下、カメラの視点番号）と対応付けて記憶する。背景画像データ取得部２０２は、背景画像データとカメラパラメータとを画像変換部２０３に出力し、背景画像データのみを補正部２０６に出力する。 The background image data acquisition unit 202 also acquires camera parameters corresponding to each background image data together with the background image data. Furthermore, the background image data acquisition unit 202 stores each background image data in association with a number for distinguishing the viewpoint of the camera (hereinafter referred to as camera viewpoint number) in order to distinguish each of the plurality of background image data. The background image data acquisition unit 202 outputs the background image data and camera parameters to the image conversion unit 203 and outputs only the background image data to the correction unit 206 .

ステップＳ３０３において、画像変換部２０３は、対象画像データ取得部２０１と背景画像データ取得部２０２とから取得したカメラパラメータを用いて、背景画像データ取得部２０２から取得した背景画像データを、着目視点から見た場合の画像へと変換する。具体的には、背景画像データ毎に、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる背景画像（データ）を変換背景画像（データ）と呼ぶ。このように、画像変換部２０３は、変換背景画像データ作成手段として機能する。ここで、本ステップにおける画像変換の手法を、図５を用いて説明する。 In step S303, the image conversion unit 203 uses the camera parameters acquired from the target image data acquisition unit 201 and the background image data acquisition unit 202 to transform the background image data acquired from the background image data acquisition unit 202 from the viewpoint of interest. Convert it into an image as you see it. Specifically, each piece of background image data is subjected to projective transformation with the ground surface as a reference, thereby obtaining an image as seen from the viewpoint of interest. The background image (data) obtained by image conversion in this step is called a converted background image (data). Thus, the image conversion unit 203 functions as a converted background image data creation unit. Here, the method of image conversion in this step will be described with reference to FIG.

図５に示すように、３次元空間中のある点５０１がカメラ５０２の画像に投影されている場合、点５０１とカメラ５０２とを結ぶ直線と、画像面５０３とが交差してできる点５０４が、３次元空間中の点５０１の画像面５０３への投影像となる。同様に、カメラ５０２と異なる位置に存在するカメラ（別視点のカメラ）５０５では、点５０１とカメラ５０５とを結ぶ直線と、画像面５０６とが交差してできる点５０７が、点５０１の画像面５０６への投影像となる。ここで、点５０１を含む、画像面５０３と画像面５０６とに投影されている全ての３次元空間中の点が、地上面である同一平面上に存在する場合について検討する。この場合、カメラ５０２とカメラ５０５とのカメラパラメータによって算出される３×３のホモグラフィ行列Ｈ₀₁を用いて、式（１）により、画像面５０３上の任意の画素の座標（ｕ₀、ｖ₀）は、画像面５０６上の座標（ｕ₁、ｖ₁）へと変換される。 As shown in FIG. 5, when a point 501 in a three-dimensional space is projected onto an image of a camera 502, a point 504 formed by the intersection of a straight line connecting the point 501 and the camera 502 and an image plane 503 is , is a projection image of a point 501 in the three-dimensional space onto an image plane 503. FIG. Similarly, with a camera 505 (a camera with a different viewpoint) that exists at a position different from that of the camera 502 , a point 507 formed by intersecting a straight line connecting the point 501 and the camera 505 with an image plane 506 is the image plane of the point 501 . 506 is a projected image. Here, consider the case where all the points in the three-dimensional space projected onto the image planes 503 and 506, including the point 501, are on the same plane, which is the ground plane. In this case, the coordinates ( _u ₀ , v ₀ ) are transformed to coordinates (u ₁ , v ₁ ) on the image plane 506 .

ステップＳ３０３では、背景画像データ取得部２０２から取得した背景画像データに対応する視点のカメラを上述のカメラ５０２とし、対象画像データ取得部２０１で定めた着目視点のカメラをカメラ５０５とする射影変換を、背景画像データ毎に実行する。このため、本ステップで取得する変換背景画像データの数は、背景画像データ取得部２０２が取得した背景画像データの数と同一である。また、変換背景画像データはそれぞれ、背景画像データ取得部２０２が取得した各背景画像データの視点番号と対応付けて記憶される。画像変換部２０３は、変換背景画像データを一致度算出部２０４と補正部２０６とに出力する。 In step S303, the camera of the viewpoint corresponding to the background image data acquired from the background image data acquisition unit 202 is set to the camera 502 described above, and the camera of the viewpoint of interest determined by the target image data acquisition unit 201 is set to the camera 505. , are executed for each background image data. Therefore, the number of converted background image data acquired in this step is the same as the number of background image data acquired by the background image data acquisition unit 202 . Also, each piece of converted background image data is stored in association with the viewpoint number of each piece of background image data acquired by the background image data acquisition unit 202 . The image conversion unit 203 outputs the converted background image data to the degree-of-match calculation unit 204 and the correction unit 206 .

ステップＳ３０４において、画像変換部２０３は、背景画像データ取得部２０２から取得した背景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を、基準の背景画像（以下、基準背景画像）として定める。具体的には、着目視点の座標（Ｘo，Ｙo，Ｚo）と、背景画像データ取得部２０２から取得した背景画像データに対応する視点の座標（Ｘｉ，Ｙｉ，Ｚｉ）との距離を視点毎に算出する。ここで、ｉは視点番号を表しており、１≦ｉ＜視点数＋１となる。そして、算出した距離が最小となる視点（基準視点）を検出し、基準視点に対応する背景画像（データ）を基準背景画像（データ）とする。画像変換部２０３は、基準背景画像に対応する視点番号を、一致度算出部２０４と補正部２０６とに出力する。本実施例では、基準背景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S304, the image conversion unit 203 selects, from the background image data acquired from the background image data acquisition unit 202, the image corresponding to the viewpoint closest to the camera position (viewpoint) that captured the target image data as a reference. It is defined as a background image (hereinafter referred to as a reference background image). Specifically, the distance between the coordinates (Xo, Yo, Zo) of the viewpoint of interest and the coordinates (Xi, Yi, Zi) of the viewpoint corresponding to the background image data acquired from the background image data acquisition unit 202 is calculated for each viewpoint. calculate. Here, i represents a viewpoint number, and 1≦i<number of viewpoints+1. Then, the viewpoint (reference viewpoint) that minimizes the calculated distance is detected, and the background image (data) corresponding to the reference viewpoint is set as the reference background image (data). The image conversion unit 203 outputs the viewpoint number corresponding to the reference background image to the degree-of-match calculation unit 204 and the correction unit 206 . In this embodiment, the viewpoint number corresponding to the reference background image is called the reference viewpoint number.

ステップＳ３０５において、一致度算出部２０４は、複数の変換背景画像データにおいて画素が一致するかを判定する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について、複数の変換背景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S305, the degree-of-match calculation unit 204 determines a pixel of interest in the converted background image data, which is a target for determining whether pixels match in a plurality of pieces of converted background image data. In this embodiment, first, the upper left pixel of the converted background image data is selected as the pixel of interest, and then unprocessed pixels are sequentially selected as the pixel of interest. It should be noted that the pixel of interest may be determined in any order as long as it is determined whether or not the pixels in a plurality of pieces of converted background image data match for all the pixels of the converted background image data.

ステップＳ３０６において、一致度算出部２０４は、画像変換部２０３から取得した複数の変換背景画像データを用いて、基準視点番号に対応する変換背景画像データと他の変換背景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S306, the degree-of-match calculation unit 204 uses a plurality of pieces of converted background image data acquired from the image conversion unit 203 to determine the difference between the converted background image data corresponding to the reference viewpoint number and other converted background image data. A degree of matching is calculated for the pixel of interest. A method for calculating the degree of matching will be specifically described below.

まず、一致度算出部２０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換背景画像データの画素値Ｂ_j（ｕ₂、ｖ₂）を取得する。ここでｊは複数の変換背景画像データのそれぞれを区別する添え字を表し、一致度算出部２０４は、変換背景画像データの数分の画素値を取得する。次に、一致度算出部２０４は、取得した全画素値の中間値を算出する。この中間値は、一致度を算出する際の基準値Ｍとして用いられる。なお、基準値はこれに限られず、平均値など、複数の画素値の統計的な性質を反映する任意の値を基準値として用いて良い。 First, the matching degree calculation unit 204 acquires the pixel value B _j (u ₂ , v ₂ ) of the converted background image data at the determined coordinates (u ₂ , v ₂ ) of the pixel of interest. Here, j represents a subscript that distinguishes each of a plurality of pieces of converted background image data, and the degree-of-match calculation unit 204 acquires pixel values for the number of pieces of converted background image data. Next, the degree-of-match calculation unit 204 calculates an intermediate value of all the acquired pixel values. This intermediate value is used as the reference value M when calculating the degree of matching. Note that the reference value is not limited to this, and any value that reflects the statistical properties of a plurality of pixel values, such as an average value, may be used as the reference value.

次に、一致度算出部２０４は、着目画素における一致度を、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と算出した基準値Ｍ（ｕ₂、ｖ₂）とを用いて、式（２）により算出する。 Next, the degree-of-match calculation unit 204 calculates the degree of coincidence of the pixel of interest as the pixel value B ₀ (u ₂ , v ₂ ) of the pixel of interest in the converted background image data corresponding to the reference viewpoint number, and the calculated reference value M(u ₂ , v ₂ ) are used to calculate by equation (2).

ここで、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。式（２）により算出する一致度Ｄは、複数の変換背景画像データにおける画素値のばらつきが少ないほど小さくなる。なお、用いる一致度はこれに限られず、画素間の違いを示す任意の値を用いて良い。例えば、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と、他の変換背景画像データにおける着目画素の画素値それぞれとの差分の総和を一致度として用いても良い。 Here, k represents a subscript for identifying RGB3 channels. The degree of matching D calculated by Equation (2) becomes smaller as the variations in pixel values in the plurality of pieces of converted background image data are smaller. Note that the matching degree to be used is not limited to this, and any value indicating the difference between pixels may be used. For example, the sum of the differences between the pixel value B ₀ (u ₂ , v ₂ ) of the pixel of interest in the converted background image data corresponding to the reference viewpoint number and the pixel value of the pixel of interest in the other converted background image data is defined as the degree of matching. You can use it as

ステップＳ３０７において、一致度算出部２０４は、変換背景画像データの全画素についてステップＳ３０５～ステップＳ３０６の処理を行ったかを判定する。ステップＳ３０７の判定の結果が真の場合、一致度算出部２０４は、算出した全画素の一致度を補正判定部２０５に、算出した基準値を補正部２０６に出力し、ステップＳ３０８に進む。一方、ステップＳ３０７の判定の結果が偽の場合、ステップＳ３０５に戻る。 In step S307, the degree-of-match calculation unit 204 determines whether the processing of steps S305 and S306 has been performed for all pixels of the converted background image data. If the result of determination in step S307 is true, the matching degree calculation unit 204 outputs the calculated matching degree of all pixels to the correction determination unit 205 and the calculated reference value to the correction unit 206, and proceeds to step S308. On the other hand, if the result of determination in step S307 is false, the process returns to step S305.

ステップＳ３０８において、補正判定部２０５は、フラグマップを初期化つまりフラグマップの全画素の画素値を０とする。本ステップで初期化するフラグマップは、ステップＳ３１１で基準視点番号に対応する変換背景画像データの画素を補正する際、補正処理の対象となる画素を判定するために用いられる。このフラグマップでは、補正処理の対象の画素に対応する画素値に１が代入され、補正処理の対象ではない画素に対応する画素値に０が代入される。本ステップでの初期化により、基準視点番号に対応する変換背景画像データの全画素について、補正処理の対象ではないとされることとなる。 In step S308, the correction determination unit 205 initializes the flag map, that is, sets the pixel values of all pixels of the flag map to zero. The flag map initialized in this step is used to determine pixels to be corrected when the pixels of the converted background image data corresponding to the reference viewpoint number are corrected in step S311. In this flag map, 1 is assigned to the pixel value corresponding to the pixel to be corrected, and 0 is assigned to the pixel value corresponding to the pixel not to be corrected. Due to the initialization in this step, all pixels of the converted background image data corresponding to the reference viewpoint number are not subject to correction processing.

ステップＳ３０９において、補正判定部２０５は、一致度算出部２０４から取得した一致度に基づいてフラグマップを更新する。具体的には、補正判定部２０５は、基準視点番号に対応する変換背景画像データにおいて前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが低いため、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、一致度Ｄが閾値未満であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが高いため、着目画素が背景の被写体の画像領域の画素である可能性が高いとする。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％～５％の範囲内の任意の値を用いて閾値を決定する。すなわち、任意の値をａとすると、式（２）では一致度として差分二乗和を用いることから、閾値はａ×ａ×3となる。なお、仮に一致度として差分の総和を用いる場合、閾値はａ×3となる。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部２０５は、更新が完了したフラグマップを補正部２０６に出力する。 In step S309 , the correction determination unit 205 updates the flag map based on the matching degree acquired from the matching degree calculation unit 204 . Specifically, the correction determination unit 205 determines the pixel value of the flag map corresponding to the pixel that is highly likely to be the pixel in the image area of the subject in the foreground in the converted background image data corresponding to the reference viewpoint number. to 1. In this embodiment, if the calculated degree of matching D is equal to or greater than a predetermined threshold value, the degree of matching between the pixels of the converted background image data corresponding to the reference viewpoint number and the pixels of other converted background image data is low. , it is determined that there is a high possibility that the pixel of interest is a pixel in the image area of the subject in the foreground. On the other hand, if the degree of matching D is less than the threshold, the degree of matching between the pixels of the converted background image data corresponding to the reference viewpoint number and the pixels of other converted background image data is high. Suppose that there is a high possibility that it is a pixel in a region. Note that the threshold used in this step is determined based on the maximum value of the pixel value, etc., and any value smaller than 20% of the maximum value, for example, within the range of 1% to 5% of the maximum value is used. Determine the threshold. That is, if an arbitrary value is a, the threshold is a×a×3 because the sum of squared differences is used as the degree of matching in Equation (2). Note that if the sum of differences is used as the degree of matching, the threshold is a×3. Also, whether or not the pixel is in the image area of the subject in the foreground is determined for each pixel. Correction determination section 205 outputs the updated flag map to correction section 206 .

ステップＳ３１０において、補正部２０６は、基準視点番号に対応する変換背景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換背景画像データの全画素についてフラグマップに基づく画素値の更新（ステップＳ３１１）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S310, the correction unit 206 determines a pixel of interest in the converted background image data corresponding to the reference viewpoint number. In this embodiment, first, the upper left pixel of the converted background image data corresponding to the reference viewpoint number is selected as the pixel of interest, and then unprocessed pixels are sequentially selected as the pixel of interest. As long as the pixel values of all pixels of the converted background image data corresponding to the reference viewpoint number are updated based on the flag map (step S311), the pixels of interest may be determined in any order.

ステップＳ３１１において、補正部２０６は、補正判定部２０５から取得したフラグマップに基づき、基準視点番号に対応する変換背景画像における着目画素の画素を補正する。本実施例では、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が１である場合、該着目画素の画素値を、一致度算出部２０４で算出した基準値で置き換える。一方、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が０である場合、該着目画素の画素値は変更しない。なお、画素値を補正する手法はこれに限られず、基準視点と隣接する視点に対応する背景画像の画素値で置き換えるなど他の手法を用いても良い。 In step S311 , the correction unit 206 corrects the pixel of interest in the converted background image corresponding to the reference viewpoint number based on the flag map acquired from the correction determination unit 205 . In this embodiment, when the pixel value of the flag map corresponding to the pixel of interest in the converted background image corresponding to the reference viewpoint number is 1, the pixel value of the pixel of interest is calculated using the reference value calculated by the degree-of-match calculation unit 204. replace. On the other hand, if the flag map pixel value corresponding to the pixel of interest in the converted background image corresponding to the reference viewpoint number is 0, the pixel value of the pixel of interest is not changed. Note that the method of correcting pixel values is not limited to this, and other methods such as replacing with pixel values of a background image corresponding to a viewpoint adjacent to the reference viewpoint may be used.

ステップＳ３１２において、補正部２０６は、基準視点番号に対応する変換背景画像データの全画素についてステップＳ３１０～ステップＳ３１１の処理を行ったかを判定する。ステップＳ３１２の判定の結果が真の場合、補正部２０６は、補正が完了した基準視点番号に対応する変換背景画像データを、前景抽出部２０７に出力して、ステップＳ３１３に進む一方、該判定の結果が偽の場合、ステップＳ３１０に戻る。 In step S312, the correction unit 206 determines whether the processing of steps S310 to S311 has been performed for all pixels of the converted background image data corresponding to the reference viewpoint number. If the determination result in step S312 is true, the correction unit 206 outputs the converted background image data corresponding to the corrected reference viewpoint number to the foreground extraction unit 207, and proceeds to step S313. If the result is false, return to step S310.

ステップＳ３１３において、前景抽出部２０７は、補正部２０６から取得した補正が完了した基準視点番号に対応する変換背景画像データ（完全な背景画像Ｉ_bとする）を用いて、対象画像データ（Ｉとする）から前景の被写体による領域を抽出する。具体的には、式（３）に示すように、完全な背景画像Ｉ_bと対象画像データＩとの間で画素毎に差分二乗和を算出し、差分二乗和が閾値以上である画素を前景の被写体の画像領域の画素とみなすことで、前景の被写体による領域を抽出した画像Ｉ_fを作成する。画像Ｉ_fは２値画像であり、前景の被写体の画像領域の画素に対応する画素値に１が代入され、背景の被写体の画像領域の画素に対応する画素値に０が代入される。 In step _S313 , the foreground extraction unit 207 extracts target image data (I and the foreground subject area is extracted from the foreground subject. Specifically, as shown in Equation (3), the sum of squared differences is calculated for each pixel between the complete background image I _b and the target image data I, and the pixels whose sum of squared differences is equal to or greater than a threshold are By regarding pixels in the image area of the subject, an image If in which the area of the subject in the _foreground is extracted is created. The image _If is a binary image in which 1 is assigned to the pixel values corresponding to the pixels in the foreground subject image area, and 0 is assigned to the pixel values corresponding to the pixels in the background subject image area.

ここで、Ｔｈは閾値を表し、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。なお、ここで用いる閾値は、画素値の最大値などに基づいて決定し、画素値の最大値の２０％より小さい値、例えば、最大値の１％～５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は、式（２）の場合と同様である。このように、前景抽出部２０７は、前景画像データ作成手段として機能する。前景抽出部２０７は、作成した画像Ｉ_fを二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 Here, Th represents a threshold, and k represents a subscript for identifying RGB3 channels. Note that the threshold value used here is determined based on the maximum value of the pixel value, etc., and a value smaller than 20% of the maximum value of the pixel value, for example, any value within the range of 1% to 5% of the maximum value. may be used to determine the threshold. The method of obtaining this threshold is the same as in the case of equation (2). Thus, the foreground extraction unit 207 functions as a foreground image data creation unit. The foreground extraction unit 207 outputs the created image _If to the secondary storage device 104, the external storage device 108, or the display device 109, and a series of processing is completed. The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図６を用いて説明する。図６において、画像データ６０１は、従来手法に従って時系列に沿って連続で撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０１には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体が写っている。この理由は、背景画像データを作成するための連続画像を撮像する際に、前景の被写体６０３、６０４が、同一位置に存在し動かなかった結果、背景画像データを作成する際に前景の被写体６０３、６０４が背景の被写体と誤ってみなされたためである。背景画像データ６０１を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０６が取得される。前景画像データ６０６では、被写体６０３、６０４以外の、概ね動いている前景の被写体による領域を抽出できている。しかし、停止している前景の被写体６０３、６０４による領域を、抽出できていない。 <Effects of this embodiment>
The effects of this embodiment will be described below with reference to FIG. In FIG. 6, image data 601 is background image data at a viewpoint 602 created based on a plurality of images captured continuously in time series according to the conventional method. Background image data 601 includes foreground subjects such as a foreground subject 603 (goalkeeper) and a foreground subject (goal) 604 . The reason for this is that the foreground subjects 603 and 604 are in the same position and do not move when the continuous images for creating the background image data are captured. , 604 was mistakenly considered as a background object. When the foreground area is extracted from the target image data 605 using the background image data 601, the foreground image data 606 is obtained. In the foreground image data 606, an area of the foreground subject that is generally moving, other than the subjects 603 and 604, can be extracted. However, the area of the stationary foreground objects 603 and 604 cannot be extracted.

また、画像データ６０７は、従来手法に従って対象画像データ６０５を撮像した時刻と同一時刻に複数の異なる視点から撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０７には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体は写っていないものの、背景の被写体の一部が欠けて写っている。この理由は、背景画像データを作成するために撮像したシーン内で前景の被写体が密集しており、前景の被写体の一部が複数の視点から見えなかったためである。背景画像データ６０７を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０８が取得される。前景画像データ６０８では、地上面からの高さを持つ前景の被写体による領域を概ね抽出できている。しかし、前景の被写体が密集しているために、複数の視点から見えない前景の被写体による領域６０９を、抽出できていない。 Image data 607 is background image data at a viewpoint 602 created based on a plurality of images captured from a plurality of different viewpoints at the same time as the target image data 605 was captured according to the conventional method. Although the background image data 607 does not include foreground subjects such as the foreground subject 603 (goalkeeper) and the foreground subject (goal) 604, part of the background subject is missing. The reason for this is that the foreground subjects are densely packed in the scene imaged to create the background image data, and some of the foreground subjects cannot be seen from a plurality of viewpoints. When the foreground area is extracted from the target image data 605 using the background image data 607, the foreground image data 608 is obtained. In the foreground image data 608, the area of the foreground subject having a height above the ground surface can be roughly extracted. However, since the foreground subjects are densely packed, an area 609 of the foreground subjects that cannot be seen from multiple viewpoints cannot be extracted.

これに対し、本実施例では、複数の異なる視点における不完全な背景画像（例えば、背景画像データ６０１など）を用いて、完全な背景画像である背景画像データ６１０を作成する。背景画像データ６１０を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６１１が取得される。前景画像データ６１１では、停止している前景の被写体６０３、６０４による領域や、複数の視点から見えない前景の被写体による領域を、高精度に抽出できている。このように、本実施例によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 In contrast, in this embodiment, background image data 610, which is a complete background image, is created using incomplete background images (for example, background image data 601) at a plurality of different viewpoints. When the foreground area is extracted from the target image data 605 using the background image data 610, foreground image data 611 is obtained. In the foreground image data 611, the areas of the stationary foreground subjects 603 and 604 and the areas of the foreground subjects that cannot be seen from a plurality of viewpoints can be extracted with high accuracy. As described above, according to the present embodiment, regardless of the presence or absence of change (movement, etc.) of the subject with the passage of time and the state of the subject such as the density of the foreground subject, the area of the foreground subject can be extracted with high accuracy. can do.

［実施例２］
実施例１では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、視点によって異なる変換背景画像の、着目画素における一致の度合いを示す一致度を用いる。一方、本実施例では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、一致度に加えて、視点によって異なる変換背景画像の、着目画素における画素値の変化の滑らかさ度合い、所謂連続性を用いる。なお、実施例１と同様の構成及び同様の処理については、実施例１と同様の符号を付して説明を省略する。 [Example 2]
In the first embodiment, when a complete background image is created based on a plurality of incomplete background images, the degree of matching, which indicates the degree of matching of the pixel of interest of the transformed background image that differs depending on the viewpoint, is used. On the other hand, in this embodiment, when creating a complete background image based on a plurality of incomplete background images, in addition to the degree of matching, the degree of smoothness of change in the pixel value of the pixel of interest in the converted background image, which varies depending on the viewpoint, is calculated. , so-called continuity is used. In addition, the same reference numerals as in the first embodiment are attached to the same configuration and the same processing as in the first embodiment, and the description thereof is omitted.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について説明する。本実施例では、複数の異なる視点における背景画像データを着目視点から見た場合の画像にそれぞれ変換することで得られる変換背景画像データを用いて、視点間の画素値の連続性を算出する。画素値の連続性とは、着目視点における変換背景画像データと該着目視点に隣接する視点における変換背景画像データとの間における、画素値の変化の滑らかさ度合いである。 <Outline of processing for extracting the foreground area>
An overview of the processing for extracting the foreground region in this embodiment will be described below. In this embodiment, the continuity of pixel values between viewpoints is calculated using transformed background image data obtained by transforming background image data at a plurality of different viewpoints into an image viewed from the viewpoint of interest. The continuity of pixel values is the degree of smoothness of change in pixel values between the converted background image data at a viewpoint of interest and the converted background image data at a viewpoint adjacent to the viewpoint of interest.

具体的には、基準視点番号に対応する変換背景画像データにおける着目画素の画素値と、基準視点に隣接する視点における変換背景画像データにおける着目画素の画素値とを比較し、画素値間の差分の総和を連続性として算出する。続いて、実施例１で説明した一致度と、本実施例で算出した連続性とを用いて、一致の度合いが低く、且つ、画素値の変化が滑らかでない画素を、前景の被写体の画像領域の画素である可能性が高いとみなし補正対象の画素として検出する。そして、検出した補正対象の画素の画素値を更新して変換背景画像データを補正することで、完全な背景画像を作成する。最後に、作成した完全な背景画像と対象画像データとを比較して、前景領域を抽出する。 Specifically, the pixel value of the pixel of interest in the converted background image data corresponding to the reference viewpoint number is compared with the pixel value of the pixel of interest in the converted background image data at a viewpoint adjacent to the reference viewpoint, and the difference between the pixel values is calculated. The sum of is calculated as continuity. Subsequently, using the degree of matching described in the first embodiment and the continuity calculated in the present embodiment, pixels with a low degree of matching and uneven pixel value changes are extracted from the image area of the foreground subject. , and is detected as a pixel to be corrected. Then, by updating the pixel values of the detected correction target pixels and correcting the converted background image data, a complete background image is created. Finally, the created complete background image is compared with the target image data to extract the foreground region.

実施例１では、全視点における背景画像データに基づき算出した画素値の一致度のみを用いて、着目画素が前景の被写体の画像領域の画素であるかを判定した。そのため、視点によって色の見え方が変化することにより画素値が異なる背景の被写体の画像領域の画素も、前景の被写体の画像領域の画素である可能性が高いとみなされ、補正対象の画素として検出される。その結果、補正する必要のない画素も補正されてしまうため、補正後の変換背景画像データに誤差が発生し、前景の被写体の画像を含まない完全な背景画像を精度良く作成することができない。視点によって見え方が変化する背景の被写体として、スポーツなどの競技シーンを撮像した画像に存在する、方向性をもって刈られている芝が挙げられる。方向性をもって刈られている芝は、見る方向により芝の色の見え方が異なり、その結果、同一位置の芝であっても視点によって画素値が変化する。このような芝を背景の被写体とするシーンに実施例１を適用した場合、複数の変換背景画像データにおける画素間の一致の度合いは低くなるため、背景の被写体である芝の画像領域の画素が前景の被写体の画像領域の画素であると誤判定される。かかる誤判定を防ぐために、本実施例では、一致度に加えて連続性を用いて、着目画素が前景の被写体の画像領域の画素であるかを判定する。一般的に、視点によって色の見え方が変化する被写体に関しては、離れた視点間で色の見え方に顕著な違いが現れる場合はあるが、近接する視点間での色の見え方の変化は緩やかである。そのため、本実施例では、色の見え方の違いにより画素値が変化した背景の被写体の画像領域の画素と、地上面からの高さを持つために画素値が変化した前景の被写体の画像領域の画素とを区別する。その結果、変換背景画像を精度良く補正して完全な背景画像を作成することができるため、対象画像データから前景の被写体による領域を高精度に抽出することが可能となる。なお、視点によって色の見え方が変化する被写体は上記の芝生の例に限られず、体育館の床など様々なものが存在する。 In Example 1, it is determined whether the pixel of interest is a pixel in the image area of the subject in the foreground using only the matching degree of the pixel values calculated based on the background image data at all viewpoints. Therefore, it is highly likely that pixels in the image area of the background subject that have different pixel values due to changes in color appearance depending on the viewpoint are also pixels in the image area of the foreground subject. detected. As a result, since pixels that do not need to be corrected are also corrected, errors occur in the converted background image data after correction, and a complete background image that does not include the image of the subject in the foreground cannot be created with high accuracy. As a background subject whose appearance changes depending on the viewpoint, grass that is mowed with directionality, which is present in an image of a competition scene such as a sport, is exemplified. In grass that is mowed with directionality, the appearance of the color of the grass differs depending on the viewing direction, and as a result, even if the grass is at the same position, the pixel value changes depending on the viewpoint. When the first embodiment is applied to such a scene in which grass is the background subject, the degree of matching between pixels in a plurality of pieces of converted background image data is low. It is erroneously determined to be a pixel in the image area of the subject in the foreground. In order to prevent such an erroneous determination, in this embodiment, continuity is used in addition to the degree of matching to determine whether the pixel of interest is a pixel in the image area of the subject in the foreground. In general, for a subject whose color appearance changes depending on the viewpoint, there may be a noticeable difference in color appearance between distant viewpoints, but there is no change in color appearance between close viewpoints. Moderate. Therefore, in this embodiment, pixels in the image area of the background subject whose pixel values change due to the difference in color appearance, and an image area of the foreground subject whose pixel values change due to the height above the ground surface. to distinguish between pixels of As a result, the converted background image can be corrected with high accuracy to create a complete background image, so the foreground object area can be extracted with high accuracy from the target image data. It should be noted that the subject whose color appearance changes depending on the viewpoint is not limited to the above example of the lawn, and there are various other subjects such as the floor of a gymnasium.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図７及び図８を用いて説明する。図７は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図８は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図７に示す各構成要素として機能し、図８に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <Regarding the process of extracting the foreground area>
Processing for extracting a foreground region performed by the image processing apparatus 100 according to this embodiment will be described below with reference to FIGS. 7 and 8. FIG. FIG. 7 is a block diagram showing the functional configuration of the image processing apparatus 100 in this embodiment, and FIG. 8 is a flow chart showing the flow of processing for extracting a foreground region in this embodiment. The CPU 101 of the image processing apparatus 100 uses the RAM 102 as a work memory and executes the program stored in the ROM 103 to function as each component shown in FIG. 7 and execute a series of processes shown in FIG. It should be noted that the CPU 101 does not need to perform all of the processing described below, and the image processing apparatus 100 may be configured such that a part or all of the processing is performed by one or a plurality of processing circuits other than the CPU 101. good.

ステップＳ８０１において、連続性算出部７０１は、連続性を算出する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像の左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について連続性の算出が実行されれば、どのような順番で着目画素を決定しても良い。 In step S801, the continuity calculation unit 701 determines a pixel of interest in the converted background image data for which continuity is to be calculated. In this embodiment, first, the upper left pixel of the converted background image is selected as the pixel of interest, and then unprocessed pixels are sequentially selected as the pixel of interest. Note that the pixels of interest may be determined in any order as long as continuity calculation is executed for all pixels of the converted background image data.

ステップＳ８０２において、連続性算出部７０１は、画像変換部２０３から取得した変換背景画像データを用いて、基準視点とその周辺の視点とに対応する変換背景画像における、着目画素の画素値の連続性を算出する。ここで、本ステップにおける連続性の算出手法を、図９を用いて説明する。 In step S802, the continuity calculation unit 701 uses the converted background image data acquired from the image conversion unit 203 to calculate the continuity of the pixel values of the pixel of interest in the converted background images corresponding to the reference viewpoint and its surrounding viewpoints. Calculate Here, a continuity calculation method in this step will be described with reference to FIG.

まず、画像変換部２０３が定めた基準視点番号に対応するカメラ９０１と隣接する、カメラ９０２、９０３を検出し、これらのカメラに対応する視点番号を取得する。以下、取得した視点番号を隣接視点番号と呼ぶ。ここで、基準視点番号に対応するカメラ９０１と隣接するカメラは、カメラの３次元空間中の座標から算出した、カメラ９０１までの距離に基づいて決定される。本実施例では、カメラ９０１の左側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０２と、カメラ９０１の右側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０３とが、カメラ９０１に隣接するカメラとして検出される。 First, the cameras 902 and 903 adjacent to the camera 901 corresponding to the reference viewpoint number determined by the image conversion unit 203 are detected, and the viewpoint numbers corresponding to these cameras are acquired. The acquired viewpoint number is hereinafter referred to as an adjacent viewpoint number. Here, the camera adjacent to the camera 901 corresponding to the reference viewpoint number is determined based on the distance to the camera 901 calculated from the coordinates of the camera in the three-dimensional space. In this embodiment, a camera 902, which has the shortest distance to the camera 901 among the cameras existing on the left side of the camera 901, and a camera 903, which has the shortest distance to the camera 901 among the cameras existing on the right side of the camera 901. is detected as a camera adjacent to camera 901 .

次に、基準視点番号に対応する変換背景画像９０４と隣接視点番号に対応する変換背景画像９０５、９０６とから、着目画素の座標（ｕ₂、ｖ₂）の画素９０７、９０８、９０９の画素値を取得し、該取得した画素値を用いて、式（４）により連続性を算出する。 Next, pixel values of pixels 907, 908, and 909 at the coordinates (u ₂ , v ₂ ) of the target pixel are calculated from the converted background image 904 corresponding to the reference viewpoint number and the converted background images 905 and 906 corresponding to the adjacent viewpoint numbers. is obtained, and using the obtained pixel values, the continuity is calculated by Equation (4).

ここで、Ｂ₉₀₁（ｕ₂、ｖ₂）、Ｂ₉₀₂（ｕ₂、ｖ₂）、Ｂ₉₀₃（ｕ₂、ｖ₂）はそれぞれ、カメラ９０１、９０２、９０３に対応する変換背景画像９０４、９０５、９０６における着目画素９０７、９０８、９０９の画素値を表す。またｋは、ＲＧＢ３チャンネルを識別するための添え字を表す。式（４）により算出するＣの値は、視点間の画素値の変化が滑らかであるほど小さくなる。なお、用いる連続性は、式（４）により算出されるＣに限られず、離散値からの二階微分など、視点間の画素値の連続性を示す任意の値を用いて良い。また、本実施例では、基準視点番号に対応するカメラ９０１と隣接するカメラ９０２、９０３を用いる場合について説明しているが、用いるカメラはこれらに限られず、被写体の見え方によっては他のカメラを用いても良い。例えば、基準視点番号に対応するカメラ９０１の左側で、カメラ９０２の代わりに、カメラ９０２の次にカメラ９０１までの距離が近いカメラを用いても良い。カメラ９０１の右側で用いるカメラについても同様である。 where B ₉₀₁ (u ₂ , v ₂ ), B ₉₀₂ (u ₂ , v ₂ ), B ₉₀₃ (u ₂ , v ₂ ) are transformed background images 904, 905 corresponding to cameras 901, 902, 903, respectively. , 906 represent pixel values of target pixels 907 , 908 , and 909 . Also, k represents a subscript for identifying RGB3 channels. The value of C calculated by Equation (4) becomes smaller as the change in pixel value between viewpoints becomes smoother. Note that the continuity to be used is not limited to C calculated by Equation (4), and any value that indicates the continuity of pixel values between viewpoints, such as a second-order differential from a discrete value, may be used. In this embodiment, the camera 901 corresponding to the reference viewpoint number and the cameras 902 and 903 adjacent to each other are used. However, the cameras to be used are not limited to these. You can use it. For example, on the left side of the camera 901 corresponding to the reference viewpoint number, instead of the camera 902, a camera next to the camera 902 and closest to the camera 901 may be used. The same is true for the camera used on the right side of camera 901 .

ステップＳ８０３において、連続性算出部７０１は、変換背景画像データの全画素についてステップＳ８０１～ステップＳ８０２の処理を行ったかを判定する。ステップＳ８０３の判定の結果が真の場合、連続性算出部７０１は、算出した全画素の連続性を補正判定部７０２に出力し、ステップＳ３０８に進む一方、該判定の結果が偽の場合、ステップＳ８０１に戻る。 In step S803, the continuity calculation unit 701 determines whether the processing of steps S801 and S802 has been performed for all pixels of the converted background image data. If the determination result in step S803 is true, the continuity calculation unit 701 outputs the calculated continuity of all pixels to the correction determination unit 702, and proceeds to step S308. Return to S801.

ステップＳ８０４において、補正判定部７０２は、一致度算出部２０４から取得した一致度と連続性算出部７０１から取得した連続性とに基づいて、フラグマップを更新する。具体的には、補正判定部７０２は、基準視点番号に対応する変換背景画像データにおいて、前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上、かつ、算出した連続性Ｃが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像と他の変換背景画像との着目画素における一致の度合い及び変化の滑らかさ度合いが低いとする。つまり、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、これらの条件を満たさない場合、着目画素が背景の被写体の画像領域の画素である可能性が高いと判定する。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％～５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は実施例１と同様である。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部７０２は、更新が完了したフラグマップを補正部２０６に出力する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S804 , the correction determination unit 702 updates the flag map based on the matching degree obtained from the matching degree calculation unit 204 and the continuity obtained from the continuity calculation unit 701 . Specifically, the correction determination unit 702 determines the pixels of the flag map corresponding to the pixels that are highly likely to be the pixels in the image area of the subject in the foreground in the converted background image data corresponding to the reference viewpoint number. Change the value to 1. In this embodiment, if the calculated degree of matching D is equal to or greater than a predetermined threshold and the calculated continuity C is equal to or greater than a predetermined threshold, the converted background image corresponding to the reference viewpoint number and the other converted background are Assume that the degree of matching and the degree of smoothness of change in the pixel of interest with the image are low. That is, it is determined that there is a high possibility that the pixel of interest is a pixel in the image area of the subject in the foreground. On the other hand, if these conditions are not satisfied, it is determined that there is a high possibility that the pixel of interest is a pixel in the image area of the background subject. Note that the threshold used in this step is determined based on the maximum value of the pixel value, etc., and any value smaller than 20% of the maximum value, for example, within the range of 1% to 5% of the maximum value is used. You can find the threshold. The method of obtaining this threshold value is the same as in the first embodiment. Further, whether or not the pixel is in the image area of the subject in the foreground is determined for each pixel. The correction determination unit 702 outputs the updated flag map to the correction unit 206 . The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１０を用いて説明する。画像データ１００２は、背景画像データを、視点毎に、地上面を基準として視点１００１から見た場合の画像へ変換することで取得する変換背景画像データである。ここで視点１００１は、着目視点、且つ、基準視点であるものとする。また、被写体１００３は、視点によって色の見え方が変化する背景の被写体（例えば、芝生）であり、被写体１００５は、前景の被写体である。 <Effects of this embodiment>
The effects of this embodiment will be described below with reference to FIG. The image data 1002 is converted background image data obtained by converting the background image data for each viewpoint into an image viewed from the viewpoint 1001 with reference to the ground surface. Here, it is assumed that the viewpoint 1001 is the viewpoint of interest and the reference viewpoint. A subject 1003 is a background subject (for example, lawn) whose color changes depending on the viewpoint, and a subject 1005 is a foreground subject.

図１０に示すシーンに実施例１を適用し、多視点の不完全な背景画像に基づき完全な背景画像を作成した場合、背景画像データ１００４が取得される。背景画像データ１００４では、前景の被写体１００５の画像を除去できているものの、背景の被写体１００３が正しく写っていない。この理由は、完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを一致度のみに基づき判定するので、背景の被写体１００３の画像領域の画素が、前景の被写体の画像領域の画素と判定されてしまうためである。この結果、背景の被写体１００３の画像領域の画素が補正された背景画像データ１００４が作成される。背景画像データ１００４を用いて、対象画像データから前景領域を抽出しようとしても、前景領域を精度良く抽出することはできない。 When the first embodiment is applied to the scene shown in FIG. 10 and a complete background image is created based on the multi-viewpoint incomplete background image, background image data 1004 is acquired. In the background image data 1004, the image of the subject 1005 in the foreground is removed, but the subject 1003 in the background is not captured correctly. The reason for this is that when a complete background image is created, whether or not the pixel of interest is a pixel in the image area of the subject in the foreground is determined based only on the degree of matching. This is because the pixels are determined to be pixels in the image area of the subject. As a result, background image data 1004 in which the pixels in the image area of the background subject 1003 are corrected is created. Even if an attempt is made to extract the foreground area from the target image data using the background image data 1004, the foreground area cannot be extracted with high accuracy.

これに対し、本実施例では、多視点の不完全な背景画像に基づき完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを、一致度と連続性とに基づき判定する。この結果、背景の被写体１００３の画像領域の画素を、前景の被写体の画像領域の画素と判定せず、背景の被写体１００３の画像領域の画素が補正されていない背景画像データ１００６が作成される。背景画像データ１００６では、前景の被写体１００５の画像を除去しつつ、背景の被写体１００３が正しく写っている。背景画像データ１００６を用いて対象画像データから前景領域を抽出することで、前景領域を高精度に抽出できるようになる。このように、本実施例によれば、背景の被写体が、視点によって色の見え方が変化する被写体である場合であっても、前景の被写体による領域を高精度に抽出することができる。 On the other hand, in this embodiment, when creating a complete background image based on an incomplete multi-viewpoint background image, whether or not the pixel of interest is a pixel in the image area of the subject in the foreground is determined by matching degree and continuity. Based on As a result, background image data 1006 in which pixels in the image area of the background subject 1003 are not corrected is generated without determining pixels in the image area of the background subject 1003 as pixels in the image area of the foreground subject. In the background image data 1006, the image of the subject 1005 in the foreground is removed while the subject 1003 in the background is captured correctly. By extracting the foreground area from the target image data using the background image data 1006, the foreground area can be extracted with high accuracy. As described above, according to the present embodiment, even when the background subject is a subject whose color appearance changes depending on the viewpoint, the area of the foreground subject can be extracted with high accuracy.

［実施例３］
実施例１及び実施例２では、複数の異なる視点における不完全な背景画像に基づき完全な背景画像を作成し、該作成した完全な背景画像と対象画像データとを比較することで前景の被写体による領域を抽出する。一方、本実施例では、複数の異なる視点における、不完全な前景画像を用いて、影による領域を含まないように前景領域を抽出する。ここで不完全な前景画像とは、前景の被写体による領域と該前景の被写体に付随する影による領域とが前景領域として抽出された画像を意味する。 [Example 3]
In Example 1 and Example 2, a complete background image is created based on incomplete background images at a plurality of different viewpoints, and the created complete background image and target image data are compared to determine the Extract regions. On the other hand, in the present embodiment, incomplete foreground images at a plurality of different viewpoints are used to extract the foreground region so as not to include regions due to shadows. Here, the imperfect foreground image means an image in which a foreground subject area and a shadow area associated with the foreground subject are extracted as the foreground area.

本実施例では、視点毎の不完全な前景画像を、地上面を基準として着目視点から見た場合の画像へと変換することで、複数の変換前景画像データを取得し、該取得した複数の変換前景画像データにおいて画素間の一致度を算出する。実施例１で説明したように前景の被写体は地上面からの高さを持つが、前景の被写体に付随する影は地上面からの高さを持たない。そこで本実施例では、複数の変換前景画像データにおいて画素間の一致の度合いが高い画素を検出し、該検出した画素は高さを持たない影による領域の画素である可能性が高いとして補正する。その結果、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された前景画像を作成できる。以下、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された画像を、完全な前景画像と呼ぶ。なお、上述の実施例と同様の構成及び同様の処理については、上述の実施例と同様の符号を付して説明を省略する。 In the present embodiment, a plurality of transformed foreground image data are acquired by transforming an incomplete foreground image for each viewpoint into an image viewed from the viewpoint of interest with reference to the ground surface. A degree of matching between pixels is calculated in the converted foreground image data. As described in the first embodiment, the foreground subject has a height above the ground, but the shadow attached to the foreground subject does not have a height above the ground. Therefore, in this embodiment, pixels with a high degree of matching between pixels are detected in a plurality of pieces of converted foreground image data, and the detected pixels are corrected as highly likely to be pixels in a shadow area having no height. . As a result, it is possible to create a foreground image in which only the area of the subject in the foreground having height is extracted as the foreground area without extracting the shadow area. Hereinafter, an image in which only a foreground object area having a height is extracted as a foreground area without extracting a shadow area is referred to as a complete foreground image. Note that the same reference numerals as in the above-described embodiment are attached to the same configuration and the same processing as those in the above-described embodiment, and the description thereof is omitted.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図１１及び図１２を用いて説明する。図１１は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図１２は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図１１に示す各構成要素として機能し、図１２に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <Regarding the process of extracting the foreground area>
Processing for extracting a foreground region performed by the image processing apparatus 100 according to this embodiment will be described below with reference to FIGS. 11 and 12. FIG. FIG. 11 is a block diagram showing the functional configuration of the image processing apparatus 100 in this embodiment, and FIG. 12 is a flow chart showing the flow of processing for extracting the foreground region in this embodiment. The CPU 101 of the image processing apparatus 100 uses the RAM 102 as a work memory to execute the program stored in the ROM 103, thereby functioning as each component shown in FIG. 11 and executing a series of processes shown in FIG. It should be noted that the CPU 101 does not need to perform all of the processing described below, and the image processing apparatus 100 may be configured such that a part or all of the processing is performed by one or a plurality of processing circuits other than the CPU 101. good.

ステップＳ１２０１において、カメラパラメータ取得部１１０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを撮像したカメラのカメラパラメータを取得する。また、カメラパラメータ取得部１１０１は、対象画像データを撮像したカメラの視点を着目視点と定める。本ステップで取得するカメラパラメータとは、実施例１で説明したカメラパラメータと同様である。カメラパラメータ取得部１１０１は、カメラパラメータを画像変換部１１０３に出力する。 In step S1201 , the camera parameter acquisition unit 1101 acquires camera parameters of the camera that captured the target image data from the external storage device 108 or the secondary storage device 104 via the input interface 105 . Also, the camera parameter acquisition unit 1101 determines the viewpoint of the camera that captured the target image data as the viewpoint of interest. The camera parameters acquired in this step are the same as the camera parameters described in the first embodiment. The camera parameter acquisition unit 1101 outputs camera parameters to the image conversion unit 1103 .

ステップＳ１２０２において、前景画像データ取得部１１０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の前景画像データを取得する。本ステップで取得する前景画像データは、前景の被写体による領域を抽出した画像であり、該抽出した領域には、影による領域が含まれるものとする。本実施例では、この前景画像データは、事前に撮像した撮像画像と背景画像とに基づき作成される。以下、前景画像データの作成手法を具体的に説明する。ここで用いる撮像画像は、対象画像データにおける前景の被写体及び背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。また、背景画像は、対象画像データにおける背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。本実施例では、視点毎に、撮像画像の画素値と背景画像の画素値とを画素毎に比較し、これらの画素値が同一の座標の画素の画素値を０、そうでない画素の画素値を１とすることで、視点毎の２値画像を作成する。この２値画像が前景画像データである。なお、前景画像データを作成する手法はこれに限られず、また、作成する前景画像データも２値画像に限られず多値画像であっても良い。また、前景画像データ取得部１１０２は、各前景画像データに対応するカメラパラメータを、前景画像データとともに取得する。さらに、前景画像データ取得部１１０２は、複数の前景画像データのそれぞれを区別するため、各前景画像データを、カメラの視点番号と対応付けて記憶する。前景画像データ取得部１１０２は、前景画像データとカメラパラメータとを画像変換部１１０３に出力する。 In step S1202 , the foreground image data acquisition unit 1102 acquires a plurality of foreground image data at a plurality of different viewpoints from the external storage device 108 or the secondary storage device 104 via the input interface 105 . The foreground image data acquired in this step is an image obtained by extracting a foreground object area, and the extracted area includes a shadow area. In this embodiment, the foreground image data is created based on the captured image captured in advance and the background image. A method for creating the foreground image data will be specifically described below. The captured image used here is an image in which the foreground subject and the background subject in the target image data are captured in substantially the same environment as the environment in which the target image data was captured. Also, the background image is an image of the subject in the background of the target image data captured in substantially the same environment as the environment in which the target image data was captured. In this embodiment, for each viewpoint, the pixel value of the captured image and the pixel value of the background image are compared pixel by pixel. is set to 1, a binary image is created for each viewpoint. This binary image is the foreground image data. Note that the method for creating foreground image data is not limited to this, and the foreground image data to be created is not limited to a binary image, and may be a multivalued image. Also, the foreground image data acquisition unit 1102 acquires camera parameters corresponding to each foreground image data together with the foreground image data. Furthermore, the foreground image data acquisition unit 1102 stores each foreground image data in association with the viewpoint number of the camera in order to distinguish each of the plurality of foreground image data. The foreground image data acquisition unit 1102 outputs the foreground image data and camera parameters to the image conversion unit 1103 .

ステップＳ１２０３において、画像変換部１１０３は、カメラパラメータ取得部１１０１と前景画像データ取得部１１０２とから得たカメラパラメータを用いて、前景画像データ取得部１１０２から得た前景画像データを着目視点から見た場合の画像へと変換する。本ステップの変換は、実施例１のステップＳ３０３と同様の変換であり、視点毎に、前景画像データを、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる前景画像（データ）を変換前景画像（データ）と呼ぶ。このように、画像変換部１１０３は、変換前景画像データ作成手段として機能する。画像変換部１１０３は、変換前景画像データを一致度算出部１１０４に出力する。 In step S1203, the image conversion unit 1103 uses the camera parameters obtained from the camera parameter acquisition unit 1101 and the foreground image data acquisition unit 1102 to convert the foreground image data obtained from the foreground image data acquisition unit 1102 from the viewpoint of interest. Convert to an image of the case. The transformation in this step is similar to that in step S303 of the first embodiment, and for each viewpoint, the foreground image data is subject to projective transformation with the ground plane as a reference, thereby obtaining an image viewed from the viewpoint of interest. The foreground image (data) obtained by image conversion in this step is called a converted foreground image (data). Thus, the image conversion unit 1103 functions as a converted foreground image data creating unit. The image conversion unit 1103 outputs the converted foreground image data to the degree-of-match calculation unit 1104 .

ステップＳ１２０４において、画像変換部１１０３は、前景画像データ取得部１１０２から取得した前景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を基準の前景画像（以下、基準前景画像）として定める。具体的には、着目視点の座標と前景画像データに対応する視点の座標との距離を、視点毎に算出する。そして、算出した距離が最小となる視点（基準視点）に対応する前景画像（データ）を基準前景画像（データ）とする。画像変換部１１０３は、基準前景画像に対応する視点番号を、補正部１１０５に出力する。本実施例では、基準前景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S1204, the image conversion unit 1103 converts an image corresponding to a viewpoint closest to the camera position (viewpoint of interest) that captured the target image data from the foreground image data acquired from the foreground image data acquisition unit 1102 to a reference foreground. It is defined as an image (hereinafter referred to as a reference foreground image). Specifically, the distance between the coordinates of the viewpoint of interest and the coordinates of the viewpoint corresponding to the foreground image data is calculated for each viewpoint. Then, the foreground image (data) corresponding to the viewpoint (reference viewpoint) with the smallest calculated distance is set as the reference foreground image (data). The image conversion unit 1103 outputs the viewpoint number corresponding to the reference foreground image to the correction unit 1105 . In this embodiment, the viewpoint number corresponding to the reference foreground image is called the reference viewpoint number.

ステップＳ１２０５では、一致度算出部１１０４は、複数の変換前景画像データにおいて画素が一致するかを判定する対象となる、変換前景画像データにおける着目画素を決定する。本実施例では、まず、変換前景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換前景画像データの全画素について、複数の変換前景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1205, the degree-of-match calculation unit 1104 determines a pixel of interest in the converted foreground image data, which is a target for determining whether pixels match in a plurality of pieces of converted foreground image data. In this embodiment, first, the upper left pixel of the converted foreground image data is selected as the pixel of interest, and then unprocessed pixels are sequentially selected as the pixel of interest. It should be noted that the pixel of interest may be determined in any order as long as it is determined whether pixels in a plurality of pieces of converted foreground image data match for all the pixels of the converted foreground image data.

ステップＳ１２０６において、一致度算出部１１０４は、画像変換部１１０３から取得した複数の変換前景画像データを用いて、基準視点番号に対応する変換前景画像データと他の変換前景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S1206, the degree-of-match calculation unit 1104 uses a plurality of pieces of converted foreground image data acquired from the image conversion unit 1103 to determine the difference between the converted foreground image data corresponding to the reference viewpoint number and other converted foreground image data A degree of matching is calculated for the pixel of interest. A method for calculating the degree of matching will be specifically described below.

まず、一致度算出部１１０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換前景画像データの画素値Ｆ_l（ｕ₂、ｖ₂）を取得する。ここでｌは複数の変換前景画像データのそれぞれを区別する添え字を表し、一致度算出部１１０４は、変換前景画像データの数分の画素値を取得する。次に、一致度算出部１１０４は、取得した全画素値の平均値を算出する。本実施例では、この平均値を一致度として用いる。また、一致度はこれに限られず、複数の画素値の統計的な性質を反映する値を一致度として用いて良い。 First, the matching degree calculation unit 1104 acquires the pixel value F _l (u ₂ , v ₂ ) of the transformed foreground image data at the determined coordinates (u ₂ , v ₂ ) of the pixel of interest. Here, l represents a subscript that distinguishes each of a plurality of pieces of converted foreground image data, and the degree-of-match calculation unit 1104 acquires pixel values for the number of pieces of converted foreground image data. Next, the degree-of-match calculation unit 1104 calculates the average value of all the acquired pixel values. In this embodiment, this average value is used as the matching degree. Also, the degree of matching is not limited to this, and a value that reflects the statistical properties of a plurality of pixel values may be used as the degree of matching.

ステップＳ１２０７において、一致度算出部１１０４は、変換前景画像データの全画素についてステップ１２０５～ステップＳ１２０６の処理を行ったかを判定する。ステップＳ１２０７の判定の結果が真の場合、一致度算出部１１０４は、算出した全画素の一致度を補正部１１０５に出力し、ステップＳ１２０８に進む。一方、ステップＳ１２０７の判定の結果が偽の場合、ステップＳ１２０５に戻る。 In step S1207, the degree-of-match calculation unit 1104 determines whether the processing of steps 1205 and S1206 has been performed for all pixels of the converted foreground image data. If the result of the determination in step S1207 is true, the matching degree calculation unit 1104 outputs the calculated matching degrees of all pixels to the correcting unit 1105, and proceeds to step S1208. On the other hand, if the result of determination in step S1207 is false, the process returns to step S1205.

ステップＳ１２０８において、補正部１１０５は、基準視点番号に対応する変換前景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換前景画像データの左上の画素が着目画素として選択され、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換前景画像データの全画素について一致度に基づく画素値の更新（ステップＳ１２０９）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1208, the correction unit 1105 determines a pixel of interest in the converted foreground image data corresponding to the reference viewpoint number. In this embodiment, first, the upper left pixel of the converted foreground image data corresponding to the reference viewpoint number is selected as the pixel of interest, and unprocessed pixels are sequentially selected as the pixel of interest. As long as pixel values are updated (step S1209) based on the degree of matching for all pixels of the converted foreground image data corresponding to the reference viewpoint number, the pixels of interest may be determined in any order.

ステップＳ１２０９において、補正部１１０５は、一致度算出部１１０４から取得した一致度に基づき、基準視点番号に対応する変換前景画像における影による領域の画素である可能性が高い画素を検出する。そして、補正部１１０５は、検出した画素の画素値を０に変更することで不完全な前景画像から影による領域を取り除く。本実施例では、算出した一致度が事前に定めた閾値以上であれば、全視点における着目画素間の一致の度合いが高いため、着目画素が高さを持たない影による領域の画素である可能性が高いと判定する。そして、基準視点番号に対応する変換前景画像データにおける着目画素の画素値を０に変更する。一方、算出した一致度が閾値未満であれば、全視点における着目画素間の一致の度合いが低く、着目画素が高さを持つ前景の被写体による領域の画素である可能性が高いと判定する。この場合、基準視点番号に対応する変換前景画像における着目画素の画素値を変更しない。なお、本実施例では、閾値として０．８を用いたが、閾値の値はこれに限らない。 In step S1209 , the correction unit 1105 detects pixels that are highly likely to be pixels in a shadow area in the transformed foreground image corresponding to the reference viewpoint number, based on the degree of matching acquired from the degree of matching calculation unit 1104 . Then, the correction unit 1105 changes the pixel value of the detected pixel to 0, thereby removing the region due to the shadow from the incomplete foreground image. In this embodiment, if the calculated degree of matching is equal to or greater than a predetermined threshold value, the degree of matching between pixels of interest from all viewpoints is high. judged to be of high quality. Then, the pixel value of the pixel of interest in the converted foreground image data corresponding to the reference viewpoint number is changed to zero. On the other hand, if the calculated degree of matching is less than the threshold, it is determined that the degree of matching between pixels of interest in all viewpoints is low, and that the pixel of interest is likely to be a pixel in a region of a tall foreground subject. In this case, the pixel value of the pixel of interest in the converted foreground image corresponding to the reference viewpoint number is not changed. Although 0.8 is used as the threshold value in this embodiment, the value of the threshold value is not limited to this.

ステップＳ１２１０において、補正部１１０５は、基準視点番号に対応する変換前景画像データの全画素についてステップＳ１２０８～ステップＳ１２０９の処理を行ったかを判定する。ステップＳ１２１０の判定の結果が真の場合、補正部１１０５は、補正が完了した基準視点番号に対応する変換前景画像データを、二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。一方、ステップＳ１２１０の判定の結果が偽の場合、ステップＳ１２０８に戻る。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S1210, the correction unit 1105 determines whether the processing of steps S1208 and S1209 has been performed for all pixels of the converted foreground image data corresponding to the reference viewpoint number. If the result of determination in step S1210 is true, the correction unit 1105 outputs the converted foreground image data corresponding to the corrected reference viewpoint number to the secondary storage device 104, the external storage device 108, or the display device 109. , the series of processing is completed. On the other hand, if the determination result in step S1210 is false, the process returns to step S1208. The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１３を用いて説明する。被写体１３０１は、被写体自身の影１３０２が地上面１３０３に存在する前景の被写体である。画像データ１３０４は、複数の異なる視点における、被写体１３０１とこれに付随する影１３０２とによる領域を前景領域として抽出した前景画像である。本実施例では、画像データ１３０４を着目視点１３０５から見た場合の画像に変換することで得られる複数の変換前景画像データにおける着目画素間の一致の度合いに基づき、地上面からの高さを持たない影による領域の画素を検出する。そして、検出した画素を補正することで、前景画像１３０６を作成する。前景画像１３０６では、高さを持つ前景の被写体１３０１に付随する影１３０２による領域が取り除かれており、前景の被写体１３０１による領域のみを抽出できている。このように、本実施例によれば、高さを持つ前景の被写体に付随する影が存在する場合であっても、影による領域を抽出することなく、この前景の被写体による領域のみを高精度に抽出することができる。 <Effects of this embodiment>
The effects of this embodiment will be described below with reference to FIG. A subject 1301 is a foreground subject whose shadow 1302 is present on the ground surface 1303 . The image data 1304 is a foreground image obtained by extracting the area of the object 1301 and the shadow 1302 accompanying the subject 1301 at a plurality of different viewpoints as the foreground area. In this embodiment, the height from the ground surface is determined based on the degree of matching between pixels of interest in a plurality of converted foreground image data obtained by converting the image data 1304 into an image when viewed from the viewpoint 1305 of interest. Detect pixels in areas with no shadows. A foreground image 1306 is created by correcting the detected pixels. In the foreground image 1306, the area of the shadow 1302 attached to the foreground object 1301 having height is removed, and only the area of the foreground object 1301 can be extracted. As described above, according to the present embodiment, even if there is a shadow attached to a tall foreground subject, only the area due to the foreground subject is extracted with high accuracy without extracting the area due to the shadow. can be extracted to

なお、本実施例では、不完全な前景画像として、事前に撮像した撮像画像と背景画像とに基づいて作成した前景画像を用いるが、実施例１や実施例２により作成した前景画像を用いてもよい。その場合、実施例１や実施例２と、実施例３とをそれぞれ単独で実行した場合に比べて、前景の被写体による領域を高精度に抽出することができる。 In this embodiment, as an incomplete foreground image, a foreground image created based on an image captured in advance and a background image is used. good too. In this case, the area of the foreground subject can be extracted with higher accuracy than when the first or second embodiment and the third embodiment are executed independently.

［その他の実施例］
本発明の実施形態は、上述の実施例に限られるものではなく、様々な実施形態をとることが可能である。例えば、上述の実施例では、不完全な背景画像である背景画像データのサイズと対象画像データのサイズとが、同一である場合について説明しているが、これらのサイズは同一でなくても良い。その場合、地上面を上から見た視点を基準視点として、背景画像を基準視点から見た場合の画像へと変換する。そして、該変換した画像を用いて背景画像を補正し、該補正した背景画像を着目視点から見た場合の画像へと変換することで、対象画像データに対応する背景画像データを作成する。 [Other Examples]
Embodiments of the present invention are not limited to the examples described above, but can take various forms. For example, in the above embodiment, the size of the background image data, which is an incomplete background image, and the size of the target image data are the same, but these sizes may not be the same. . In that case, the viewpoint of the ground surface from above is used as a reference viewpoint, and the background image is converted into an image when viewed from the reference viewpoint. Then, the background image is corrected using the converted image, and the corrected background image is converted into an image when viewed from the viewpoint of interest, thereby creating background image data corresponding to the target image data.

また、上述の実施例では、一致度の算出や前景の抽出において、ＲＧＢ空間における画素値を用いているが、用いる情報はこれに限られない。例えば、ＨＳＶやＬａｂなどの異なる色空間の画素値を用いて、一致度の算出や前景の抽出を行うようにしても良い。 In addition, in the above-described embodiment, the pixel values in the RGB space are used in the calculation of the degree of matching and the extraction of the foreground, but the information to be used is not limited to this. For example, pixel values in different color spaces such as HSV and Lab may be used to calculate the degree of matching and extract the foreground.

さらに、上述の実施例では、画像を射影変換する際、地上面の一平面のみを基準としているが、地上面に平行な複数の平面を基準として用いても良い。例えば、地上面からの高さが０から１センチメートルまでを等間隔に刻むことで複数の平面を設定し、該設定した平面のそれぞれを基準とする射影変換により得られた変換画像を全て用いて一致度の算出を行うようにしても良い。このようにすることで、カメラパラメータの誤差に対するロバスト性が向上する。 Furthermore, in the above-described embodiment, only one plane of the ground surface is used as a reference when projectively transforming an image, but a plurality of planes parallel to the ground surface may be used as a reference. For example, a plurality of planes are set by dividing the height from the ground surface from 0 to 1 cm at equal intervals, and all transformed images obtained by projective transformation based on each of the set planes are used. The degree of matching may be calculated using the By doing so, the robustness against camera parameter errors is improved.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

２０１・・・対象画像データ取得部
２０２・・・背景画像データ取得部
２０３・・・画像変換部
２０４・・・一致度算出部
２０６・・・補正部
２０７・・・前景抽出部 201... Target image data acquisition unit 202... Background image data acquisition unit 203... Image conversion unit 204... Match degree calculation unit 206... Correction unit 207... Foreground extraction unit

Claims

a first acquisition means for acquiring an image of interest including a subject, which is captured and acquired from a viewpoint of interest;
a second obtaining means for obtaining a plurality of reference images captured from a plurality of viewpoints different from the viewpoint of interest;
generating means for transforming the plurality of reference images acquired by the second acquisition means to generate a plurality of transformed images viewed from the viewpoint of interest;
Based on an index relating to the difference between the pixel value of the pixel of interest in the image of interest acquired by the first acquisition means and the pixel value of the pixel corresponding to the pixel of interest in each of the plurality of converted images generated by the generation means and determining means for determining an image area of the subject in the image of interest,
The determination means identifies pixels to be corrected in the image of interest based on the index, corrects the identified pixels to be corrected, thereby generating a corrected image of interest, and the image of interest acquired by the first acquisition means. and determining an image area of the subject in the image of interest based on the corrected image of interest.
An image processing apparatus characterized by:

2. The image processing apparatus according to claim 1, wherein said determining means specifies pixels for which said index is equal to or greater than a threshold value as pixels to be corrected.

The determining means determines the image area of the subject in the target image based on another index related to the degree of smoothness of change in pixel values of pixels corresponding to the target pixel in each of the plurality of converted images. 3. The image processing apparatus according to claim 1, wherein:

4. The image processing apparatus according to claim 3, wherein said another index is an index based on a converted image at a viewpoint closest to said viewpoint of interest and a converted image at a viewpoint adjacent to said viewpoint.

5. The image processing apparatus according to claim 3, wherein said determining means specifies pixels for which said index is equal to or greater than a threshold and said other index is equal to or greater than said threshold as pixels to be corrected.

The index is the difference between the pixel value of the pixel of interest in the image of interest acquired by the first acquisition means and the pixel value of the pixel corresponding to the pixel of interest in each of the plurality of converted images generated by the generation means. 6. The image processing apparatus according to any one of claims 1 to 5, characterized in that it is an index related to summation.

a first acquisition means for acquiring an image of interest including a subject, which is captured and acquired from a viewpoint of interest;
a second obtaining means for obtaining a plurality of reference images captured from a plurality of viewpoints different from the viewpoint of interest;
generating means for transforming the plurality of reference images acquired by the second acquisition means to generate a plurality of transformed images viewed from the viewpoint of interest;
An index relating to a difference between a pixel value of a pixel of interest in the image of interest acquired by the first acquisition means and a statistical value of pixel values of pixels corresponding to the pixel of interest in each of the plurality of converted images generated by the generation means. determining means for determining the image area of the subject in the image of interest based on;
An image processing device having

8. The image processing apparatus according to claim 7 , wherein the statistical value is an intermediate value or average value of pixel values of pixels corresponding to the target pixel in each of the plurality of converted images.

9. The method according to any one of claims 1 to 8 , wherein the generation means performs projective transformation on each of the plurality of reference images into an image viewed from the viewpoint of interest with reference to the ground surface. Image processing device.

wherein said generating means projectively transforms each of said plurality of reference images into an image viewed from said viewpoint of interest with reference to said ground surface and a plurality of planes parallel to said ground surface. 10. The image processing device according to Item 9 .

The determining means is
identifying a pixel to be corrected in the image of interest based on the index;
generating a corrected image of interest by correcting the identified correction target pixels;
11. The image area of the object in the image of interest is determined based on the image of interest acquired by the first acquisition unit and the corrected image of interest . The described image processing device.

12. The image processing apparatus according to claim 11 , wherein said determining means specifies pixels for which said index is equal to or greater than a threshold value as pixels to be corrected.

The determining means determines the image area of the subject in the target image based on another index related to the degree of smoothness of change in pixel values of pixels corresponding to the target pixel in each of the plurality of converted images. 13. The image processing apparatus according to claim 11 , characterized by:

14. The image processing apparatus according to claim 13 , wherein said another index is an index based on a transformed image at a viewpoint closest to said viewpoint of interest and a transformed image at a viewpoint adjacent to said viewpoint.

15. The image processing apparatus according to claim 13 , wherein the determination unit specifies pixels for which the index is equal to or greater than a threshold and the other index is equal to or greater than a threshold as pixels to be corrected.

a first acquisition step of acquiring an image of interest including the subject, which is captured and acquired from the viewpoint of interest;
a second obtaining step of obtaining a plurality of reference images captured and obtained from a plurality of viewpoints different from the viewpoint of interest;
a generation step of transforming the plurality of reference images acquired in the second acquisition step to generate a plurality of transformed images viewed from the viewpoint of interest;
Based on an index relating to the difference between the pixel value of the pixel of interest in the image of interest acquired by the first acquisition step and the pixel value of the pixel corresponding to the pixel of interest in each of the plurality of converted images generated by the generation step and a determining step of determining an image area of the subject in the image of interest,
In the determining step, based on the index, pixels to be corrected in the image of interest are identified, the identified pixels to be corrected are corrected to generate a corrected image of interest, and the image of interest obtained by the first obtaining step is generated. and determining an image area of the subject in the image of interest based on the corrected image of interest.
An image processing method comprising:

a first acquisition step of acquiring an image of interest including the subject, which is captured and acquired from the viewpoint of interest;
a second obtaining step of obtaining a plurality of reference images captured and obtained from a plurality of viewpoints different from the viewpoint of interest;
a generation step of transforming the plurality of reference images acquired in the second acquisition step to generate a plurality of transformed images viewed from the viewpoint of interest;
An index relating to the difference between the pixel value of the pixel of interest in the image of interest acquired by the first acquisition step and the statistical value of the pixel values of the pixels corresponding to the pixel of interest in each of the plurality of converted images generated by the generation step a determination step of determining the image area of the subject in the image of interest based on;
An image processing method comprising:

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 15 .