JP2021108193A

JP2021108193A - Image processing device, image processing method, and program

Info

Publication number: JP2021108193A
Application number: JP2021068487A
Authority: JP
Inventors: 希名板倉; Kina Itakura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-09-28
Filing date: 2021-04-14
Publication date: 2021-07-29
Anticipated expiration: 2036-09-28
Also published as: JP2018055367A; JP6873644B2; JP7159384B2

Abstract

To extract an area due to a subject in foreground with high accuracy, irrespective of a subject state such as presence or absence of a change of subject (movement, etc.), or dense or sparse arrangement of the subject in the foreground over time.SOLUTION: An image processing device according to the present invention is characterized to comprise: means that acquires target image data being an image by imaging a subject in foreground and a subject in background from an interest viewpoint; means that acquires background image data being the image of the subject in the background in a plurality of different viewpoints; means that generates a plurality of pieces of converted background image data by converting the acquired plurality of pieces of background image data; calculation means that calculates a coincidence degree between interest pixels in the plurality of pieces of converted background image data; means that corrects the converted background image data in a reference viewpoint based on the coincidence degree; and means that generates the foreground image data being an image, in which an area due to the subject in the foreground is extracted, based on the target image data and the corrected converted background image data.SELECTED DRAWING: Figure 3

Description

本発明は、撮像画像から前景の被写体による領域を抽出する技術に関する。 The present invention relates to a technique for extracting a region of a foreground subject from a captured image.

従来、被写体（前景の被写体と背景の被写体とを含む）を撮像することで取得した撮像画像から前景の被写体による領域を抽出する手法として、背景差分法が存在する。背景差分法では、前景の被写体と背景の被写体とが写っている撮像画像の画素値と、背景の被写体のみが写っている背景画像の画素値との画素毎の差分に基づいて、前景の被写体による領域を抽出した前景画像を作成する。このとき、特定の条件の元で予め撮像した背景のみが写っている画像を背景画像として用いた場合、時間の経過に伴う日照の変化などにより背景が変化すると、前景の被写体による領域を抽出する精度が低下してしまうという問題があった。 Conventionally, there is a background subtraction method as a method of extracting a region by a subject in the foreground from an captured image acquired by imaging a subject (including a subject in the foreground and a subject in the background). In the background subtraction method, the subject in the foreground is based on the pixel value of the captured image in which the subject in the foreground and the subject in the background are shown and the pixel value in the background image in which only the subject in the background is shown. Create a foreground image by extracting the area by. At this time, when an image showing only the background imaged in advance under specific conditions is used as the background image, if the background changes due to a change in sunshine over time, the area of the subject in the foreground is extracted. There was a problem that the accuracy was lowered.

上記の問題を解決するために、特許文献１は、撮像時刻が異なる複数の画像に基づいて作成した背景画像を用いることで、背景の変化によらず前景の被写体による領域を抽出する技術を開示する。 In order to solve the above problem, Patent Document 1 discloses a technique for extracting a region due to a subject in the foreground regardless of a change in the background by using a background image created based on a plurality of images having different imaging times. do.

また、特許文献２は、同一時刻において異なる視点から撮像した複数の画像に基づいて作成した背景画像を用いて、時間の経過に伴う被写体の変化によらず前景の被写体による領域を抽出する技術を開示する。 Further, Patent Document 2 discloses a technique of extracting a region of a subject in the foreground by using a background image created based on a plurality of images taken from different viewpoints at the same time regardless of the change of the subject with the passage of time. Disclose.

特開２０１２−１０４０５３号公報Japanese Unexamined Patent Publication No. 2012-104053 特開２０１４−２３０１８０号公報Japanese Unexamined Patent Publication No. 2014-230180

しかしながら、特許文献１では、前景の被写体が動かないで停止している場合、この前景の被写体による領域を背景の被写体による領域と誤って判定するため、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 However, in Patent Document 1, when the subject in the foreground is stationary and stopped, the area due to the subject in the foreground is erroneously determined as the area due to the subject in the background, so that the background image cannot be created accurately. Therefore, there is a problem that the accuracy of extracting the region by the subject in the foreground is lowered.

また、特許文献２では、単一の視点からでは見えない背景の被写体の情報を、他の視点における情報により補うことで背景画像を作成するが、シーン内に存在する前景の被写体が密集し前景の被写体が重なる領域などにおいて、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 Further, in Patent Document 2, a background image is created by supplementing information on a background subject that cannot be seen from a single viewpoint with information from another viewpoint, but foreground subjects existing in the scene are densely packed in the foreground. The background image cannot be created accurately in the area where the subjects overlap. Therefore, there is a problem that the accuracy of extracting the region by the subject in the foreground is lowered.

そこで本発明は、上記の課題を鑑みて、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することを目的とする。 Therefore, in view of the above problems, the present invention has high accuracy in determining the area of the foreground subject regardless of the subject state such as the presence or absence of changes (movement, etc.) of the subject with the passage of time and the sparseness of the foreground subject. The purpose is to extract.

本発明は、着目視点から前景の被写体と背景の被写体とを撮像した画像である対象画像データを取得する対象画像データ取得手段と、複数の異なる視点における、前記背景の被写体の画像である背景画像データを取得する背景画像データ取得手段と、前記取得した複数の背景画像データを、着目視点から見た場合の画像へとそれぞれ変換することで、複数の変換背景画像データを作成する変換背景画像データ作成手段と、前記複数の変換背景画像データにおける着目画素間の一致の度合いを示す一致度を算出する算出手段と、前記着目視点との距離に応じて決定された視点における前記変換背景画像データを、前記一致度に基づき補正する補正手段と、前記対象画像データと前記補正した変換背景画像データとに基づき、前記前景の被写体による領域が抽出された画像である前景画像データを作成する前景画像データ作成手段とを有することを特徴とする画像処理装置である。 The present invention comprises a target image data acquisition means for acquiring target image data which is an image of a foreground subject and a background subject from a viewpoint of interest, and a background image which is an image of the background subject from a plurality of different viewpoints. Converted background image data that creates a plurality of converted background image data by converting the background image data acquisition means for acquiring data and the plurality of acquired background image data into images when viewed from the viewpoint of interest. The creation means, the calculation means for calculating the degree of matching indicating the degree of matching between the pixels of interest in the plurality of converted background image data, and the converted background image data in the viewpoint determined according to the distance from the viewpoint of interest. Foreground image data that creates foreground image data that is an image in which a region of the foreground subject is extracted based on the correction means that corrects based on the degree of coincidence, the target image data, and the corrected conversion background image data. It is an image processing apparatus characterized by having a creating means.

本発明によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 According to the present invention, it is possible to extract a region of a subject in the foreground with high accuracy regardless of the state of the subject such as the presence or absence of a change (movement or the like) of the subject with the passage of time and the density of the subject in the foreground.

実施例１乃至３における画像処理装置のハードウェア構成を示すブロック図A block diagram showing a hardware configuration of the image processing apparatus according to the first to third embodiments. 実施例１における画像処理装置の機能構成を示すブロック図The block diagram which shows the functional structure of the image processing apparatus in Example 1. 実施例１における前景領域を抽出する処理の流れを示すフローチャートFlow chart showing the flow of the process of extracting the foreground region in the first embodiment 実施例１における前景領域を抽出する処理の概要を説明する図The figure explaining the outline of the process of extracting the foreground region in Example 1. 実施例１における画像変換を説明する図The figure explaining the image conversion in Example 1. 実施例１の効果を説明する図The figure explaining the effect of Example 1. 実施例２における画像処理装置の機能構成を示すブロック図The block diagram which shows the functional structure of the image processing apparatus in Example 2. 実施例２における前景領域を抽出する処理の流れを示すフローチャートFlow chart showing the flow of the process of extracting the foreground region in the second embodiment 実施例２における連続性の算出手法を説明する図The figure explaining the continuity calculation method in Example 2. 実施例２の効果を説明する図The figure explaining the effect of Example 2. 実施例３における画像処理装置の機能構成を示すブロック図The block diagram which shows the functional structure of the image processing apparatus in Example 3. 実施例３における前景領域を抽出する処理の流れを示すフローチャートFlow chart showing the flow of the process of extracting the foreground region in the third embodiment 実施例３の効果を説明する図The figure explaining the effect of Example 3.

以下、本発明の実施形態について、図面を参照して説明する。ただし、以下の実施形態は本発明を限定するものではなく、また、以下の実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成要素については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the following embodiments do not limit the present invention, and not all combinations of features described in the following embodiments are essential to the means of solving the present invention. The same components will be described with the same reference numerals.

［実施例１］
実施例１では、多視点画像、具体的には複数の異なる視点における前景の被写体の画像を一部含む背景の被写体の画像（以下、不完全な背景画像）に基づき、着目視点における前景の被写体の画像を含まない背景の被写体の画像（以下、完全な背景画像）を作成する。そして、完全な背景画像を用いて、処理対象の画像から前景の被写体による領域を抽出する。 [Example 1]
In the first embodiment, the subject in the foreground at the viewpoint of interest is based on a multi-viewpoint image, specifically, an image of a background subject including a part of an image of the subject in the foreground at a plurality of different viewpoints (hereinafter, an incomplete background image). Create an image of the subject in the background that does not include the image of (hereinafter referred to as the complete background image). Then, using the complete background image, the area of the subject in the foreground is extracted from the image to be processed.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について、図４を用いて説明する。本実施例では、まず、複数の異なる視点における背景画像データ４０１を取得する。背景画像データとは、背景の被写体の画像、所謂背景画像である。ここで取得する背景画像データは、前景の被写体の画像（以下、前景画像）を全く含まない完全な背景画像である必要はないが、前景領域を抽出する対象の画像を撮像した時刻に近い時刻に撮像した画像であることが望ましい。取得する複数の背景画像データ４０１の中には、前景領域を抽出する対象の画像を撮像した視点４０２と同一の視点における画像が含まれているものとする。以下、前景領域を抽出する対象の画像を対象画像（データ）と呼び、対象画像（データ）を撮像した視点を着目視点と呼ぶ。 <Overview of the process of extracting the foreground area>
Hereinafter, the outline of the process of extracting the foreground region in this embodiment will be described with reference to FIG. In this embodiment, first, background image data 401 from a plurality of different viewpoints is acquired. The background image data is an image of a subject in the background, a so-called background image. The background image data acquired here does not have to be a complete background image that does not include the image of the subject in the foreground (hereinafter referred to as the foreground image) at all, but is a time close to the time when the image of the target for extracting the foreground region was captured. It is desirable that the image is taken in. It is assumed that the plurality of background image data 401 to be acquired include an image at the same viewpoint as the viewpoint 402 that captured the image of the target for which the foreground region is to be extracted. Hereinafter, the target image from which the foreground region is extracted is referred to as a target image (data), and the viewpoint obtained by capturing the target image (data) is referred to as a viewpoint of interest.

次に、取得した背景画像データ４０１を、視点毎に、地上面を基準として着目視点４０２から見た場合の画像へと変換することで、着目視点における背景画像データ４０３を作成する。ここで作成される背景画像データ４０３の数は、背景画像データ４０１の数と同一である。以下、背景画像データ４０１を変換することで得られる背景画像データ４０３を、変換背景画像データ４０３と呼ぶ。 Next, the background image data 401 at the viewpoint of interest is created by converting the acquired background image data 401 into an image when viewed from the viewpoint 402 of interest with respect to the ground surface as a reference. The number of background image data 403 created here is the same as the number of background image data 401. Hereinafter, the background image data 403 obtained by converting the background image data 401 will be referred to as a converted background image data 403.

ここで、前景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して近い位置に存在する被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、選手や審判などの人物や、ゴールやボールなどの器具が前景の被写体であり、前景の被写体には、時系列に沿って連続で撮像した複数の画像において概ね動き続けるものが含まれる。一方で、背景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して遠い位置に存在するため前景の被写体の背後となる被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、芝や土で構成されるグラウンド、体育館の床などが背景の被写体であり、背景の被写体は、時系列に沿って連続で撮像した複数の画像において概ね止まっているものが多い。 Here, the subject in the foreground means a subject existing in a position close to the image pickup apparatus among the subjects included in the captured image. For example, when the target image data is data obtained by capturing a competition scene such as sports, a person such as a player or a referee or an instrument such as a goal or a ball is a subject in the foreground, and the subject in the foreground is in chronological order. This includes those that generally continue to move in a plurality of images taken continuously. On the other hand, the background subject means a subject behind the subject in the foreground because it exists at a position far from the imaging device among the subjects included in the captured image. For example, when the target image data is data obtained by capturing a competition scene such as sports, the background subject is a ground composed of turf or soil, the floor of a gymnasium, etc., and the background subject is continuous in chronological order. Most of the multiple images taken in 1 are almost stopped.

このような前景の被写体は地上面からの高さを持つ一方で、背景の被写体は地上面からの高さを持たない。そのため、複数の変換背景画像データ４０３を用いて、地上面からの高さを持つ被写体つまり前景の被写体の画像（前景画像）を検出し、該検出した前景画像を不完全な背景画像から除去することで、着目視点４０２における完全な背景画像を作成する。具体的には、着目視点４０２における画像を含む複数の変換背景画像データ４０３について、着目画素間の一致の度合いを画素毎に算出し、一致の度合いが低い画素を前景の被写体の画像領域の画素として検出する。上述の通り、変換背景画像データ４０３は、背景画像データ４０１を、地上面を基準面として着目視点４０２から見た場合の画像に変換することで得られる画像である。そのため、地上面に存在し高さを持たない被写体４０４に対応する、背景画像データ４０１における領域４０５〜４０７の画素の座標はそれぞれ、全ての変換背景画像データ４０３において共通して同じ位置に存在する領域４０８の画素の座標へと変換される。一方、高さを持つ被写体４０９に対応する、背景画像データ４０１における領域４１０〜４１２の画素の座標はそれぞれ、視点によって位置が異なる領域４１３〜４１５の画素の座標へと変換される。従って、複数の変換背景画像データ４０３において、着目画素間の一致の度合いが高い画素を、高さを持たない背景の被写体の画像領域の画素とみなし、一致の度合いが低い画素を、高さを持つ前景の被写体の画像領域の画素とみなす。これにより、完全な背景画像を作成する。最後に、作成した着目視点４０２における完全な背景画像と対象画像データとを比較することで前景領域を抽出する。 Such a foreground subject has a height above the ground surface, while a background subject does not have a height above the ground surface. Therefore, using the plurality of converted background image data 403, an image (foreground image) of a subject having a height from the ground surface, that is, a subject in the foreground is detected, and the detected foreground image is removed from the incomplete background image. By doing so, a complete background image at the viewpoint 402 of interest is created. Specifically, for a plurality of converted background image data 403 including the image at the viewpoint 402 of interest, the degree of matching between the pixels of interest is calculated for each pixel, and the pixels with a low degree of matching are the pixels of the image region of the subject in the foreground. Detect as. As described above, the converted background image data 403 is an image obtained by converting the background image data 401 into an image when viewed from the viewpoint 402 of interest with the ground surface as a reference plane. Therefore, the coordinates of the pixels of the regions 405 to 407 in the background image data 401 corresponding to the subject 404 existing on the ground surface and having no height are commonly present at the same position in all the converted background image data 403. It is converted to the coordinates of the pixels of the area 408. On the other hand, the coordinates of the pixels of the regions 410 to 412 in the background image data 401 corresponding to the subject 409 having the height are converted into the coordinates of the pixels of the regions 413 to 415 whose positions differ depending on the viewpoint. Therefore, in the plurality of converted background image data 403, pixels having a high degree of matching between the pixels of interest are regarded as pixels in the image region of the subject of the background having no height, and pixels having a low degree of matching are defined as heights. It is regarded as a pixel in the image area of the subject in the foreground. This creates a complete background image. Finally, the foreground region is extracted by comparing the complete background image at the created viewpoint 402 of interest with the target image data.

以上が、本実施例で行われる処理の概要である。なお、用いる対象画像データは上記の例に限られず、監視カメラで撮像したデータなど様々な画像データを用いることができる。また、ここでは、背景画像データ４０１の中に着目視点における画像が含まれる場合について説明したが、背景画像データの中に着目視点における画像が含まれない場合にも本実施例を適用可能であり、具体的な処理方法は後述する。 The above is the outline of the processing performed in this embodiment. The target image data to be used is not limited to the above example, and various image data such as data captured by a surveillance camera can be used. Further, although the case where the background image data 401 includes the image at the viewpoint of interest is described here, the present embodiment can be applied even when the background image data does not include the image at the viewpoint of interest. , The specific processing method will be described later.

＜画像処理装置のハードウェア構成について＞
以下、本実施例の画像処理装置のハードウェア構成について述べる。図１は、本実施例の画像処理装置のハードウェア構成の一例を示すブロック図である。本実施例の画像処理装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、二次記憶装置１０４、入力インターフェース１０５、及び出力インターフェース１０６を備え、これらの構成要素は、システムバス１０７によって相互に接続されている。また、画像処理装置１００は、入力インターフェース１０５を介して外部記憶装置１０８に接続されており、出力インターフェース１０６を介して外部記憶装置１０８と表示装置１０９とに接続されている。 <About the hardware configuration of the image processing device>
Hereinafter, the hardware configuration of the image processing apparatus of this embodiment will be described. FIG. 1 is a block diagram showing an example of the hardware configuration of the image processing device of this embodiment. The image processing device 100 of this embodiment includes a CPU 101, a RAM 102, a ROM 103, a secondary storage device 104, an input interface 105, and an output interface 106, and these components are connected to each other by a system bus 107. Further, the image processing device 100 is connected to the external storage device 108 via the input interface 105, and is connected to the external storage device 108 and the display device 109 via the output interface 106.

ＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして、ＲＯＭ１０３に格納されたプログラムを実行し、システムバス１０７を介して画像処理装置１００の各構成要素を統括的に制御する。これにより、後述する様々な処理が実行される。 The CPU 101 uses the RAM 102 as a work memory to execute a program stored in the ROM 103, and comprehensively controls each component of the image processing device 100 via the system bus 107. As a result, various processes described later are executed.

二次記憶装置１０４は、画像処理装置１００で取り扱われる種々のデータを記憶する記憶装置であり、本実施例ではＨＤＤが用いられる。ＣＰＵ１０１は、システムバス１０７を介して二次記憶装置１０４へのデータの書き込みと二次記憶装置１０４に記憶されたデータの読出しとを行うことができる。なお、二次記憶装置１０４としてＨＤＤの他に、光ディスクドライブやフラッシュメモリなど、様々な記憶デバイスを用いることが可能である。 The secondary storage device 104 is a storage device that stores various data handled by the image processing device 100, and an HDD is used in this embodiment. The CPU 101 can write data to the secondary storage device 104 and read data stored in the secondary storage device 104 via the system bus 107. In addition to the HDD, various storage devices such as an optical disk drive and a flash memory can be used as the secondary storage device 104.

入力インターフェース１０５は、例えばＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースであり、外部装置から画像処理装置１００へのデータや命令等の入力は、入力インターフェース１０５を介して行われる。画像処理装置１００は、入力インターフェース１０５を介して、外部記憶装置１０８（例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリなどの記憶媒体）からデータを取得する。なお、入力インターフェース１０５にはマウスやキーボードなどユーザーが入力するための入力デバイス（不図示）も接続可能である。出力インターフェース１０６は、入力インターフェース１０５と同様のＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースの他に、例えばＤＶＩやＨＤＭＩ（登録商標）等の映像出力端子も含む。画像処理装置１００から外部装置へのデータの出力は、出力インターフェース１０６を介して行われる。画像処理装置１００は、出力インターフェース１０６を介して表示装置１０９（液晶ディスプレイなどの各種画像表示デバイス）に処理した画像などを出力することで、画像の表示を行う。なお、画像処理装置１００の構成要素は上述のもの以外にも存在するが、本発明の主眼ではないため、説明を省略する。 The input interface 105 is, for example, a serial bus interface such as USB or IEEE 1394, and data, commands, and the like are input from an external device to the image processing device 100 via the input interface 105. The image processing device 100 acquires data from an external storage device 108 (for example, a storage medium such as a hard disk, a memory card, a CF card, an SD card, or a USB memory) via an input interface 105. An input device (not shown) for the user to input, such as a mouse or a keyboard, can also be connected to the input interface 105. The output interface 106 includes, for example, a video output terminal such as DVI or HDMI (registered trademark), in addition to a serial bus interface such as USB or IEEE1394 similar to the input interface 105. Data is output from the image processing device 100 to the external device via the output interface 106. The image processing device 100 displays an image by outputting the processed image or the like to the display device 109 (various image display devices such as a liquid crystal display) via the output interface 106. Although there are components of the image processing apparatus 100 other than those described above, the description thereof will be omitted because they are not the main focus of the present invention.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図２及び図３を用いて説明する。図２は、画像処理装置１００の機能構成を示すブロック図であり、図３は、前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図２に示す各構成要素として機能し、図３に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About the process of extracting the foreground area>
Hereinafter, the process of extracting the foreground region executed by the image processing apparatus 100 in this embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram showing a functional configuration of the image processing device 100, and FIG. 3 is a flowchart showing a flow of processing for extracting a foreground region. The CPU 101 of the image processing device 100 functions as each component shown in FIG. 2 by executing a program stored in the ROM 103 using the RAM 102 as a work memory, and executes a series of processes shown in FIG. It should be noted that it is not necessary that all of the processes shown below are executed by the CPU 101, and even if the image processing device 100 is configured so that a part or all of the processes are performed by one or a plurality of processing circuits other than the CPU 101. good.

以下、各構成要素により行われる処理の流れを説明する。ステップＳ３０１において、対象画像データ取得部２０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを取得する。上述の通り、対象画像データとは、前景領域を抽出する対象となる画像である。また、対象画像データ取得部２０１は、対象画像データを撮像したカメラの視点を着目視点と定める。なお、ここでは、対象画像データが１枚の画像である場合について説明しているが、対象画像データが複数枚の画像である場合についても、本実施例を適用することが可能である。さらに、対象画像データ取得部２０１は、対象画像データを撮像したカメラのパラメータ（以下、カメラパラメータ）を、対象画像データとともに取得する。ここでカメラパラメータとは、３次元空間中の点をカメラで撮像される画像上に射影する計算を可能とするパラメータであって、カメラの位置、姿勢を表す外部パラメータと、焦点距離、光学中心を表す内部パラメータとを含む。メモリ上に予め記憶されている計測値や設計値を、カメラパラメータとして用いて良い。対象画像データ取得部２０１は、対象画像データを前景抽出部２０７に、カメラパラメータを画像変換部２０３に出力する。 Hereinafter, the flow of processing performed by each component will be described. In step S301, the target image data acquisition unit 201 acquires the target image data from the external storage device 108 or from the secondary storage device 104 via the input interface 105. As described above, the target image data is an image to be extracted from the foreground region. Further, the target image data acquisition unit 201 defines the viewpoint of the camera that has captured the target image data as the viewpoint of interest. Although the case where the target image data is one image is described here, the present embodiment can be applied to the case where the target image data is a plurality of images. Further, the target image data acquisition unit 201 acquires the parameters of the camera that captured the target image data (hereinafter, camera parameters) together with the target image data. Here, the camera parameter is a parameter that enables calculation to project a point in three-dimensional space onto an image captured by the camera, and is an external parameter representing the position and orientation of the camera, a focal length, and an optical center. Includes internal parameters that represent. Measured values and design values stored in advance in the memory may be used as camera parameters. The target image data acquisition unit 201 outputs the target image data to the foreground extraction unit 207 and the camera parameters to the image conversion unit 203.

ステップＳ３０２において、背景画像データ取得部２０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の背景画像データを取得する。ここで背景画像データとは、対象画像データを撮像した際の環境と略同一の環境（天候や時間帯など）における背景の被写体の画像である。なお、上述の通り、本ステップで取得する背景画像データは、前景画像を全く含まない背景画像（完全な背景画像）である必要はない。 In step S302, the background image data acquisition unit 202 acquires a plurality of background image data from a plurality of different viewpoints from the external storage device 108 or from the secondary storage device 104 via the input interface 105. Here, the background image data is an image of a background subject in an environment (weather, time zone, etc.) that is substantially the same as the environment in which the target image data was captured. As described above, the background image data acquired in this step does not have to be a background image (complete background image) that does not include a foreground image at all.

本実施例では、シーンを同一の視点から時系列に沿って連続で撮像することで取得した複数の異なる時刻に対応する複数の画像に対して、中間値フィルタを用いたフィルタ処理を行うことで、各視点における背景画像データを作成する。ただし、背景画像データを作成する手法はこの手法に限られない。例えば、平均値フィルタなど他のフィルタを用いて背景画像データを作成しても良いし、複数の画像に対するクラスタリング処理を行うことで、背景画像データを作成しても良い。また、視点毎に、前景の被写体が存在しない状態で事前に撮像することで取得した背景画像データを用いても良い。 In this embodiment, a plurality of images corresponding to a plurality of different times acquired by continuously imaging a scene from the same viewpoint along a time series are filtered by using an intermediate value filter. , Create background image data for each viewpoint. However, the method of creating background image data is not limited to this method. For example, the background image data may be created by using another filter such as an average value filter, or the background image data may be created by performing clustering processing on a plurality of images. Further, the background image data acquired by taking an image in advance in a state where the subject in the foreground does not exist may be used for each viewpoint.

また、背景画像データ取得部２０２は、各背景画像データに対応するカメラパラメータを、背景画像データとともに取得する。さらに、背景画像データ取得部２０２は、複数の背景画像データのそれぞれを区別するため、各背景画像データを、カメラの視点を区別する番号（以下、カメラの視点番号）と対応付けて記憶する。背景画像データ取得部２０２は、背景画像データとカメラパラメータとを画像変換部２０３に出力し、背景画像データのみを補正部２０６に出力する。 Further, the background image data acquisition unit 202 acquires the camera parameters corresponding to each background image data together with the background image data. Further, the background image data acquisition unit 202 stores each background image data in association with a number for distinguishing the viewpoint of the camera (hereinafter, a viewpoint number of the camera) in order to distinguish each of the plurality of background image data. The background image data acquisition unit 202 outputs the background image data and the camera parameters to the image conversion unit 203, and outputs only the background image data to the correction unit 206.

ステップＳ３０３において、画像変換部２０３は、対象画像データ取得部２０１と背景画像データ取得部２０２とから取得したカメラパラメータを用いて、背景画像データ取得部２０２から取得した背景画像データを、着目視点から見た場合の画像へと変換する。具体的には、背景画像データ毎に、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる背景画像（データ）を変換背景画像（データ）と呼ぶ。このように、画像変換部２０３は、変換背景画像データ作成手段として機能する。ここで、本ステップにおける画像変換の手法を、図５を用いて説明する。 In step S303, the image conversion unit 203 uses the camera parameters acquired from the target image data acquisition unit 201 and the background image data acquisition unit 202 to obtain the background image data acquired from the background image data acquisition unit 202 from the viewpoint of interest. Converts to an image when viewed. Specifically, each background image data is projected and transformed with the ground surface as a reference to obtain an image when viewed from the viewpoint of interest. The background image (data) obtained by the image conversion in this step is called a converted background image (data). In this way, the image conversion unit 203 functions as a conversion background image data creation means. Here, the image conversion method in this step will be described with reference to FIG.

図５に示すように、３次元空間中のある点５０１がカメラ５０２の画像に投影されている場合、点５０１とカメラ５０２とを結ぶ直線と、画像面５０３とが交差してできる点５０４が、３次元空間中の点５０１の画像面５０３への投影像となる。同様に、カメラ５０２と異なる位置に存在するカメラ（別視点のカメラ）５０５では、点５０１とカメラ５０５とを結ぶ直線と、画像面５０６とが交差してできる点５０７が、点５０１の画像面５０６への投影像となる。ここで、点５０１を含む、画像面５０３と画像面５０６とに投影されている全ての３次元空間中の点が、地上面である同一平面上に存在する場合について検討する。この場合、カメラ５０２とカメラ５０５とのカメラパラメータによって算出される３×３のホモグラフィ行列Ｈ₀₁を用いて、式（１）により、画像面５０３上の任意の画素の座標（ｕ₀、ｖ₀）は、画像面５０６上の座標（ｕ₁、ｖ₁）へと変換される。 As shown in FIG. 5, when a certain point 501 in the three-dimensional space is projected on the image of the camera 502, the point 504 formed by the intersection of the straight line connecting the point 501 and the camera 502 and the image surface 503 is formed. It is a projected image of the point 501 in the three-dimensional space on the image plane 503. Similarly, in the camera (camera of another viewpoint) 505 existing at a position different from the camera 502, the point 507 formed by the intersection of the straight line connecting the point 501 and the camera 505 and the image surface 506 is the image surface of the point 501. It becomes a projection image to 506. Here, a case where all the points in the three-dimensional space projected on the image surface 503 and the image surface 506 including the point 501 exist on the same plane which is the ground surface will be examined. _{In this case, using the 3 × 3 homography matrix H 01} calculated by the camera parameters of the camera 502 and the camera 505, _{the coordinates (u 0} , v) of any pixel on the image plane 503 are calculated by the equation (1). ₀ ) is converted to _{the coordinates (u 1} , v ₁ ) on the image plane 506.

ステップＳ３０３では、背景画像データ取得部２０２から取得した背景画像データに対応する視点のカメラを上述のカメラ５０２とし、対象画像データ取得部２０１で定めた着目視点のカメラをカメラ５０５とする射影変換を、背景画像データ毎に実行する。このため、本ステップで取得する変換背景画像データの数は、背景画像データ取得部２０２が取得した背景画像データの数と同一である。また、変換背景画像データはそれぞれ、背景画像データ取得部２０２が取得した各背景画像データの視点番号と対応付けて記憶される。画像変換部２０３は、変換背景画像データを一致度算出部２０４と補正部２０６とに出力する。 In step S303, projection conversion is performed in which the camera of the viewpoint corresponding to the background image data acquired from the background image data acquisition unit 202 is the camera 502 described above, and the camera of the viewpoint of interest defined by the target image data acquisition unit 201 is the camera 505. , Execute for each background image data. Therefore, the number of converted background image data acquired in this step is the same as the number of background image data acquired by the background image data acquisition unit 202. Further, the converted background image data is stored in association with the viewpoint number of each background image data acquired by the background image data acquisition unit 202. The image conversion unit 203 outputs the conversion background image data to the matching degree calculation unit 204 and the correction unit 206.

ステップＳ３０４において、画像変換部２０３は、背景画像データ取得部２０２から取得した背景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を、基準の背景画像（以下、基準背景画像）として定める。具体的には、着目視点の座標（Ｘo，Ｙo，Ｚo）と、背景画像データ取得部２０２から取得した背景画像データに対応する視点の座標（Ｘｉ，Ｙｉ，Ｚｉ）との距離を視点毎に算出する。ここで、ｉは視点番号を表しており、１≦ｉ＜視点数＋１となる。そして、算出した距離が最小となる視点（基準視点）を検出し、基準視点に対応する背景画像（データ）を基準背景画像（データ）とする。画像変換部２０３は、基準背景画像に対応する視点番号を、一致度算出部２０４と補正部２０６とに出力する。本実施例では、基準背景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S304, the image conversion unit 203 uses the image corresponding to the viewpoint closest to the camera position (viewpoint of interest) in which the target image data is captured from the background image data acquired from the background image data acquisition unit 202 as a reference. It is defined as a background image (hereinafter referred to as a reference background image). Specifically, the distance between the coordinates of the viewpoint of interest (Xo, Yo, Zo) and the coordinates of the viewpoint corresponding to the background image data acquired from the background image data acquisition unit 202 (Xi, Yi, Zi) is determined for each viewpoint. calculate. Here, i represents a viewpoint number, and 1 ≦ i <number of viewpoints + 1. Then, the viewpoint (reference viewpoint) that minimizes the calculated distance is detected, and the background image (data) corresponding to the reference viewpoint is used as the reference background image (data). The image conversion unit 203 outputs the viewpoint number corresponding to the reference background image to the matching degree calculation unit 204 and the correction unit 206. In this embodiment, the viewpoint number corresponding to the reference background image is referred to as a reference viewpoint number.

ステップＳ３０５において、一致度算出部２０４は、複数の変換背景画像データにおいて画素が一致するかを判定する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について、複数の変換背景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S305, the matching degree calculation unit 204 determines the pixel of interest in the converted background image data, which is the target for determining whether the pixels match in the plurality of converted background image data. In this embodiment, first, the upper left pixel of the converted background image data is selected as the pixel of interest, and then the unprocessed pixel is sequentially selected as the pixel of interest. For all the pixels of the converted background image data, the pixels of interest may be determined in any order as long as the determination of whether the pixels match in the plurality of converted background image data is executed.

ステップＳ３０６において、一致度算出部２０４は、画像変換部２０３から取得した複数の変換背景画像データを用いて、基準視点番号に対応する変換背景画像データと他の変換背景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S306, the matching degree calculation unit 204 uses the plurality of conversion background image data acquired from the image conversion unit 203 to obtain between the conversion background image data corresponding to the reference viewpoint number and the other conversion background image data. The degree of coincidence in the pixel of interest is calculated. Hereinafter, the method for calculating the degree of agreement will be specifically described.

まず、一致度算出部２０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換背景画像データの画素値Ｂ_j（ｕ₂、ｖ₂）を取得する。ここでｊは複数の変換背景画像データのそれぞれを区別する添え字を表し、一致度算出部２０４は、変換背景画像データの数分の画素値を取得する。次に、一致度算出部２０４は、取得した全画素値の中間値を算出する。この中間値は、一致度を算出する際の基準値Ｍとして用いられる。なお、基準値はこれに限られず、平均値など、複数の画素値の統計的な性質を反映する任意の値を基準値として用いて良い。 First, the matching degree calculation unit 204 acquires _{the pixel value B j} (u ₂ , v ₂ ) of the converted background image data at the _{determined coordinates (u 2} , v _{2) of the pixel of interest.} Here, j represents a subscript that distinguishes each of the plurality of converted background image data, and the matching degree calculation unit 204 acquires pixel values corresponding to the number of the converted background image data. Next, the matching degree calculation unit 204 calculates an intermediate value of all the acquired pixel values. This intermediate value is used as a reference value M when calculating the degree of agreement. The reference value is not limited to this, and any value that reflects the statistical properties of a plurality of pixel values, such as an average value, may be used as the reference value.

次に、一致度算出部２０４は、着目画素における一致度を、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と算出した基準値Ｍ（ｕ₂、ｖ₂）とを用いて、式（２）により算出する。 Next, the matching degree calculation unit 204 calculates the matching degree in the pixel of interest as the pixel value B ₀ (u ₂ , v ₂ ) of the pixel of interest in the converted background image data corresponding to the reference viewpoint number, and the reference value M (u). Calculated by Eq. (2) using ₂ , v _2).

ここで、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。式（２）により算出する一致度Ｄは、複数の変換背景画像データにおける画素値のばらつきが少ないほど小さくなる。なお、用いる一致度はこれに限られず、画素間の違いを示す任意の値を用いて良い。例えば、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と、他の変換背景画像データにおける着目画素の画素値それぞれとの差分の総和を一致度として用いても良い。 Here, k represents a subscript for identifying an RGB3 channel. The degree of coincidence D calculated by the equation (2) becomes smaller as the variation in the pixel values in the plurality of converted background image data is smaller. The degree of matching used is not limited to this, and any value indicating the difference between pixels may be used. _{For example, the sum of the differences between the pixel values B 0} (u ₂ , v ₂ ) of the pixel of interest in the converted background image data corresponding to the reference viewpoint number and the pixel values of the pixels of interest in the other converted background image data is the degree of coincidence. It may be used as.

ステップＳ３０７において、一致度算出部２０４は、変換背景画像データの全画素についてステップＳ３０５〜ステップＳ３０６の処理を行ったかを判定する。ステップＳ３０７の判定の結果が真の場合、一致度算出部２０４は、算出した全画素の一致度を補正判定部２０５に、算出した基準値を補正部２０６に出力し、ステップＳ３０８に進む。一方、ステップＳ３０７の判定の結果が偽の場合、ステップＳ３０５に戻る。 In step S307, the matching degree calculation unit 204 determines whether the processing of steps S305 to S306 has been performed on all the pixels of the converted background image data. If the result of the determination in step S307 is true, the matching degree calculation unit 204 outputs the calculated matching degree of all pixels to the correction determination unit 205 and the calculated reference value to the correction unit 206, and proceeds to step S308. On the other hand, if the result of the determination in step S307 is false, the process returns to step S305.

ステップＳ３０８において、補正判定部２０５は、フラグマップを初期化つまりフラグマップの全画素の画素値を０とする。本ステップで初期化するフラグマップは、ステップＳ３１１で基準視点番号に対応する変換背景画像データの画素を補正する際、補正処理の対象となる画素を判定するために用いられる。このフラグマップでは、補正処理の対象の画素に対応する画素値に１が代入され、補正処理の対象ではない画素に対応する画素値に０が代入される。本ステップでの初期化により、基準視点番号に対応する変換背景画像データの全画素について、補正処理の対象ではないとされることとなる。 In step S308, the correction determination unit 205 initializes the flag map, that is, sets the pixel values of all the pixels of the flag map to 0. The flag map initialized in this step is used to determine the pixel to be corrected when correcting the pixel of the converted background image data corresponding to the reference viewpoint number in step S311. In this flag map, 1 is assigned to the pixel value corresponding to the pixel to be corrected, and 0 is assigned to the pixel value corresponding to the pixel not to be corrected. By the initialization in this step, all the pixels of the converted background image data corresponding to the reference viewpoint number are not subject to the correction processing.

ステップＳ３０９において、補正判定部２０５は、一致度算出部２０４から取得した一致度に基づいてフラグマップを更新する。具体的には、補正判定部２０５は、基準視点番号に対応する変換背景画像データにおいて前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが低いため、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、一致度Ｄが閾値未満であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが高いため、着目画素が背景の被写体の画像領域の画素である可能性が高いとする。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を決定する。すなわち、任意の値をａとすると、式（２）では一致度として差分二乗和を用いることから、閾値はａ×ａ×3となる。なお、仮に一致度として差分の総和を用いる場合、閾値はａ×3となる。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部２０５は、更新が完了したフラグマップを補正部２０６に出力する。 In step S309, the correction determination unit 205 updates the flag map based on the degree of agreement acquired from the degree of agreement calculation unit 204. Specifically, the correction determination unit 205 is a pixel value of a flag map corresponding to a pixel in the converted background image data corresponding to the reference viewpoint number, which is considered to be a pixel in the image region of the subject in the foreground. Is changed to 1. In this embodiment, if the calculated degree of coincidence D is equal to or greater than a predetermined threshold value, the degree of matching between the pixels of the converted background image data corresponding to the reference viewpoint number and the pixels of the other converted background image data is low. , It is determined that the pixel of interest is likely to be a pixel in the image region of the subject in the foreground. On the other hand, if the degree of coincidence D is less than the threshold value, the degree of matching between the pixels of the converted background image data corresponding to the reference viewpoint number and the pixels of the other converted background image data is high, so that the pixel of interest is the image of the subject in the background. It is highly likely that it is a pixel in the area. The threshold value used in this step is determined based on the maximum value of the pixel value, etc., and a value smaller than 20% of the maximum value, for example, an arbitrary value within the range of 1% to 5% of the maximum value is used. Determine the threshold. That is, assuming that an arbitrary value is a, the threshold value is a × a × 3 because the sum of squared differences is used as the degree of coincidence in the equation (2). If the sum of the differences is used as the degree of coincidence, the threshold value is a × 3. Further, it is determined for each pixel whether or not it is a pixel in the image area of the subject in the foreground. The correction determination unit 205 outputs the updated flag map to the correction unit 206.

ステップＳ３１０において、補正部２０６は、基準視点番号に対応する変換背景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換背景画像データの全画素についてフラグマップに基づく画素値の更新（ステップＳ３１１）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S310, the correction unit 206 determines the pixel of interest in the converted background image data corresponding to the reference viewpoint number. In this embodiment, first, the upper left pixel of the converted background image data corresponding to the reference viewpoint number is selected as the pixel of interest, and then the unprocessed pixel is sequentially selected as the pixel of interest. If the pixel values are updated (step S311) based on the flag map for all the pixels of the converted background image data corresponding to the reference viewpoint number, the pixels of interest may be determined in any order.

ステップＳ３１１において、補正部２０６は、補正判定部２０５から取得したフラグマップに基づき、基準視点番号に対応する変換背景画像における着目画素の画素を補正する。本実施例では、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が１である場合、該着目画素の画素値を、一致度算出部２０４で算出した基準値で置き換える。一方、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が０である場合、該着目画素の画素値は変更しない。なお、画素値を補正する手法はこれに限られず、基準視点と隣接する視点に対応する背景画像の画素値で置き換えるなど他の手法を用いても良い。 In step S311, the correction unit 206 corrects the pixel of the pixel of interest in the converted background image corresponding to the reference viewpoint number based on the flag map acquired from the correction determination unit 205. In this embodiment, when the pixel value of the flag map corresponding to the pixel of interest in the converted background image corresponding to the reference viewpoint number is 1, the pixel value of the pixel of interest is the reference value calculated by the matching degree calculation unit 204. replace. On the other hand, when the pixel value of the flag map corresponding to the pixel of interest in the converted background image corresponding to the reference viewpoint number is 0, the pixel value of the pixel of interest is not changed. The method for correcting the pixel value is not limited to this, and another method such as replacing with the pixel value of the background image corresponding to the viewpoint adjacent to the reference viewpoint may be used.

ステップＳ３１２において、補正部２０６は、基準視点番号に対応する変換背景画像データの全画素についてステップＳ３１０〜ステップＳ３１１の処理を行ったかを判定する。ステップＳ３１２の判定の結果が真の場合、補正部２０６は、補正が完了した基準視点番号に対応する変換背景画像データを、前景抽出部２０７に出力して、ステップＳ３１３に進む一方、該判定の結果が偽の場合、ステップＳ３１０に戻る。 In step S312, the correction unit 206 determines whether the processing of steps S310 to S311 has been performed on all the pixels of the converted background image data corresponding to the reference viewpoint number. If the result of the determination in step S312 is true, the correction unit 206 outputs the converted background image data corresponding to the reference viewpoint number for which the correction has been completed to the foreground extraction unit 207, and proceeds to step S313 while proceeding to the determination. If the result is false, the process returns to step S310.

ステップＳ３１３において、前景抽出部２０７は、補正部２０６から取得した補正が完了した基準視点番号に対応する変換背景画像データ（完全な背景画像Ｉ_bとする）を用いて、対象画像データ（Ｉとする）から前景の被写体による領域を抽出する。具体的には、式（３）に示すように、完全な背景画像Ｉ_bと対象画像データＩとの間で画素毎に差分二乗和を算出し、差分二乗和が閾値以上である画素を前景の被写体の画像領域の画素とみなすことで、前景の被写体による領域を抽出した画像Ｉ_fを作成する。画像Ｉ_fは２値画像であり、前景の被写体の画像領域の画素に対応する画素値に１が代入され、背景の被写体の画像領域の画素に対応する画素値に０が代入される。 In step S313, the foreground extraction unit 207 uses the converted background image data (referred to as a complete background image I _b ) corresponding to the corrected reference viewpoint number acquired from the correction unit 206 to be the target image data (I and). Extract the area of the subject in the foreground from). Specifically, as shown in the equation (3), the _{difference squared sum is calculated for each pixel between the complete background image I b} and the target image data I, and the pixel in which the difference squared sum is equal to or greater than the threshold value is in the foreground. By regarding it as a pixel in the image region of the subject, an image _If is created by extracting the region of the subject in the foreground. The image _If is a binary image, 1 is assigned to the pixel value corresponding to the pixel in the image area of the subject in the foreground, and 0 is assigned to the pixel value corresponding to the pixel in the image area of the subject in the background.

ここで、Ｔｈは閾値を表し、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。なお、ここで用いる閾値は、画素値の最大値などに基づいて決定し、画素値の最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は、式（２）の場合と同様である。このように、前景抽出部２０７は、前景画像データ作成手段として機能する。前景抽出部２０７は、作成した画像Ｉ_fを二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 Here, Th represents a threshold value, and k represents a subscript for identifying an RGB3 channel. The threshold value used here is determined based on the maximum value of the pixel value or the like, and a value smaller than 20% of the maximum value of the pixel value, for example, an arbitrary value within the range of 1% to 5% of the maximum value can be used. It may be used to obtain the threshold value. The method of obtaining this threshold value is the same as in the case of the equation (2). In this way, the foreground extraction unit 207 functions as a foreground image data creation means. The foreground extraction unit 207 outputs the created image _If to the secondary storage device 104, the external storage device 108, and the display device 109, and completes a series of processes. The above is the process of extracting the foreground region, which is executed by the image processing apparatus 100 in this embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図６を用いて説明する。図６において、画像データ６０１は、従来手法に従って時系列に沿って連続で撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０１には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体が写っている。この理由は、背景画像データを作成するための連続画像を撮像する際に、前景の被写体６０３、６０４が、同一位置に存在し動かなかった結果、背景画像データを作成する際に前景の被写体６０３、６０４が背景の被写体と誤ってみなされたためである。背景画像データ６０１を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０６が取得される。前景画像データ６０６では、被写体６０３、６０４以外の、概ね動いている前景の被写体による領域を抽出できている。しかし、停止している前景の被写体６０３、６０４による領域を、抽出できていない。 <About the effect of this example>
Hereinafter, the effects of this embodiment will be described with reference to FIG. In FIG. 6, the image data 601 is background image data at the viewpoint 602 created based on a plurality of images continuously captured in chronological order according to a conventional method. The background image data 601 shows a foreground subject such as a foreground subject 603 (goalkeeper) and a foreground subject (goal) 604. The reason for this is that when the continuous images for creating the background image data are captured, the subjects 603 and 604 in the foreground exist at the same position and do not move. As a result, the subject 603 in the foreground is created when the background image data is created. , 604 was mistakenly regarded as a background subject. When the foreground region is extracted from the target image data 605 using the background image data 601, the foreground image data 606 is acquired. In the foreground image data 606, a region other than the subjects 603 and 604, which is a substantially moving foreground subject, can be extracted. However, the area of the stopped foreground subjects 603 and 604 cannot be extracted.

また、画像データ６０７は、従来手法に従って対象画像データ６０５を撮像した時刻と同一時刻に複数の異なる視点から撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０７には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体は写っていないものの、背景の被写体の一部が欠けて写っている。この理由は、背景画像データを作成するために撮像したシーン内で前景の被写体が密集しており、前景の被写体の一部が複数の視点から見えなかったためである。背景画像データ６０７を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０８が取得される。前景画像データ６０８では、地上面からの高さを持つ前景の被写体による領域を概ね抽出できている。しかし、前景の被写体が密集しているために、複数の視点から見えない前景の被写体による領域６０９を、抽出できていない。 Further, the image data 607 is background image data at the viewpoint 602 created based on a plurality of images captured from a plurality of different viewpoints at the same time as the time when the target image data 605 was imaged according to the conventional method. Although the background image data 607 does not show the foreground subject such as the foreground subject 603 (goalkeeper) or the foreground subject (goal) 604, a part of the background subject is missing. The reason for this is that the subjects in the foreground are densely packed in the scene captured to create the background image data, and a part of the subjects in the foreground cannot be seen from a plurality of viewpoints. When the foreground region is extracted from the target image data 605 using the background image data 607, the foreground image data 608 is acquired. In the foreground image data 608, the area of the subject in the foreground having a height from the ground surface can be roughly extracted. However, because the subjects in the foreground are densely packed, the region 609 due to the subjects in the foreground that cannot be seen from a plurality of viewpoints cannot be extracted.

これに対し、本実施例では、複数の異なる視点における不完全な背景画像（例えば、背景画像データ６０１など）を用いて、完全な背景画像である背景画像データ６１０を作成する。背景画像データ６１０を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６１１が取得される。前景画像データ６１１では、停止している前景の被写体６０３、６０４による領域や、複数の視点から見えない前景の被写体による領域を、高精度に抽出できている。このように、本実施例によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 On the other hand, in this embodiment, background image data 610, which is a complete background image, is created by using incomplete background images (for example, background image data 601) from a plurality of different viewpoints. When the foreground region is extracted from the target image data 605 using the background image data 610, the foreground image data 611 is acquired. In the foreground image data 611, the area of the foreground subject 603 and 604 that is stopped and the area of the foreground subject that cannot be seen from a plurality of viewpoints can be extracted with high accuracy. As described above, according to the present embodiment, the area due to the subject in the foreground is extracted with high accuracy regardless of the state of the subject such as the presence or absence of the change (movement, etc.) of the subject with the passage of time and the density of the subject in the foreground. can do.

［実施例２］
実施例１では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、視点によって異なる変換背景画像の、着目画素における一致の度合いを示す一致度を用いる。一方、本実施例では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、一致度に加えて、視点によって異なる変換背景画像の、着目画素における画素値の変化の滑らかさ度合い、所謂連続性を用いる。なお、実施例１と同様の構成及び同様の処理については、実施例１と同様の符号を付して説明を省略する。 [Example 2]
In the first embodiment, when a complete background image is created based on a plurality of incomplete background images, a degree of coincidence indicating the degree of matching of the converted background images, which differs depending on the viewpoint, at the pixel of interest is used. On the other hand, in this embodiment, when creating a complete background image based on a plurality of incomplete background images, in addition to the degree of matching, the degree of smoothness of change in the pixel value of the converted background image that differs depending on the viewpoint in the pixel of interest. , So-called continuity is used. The same configuration and the same processing as in the first embodiment are designated by the same reference numerals as those in the first embodiment, and the description thereof will be omitted.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について説明する。本実施例では、複数の異なる視点における背景画像データを着目視点から見た場合の画像にそれぞれ変換することで得られる変換背景画像データを用いて、視点間の画素値の連続性を算出する。画素値の連続性とは、着目視点における変換背景画像データと該着目視点に隣接する視点における変換背景画像データとの間における、画素値の変化の滑らかさ度合いである。 <Overview of the process of extracting the foreground area>
Hereinafter, the outline of the process for extracting the foreground region in this embodiment will be described. In this embodiment, the continuity of pixel values between viewpoints is calculated using the converted background image data obtained by converting the background image data from a plurality of different viewpoints into an image when viewed from the viewpoint of interest. The continuity of pixel values is the degree of smoothness of change in pixel values between the converted background image data at the viewpoint of interest and the converted background image data at a viewpoint adjacent to the viewpoint of interest.

具体的には、基準視点番号に対応する変換背景画像データにおける着目画素の画素値と、基準視点に隣接する視点における変換背景画像データにおける着目画素の画素値とを比較し、画素値間の差分の総和を連続性として算出する。続いて、実施例１で説明した一致度と、本実施例で算出した連続性とを用いて、一致の度合いが低く、且つ、画素値の変化が滑らかでない画素を、前景の被写体の画像領域の画素である可能性が高いとみなし補正対象の画素として検出する。そして、検出した補正対象の画素の画素値を更新して変換背景画像データを補正することで、完全な背景画像を作成する。最後に、作成した完全な背景画像と対象画像データとを比較して、前景領域を抽出する。 Specifically, the pixel value of the pixel of interest in the converted background image data corresponding to the reference viewpoint number is compared with the pixel value of the pixel of interest in the converted background image data in the viewpoint adjacent to the reference viewpoint, and the difference between the pixel values is compared. Is calculated as continuity. Subsequently, using the degree of agreement described in Example 1 and the continuity calculated in this example, pixels having a low degree of agreement and a non-smooth change in pixel value are obtained in the image region of the subject in the foreground. It is considered that there is a high possibility that it is a pixel of, and it is detected as a pixel to be corrected. Then, the pixel value of the detected pixel to be corrected is updated and the converted background image data is corrected to create a complete background image. Finally, the foreground area is extracted by comparing the created complete background image with the target image data.

実施例１では、全視点における背景画像データに基づき算出した画素値の一致度のみを用いて、着目画素が前景の被写体の画像領域の画素であるかを判定した。そのため、視点によって色の見え方が変化することにより画素値が異なる背景の被写体の画像領域の画素も、前景の被写体の画像領域の画素である可能性が高いとみなされ、補正対象の画素として検出される。その結果、補正する必要のない画素も補正されてしまうため、補正後の変換背景画像データに誤差が発生し、前景の被写体の画像を含まない完全な背景画像を精度良く作成することができない。視点によって見え方が変化する背景の被写体として、スポーツなどの競技シーンを撮像した画像に存在する、方向性をもって刈られている芝が挙げられる。方向性をもって刈られている芝は、見る方向により芝の色の見え方が異なり、その結果、同一位置の芝であっても視点によって画素値が変化する。このような芝を背景の被写体とするシーンに実施例１を適用した場合、複数の変換背景画像データにおける画素間の一致の度合いは低くなるため、背景の被写体である芝の画像領域の画素が前景の被写体の画像領域の画素であると誤判定される。かかる誤判定を防ぐために、本実施例では、一致度に加えて連続性を用いて、着目画素が前景の被写体の画像領域の画素であるかを判定する。一般的に、視点によって色の見え方が変化する被写体に関しては、離れた視点間で色の見え方に顕著な違いが現れる場合はあるが、近接する視点間での色の見え方の変化は緩やかである。そのため、本実施例では、色の見え方の違いにより画素値が変化した背景の被写体の画像領域の画素と、地上面からの高さを持つために画素値が変化した前景の被写体の画像領域の画素とを区別する。その結果、変換背景画像を精度良く補正して完全な背景画像を作成することができるため、対象画像データから前景の被写体による領域を高精度に抽出することが可能となる。なお、視点によって色の見え方が変化する被写体は上記の芝生の例に限られず、体育館の床など様々なものが存在する。 In the first embodiment, it was determined whether or not the pixel of interest is a pixel in the image region of the subject in the foreground by using only the degree of coincidence of the pixel values calculated based on the background image data in all viewpoints. Therefore, it is highly likely that the pixels in the image area of the subject in the background, which have different pixel values due to the change in the appearance of colors depending on the viewpoint, are also the pixels in the image area of the subject in the foreground, and are used as the pixels to be corrected. Detected. As a result, pixels that do not need to be corrected are also corrected, so that an error occurs in the converted background image data after correction, and it is not possible to accurately create a complete background image that does not include the image of the subject in the foreground. As a background subject whose appearance changes depending on the viewpoint, there is a turf that is mowed in a direction, which is present in an image of a competition scene such as sports. The turf that is mowed with directionality has a different appearance of turf color depending on the viewing direction, and as a result, the pixel value changes depending on the viewpoint even if the turf is at the same position. When Example 1 is applied to such a scene in which the turf is the background subject, the degree of matching between the pixels in the plurality of converted background image data is low, so that the pixels in the turf image area which is the background subject are It is erroneously determined to be a pixel in the image area of the subject in the foreground. In order to prevent such erroneous determination, in this embodiment, it is determined whether or not the pixel of interest is a pixel in the image region of the subject in the foreground by using continuity in addition to the degree of coincidence. In general, for a subject whose color appearance changes depending on the viewpoint, there may be a noticeable difference in the color appearance between distant viewpoints, but the change in color appearance between adjacent viewpoints is It is gradual. Therefore, in this embodiment, the pixels in the image area of the background subject whose pixel values have changed due to the difference in color appearance and the image area of the foreground subject whose pixel values have changed due to the height from the ground surface. Distinguish from the pixels of. As a result, the converted background image can be corrected with high accuracy to create a complete background image, so that the region of the subject in the foreground can be extracted with high accuracy from the target image data. It should be noted that the subject whose color appearance changes depending on the viewpoint is not limited to the above-mentioned example of the lawn, and there are various subjects such as the floor of the gymnasium.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図７及び図８を用いて説明する。図７は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図８は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図７に示す各構成要素として機能し、図８に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About the process of extracting the foreground area>
Hereinafter, the process of extracting the foreground region executed by the image processing apparatus 100 in this embodiment will be described with reference to FIGS. 7 and 8. FIG. 7 is a block diagram showing a functional configuration of the image processing apparatus 100 in this embodiment, and FIG. 8 is a flowchart showing a flow of processing for extracting a foreground region in this embodiment. The CPU 101 of the image processing device 100 functions as each component shown in FIG. 7 by executing a program stored in the ROM 103 using the RAM 102 as a work memory, and executes a series of processes shown in FIG. It should be noted that it is not necessary that all of the processes shown below are executed by the CPU 101, and even if the image processing device 100 is configured so that a part or all of the processes are performed by one or a plurality of processing circuits other than the CPU 101. good.

ステップＳ８０１において、連続性算出部７０１は、連続性を算出する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像の左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について連続性の算出が実行されれば、どのような順番で着目画素を決定しても良い。 In step S801, the continuity calculation unit 701 determines the pixel of interest in the converted background image data to be calculated for continuity. In this embodiment, first, the upper left pixel of the converted background image is selected as the pixel of interest, and then the unprocessed pixel is sequentially selected as the pixel of interest. As long as the continuity is calculated for all the pixels of the converted background image data, the pixels of interest may be determined in any order.

ステップＳ８０２において、連続性算出部７０１は、画像変換部２０３から取得した変換背景画像データを用いて、基準視点とその周辺の視点とに対応する変換背景画像における、着目画素の画素値の連続性を算出する。ここで、本ステップにおける連続性の算出手法を、図９を用いて説明する。 In step S802, the continuity calculation unit 701 uses the conversion background image data acquired from the image conversion unit 203 to maintain the continuity of the pixel values of the pixels of interest in the conversion background image corresponding to the reference viewpoint and the viewpoints around the reference viewpoint. Is calculated. Here, the method of calculating the continuity in this step will be described with reference to FIG.

まず、画像変換部２０３が定めた基準視点番号に対応するカメラ９０１と隣接する、カメラ９０２、９０３を検出し、これらのカメラに対応する視点番号を取得する。以下、取得した視点番号を隣接視点番号と呼ぶ。ここで、基準視点番号に対応するカメラ９０１と隣接するカメラは、カメラの３次元空間中の座標から算出した、カメラ９０１までの距離に基づいて決定される。本実施例では、カメラ９０１の左側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０２と、カメラ９０１の右側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０３とが、カメラ９０１に隣接するカメラとして検出される。 First, the cameras 902 and 903 adjacent to the cameras 901 corresponding to the reference viewpoint numbers determined by the image conversion unit 203 are detected, and the viewpoint numbers corresponding to these cameras are acquired. Hereinafter, the acquired viewpoint number will be referred to as an adjacent viewpoint number. Here, the camera adjacent to the camera 901 corresponding to the reference viewpoint number is determined based on the distance to the camera 901 calculated from the coordinates in the three-dimensional space of the camera. In this embodiment, the camera 902 having the shortest distance to the camera 901 among the cameras on the left side of the camera 901 and the camera 903 having the shortest distance to the camera 901 among the cameras on the right side of the camera 901. Is detected as a camera adjacent to the camera 901.

次に、基準視点番号に対応する変換背景画像９０４と隣接視点番号に対応する変換背景画像９０５、９０６とから、着目画素の座標（ｕ₂、ｖ₂）の画素９０７、９０８、９０９の画素値を取得し、該取得した画素値を用いて、式（４）により連続性を算出する。 Next, from the converted background image 904 corresponding to the reference viewpoint number and the converted background images 905 and 906 corresponding to the adjacent viewpoint number, the pixel values of the pixels 907, 908, _{and 909 of the coordinates (u 2} , v _{2) of the pixel of interest.} Is acquired, and the continuity is calculated by the equation (4) using the acquired pixel values.

ここで、Ｂ₉₀₁（ｕ₂、ｖ₂）、Ｂ₉₀₂（ｕ₂、ｖ₂）、Ｂ₉₀₃（ｕ₂、ｖ₂）はそれぞれ、カメラ９０１、９０２、９０３に対応する変換背景画像９０４、９０５、９０６における着目画素９０７、９０８、９０９の画素値を表す。またｋは、ＲＧＢ３チャンネルを識別するための添え字を表す。式（４）により算出するＣの値は、視点間の画素値の変化が滑らかであるほど小さくなる。なお、用いる連続性は、式（４）により算出されるＣに限られず、離散値からの二階微分など、視点間の画素値の連続性を示す任意の値を用いて良い。また、本実施例では、基準視点番号に対応するカメラ９０１と隣接するカメラ９０２、９０３を用いる場合について説明しているが、用いるカメラはこれらに限られず、被写体の見え方によっては他のカメラを用いても良い。例えば、基準視点番号に対応するカメラ９０１の左側で、カメラ９０２の代わりに、カメラ９０２の次にカメラ９０１までの距離が近いカメラを用いても良い。カメラ９０１の右側で用いるカメラについても同様である。 Here, B ₉₀₁ (u ₂ , v ₂ ), B ₉₀₂ (u ₂ , v ₂ ), and B ₉₀₃ (u ₂ , v ₂ ) are converted background images 904 and 905 corresponding to the cameras 901, 902, and 903, respectively. , 906 represents the pixel values of the pixels of interest 907, 908, 909. Further, k represents a subscript for identifying the RGB3 channel. The value of C calculated by the equation (4) becomes smaller as the change of the pixel value between the viewpoints becomes smoother. The continuity used is not limited to C calculated by the equation (4), and any value indicating the continuity of the pixel values between viewpoints, such as the second derivative from the discrete value, may be used. Further, in this embodiment, the case where the cameras 901 corresponding to the reference viewpoint number and the adjacent cameras 902 and 903 are used is described, but the cameras used are not limited to these, and other cameras may be used depending on how the subject is viewed. You may use it. For example, on the left side of the camera 901 corresponding to the reference viewpoint number, instead of the camera 902, a camera having a short distance to the camera 901 next to the camera 902 may be used. The same applies to the camera used on the right side of the camera 901.

ステップＳ８０３において、連続性算出部７０１は、変換背景画像データの全画素についてステップＳ８０１〜ステップＳ８０２の処理を行ったかを判定する。ステップＳ８０３の判定の結果が真の場合、連続性算出部７０１は、算出した全画素の連続性を補正判定部７０２に出力し、ステップＳ３０８に進む一方、該判定の結果が偽の場合、ステップＳ８０１に戻る。 In step S803, the continuity calculation unit 701 determines whether the processes of steps S801 to S802 have been performed on all the pixels of the converted background image data. If the result of the determination in step S803 is true, the continuity calculation unit 701 outputs the calculated continuity of all pixels to the correction determination unit 702 and proceeds to step S308, while if the result of the determination is false, the step Return to S801.

ステップＳ８０４において、補正判定部７０２は、一致度算出部２０４から取得した一致度と連続性算出部７０１から取得した連続性とに基づいて、フラグマップを更新する。具体的には、補正判定部７０２は、基準視点番号に対応する変換背景画像データにおいて、前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上、かつ、算出した連続性Ｃが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像と他の変換背景画像との着目画素における一致の度合い及び変化の滑らかさ度合いが低いとする。つまり、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、これらの条件を満たさない場合、着目画素が背景の被写体の画像領域の画素である可能性が高いと判定する。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は実施例１と同様である。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部７０２は、更新が完了したフラグマップを補正部２０６に出力する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S804, the correction determination unit 702 updates the flag map based on the degree of agreement acquired from the degree of agreement calculation unit 204 and the continuity acquired from the continuity calculation unit 701. Specifically, the correction determination unit 702 is a pixel of the flag map corresponding to a pixel in the converted background image data corresponding to the reference viewpoint number, which is considered to be a pixel in the image region of the subject in the foreground. Change the value to 1. In this embodiment, if the calculated degree of coincidence D is equal to or higher than the predetermined threshold value and the calculated continuity C is equal to or higher than the predetermined threshold value, the conversion background image corresponding to the reference viewpoint number and other conversion backgrounds are used. It is assumed that the degree of matching with the image and the degree of smoothness of change in the pixel of interest are low. That is, it is determined that there is a high possibility that the pixel of interest is a pixel in the image region of the subject in the foreground. On the other hand, if these conditions are not satisfied, it is determined that the pixel of interest is likely to be a pixel in the image region of the subject in the background. The threshold value used in this step is determined based on the maximum value of the pixel value, etc., and a value smaller than 20% of the maximum value, for example, an arbitrary value within the range of 1% to 5% of the maximum value is used. You may find the threshold. The method of obtaining this threshold value is the same as that in the first embodiment. Further, it is determined for each pixel whether or not it is a pixel in the image area of the subject in the foreground. The correction determination unit 702 outputs the updated flag map to the correction unit 206. The above is the process of extracting the foreground region, which is executed by the image processing apparatus 100 in this embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１０を用いて説明する。画像データ１００２は、背景画像データを、視点毎に、地上面を基準として視点１００１から見た場合の画像へ変換することで取得する変換背景画像データである。ここで視点１００１は、着目視点、且つ、基準視点であるものとする。また、被写体１００３は、視点によって色の見え方が変化する背景の被写体（例えば、芝生）であり、被写体１００５は、前景の被写体である。 <About the effect of this example>
Hereinafter, the effects of this embodiment will be described with reference to FIG. The image data 1002 is converted background image data acquired by converting the background image data into an image when viewed from the viewpoint 1001 with the ground surface as a reference for each viewpoint. Here, the viewpoint 1001 is assumed to be a viewpoint of interest and a reference viewpoint. Further, the subject 1003 is a background subject (for example, a lawn) whose color appearance changes depending on the viewpoint, and the subject 1005 is a foreground subject.

図１０に示すシーンに実施例１を適用し、多視点の不完全な背景画像に基づき完全な背景画像を作成した場合、背景画像データ１００４が取得される。背景画像データ１００４では、前景の被写体１００５の画像を除去できているものの、背景の被写体１００３が正しく写っていない。この理由は、完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを一致度のみに基づき判定するので、背景の被写体１００３の画像領域の画素が、前景の被写体の画像領域の画素と判定されてしまうためである。この結果、背景の被写体１００３の画像領域の画素が補正された背景画像データ１００４が作成される。背景画像データ１００４を用いて、対象画像データから前景領域を抽出しようとしても、前景領域を精度良く抽出することはできない。 When Example 1 is applied to the scene shown in FIG. 10 and a complete background image is created based on an incomplete background image from multiple viewpoints, background image data 1004 is acquired. In the background image data 1004, although the image of the subject 1005 in the foreground can be removed, the subject 1003 in the background is not correctly captured. The reason for this is that when creating a complete background image, it is determined whether the pixel of interest is a pixel in the image area of the subject in the foreground based only on the degree of coincidence, so that the pixel in the image area of the subject 1003 in the background is the foreground. This is because it is determined to be a pixel in the image area of the subject. As a result, the background image data 1004 in which the pixels in the image region of the background subject 1003 are corrected is created. Even if an attempt is made to extract the foreground region from the target image data using the background image data 1004, the foreground region cannot be extracted with high accuracy.

これに対し、本実施例では、多視点の不完全な背景画像に基づき完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを、一致度と連続性とに基づき判定する。この結果、背景の被写体１００３の画像領域の画素を、前景の被写体の画像領域の画素と判定せず、背景の被写体１００３の画像領域の画素が補正されていない背景画像データ１００６が作成される。背景画像データ１００６では、前景の被写体１００５の画像を除去しつつ、背景の被写体１００３が正しく写っている。背景画像データ１００６を用いて対象画像データから前景領域を抽出することで、前景領域を高精度に抽出できるようになる。このように、本実施例によれば、背景の被写体が、視点によって色の見え方が変化する被写体である場合であっても、前景の被写体による領域を高精度に抽出することができる。 On the other hand, in this embodiment, when creating a complete background image based on an incomplete background image from multiple viewpoints, whether the pixel of interest is a pixel in the image region of the subject in the foreground is determined by the degree of coincidence and continuity. Judgment is based on. As a result, the pixels in the image area of the background subject 1003 are not determined to be the pixels in the image area of the subject in the foreground, and the background image data 1006 in which the pixels in the image area of the background subject 1003 are not corrected is created. In the background image data 1006, the background subject 1003 is correctly captured while removing the image of the foreground subject 1005. By extracting the foreground region from the target image data using the background image data 1006, the foreground region can be extracted with high accuracy. As described above, according to the present embodiment, even when the background subject is a subject whose color appearance changes depending on the viewpoint, the region of the foreground subject can be extracted with high accuracy.

［実施例３］
実施例１及び実施例２では、複数の異なる視点における不完全な背景画像に基づき完全な背景画像を作成し、該作成した完全な背景画像と対象画像データとを比較することで前景の被写体による領域を抽出する。一方、本実施例では、複数の異なる視点における、不完全な前景画像を用いて、影による領域を含まないように前景領域を抽出する。ここで不完全な前景画像とは、前景の被写体による領域と該前景の被写体に付随する影による領域とが前景領域として抽出された画像を意味する。 [Example 3]
In the first and second embodiments, a complete background image is created based on incomplete background images from a plurality of different viewpoints, and the created complete background image is compared with the target image data to obtain a subject in the foreground. Extract the area. On the other hand, in this embodiment, the foreground region is extracted so as not to include the region due to the shadow by using the incomplete foreground image from a plurality of different viewpoints. Here, the incomplete foreground image means an image in which the region of the subject in the foreground and the region of the shadow accompanying the subject in the foreground are extracted as the foreground region.

本実施例では、視点毎の不完全な前景画像を、地上面を基準として着目視点から見た場合の画像へと変換することで、複数の変換前景画像データを取得し、該取得した複数の変換前景画像データにおいて画素間の一致度を算出する。実施例１で説明したように前景の被写体は地上面からの高さを持つが、前景の被写体に付随する影は地上面からの高さを持たない。そこで本実施例では、複数の変換前景画像データにおいて画素間の一致の度合いが高い画素を検出し、該検出した画素は高さを持たない影による領域の画素である可能性が高いとして補正する。その結果、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された前景画像を作成できる。以下、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された画像を、完全な前景画像と呼ぶ。なお、上述の実施例と同様の構成及び同様の処理については、上述の実施例と同様の符号を付して説明を省略する。 In this embodiment, by converting an incomplete foreground image for each viewpoint into an image when viewed from the viewpoint of interest with the ground surface as a reference, a plurality of converted foreground image data are acquired, and a plurality of acquired foreground image data are acquired. The degree of coincidence between pixels is calculated in the converted foreground image data. As described in the first embodiment, the subject in the foreground has a height from the ground surface, but the shadow accompanying the subject in the foreground does not have a height from the ground surface. Therefore, in this embodiment, pixels having a high degree of matching between pixels are detected in a plurality of converted foreground image data, and the detected pixels are corrected as being likely to be pixels in a shadow region having no height. .. As a result, it is possible to create a foreground image in which only the area due to the subject in the foreground having a height is extracted as the foreground area without extracting the area due to the shadow. Hereinafter, an image in which only the area due to the subject in the foreground having a height without extracting the area due to the shadow is extracted as the foreground area is referred to as a complete foreground image. The same configuration and the same processing as those in the above-described embodiment are designated by the same reference numerals as those in the above-mentioned embodiment, and the description thereof will be omitted.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図１１及び図１２を用いて説明する。図１１は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図１２は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図１１に示す各構成要素として機能し、図１２に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About the process of extracting the foreground area>
Hereinafter, the process of extracting the foreground region executed by the image processing apparatus 100 in this embodiment will be described with reference to FIGS. 11 and 12. FIG. 11 is a block diagram showing a functional configuration of the image processing apparatus 100 in this embodiment, and FIG. 12 is a flowchart showing a flow of processing for extracting a foreground region in this embodiment. The CPU 101 of the image processing device 100 functions as each component shown in FIG. 11 by executing a program stored in the ROM 103 using the RAM 102 as a work memory, and executes a series of processes shown in FIG. It should be noted that it is not necessary that all of the processes shown below are executed by the CPU 101, and even if the image processing device 100 is configured so that a part or all of the processes are performed by one or a plurality of processing circuits other than the CPU 101. good.

ステップＳ１２０１において、カメラパラメータ取得部１１０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを撮像したカメラのカメラパラメータを取得する。また、カメラパラメータ取得部１１０１は、対象画像データを撮像したカメラの視点を着目視点と定める。本ステップで取得するカメラパラメータとは、実施例１で説明したカメラパラメータと同様である。カメラパラメータ取得部１１０１は、カメラパラメータを画像変換部１１０３に出力する。 In step S1201, the camera parameter acquisition unit 1101 acquires the camera parameters of the camera that captured the target image data from the external storage device 108 or the secondary storage device 104 via the input interface 105. Further, the camera parameter acquisition unit 1101 defines the viewpoint of the camera that has captured the target image data as the viewpoint of interest. The camera parameters acquired in this step are the same as the camera parameters described in the first embodiment. The camera parameter acquisition unit 1101 outputs the camera parameters to the image conversion unit 1103.

ステップＳ１２０２において、前景画像データ取得部１１０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の前景画像データを取得する。本ステップで取得する前景画像データは、前景の被写体による領域を抽出した画像であり、該抽出した領域には、影による領域が含まれるものとする。本実施例では、この前景画像データは、事前に撮像した撮像画像と背景画像とに基づき作成される。以下、前景画像データの作成手法を具体的に説明する。ここで用いる撮像画像は、対象画像データにおける前景の被写体及び背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。また、背景画像は、対象画像データにおける背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。本実施例では、視点毎に、撮像画像の画素値と背景画像の画素値とを画素毎に比較し、これらの画素値が同一の座標の画素の画素値を０、そうでない画素の画素値を１とすることで、視点毎の２値画像を作成する。この２値画像が前景画像データである。なお、前景画像データを作成する手法はこれに限られず、また、作成する前景画像データも２値画像に限られず多値画像であっても良い。また、前景画像データ取得部１１０２は、各前景画像データに対応するカメラパラメータを、前景画像データとともに取得する。さらに、前景画像データ取得部１１０２は、複数の前景画像データのそれぞれを区別するため、各前景画像データを、カメラの視点番号と対応付けて記憶する。前景画像データ取得部１１０２は、前景画像データとカメラパラメータとを画像変換部１１０３に出力する。 In step S1202, the foreground image data acquisition unit 1102 acquires a plurality of foreground image data at a plurality of different viewpoints from the external storage device 108 or from the secondary storage device 104 via the input interface 105. The foreground image data acquired in this step is an image obtained by extracting a region due to the subject in the foreground, and it is assumed that the extracted region includes a region due to a shadow. In this embodiment, the foreground image data is created based on the captured image and the background image captured in advance. Hereinafter, the method for creating the foreground image data will be specifically described. The captured image used here is an image obtained by capturing a foreground subject and a background subject in the target image data in an environment substantially the same as the environment in which the target image data was captured. The background image is an image obtained by capturing a background subject in the target image data in an environment substantially the same as the environment in which the target image data was captured. In this embodiment, the pixel value of the captured image and the pixel value of the background image are compared for each pixel for each viewpoint, the pixel value of the pixel having the same pixel value is 0, and the pixel value of the other pixel is 0. By setting to 1, a binary image for each viewpoint is created. This binary image is the foreground image data. The method for creating the foreground image data is not limited to this, and the foreground image data to be created is not limited to the binary image and may be a multivalued image. Further, the foreground image data acquisition unit 1102 acquires the camera parameters corresponding to each foreground image data together with the foreground image data. Further, the foreground image data acquisition unit 1102 stores each foreground image data in association with the viewpoint number of the camera in order to distinguish each of the plurality of foreground image data. The foreground image data acquisition unit 1102 outputs the foreground image data and camera parameters to the image conversion unit 1103.

ステップＳ１２０３において、画像変換部１１０３は、カメラパラメータ取得部１１０１と前景画像データ取得部１１０２とから得たカメラパラメータを用いて、前景画像データ取得部１１０２から得た前景画像データを着目視点から見た場合の画像へと変換する。本ステップの変換は、実施例１のステップＳ３０３と同様の変換であり、視点毎に、前景画像データを、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる前景画像（データ）を変換前景画像（データ）と呼ぶ。このように、画像変換部１１０３は、変換前景画像データ作成手段として機能する。画像変換部１１０３は、変換前景画像データを一致度算出部１１０４に出力する。 In step S1203, the image conversion unit 1103 used the camera parameters obtained from the camera parameter acquisition unit 1101 and the foreground image data acquisition unit 1102 to view the foreground image data obtained from the foreground image data acquisition unit 1102 from the viewpoint of interest. Convert to a case image. The conversion in this step is the same conversion as in step S303 of the first embodiment, and the foreground image data is projected and transformed with respect to the ground surface for each viewpoint to obtain an image when viewed from the viewpoint of interest. The foreground image (data) obtained by the image conversion in this step is called a converted foreground image (data). In this way, the image conversion unit 1103 functions as a conversion foreground image data creation means. The image conversion unit 1103 outputs the conversion foreground image data to the matching degree calculation unit 1104.

ステップＳ１２０４において、画像変換部１１０３は、前景画像データ取得部１１０２から取得した前景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を基準の前景画像（以下、基準前景画像）として定める。具体的には、着目視点の座標と前景画像データに対応する視点の座標との距離を、視点毎に算出する。そして、算出した距離が最小となる視点（基準視点）に対応する前景画像（データ）を基準前景画像（データ）とする。画像変換部１１０３は、基準前景画像に対応する視点番号を、補正部１１０５に出力する。本実施例では、基準前景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S1204, the image conversion unit 1103 uses the image corresponding to the viewpoint closest to the camera position (viewpoint of interest) that captured the target image data as the reference foreground from the foreground image data acquired from the foreground image data acquisition unit 1102. It is defined as an image (hereinafter referred to as a reference foreground image). Specifically, the distance between the coordinates of the viewpoint of interest and the coordinates of the viewpoint corresponding to the foreground image data is calculated for each viewpoint. Then, the foreground image (data) corresponding to the viewpoint (reference viewpoint) that minimizes the calculated distance is used as the reference foreground image (data). The image conversion unit 1103 outputs the viewpoint number corresponding to the reference foreground image to the correction unit 1105. In this embodiment, the viewpoint number corresponding to the reference foreground image is referred to as a reference viewpoint number.

ステップＳ１２０５では、一致度算出部１１０４は、複数の変換前景画像データにおいて画素が一致するかを判定する対象となる、変換前景画像データにおける着目画素を決定する。本実施例では、まず、変換前景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換前景画像データの全画素について、複数の変換前景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1205, the matching degree calculation unit 1104 determines the pixel of interest in the converted foreground image data, which is the target for determining whether the pixels match in the plurality of converted foreground image data. In this embodiment, first, the upper left pixel of the converted foreground image data is selected as the pixel of interest, and then the unprocessed pixel is sequentially selected as the pixel of interest. For all the pixels of the converted foreground image data, the pixels of interest may be determined in any order as long as the determination of whether the pixels match in the plurality of converted foreground image data is executed.

ステップＳ１２０６において、一致度算出部１１０４は、画像変換部１１０３から取得した複数の変換前景画像データを用いて、基準視点番号に対応する変換前景画像データと他の変換前景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S1206, the matching degree calculation unit 1104 uses the plurality of converted foreground image data acquired from the image conversion unit 1103 to obtain between the converted foreground image data corresponding to the reference viewpoint number and the other converted foreground image data. The degree of coincidence in the pixel of interest is calculated. Hereinafter, the method for calculating the degree of agreement will be specifically described.

まず、一致度算出部１１０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換前景画像データの画素値Ｆ_l（ｕ₂、ｖ₂）を取得する。ここでｌは複数の変換前景画像データのそれぞれを区別する添え字を表し、一致度算出部１１０４は、変換前景画像データの数分の画素値を取得する。次に、一致度算出部１１０４は、取得した全画素値の平均値を算出する。本実施例では、この平均値を一致度として用いる。また、一致度はこれに限られず、複数の画素値の統計的な性質を反映する値を一致度として用いて良い。 First, the matching degree calculation unit 1104 acquires _{the pixel value F l} (u ₂ , v ₂ ) of the converted foreground image data at the _{determined coordinates (u 2} , v _{2) of the pixel of interest.} Here, l represents a subscript that distinguishes each of the plurality of converted foreground image data, and the matching degree calculation unit 1104 acquires pixel values corresponding to the number of converted foreground image data. Next, the matching degree calculation unit 1104 calculates the average value of all the acquired pixel values. In this embodiment, this average value is used as the degree of agreement. The degree of coincidence is not limited to this, and a value that reflects the statistical properties of a plurality of pixel values may be used as the degree of coincidence.

ステップＳ１２０７において、一致度算出部１１０４は、変換前景画像データの全画素についてステップ１２０５〜ステップＳ１２０６の処理を行ったかを判定する。ステップＳ１２０７の判定の結果が真の場合、一致度算出部１１０４は、算出した全画素の一致度を補正部１１０５に出力し、ステップＳ１２０８に進む。一方、ステップＳ１２０７の判定の結果が偽の場合、ステップＳ１２０５に戻る。 In step S1207, the matching degree calculation unit 1104 determines whether the processing of steps 1205 to S1206 has been performed on all the pixels of the converted foreground image data. If the result of the determination in step S1207 is true, the matching degree calculation unit 1104 outputs the calculated matching degree of all the pixels to the correction unit 1105, and proceeds to step S1208. On the other hand, if the result of the determination in step S1207 is false, the process returns to step S1205.

ステップＳ１２０８において、補正部１１０５は、基準視点番号に対応する変換前景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換前景画像データの左上の画素が着目画素として選択され、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換前景画像データの全画素について一致度に基づく画素値の更新（ステップＳ１２０９）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1208, the correction unit 1105 determines the pixel of interest in the converted foreground image data corresponding to the reference viewpoint number. In this embodiment, first, the upper left pixel of the converted foreground image data corresponding to the reference viewpoint number is selected as the pixel of interest, and the unprocessed pixel is sequentially selected as the pixel of interest. If the pixel values are updated based on the degree of coincidence (step S1209) for all the pixels of the converted foreground image data corresponding to the reference viewpoint number, the pixels of interest may be determined in any order.

ステップＳ１２０９において、補正部１１０５は、一致度算出部１１０４から取得した一致度に基づき、基準視点番号に対応する変換前景画像における影による領域の画素である可能性が高い画素を検出する。そして、補正部１１０５は、検出した画素の画素値を０に変更することで不完全な前景画像から影による領域を取り除く。本実施例では、算出した一致度が事前に定めた閾値以上であれば、全視点における着目画素間の一致の度合いが高いため、着目画素が高さを持たない影による領域の画素である可能性が高いと判定する。そして、基準視点番号に対応する変換前景画像データにおける着目画素の画素値を０に変更する。一方、算出した一致度が閾値未満であれば、全視点における着目画素間の一致の度合いが低く、着目画素が高さを持つ前景の被写体による領域の画素である可能性が高いと判定する。この場合、基準視点番号に対応する変換前景画像における着目画素の画素値を変更しない。なお、本実施例では、閾値として０．８を用いたが、閾値の値はこれに限らない。 In step S1209, the correction unit 1105 detects pixels that are likely to be pixels in the shadow region in the converted foreground image corresponding to the reference viewpoint number, based on the degree of coincidence acquired from the match degree calculation unit 1104. Then, the correction unit 1105 removes the region due to the shadow from the incomplete foreground image by changing the pixel value of the detected pixel to 0. In this embodiment, if the calculated degree of coincidence is equal to or greater than a predetermined threshold value, the degree of coincidence between the pixels of interest at all viewpoints is high, so that the pixels of interest may be pixels in a shadow region having no height. It is judged that the sex is high. Then, the pixel value of the pixel of interest in the converted foreground image data corresponding to the reference viewpoint number is changed to 0. On the other hand, if the calculated degree of coincidence is less than the threshold value, it is determined that the degree of coincidence between the pixels of interest in all viewpoints is low, and it is highly possible that the pixels of interest are pixels in the region of the subject in the foreground having a height. In this case, the pixel value of the pixel of interest in the converted foreground image corresponding to the reference viewpoint number is not changed. In this embodiment, 0.8 is used as the threshold value, but the value of the threshold value is not limited to this.

ステップＳ１２１０において、補正部１１０５は、基準視点番号に対応する変換前景画像データの全画素についてステップＳ１２０８〜ステップＳ１２０９の処理を行ったかを判定する。ステップＳ１２１０の判定の結果が真の場合、補正部１１０５は、補正が完了した基準視点番号に対応する変換前景画像データを、二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。一方、ステップＳ１２１０の判定の結果が偽の場合、ステップＳ１２０８に戻る。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S1210, the correction unit 1105 determines whether the processes of steps S1208 to S1209 have been performed on all the pixels of the converted foreground image data corresponding to the reference viewpoint number. When the result of the determination in step S1210 is true, the correction unit 1105 outputs the converted foreground image data corresponding to the reference viewpoint number for which the correction has been completed to the secondary storage device 104, the external storage device 108, or the display device 109. , A series of processing is completed. On the other hand, if the result of the determination in step S1210 is false, the process returns to step S1208. The above is the process of extracting the foreground region, which is executed by the image processing apparatus 100 in this embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１３を用いて説明する。被写体１３０１は、被写体自身の影１３０２が地上面１３０３に存在する前景の被写体である。画像データ１３０４は、複数の異なる視点における、被写体１３０１とこれに付随する影１３０２とによる領域を前景領域として抽出した前景画像である。本実施例では、画像データ１３０４を着目視点１３０５から見た場合の画像に変換することで得られる複数の変換前景画像データにおける着目画素間の一致の度合いに基づき、地上面からの高さを持たない影による領域の画素を検出する。そして、検出した画素を補正することで、前景画像１３０６を作成する。前景画像１３０６では、高さを持つ前景の被写体１３０１に付随する影１３０２による領域が取り除かれており、前景の被写体１３０１による領域のみを抽出できている。このように、本実施例によれば、高さを持つ前景の被写体に付随する影が存在する場合であっても、影による領域を抽出することなく、この前景の被写体による領域のみを高精度に抽出することができる。 <About the effect of this example>
Hereinafter, the effects of this embodiment will be described with reference to FIG. The subject 1301 is a foreground subject in which the shadow 1302 of the subject itself exists on the ground surface 1303. The image data 1304 is a foreground image obtained by extracting a region formed by a subject 1301 and a shadow 1302 associated therewith from a plurality of different viewpoints as a foreground region. In this embodiment, the height from the ground surface is obtained based on the degree of matching between the pixels of interest in the plurality of converted foreground image data obtained by converting the image data 1304 into an image when viewed from the viewpoint of interest 1305. Detects pixels in the area with no shadows. Then, the foreground image 1306 is created by correcting the detected pixels. In the foreground image 1306, the region due to the shadow 1302 accompanying the subject 1301 in the foreground having a height is removed, and only the region due to the subject 1301 in the foreground can be extracted. As described above, according to the present embodiment, even if there is a shadow accompanying the subject in the foreground having a height, only the region due to the subject in the foreground is highly accurate without extracting the region due to the shadow. Can be extracted to.

なお、本実施例では、不完全な前景画像として、事前に撮像した撮像画像と背景画像とに基づいて作成した前景画像を用いるが、実施例１や実施例２により作成した前景画像を用いてもよい。その場合、実施例１や実施例２と、実施例３とをそれぞれ単独で実行した場合に比べて、前景の被写体による領域を高精度に抽出することができる。 In this embodiment, the foreground image created based on the captured image and the background image captured in advance is used as the incomplete foreground image, but the foreground image created in Examples 1 and 2 is used. May be good. In that case, the region due to the subject in the foreground can be extracted with higher accuracy than when the first and second embodiments and the third embodiment are executed independently.

［その他の実施例］
本発明の実施形態は、上述の実施例に限られるものではなく、様々な実施形態をとることが可能である。例えば、上述の実施例では、不完全な背景画像である背景画像データのサイズと対象画像データのサイズとが、同一である場合について説明しているが、これらのサイズは同一でなくても良い。その場合、地上面を上から見た視点を基準視点として、背景画像を基準視点から見た場合の画像へと変換する。そして、該変換した画像を用いて背景画像を補正し、該補正した背景画像を着目視点から見た場合の画像へと変換することで、対象画像データに対応する背景画像データを作成する。 [Other Examples]
The embodiment of the present invention is not limited to the above-described embodiment, and various embodiments can be taken. For example, in the above-described embodiment, the case where the size of the background image data which is an incomplete background image and the size of the target image data are the same is described, but these sizes do not have to be the same. .. In that case, the viewpoint viewed from above on the ground surface is used as the reference viewpoint, and the background image is converted into the image viewed from the reference viewpoint. Then, the background image is corrected using the converted image, and the corrected background image is converted into an image when viewed from the viewpoint of interest to create background image data corresponding to the target image data.

また、上述の実施例では、一致度の算出や前景の抽出において、ＲＧＢ空間における画素値を用いているが、用いる情報はこれに限られない。例えば、ＨＳＶやＬａｂなどの異なる色空間の画素値を用いて、一致度の算出や前景の抽出を行うようにしても良い。 Further, in the above-described embodiment, the pixel value in the RGB space is used in the calculation of the degree of coincidence and the extraction of the foreground, but the information used is not limited to this. For example, pixel values in different color spaces such as HSV and Lab may be used to calculate the degree of coincidence and extract the foreground.

さらに、上述の実施例では、画像を射影変換する際、地上面の一平面のみを基準としているが、地上面に平行な複数の平面を基準として用いても良い。例えば、地上面からの高さが０から１センチメートルまでを等間隔に刻むことで複数の平面を設定し、該設定した平面のそれぞれを基準とする射影変換により得られた変換画像を全て用いて一致度の算出を行うようにしても良い。このようにすることで、カメラパラメータの誤差に対するロバスト性が向上する。 Further, in the above-described embodiment, when the image is projected and transformed, only one plane on the ground surface is used as a reference, but a plurality of planes parallel to the ground surface may be used as a reference. For example, a plurality of planes are set by carving the height from the ground surface from 0 to 1 cm at equal intervals, and all the converted images obtained by the projective transformation based on each of the set planes are used. The degree of coincidence may be calculated. By doing so, the robustness against the error of the camera parameter is improved.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２０１・・・対象画像データ取得部
２０２・・・背景画像データ取得部
２０３・・・画像変換部
２０４・・・一致度算出部
２０６・・・補正部
２０７・・・前景抽出部 201 ... Target image data acquisition unit 202 ... Background image data acquisition unit 203 ... Image conversion unit 204 ... Matching degree calculation unit 206 ... Correction unit 207 ... Foreground extraction unit

本発明は、着目視点から撮影されて取得された、被写体を含む着目画像を取得する第１取得手段と、前記着目視点とは異なる複数の視点から撮影されて取得された複数の参照画像を取得する第２取得手段と、前記第２取得手段により取得された複数の参照画像を変換して、前記着目視点から見た場合の複数の変換画像を生成する生成手段と、前記第１取得手段により取得された着目画像の着目画素の画素値と、前記生成手段により生成された複数の変換画像それぞれにおける前記着目画素と対応する画素の画素値との差分に関する指標に基づいて、前記着目画像における前記被写体の画像領域を決定する決定手段と、を有する画像処理装置である。 The present invention obtains a first acquisition means for acquiring a focus image including a subject, which is captured and acquired from a viewpoint of interest, and a plurality of reference images captured and acquired from a plurality of viewpoints different from the viewpoint of interest. The second acquisition means, the generation means that converts the plurality of reference images acquired by the second acquisition means to generate a plurality of converted images when viewed from the viewpoint of interest, and the first acquisition means. The said in the image of interest, based on an index relating to the difference between the pixel value of the pixel of interest in the acquired image of interest and the pixel value of the pixel of interest and the corresponding pixel in each of the plurality of converted images generated by the generation means. It is an image processing apparatus having a determination means for determining an image area of a subject.

Claims

A target image data acquisition means for acquiring target image data, which is an image obtained by capturing an image of a foreground subject and a background subject from a viewpoint of interest.
A background image data acquisition means for acquiring background image data which is an image of the subject in the background from a plurality of different viewpoints, and
A conversion background image data creation means for creating a plurality of conversion background image data by converting each of the acquired plurality of background image data into an image when viewed from the viewpoint of interest.
A calculation means for calculating the degree of coincidence indicating the degree of coincidence between the pixels of interest in the plurality of converted background image data, and
A correction means for correcting the converted background image data in the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence.
An image process characterized by having a foreground image data creating means for creating foreground image data which is an image in which a region of the foreground subject is extracted based on the target image data and the corrected converted background image data. Device.

The conversion background image data creation means according to claim 1, wherein the acquired background image data is projected and converted into an image when viewed from the viewpoint of interest with the ground surface as a reference. Image processing device.

The conversion background image data creating means projects the acquired plurality of background image data into an image when viewed from the viewpoint of interest with reference to a plurality of planes parallel to the ground surface in addition to the ground surface. The image processing apparatus according to claim 2, wherein the image processing apparatus is used.

The image processing apparatus according to any one of claims 1 to 3, further comprising a detection means for detecting a pixel to be corrected by the correction means based on the degree of coincidence.

When the calculated degree of coincidence indicates that the degree of coincidence between the pixels of interest in the plurality of converted background image data is low, the detection means is characterized in that the pixel of interest is detected as the object to be corrected. The image processing apparatus according to claim 4.

A continuity calculation means for calculating the continuity indicating the degree of smoothness of the change in the pixel value between the pixels of interest in the plurality of converted background image data,
The image processing apparatus according to any one of claims 1 to 3, further comprising a detection means for detecting a pixel to be corrected by the correction means based on the degree of coincidence and the continuity. ..

6. The continuity calculation means is characterized in that the continuity is calculated based on the converted background image data at the viewpoint closest to the viewpoint of interest and the converted background image data at the viewpoint adjacent to the viewpoint. The image processing apparatus according to.

The calculated degree of matching indicates that the degree of matching between the pixels of interest in the plurality of converted background image data is low, and the calculated continuity is the pixels between the pixels of interest in the plurality of converted background image data. The image processing apparatus according to claim 6 or 7, wherein when the smoothness of the change in the value is shown to be low, the detection means detects the pixel of interest as the correction target.

The correction means is characterized in that the pixel value of the pixel of interest in the converted background image data at the viewpoint closest to the viewpoint of interest is replaced with an intermediate value of the pixel value of the pixel of interest in the plurality of converted background images. The image processing apparatus according to any one of claims 1 to 8.

The image processing apparatus according to any one of claims 1 to 9, wherein the background image data includes an image of the subject in the foreground.

The background image data is an image created based on a plurality of images corresponding to a plurality of different times taken continuously in a time series at each of the plurality of different viewpoints, or at each of the plurality of different viewpoints. The image processing apparatus according to any one of claims 1 to 10, wherein the image is captured in a state where the subject in the foreground does not exist and only the subject in the background exists.

A target image data acquisition means for acquiring target image data, which is an image obtained by capturing an image of a foreground subject and a background subject from a viewpoint of interest.
Foreground image data acquisition means for acquiring foreground image data in which a region formed by a subject in the foreground and a shadow accompanying the subject is extracted from a plurality of different viewpoints.
A conversion foreground image data creation means for creating a plurality of conversion foreground image data by converting each of the acquired plurality of foreground image data into an image when viewed from the viewpoint of interest.
A calculation means for calculating the degree of coincidence indicating the degree of coincidence between the pixels of interest in the plurality of converted foreground image data, and
A correction means for correcting the converted foreground image data in the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence.
An image processing apparatus comprising: foreground image data creating means for creating foreground image data in which a region of the foreground subject is extracted based on the target image data and the corrected converted foreground image data.

The step of acquiring the target image data, which is an image of the foreground subject and the background subject from the viewpoint of interest,
A step of acquiring background image data which is an image of the subject in the background from a plurality of different viewpoints,
A step of creating a plurality of converted background image data by converting each of the acquired plurality of background image data into an image when viewed from the viewpoint of interest.
A step of calculating the degree of coincidence indicating the degree of coincidence between the pixels of interest in the plurality of converted background image data, and
A step of correcting the converted background image data in the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence, and
An image processing method comprising a step of creating foreground image data, which is an image in which a region of the foreground subject is extracted, based on the target image data and the corrected conversion background image data.

The step of acquiring the target image data, which is an image of the foreground subject and the background subject from the viewpoint of interest,
A step of acquiring foreground image data in which a region formed by a subject in the foreground and a shadow accompanying the subject is extracted from a plurality of different viewpoints.
A step of creating a plurality of converted foreground image data by converting each of the acquired plurality of foreground image data into an image when viewed from the viewpoint of interest.
The step of calculating the degree of coincidence indicating the degree of coincidence between the pixels of interest in the plurality of converted foreground image data, and
A step of correcting the converted foreground image data in the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence, and
An image processing method comprising a step of creating foreground image data in which a region due to a subject in the foreground is extracted based on the target image data and the corrected foreground image data.

A program for operating a computer as the image processing device according to any one of claims 1 to 12.