JP2018055367A

JP2018055367A - Image processing device, image processing method, and program

Info

Publication number: JP2018055367A
Application number: JP2016190052A
Authority: JP
Inventors: 希名板倉; Kina Itakura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2018-04-05
Anticipated expiration: 2036-09-28
Also published as: JP6873644B2; JP2021108193A; JP7159384B2

Abstract

PROBLEM TO BE SOLVED: To extract an area due to a subject in foreground with high accuracy, irrespective of a subject state such as presence or absence of a change of subject (movement, etc.), or dense or sparse arrangement of the subject in the foreground over time.SOLUTION: An image processing device according to the present invention is characterized to comprise: means that acquires target image data being an image by imaging a subject in foreground and a subject in background from an interest viewpoint; means that acquires background image data being the image of the subject in the background in a plurality of different viewpoints; means that generates a plurality of pieces of converted background image data by converting the acquired plurality of pieces of background image data; calculation means that calculates a coincidence degree between interest pixels in the plurality of pieces of converted background image data; means that corrects the converted background image data in a reference viewpoint based on the coincidence degree; and means that generates the foreground image data being an image, in which an area due to the subject in the foreground is extracted, based on the target image data and the corrected converted background image data.SELECTED DRAWING: Figure 3

Description

本発明は、撮像画像から前景の被写体による領域を抽出する技術に関する。 The present invention relates to a technique for extracting a region of a foreground subject from a captured image.

従来、被写体（前景の被写体と背景の被写体とを含む）を撮像することで取得した撮像画像から前景の被写体による領域を抽出する手法として、背景差分法が存在する。背景差分法では、前景の被写体と背景の被写体とが写っている撮像画像の画素値と、背景の被写体のみが写っている背景画像の画素値との画素毎の差分に基づいて、前景の被写体による領域を抽出した前景画像を作成する。このとき、特定の条件の元で予め撮像した背景のみが写っている画像を背景画像として用いた場合、時間の経過に伴う日照の変化などにより背景が変化すると、前景の被写体による領域を抽出する精度が低下してしまうという問題があった。 Conventionally, there is a background subtraction method as a method for extracting a region of a foreground subject from a captured image acquired by imaging a subject (including a foreground subject and a background subject). In the background subtraction method, the foreground subject is based on the pixel-by-pixel difference between the pixel value of the captured image that includes the foreground subject and the background subject, and the pixel value of the background image that includes only the background subject. A foreground image is created by extracting the region by. At this time, when an image showing only a background imaged in advance under a specific condition is used as a background image, if the background changes due to a change in sunshine with the passage of time, the area of the foreground subject is extracted. There was a problem that the accuracy was lowered.

上記の問題を解決するために、特許文献１は、撮像時刻が異なる複数の画像に基づいて作成した背景画像を用いることで、背景の変化によらず前景の被写体による領域を抽出する技術を開示する。 In order to solve the above problem, Patent Document 1 discloses a technique for extracting a region of a foreground subject regardless of a background change by using a background image created based on a plurality of images having different imaging times. To do.

また、特許文献２は、同一時刻において異なる視点から撮像した複数の画像に基づいて作成した背景画像を用いて、時間の経過に伴う被写体の変化によらず前景の被写体による領域を抽出する技術を開示する。 Patent Document 2 discloses a technique for extracting a region of a foreground subject using a background image created based on a plurality of images taken from different viewpoints at the same time, regardless of changes in the subject with the passage of time. Disclose.

特開２０１２−１０４０５３号公報JP 2012-104053 A 特開２０１４−２３０１８０号公報JP 2014-230180 A

しかしながら、特許文献１では、前景の被写体が動かないで停止している場合、この前景の被写体による領域を背景の被写体による領域と誤って判定するため、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 However, in Patent Document 1, when the foreground subject is stopped without moving, the region due to the foreground subject is erroneously determined as the region due to the background subject, so that the background image cannot be created with high accuracy. For this reason, there is a problem that the extraction accuracy of the region by the foreground subject is lowered.

また、特許文献２では、単一の視点からでは見えない背景の被写体の情報を、他の視点における情報により補うことで背景画像を作成するが、シーン内に存在する前景の被写体が密集し前景の被写体が重なる領域などにおいて、背景画像を精度良く作成できない。このため、前景の被写体による領域の抽出精度が低下するという課題がある。 In Patent Document 2, a background image is created by supplementing information on a background subject that cannot be seen from a single viewpoint with information from another viewpoint. However, the foreground subjects existing in the scene are densely gathered. A background image cannot be created with high accuracy in a region where the subject overlaps. For this reason, there is a problem that the extraction accuracy of the region by the foreground subject is lowered.

そこで本発明は、上記の課題を鑑みて、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することを目的とする。 Therefore, in view of the above-described problems, the present invention provides a highly accurate area foreground subject regardless of whether or not the subject has changed (moved, etc.) over time and the foreground subject is in a sparse or dense state. The purpose is to extract.

本発明は、着目視点から前景の被写体と背景の被写体とを撮像した画像である対象画像データを取得する対象画像データ取得手段と、複数の異なる視点における、前記背景の被写体の画像である背景画像データを取得する背景画像データ取得手段と、前記取得した複数の背景画像データを、着目視点から見た場合の画像へとそれぞれ変換することで、複数の変換背景画像データを作成する変換背景画像データ作成手段と、前記複数の変換背景画像データにおける着目画素間の一致の度合いを示す一致度を算出する算出手段と、前記着目視点との距離に応じて決定された視点における前記変換背景画像データを、前記一致度に基づき補正する補正手段と、前記対象画像データと前記補正した変換背景画像データとに基づき、前記前景の被写体による領域が抽出された画像である前景画像データを作成する前景画像データ作成手段とを有することを特徴とする画像処理装置である。 The present invention provides a target image data acquisition unit that acquires target image data that is an image obtained by capturing a foreground subject and a background subject from a viewpoint of interest, and a background image that is an image of the background subject at a plurality of different viewpoints. Background image data acquisition means for acquiring data, and converted background image data for generating a plurality of converted background image data by converting each of the acquired plurality of background image data into an image when viewed from the viewpoint of interest. The converted background image data at the viewpoint determined according to the distance between the creating means, the degree of coincidence indicating the degree of matching between the target pixels in the plurality of converted background image data, and the target viewpoint The foreground subject based on the correction means for correcting based on the degree of coincidence, the target image data and the corrected converted background image data. An image processing apparatus characterized by having a foreground image data generating means for generating the foreground image data is an image region that has been extracted.

本発明によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 According to the present invention, it is possible to extract a region of a foreground subject with high accuracy regardless of the state of the subject such as the presence or absence of change (movement) of the subject with time and the density of the foreground subject.

実施例１乃至３における画像処理装置のハードウェア構成を示すブロック図FIG. 3 is a block diagram illustrating a hardware configuration of the image processing apparatus according to the first to third embodiments. 実施例１における画像処理装置の機能構成を示すブロック図1 is a block diagram illustrating a functional configuration of an image processing apparatus according to a first embodiment. 実施例１における前景領域を抽出する処理の流れを示すフローチャートA flowchart showing a flow of processing for extracting a foreground region in the first embodiment. 実施例１における前景領域を抽出する処理の概要を説明する図The figure explaining the outline | summary of the process which extracts the foreground area | region in Example 1. FIG. 実施例１における画像変換を説明する図FIG. 6 is a diagram for explaining image conversion in the first embodiment. 実施例１の効果を説明する図The figure explaining the effect of Example 1 実施例２における画像処理装置の機能構成を示すブロック図FIG. 3 is a block diagram illustrating a functional configuration of an image processing apparatus according to a second embodiment. 実施例２における前景領域を抽出する処理の流れを示すフローチャートA flowchart showing a flow of processing for extracting a foreground area in the second embodiment. 実施例２における連続性の算出手法を説明する図The figure explaining the calculation method of the continuity in Example 2 実施例２の効果を説明する図The figure explaining the effect of Example 2 実施例３における画像処理装置の機能構成を示すブロック図FIG. 9 is a block diagram illustrating a functional configuration of an image processing apparatus according to a third embodiment. 実施例３における前景領域を抽出する処理の流れを示すフローチャートA flowchart showing a flow of processing for extracting a foreground area in the third embodiment. 実施例３の効果を説明する図The figure explaining the effect of Example 3

以下、本発明の実施形態について、図面を参照して説明する。ただし、以下の実施形態は本発明を限定するものではなく、また、以下の実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成要素については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the following embodiments do not limit the present invention, and all the combinations of features described in the following embodiments are not necessarily essential to the solution means of the present invention. In addition, about the same component, the same code | symbol is attached | subjected and demonstrated.

［実施例１］
実施例１では、多視点画像、具体的には複数の異なる視点における前景の被写体の画像を一部含む背景の被写体の画像（以下、不完全な背景画像）に基づき、着目視点における前景の被写体の画像を含まない背景の被写体の画像（以下、完全な背景画像）を作成する。そして、完全な背景画像を用いて、処理対象の画像から前景の被写体による領域を抽出する。 [Example 1]
In the first embodiment, a foreground subject at a viewpoint of interest is based on a multi-viewpoint image, specifically, an image of a background subject partially including images of foreground subjects at a plurality of different viewpoints (hereinafter referred to as incomplete background images). An image of a background subject that does not include the image (hereinafter referred to as a complete background image) is created. Then, using the complete background image, an area of the foreground subject is extracted from the processing target image.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について、図４を用いて説明する。本実施例では、まず、複数の異なる視点における背景画像データ４０１を取得する。背景画像データとは、背景の被写体の画像、所謂背景画像である。ここで取得する背景画像データは、前景の被写体の画像（以下、前景画像）を全く含まない完全な背景画像である必要はないが、前景領域を抽出する対象の画像を撮像した時刻に近い時刻に撮像した画像であることが望ましい。取得する複数の背景画像データ４０１の中には、前景領域を抽出する対象の画像を撮像した視点４０２と同一の視点における画像が含まれているものとする。以下、前景領域を抽出する対象の画像を対象画像（データ）と呼び、対象画像（データ）を撮像した視点を着目視点と呼ぶ。 <Outline of foreground region extraction processing>
Hereinafter, the outline of the process of extracting the foreground region in the present embodiment will be described with reference to FIG. In this embodiment, first, background image data 401 at a plurality of different viewpoints is acquired. The background image data is a so-called background image of a background subject. The background image data acquired here does not have to be a complete background image that does not include an image of a foreground subject (hereinafter referred to as a foreground image), but a time close to the time at which an image to be extracted is extracted. It is desirable that the image is taken in the same manner. It is assumed that the plurality of background image data 401 to be acquired includes an image at the same viewpoint as the viewpoint 402 that captured the target image from which the foreground area is extracted. Hereinafter, a target image from which the foreground region is extracted is referred to as a target image (data), and a viewpoint that images the target image (data) is referred to as a viewpoint of interest.

次に、取得した背景画像データ４０１を、視点毎に、地上面を基準として着目視点４０２から見た場合の画像へと変換することで、着目視点における背景画像データ４０３を作成する。ここで作成される背景画像データ４０３の数は、背景画像データ４０１の数と同一である。以下、背景画像データ４０１を変換することで得られる背景画像データ４０３を、変換背景画像データ４０３と呼ぶ。 Next, background image data 403 at the target viewpoint is created by converting the acquired background image data 401 into an image viewed from the target viewpoint 402 with respect to the ground surface for each viewpoint. The number of background image data 403 created here is the same as the number of background image data 401. Hereinafter, the background image data 403 obtained by converting the background image data 401 is referred to as converted background image data 403.

ここで、前景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して近い位置に存在する被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、選手や審判などの人物や、ゴールやボールなどの器具が前景の被写体であり、前景の被写体には、時系列に沿って連続で撮像した複数の画像において概ね動き続けるものが含まれる。一方で、背景の被写体とは、撮像画像に含まれる被写体の中で撮像装置に対して遠い位置に存在するため前景の被写体の背後となる被写体を意味する。例えば、対象画像データがスポーツなどの競技シーンを撮像したデータである場合、芝や土で構成されるグラウンド、体育館の床などが背景の被写体であり、背景の被写体は、時系列に沿って連続で撮像した複数の画像において概ね止まっているものが多い。 Here, the foreground subject means a subject present in a position close to the imaging device among subjects included in the captured image. For example, when the target image data is data obtained by imaging a sports scene such as a sport, a person such as a player or a referee, or a device such as a goal or a ball is a foreground subject. In other words, a plurality of images captured continuously in succession are included. On the other hand, the background subject means a subject behind the foreground subject because the subject included in the captured image is located far from the imaging device. For example, if the target image data is an image of sports scenes such as sports, the ground consisting of turf and soil, the floor of the gymnasium, etc. are the background subjects, and the background subjects are continuous in time series Many of the plurality of images picked up in FIG.

このような前景の被写体は地上面からの高さを持つ一方で、背景の被写体は地上面からの高さを持たない。そのため、複数の変換背景画像データ４０３を用いて、地上面からの高さを持つ被写体つまり前景の被写体の画像（前景画像）を検出し、該検出した前景画像を不完全な背景画像から除去することで、着目視点４０２における完全な背景画像を作成する。具体的には、着目視点４０２における画像を含む複数の変換背景画像データ４０３について、着目画素間の一致の度合いを画素毎に算出し、一致の度合いが低い画素を前景の被写体の画像領域の画素として検出する。上述の通り、変換背景画像データ４０３は、背景画像データ４０１を、地上面を基準面として着目視点４０２から見た場合の画像に変換することで得られる画像である。そのため、地上面に存在し高さを持たない被写体４０４に対応する、背景画像データ４０１における領域４０５〜４０７の画素の座標はそれぞれ、全ての変換背景画像データ４０３において共通して同じ位置に存在する領域４０８の画素の座標へと変換される。一方、高さを持つ被写体４０９に対応する、背景画像データ４０１における領域４１０〜４１２の画素の座標はそれぞれ、視点によって位置が異なる領域４１３〜４１５の画素の座標へと変換される。従って、複数の変換背景画像データ４０３において、着目画素間の一致の度合いが高い画素を、高さを持たない背景の被写体の画像領域の画素とみなし、一致の度合いが低い画素を、高さを持つ前景の被写体の画像領域の画素とみなす。これにより、完全な背景画像を作成する。最後に、作成した着目視点４０２における完全な背景画像と対象画像データとを比較することで前景領域を抽出する。 Such a foreground subject has a height from the ground surface, while a background subject does not have a height from the ground surface. Therefore, a plurality of converted background image data 403 is used to detect a subject having a height from the ground surface, that is, a foreground subject image (foreground image), and the detected foreground image is removed from the incomplete background image. As a result, a complete background image at the viewpoint 402 of interest is created. Specifically, for a plurality of converted background image data 403 including an image at the viewpoint of interest 402, the degree of coincidence between the pixels of interest is calculated for each pixel, and a pixel with a low degree of coincidence is determined as a pixel in the image area of the foreground subject. Detect as. As described above, the converted background image data 403 is an image obtained by converting the background image data 401 into an image when viewed from the viewpoint of view 402 with the ground plane as a reference plane. Therefore, the coordinates of the pixels in the areas 405 to 407 in the background image data 401 corresponding to the object 404 that is present on the ground surface and does not have a height exist in the same position in common in all the converted background image data 403. It is converted into the coordinates of the pixel in the area 408. On the other hand, the coordinates of the pixels in the areas 410 to 412 in the background image data 401 corresponding to the subject 409 having a height are converted into the coordinates of the pixels in the areas 413 to 415 having different positions depending on the viewpoint. Therefore, in the plurality of converted background image data 403, a pixel having a high degree of matching between the target pixels is regarded as a pixel in an image area of a background subject having no height, and a pixel having a low degree of matching is set to a height. It is regarded as a pixel in the image area of the foreground subject that it has. This creates a complete background image. Finally, the foreground region is extracted by comparing the complete background image at the generated viewpoint 402 of interest with the target image data.

以上が、本実施例で行われる処理の概要である。なお、用いる対象画像データは上記の例に限られず、監視カメラで撮像したデータなど様々な画像データを用いることができる。また、ここでは、背景画像データ４０１の中に着目視点における画像が含まれる場合について説明したが、背景画像データの中に着目視点における画像が含まれない場合にも本実施例を適用可能であり、具体的な処理方法は後述する。 The above is the outline of the processing performed in this embodiment. The target image data to be used is not limited to the above example, and various image data such as data captured by a monitoring camera can be used. Here, the case where the image at the viewpoint of interest is included in the background image data 401 has been described, but the present embodiment can also be applied to the case where the image at the viewpoint of interest is not included in the background image data. A specific processing method will be described later.

＜画像処理装置のハードウェア構成について＞
以下、本実施例の画像処理装置のハードウェア構成について述べる。図１は、本実施例の画像処理装置のハードウェア構成の一例を示すブロック図である。本実施例の画像処理装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＲＯＭ１０３、二次記憶装置１０４、入力インターフェース１０５、及び出力インターフェース１０６を備え、これらの構成要素は、システムバス１０７によって相互に接続されている。また、画像処理装置１００は、入力インターフェース１０５を介して外部記憶装置１０８に接続されており、出力インターフェース１０６を介して外部記憶装置１０８と表示装置１０９とに接続されている。 <Hardware configuration of image processing apparatus>
The hardware configuration of the image processing apparatus according to this embodiment will be described below. FIG. 1 is a block diagram illustrating an example of a hardware configuration of the image processing apparatus according to the present exemplary embodiment. The image processing apparatus 100 according to the present exemplary embodiment includes a CPU 101, a RAM 102, a ROM 103, a secondary storage device 104, an input interface 105, and an output interface 106, and these components are connected to each other via a system bus 107. The image processing apparatus 100 is connected to the external storage device 108 via the input interface 105, and is connected to the external storage device 108 and the display device 109 via the output interface 106.

ＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして、ＲＯＭ１０３に格納されたプログラムを実行し、システムバス１０７を介して画像処理装置１００の各構成要素を統括的に制御する。これにより、後述する様々な処理が実行される。 The CPU 101 executes a program stored in the ROM 103 using the RAM 102 as a work memory, and comprehensively controls each component of the image processing apparatus 100 via the system bus 107. Thereby, various processes described later are executed.

二次記憶装置１０４は、画像処理装置１００で取り扱われる種々のデータを記憶する記憶装置であり、本実施例ではＨＤＤが用いられる。ＣＰＵ１０１は、システムバス１０７を介して二次記憶装置１０４へのデータの書き込みと二次記憶装置１０４に記憶されたデータの読出しとを行うことができる。なお、二次記憶装置１０４としてＨＤＤの他に、光ディスクドライブやフラッシュメモリなど、様々な記憶デバイスを用いることが可能である。 The secondary storage device 104 is a storage device that stores various data handled by the image processing apparatus 100, and an HDD is used in this embodiment. The CPU 101 can write data to the secondary storage device 104 and read data stored in the secondary storage device 104 via the system bus 107. In addition to the HDD, various storage devices such as an optical disk drive and a flash memory can be used as the secondary storage device 104.

入力インターフェース１０５は、例えばＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースであり、外部装置から画像処理装置１００へのデータや命令等の入力は、入力インターフェース１０５を介して行われる。画像処理装置１００は、入力インターフェース１０５を介して、外部記憶装置１０８（例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリなどの記憶媒体）からデータを取得する。なお、入力インターフェース１０５にはマウスやキーボードなどユーザーが入力するための入力デバイス（不図示）も接続可能である。出力インターフェース１０６は、入力インターフェース１０５と同様のＵＳＢやＩＥＥＥ１３９４等のシリアルバスインターフェースの他に、例えばＤＶＩやＨＤＭＩ（登録商標）等の映像出力端子も含む。画像処理装置１００から外部装置へのデータの出力は、出力インターフェース１０６を介して行われる。画像処理装置１００は、出力インターフェース１０６を介して表示装置１０９（液晶ディスプレイなどの各種画像表示デバイス）に処理した画像などを出力することで、画像の表示を行う。なお、画像処理装置１００の構成要素は上述のもの以外にも存在するが、本発明の主眼ではないため、説明を省略する。 The input interface 105 is a serial bus interface such as USB or IEEE1394, for example, and input of data, commands, and the like from an external device to the image processing apparatus 100 is performed via the input interface 105. The image processing apparatus 100 acquires data from an external storage device 108 (for example, a storage medium such as a hard disk, a memory card, a CF card, an SD card, and a USB memory) via the input interface 105. Note that an input device (not shown) such as a mouse or a keyboard for a user to input can be connected to the input interface 105. The output interface 106 includes a video output terminal such as DVI or HDMI (registered trademark), for example, in addition to a serial bus interface such as USB or IEEE1394 similar to the input interface 105. Data is output from the image processing apparatus 100 to an external apparatus via the output interface 106. The image processing apparatus 100 displays an image by outputting the processed image or the like to the display device 109 (various image display devices such as a liquid crystal display) via the output interface 106. Note that the components of the image processing apparatus 100 exist in addition to those described above, but are not the main points of the present invention, and thus the description thereof is omitted.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図２及び図３を用いて説明する。図２は、画像処理装置１００の機能構成を示すブロック図であり、図３は、前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図２に示す各構成要素として機能し、図３に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About foreground area extraction processing>
Hereinafter, the process of extracting the foreground region executed by the image processing apparatus 100 according to the present embodiment will be described with reference to FIGS. FIG. 2 is a block diagram illustrating a functional configuration of the image processing apparatus 100, and FIG. 3 is a flowchart illustrating a flow of processing for extracting a foreground region. The CPU 101 of the image processing apparatus 100 functions as each component shown in FIG. 2 and executes a series of processes shown in FIG. 3 by executing a program stored in the ROM 103 using the RAM 102 as a work memory. Note that it is not necessary for the CPU 101 to execute all of the processes described below, and the image processing apparatus 100 may be configured such that part or all of the processes are performed by one or more processing circuits other than the CPU 101. good.

以下、各構成要素により行われる処理の流れを説明する。ステップＳ３０１において、対象画像データ取得部２０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを取得する。上述の通り、対象画像データとは、前景領域を抽出する対象となる画像である。また、対象画像データ取得部２０１は、対象画像データを撮像したカメラの視点を着目視点と定める。なお、ここでは、対象画像データが１枚の画像である場合について説明しているが、対象画像データが複数枚の画像である場合についても、本実施例を適用することが可能である。さらに、対象画像データ取得部２０１は、対象画像データを撮像したカメラのパラメータ（以下、カメラパラメータ）を、対象画像データとともに取得する。ここでカメラパラメータとは、３次元空間中の点をカメラで撮像される画像上に射影する計算を可能とするパラメータであって、カメラの位置、姿勢を表す外部パラメータと、焦点距離、光学中心を表す内部パラメータとを含む。メモリ上に予め記憶されている計測値や設計値を、カメラパラメータとして用いて良い。対象画像データ取得部２０１は、対象画像データを前景抽出部２０７に、カメラパラメータを画像変換部２０３に出力する。 Hereinafter, the flow of processing performed by each component will be described. In step S 301, the target image data acquisition unit 201 acquires target image data from the external storage device 108 or the secondary storage device 104 via the input interface 105. As described above, the target image data is an image from which a foreground area is extracted. Further, the target image data acquisition unit 201 determines the viewpoint of the camera that captured the target image data as the viewpoint of interest. Although the case where the target image data is one image has been described here, the present embodiment can also be applied to the case where the target image data is a plurality of images. Furthermore, the target image data acquisition unit 201 acquires the parameters of the camera that captured the target image data (hereinafter, camera parameters) together with the target image data. Here, the camera parameter is a parameter that enables calculation to project a point in the three-dimensional space onto an image captured by the camera, and includes an external parameter that represents the position and orientation of the camera, a focal length, and an optical center. And an internal parameter representing. Measurement values and design values stored in advance in the memory may be used as camera parameters. The target image data acquisition unit 201 outputs the target image data to the foreground extraction unit 207 and the camera parameter to the image conversion unit 203.

ステップＳ３０２において、背景画像データ取得部２０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の背景画像データを取得する。ここで背景画像データとは、対象画像データを撮像した際の環境と略同一の環境（天候や時間帯など）における背景の被写体の画像である。なお、上述の通り、本ステップで取得する背景画像データは、前景画像を全く含まない背景画像（完全な背景画像）である必要はない。 In step S 302, the background image data acquisition unit 202 acquires a plurality of background image data at a plurality of different viewpoints from the external storage device 108 or the secondary storage device 104 via the input interface 105. Here, the background image data is an image of a subject in the background in an environment (such as weather or time zone) that is substantially the same as the environment when the target image data is captured. As described above, the background image data acquired in this step need not be a background image (complete background image) that does not include any foreground image.

本実施例では、シーンを同一の視点から時系列に沿って連続で撮像することで取得した複数の異なる時刻に対応する複数の画像に対して、中間値フィルタを用いたフィルタ処理を行うことで、各視点における背景画像データを作成する。ただし、背景画像データを作成する手法はこの手法に限られない。例えば、平均値フィルタなど他のフィルタを用いて背景画像データを作成しても良いし、複数の画像に対するクラスタリング処理を行うことで、背景画像データを作成しても良い。また、視点毎に、前景の被写体が存在しない状態で事前に撮像することで取得した背景画像データを用いても良い。 In the present embodiment, by performing filter processing using an intermediate value filter on a plurality of images corresponding to a plurality of different times acquired by continuously capturing a scene in time series from the same viewpoint. The background image data at each viewpoint is created. However, the method for creating the background image data is not limited to this method. For example, the background image data may be created using another filter such as an average value filter, or the background image data may be created by performing a clustering process on a plurality of images. Further, background image data acquired by imaging in advance in a state where no foreground subject exists may be used for each viewpoint.

また、背景画像データ取得部２０２は、各背景画像データに対応するカメラパラメータを、背景画像データとともに取得する。さらに、背景画像データ取得部２０２は、複数の背景画像データのそれぞれを区別するため、各背景画像データを、カメラの視点を区別する番号（以下、カメラの視点番号）と対応付けて記憶する。背景画像データ取得部２０２は、背景画像データとカメラパラメータとを画像変換部２０３に出力し、背景画像データのみを補正部２０６に出力する。 The background image data acquisition unit 202 acquires camera parameters corresponding to each background image data together with the background image data. Furthermore, the background image data acquisition unit 202 stores each of the background image data in association with a number for distinguishing the viewpoint of the camera (hereinafter, camera viewpoint number) in order to distinguish each of the plurality of background image data. The background image data acquisition unit 202 outputs the background image data and the camera parameters to the image conversion unit 203, and outputs only the background image data to the correction unit 206.

ステップＳ３０３において、画像変換部２０３は、対象画像データ取得部２０１と背景画像データ取得部２０２とから取得したカメラパラメータを用いて、背景画像データ取得部２０２から取得した背景画像データを、着目視点から見た場合の画像へと変換する。具体的には、背景画像データ毎に、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる背景画像（データ）を変換背景画像（データ）と呼ぶ。このように、画像変換部２０３は、変換背景画像データ作成手段として機能する。ここで、本ステップにおける画像変換の手法を、図５を用いて説明する。 In step S303, the image conversion unit 203 uses the camera parameters acquired from the target image data acquisition unit 201 and the background image data acquisition unit 202 to convert the background image data acquired from the background image data acquisition unit 202 from the viewpoint of interest. Convert to an image when viewed. Specifically, for each background image data, an image when viewed from the viewpoint of interest is obtained by performing projective transformation using the ground surface as a reference. The background image (data) obtained by the image conversion in this step is referred to as a converted background image (data). As described above, the image conversion unit 203 functions as a converted background image data creation unit. Here, the image conversion method in this step will be described with reference to FIG.

図５に示すように、３次元空間中のある点５０１がカメラ５０２の画像に投影されている場合、点５０１とカメラ５０２とを結ぶ直線と、画像面５０３とが交差してできる点５０４が、３次元空間中の点５０１の画像面５０３への投影像となる。同様に、カメラ５０２と異なる位置に存在するカメラ（別視点のカメラ）５０５では、点５０１とカメラ５０５とを結ぶ直線と、画像面５０６とが交差してできる点５０７が、点５０１の画像面５０６への投影像となる。ここで、点５０１を含む、画像面５０３と画像面５０６とに投影されている全ての３次元空間中の点が、地上面である同一平面上に存在する場合について検討する。この場合、カメラ５０２とカメラ５０５とのカメラパラメータによって算出される３×３のホモグラフィ行列Ｈ₀₁を用いて、式（１）により、画像面５０３上の任意の画素の座標（ｕ₀、ｖ₀）は、画像面５０６上の座標（ｕ₁、ｖ₁）へと変換される。 As shown in FIG. 5, when a certain point 501 in the three-dimensional space is projected on the image of the camera 502, a point 504 formed by the intersection of the straight line connecting the point 501 and the camera 502 and the image plane 503 is obtained. A projected image of the point 501 on the image plane 503 in the three-dimensional space is obtained. Similarly, in a camera (camera of another viewpoint) 505 that exists at a different position from the camera 502, a point 507 formed by the intersection of the straight line connecting the point 501 and the camera 505 and the image plane 506 is an image plane of the point 501. It becomes a projection image to 506. Here, a case where all the points in the three-dimensional space projected on the image plane 503 and the image plane 506 including the point 501 exist on the same plane as the ground plane will be considered. In this case, using the 3 × 3 homography matrix H ₀₁ calculated by the camera parameters of the camera 502 and the camera 505, the coordinates (u ₀ , v ₀ ) is converted into coordinates (u ₁ , v ₁ ) on the image plane 506.

ステップＳ３０３では、背景画像データ取得部２０２から取得した背景画像データに対応する視点のカメラを上述のカメラ５０２とし、対象画像データ取得部２０１で定めた着目視点のカメラをカメラ５０５とする射影変換を、背景画像データ毎に実行する。このため、本ステップで取得する変換背景画像データの数は、背景画像データ取得部２０２が取得した背景画像データの数と同一である。また、変換背景画像データはそれぞれ、背景画像データ取得部２０２が取得した各背景画像データの視点番号と対応付けて記憶される。画像変換部２０３は、変換背景画像データを一致度算出部２０４と補正部２０６とに出力する。 In step S 303, projective transformation is performed in which the viewpoint camera corresponding to the background image data acquired from the background image data acquisition unit 202 is the above-described camera 502 and the camera of the viewpoint of interest determined by the target image data acquisition unit 201 is the camera 505. Execute for each background image data. For this reason, the number of converted background image data acquired in this step is the same as the number of background image data acquired by the background image data acquisition unit 202. The converted background image data is stored in association with the viewpoint number of each background image data acquired by the background image data acquisition unit 202. The image conversion unit 203 outputs the converted background image data to the coincidence degree calculation unit 204 and the correction unit 206.

ステップＳ３０４において、画像変換部２０３は、背景画像データ取得部２０２から取得した背景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を、基準の背景画像（以下、基準背景画像）として定める。具体的には、着目視点の座標（Ｘo，Ｙo，Ｚo）と、背景画像データ取得部２０２から取得した背景画像データに対応する視点の座標（Ｘｉ，Ｙｉ，Ｚｉ）との距離を視点毎に算出する。ここで、ｉは視点番号を表しており、１≦ｉ＜視点数＋１となる。そして、算出した距離が最小となる視点（基準視点）を検出し、基準視点に対応する背景画像（データ）を基準背景画像（データ）とする。画像変換部２０３は、基準背景画像に対応する視点番号を、一致度算出部２０４と補正部２０６とに出力する。本実施例では、基準背景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S304, the image conversion unit 203 selects an image corresponding to the viewpoint closest to the camera position (the viewpoint of interest) that captured the target image data from the background image data acquired from the background image data acquisition unit 202 as a reference. It is determined as a background image (hereinafter referred to as a reference background image). Specifically, the distance between the coordinates (Xo, Yo, Zo) of the viewpoint of interest and the coordinates (Xi, Yi, Zi) of the viewpoint corresponding to the background image data acquired from the background image data acquisition unit 202 is determined for each viewpoint. calculate. Here, i represents a viewpoint number, and 1 ≦ i <number of viewpoints + 1. Then, the viewpoint (reference viewpoint) that minimizes the calculated distance is detected, and the background image (data) corresponding to the reference viewpoint is set as the reference background image (data). The image conversion unit 203 outputs the viewpoint number corresponding to the reference background image to the coincidence calculation unit 204 and the correction unit 206. In this embodiment, the viewpoint number corresponding to the reference background image is referred to as a reference viewpoint number.

ステップＳ３０５において、一致度算出部２０４は、複数の変換背景画像データにおいて画素が一致するかを判定する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について、複数の変換背景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S305, the coincidence degree calculation unit 204 determines a target pixel in the converted background image data that is a target for determining whether or not the pixels match in the plurality of converted background image data. In this embodiment, first, the upper left pixel of the converted background image data is selected as the target pixel, and then unprocessed pixels are sequentially selected as the target pixel. Note that the pixel of interest may be determined in any order as long as it is determined whether the pixels in the plurality of converted background image data match for all the pixels of the converted background image data.

ステップＳ３０６において、一致度算出部２０４は、画像変換部２０３から取得した複数の変換背景画像データを用いて、基準視点番号に対応する変換背景画像データと他の変換背景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S306, the degree-of-match calculation unit 204 uses a plurality of converted background image data acquired from the image conversion unit 203 to convert between the converted background image data corresponding to the reference viewpoint number and other converted background image data. The degree of coincidence at the pixel of interest is calculated. Hereinafter, the method for calculating the degree of coincidence will be described in detail.

まず、一致度算出部２０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換背景画像データの画素値Ｂ_j（ｕ₂、ｖ₂）を取得する。ここでｊは複数の変換背景画像データのそれぞれを区別する添え字を表し、一致度算出部２０４は、変換背景画像データの数分の画素値を取得する。次に、一致度算出部２０４は、取得した全画素値の中間値を算出する。この中間値は、一致度を算出する際の基準値Ｍとして用いられる。なお、基準値はこれに限られず、平均値など、複数の画素値の統計的な性質を反映する任意の値を基準値として用いて良い。 First, the degree of coincidence calculation unit 204 acquires the pixel value B _j (u ₂ , v ₂ ) of the converted background image data at the determined coordinates (u ₂ , v ₂ ) of the target pixel. Here, j represents a subscript for distinguishing each of the plurality of converted background image data, and the degree of coincidence calculation unit 204 acquires pixel values for the number of converted background image data. Next, the coincidence calculation unit 204 calculates an intermediate value of all the acquired pixel values. This intermediate value is used as the reference value M when calculating the degree of coincidence. The reference value is not limited to this, and any value that reflects the statistical properties of a plurality of pixel values, such as an average value, may be used as the reference value.

次に、一致度算出部２０４は、着目画素における一致度を、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と算出した基準値Ｍ（ｕ₂、ｖ₂）とを用いて、式（２）により算出する。 Next, the coincidence degree calculation unit 204 calculates the coincidence degree of the target pixel as the pixel value B ₀ (u ₂ , v ₂ ) of the target pixel in the converted background image data corresponding to the reference viewpoint number, and the calculated reference value M (u ₂ and v ₂ ) and is calculated according to the equation (2).

ここで、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。式（２）により算出する一致度Ｄは、複数の変換背景画像データにおける画素値のばらつきが少ないほど小さくなる。なお、用いる一致度はこれに限られず、画素間の違いを示す任意の値を用いて良い。例えば、基準視点番号に対応する変換背景画像データにおける着目画素の画素値Ｂ₀（ｕ₂、ｖ₂）と、他の変換背景画像データにおける着目画素の画素値それぞれとの差分の総和を一致度として用いても良い。 Here, k represents a subscript for identifying the RGB3 channel. The degree of coincidence D calculated by Expression (2) decreases as the variation in pixel values in the plurality of converted background image data decreases. Note that the degree of coincidence used is not limited to this, and an arbitrary value indicating a difference between pixels may be used. For example, the sum of the differences between the pixel value B ₀ (u ₂ , v ₂ ) of the pixel of interest in the converted background image data corresponding to the reference viewpoint number and each pixel value of the pixel of interest in the other converted background image data It may be used as

ステップＳ３０７において、一致度算出部２０４は、変換背景画像データの全画素についてステップＳ３０５〜ステップＳ３０６の処理を行ったかを判定する。ステップＳ３０７の判定の結果が真の場合、一致度算出部２０４は、算出した全画素の一致度を補正判定部２０５に、算出した基準値を補正部２０６に出力し、ステップＳ３０８に進む。一方、ステップＳ３０７の判定の結果が偽の場合、ステップＳ３０５に戻る。 In step S307, the coincidence degree calculation unit 204 determines whether the processing in steps S305 to S306 has been performed for all the pixels of the converted background image data. If the determination result in step S307 is true, the coincidence calculation unit 204 outputs the calculated coincidence of all pixels to the correction determination unit 205 and the calculated reference value to the correction unit 206, and the process proceeds to step S308. On the other hand, if the result of the determination in step S307 is false, the process returns to step S305.

ステップＳ３０８において、補正判定部２０５は、フラグマップを初期化つまりフラグマップの全画素の画素値を０とする。本ステップで初期化するフラグマップは、ステップＳ３１１で基準視点番号に対応する変換背景画像データの画素を補正する際、補正処理の対象となる画素を判定するために用いられる。このフラグマップでは、補正処理の対象の画素に対応する画素値に１が代入され、補正処理の対象ではない画素に対応する画素値に０が代入される。本ステップでの初期化により、基準視点番号に対応する変換背景画像データの全画素について、補正処理の対象ではないとされることとなる。 In step S308, the correction determination unit 205 initializes the flag map, that is, sets the pixel values of all the pixels in the flag map to zero. The flag map that is initialized in this step is used to determine a pixel that is a target of correction processing when the pixel of the converted background image data corresponding to the reference viewpoint number is corrected in step S311. In this flag map, 1 is assigned to the pixel value corresponding to the pixel to be corrected, and 0 is assigned to the pixel value corresponding to the pixel not to be corrected. By the initialization in this step, all the pixels of the converted background image data corresponding to the reference viewpoint number are not subject to correction processing.

ステップＳ３０９において、補正判定部２０５は、一致度算出部２０４から取得した一致度に基づいてフラグマップを更新する。具体的には、補正判定部２０５は、基準視点番号に対応する変換背景画像データにおいて前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが低いため、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、一致度Ｄが閾値未満であれば、基準視点番号に対応する変換背景画像データの画素と他の変換背景画像データの画素との一致の度合いが高いため、着目画素が背景の被写体の画像領域の画素である可能性が高いとする。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を決定する。すなわち、任意の値をａとすると、式（２）では一致度として差分二乗和を用いることから、閾値はa×a×3となる。なお、仮に一致度として差分の総和を用いる場合、閾値はa×3となる。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部２０５は、更新が完了したフラグマップを補正部２０６に出力する。 In step S 309, the correction determination unit 205 updates the flag map based on the coincidence degree acquired from the coincidence degree calculation unit 204. Specifically, the correction determination unit 205 determines the pixel value of the flag map corresponding to a pixel that is considered to be a pixel in the image area of the foreground subject in the converted background image data corresponding to the reference viewpoint number. Is changed to 1. In this embodiment, if the calculated degree of coincidence D is equal to or greater than a predetermined threshold, the degree of coincidence between the pixel of the converted background image data corresponding to the reference viewpoint number and the pixel of the other converted background image data is low. The pixel of interest is determined to be highly likely to be a pixel in the image area of the foreground subject. On the other hand, if the degree of coincidence D is less than the threshold value, the degree of coincidence between the pixel of the converted background image data corresponding to the reference viewpoint number and the pixel of the other converted background image data is high. It is assumed that there is a high possibility that the pixel is a region pixel. Note that the threshold value used in this step is determined based on the maximum value of the pixel value and the like, and is a value smaller than 20% of the maximum value, for example, an arbitrary value within the range of 1% to 5% of the maximum value Determine the threshold. That is, if an arbitrary value is a, Equation (2) uses the sum of squared differences as the degree of coincidence, so the threshold value is a × a × 3. If the sum of differences is used as the degree of coincidence, the threshold value is a × 3. The determination of whether the pixel is in the image area of the foreground subject is performed for each pixel. The correction determination unit 205 outputs the flag map that has been updated to the correction unit 206.

ステップＳ３１０において、補正部２０６は、基準視点番号に対応する変換背景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換背景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換背景画像データの全画素についてフラグマップに基づく画素値の更新（ステップＳ３１１）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S310, the correction unit 206 determines a pixel of interest in the converted background image data corresponding to the reference viewpoint number. In the present embodiment, first, the upper left pixel of the converted background image data corresponding to the reference viewpoint number is selected as the target pixel, and then unprocessed pixels are sequentially selected as the target pixel. Note that the pixel of interest may be determined in any order as long as pixel values are updated (step S311) based on the flag map for all the pixels of the converted background image data corresponding to the reference viewpoint number.

ステップＳ３１１において、補正部２０６は、補正判定部２０５から取得したフラグマップに基づき、基準視点番号に対応する変換背景画像における着目画素の画素を補正する。本実施例では、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が１である場合、該着目画素の画素値を、一致度算出部２０４で算出した基準値で置き換える。一方、基準視点番号に対応する変換背景画像における着目画素に対応するフラグマップの画素値が０である場合、該着目画素の画素値は変更しない。なお、画素値を補正する手法はこれに限られず、基準視点と隣接する視点に対応する背景画像の画素値で置き換えるなど他の手法を用いても良い。 In step S311, the correction unit 206 corrects the pixel of the target pixel in the converted background image corresponding to the reference viewpoint number based on the flag map acquired from the correction determination unit 205. In this embodiment, when the pixel value of the flag map corresponding to the pixel of interest in the converted background image corresponding to the reference viewpoint number is 1, the pixel value of the pixel of interest is the reference value calculated by the matching degree calculation unit 204. replace. On the other hand, when the pixel value of the flag map corresponding to the target pixel in the converted background image corresponding to the reference viewpoint number is 0, the pixel value of the target pixel is not changed. Note that the method of correcting the pixel value is not limited to this, and other methods such as replacement with the pixel value of the background image corresponding to the viewpoint adjacent to the reference viewpoint may be used.

ステップＳ３１２において、補正部２０６は、基準視点番号に対応する変換背景画像データの全画素についてステップＳ３１０〜ステップＳ３１１の処理を行ったかを判定する。ステップＳ３１２の判定の結果が真の場合、補正部２０６は、補正が完了した基準視点番号に対応する変換背景画像データを、前景抽出部２０７に出力して、ステップＳ３１３に進む一方、該判定の結果が偽の場合、ステップＳ３１０に戻る。 In step S312, the correction unit 206 determines whether the processing in steps S310 to S311 has been performed on all the pixels of the converted background image data corresponding to the reference viewpoint number. If the result of the determination in step S312 is true, the correction unit 206 outputs converted background image data corresponding to the reference viewpoint number for which correction has been completed to the foreground extraction unit 207, and proceeds to step S313. If the result is false, the process returns to step S310.

ステップＳ３１３において、前景抽出部２０７は、補正部２０６から取得した補正が完了した基準視点番号に対応する変換背景画像データ（完全な背景画像Ｉ_bとする）を用いて、対象画像データ（Ｉとする）から前景の被写体による領域を抽出する。具体的には、式（３）に示すように、完全な背景画像Ｉ_bと対象画像データＩとの間で画素毎に差分二乗和を算出し、差分二乗和が閾値以上である画素を前景の被写体の画像領域の画素とみなすことで、前景の被写体による領域を抽出した画像Ｉ_fを作成する。画像Ｉ_fは２値画像であり、前景の被写体の画像領域の画素に対応する画素値に１が代入され、背景の被写体の画像領域の画素に対応する画素値に０が代入される。 In step S313, the foreground extracting section 207, by using the conversion background image data corresponding to the reference viewpoint number correction obtained from the correction unit 206 has been completed (the complete background image I _b), and the target image data (I To foreground subject area is extracted. Specifically, as shown in Equation (3), a difference sum of squares is calculated for each pixel between the complete background image I _b and the target image data I, and a pixel whose difference sum of squares is equal to or greater than a threshold is calculated in the foreground. by regarded the pixel of the image region of the object, to create an image I _f obtained by extracting the region by the foreground subject. The image _If is a binary image, 1 is assigned to the pixel value corresponding to the pixel in the image area of the foreground subject, and 0 is assigned to the pixel value corresponding to the pixel in the image area of the background subject.

ここで、Ｔｈは閾値を表し、ｋはＲＧＢ３チャンネルを識別するための添え字を表す。なお、ここで用いる閾値は、画素値の最大値などに基づいて決定し、画素値の最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は、式（２）の場合と同様である。このように、前景抽出部２０７は、前景画像データ作成手段として機能する。前景抽出部２０７は、作成した画像Ｉ_fを二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 Here, Th represents a threshold value, and k represents a subscript for identifying the RGB3 channel. The threshold value used here is determined based on the maximum value of the pixel value and the like, and is a value smaller than 20% of the maximum value of the pixel value, for example, an arbitrary value within the range of 1% to 5% of the maximum value. The threshold value may be obtained by using it. The method for obtaining this threshold value is the same as in the case of equation (2). In this way, the foreground extraction unit 207 functions as a foreground image data creation unit. The foreground extraction unit 207 outputs the created image _If to the secondary storage device 104, the external storage device 108, and the display device 109, and a series of processing is completed. The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図６を用いて説明する。図６において、画像データ６０１は、従来手法に従って時系列に沿って連続で撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０１には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体が写っている。この理由は、背景画像データを作成するための連続画像を撮像する際に、前景の被写体６０３、６０４が、同一位置に存在し動かなかった結果、背景画像データを作成する際に前景の被写体６０３、６０４が背景の被写体と誤ってみなされたためである。背景画像データ６０１を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０６が取得される。前景画像データ６０６では、被写体６０３、６０４以外の、概ね動いている前景の被写体による領域を抽出できている。しかし、停止している前景の被写体６０３、６０４による領域を、抽出できていない。 <About the effects of this embodiment>
Hereinafter, the effect of the present embodiment will be described with reference to FIG. In FIG. 6, image data 601 is background image data at the viewpoint 602 created based on a plurality of images taken continuously in time series according to a conventional method. The background image data 601 includes foreground subjects such as the foreground subject 603 (goal keeper) and the foreground subject (goal) 604. This is because the foreground subjects 603 and 604 exist at the same position and do not move when capturing a continuous image for creating the background image data, so that the foreground subject 603 is created when creating the background image data. 604 is mistakenly regarded as a background subject. When the foreground area is extracted from the target image data 605 using the background image data 601, foreground image data 606 is acquired. In the foreground image data 606, an area of a foreground subject that is moving in general, other than the subjects 603 and 604, can be extracted. However, the area of the stopped foreground subjects 603 and 604 cannot be extracted.

また、画像データ６０７は、従来手法に従って対象画像データ６０５を撮像した時刻と同一時刻に複数の異なる視点から撮像した複数の画像に基づき作成した、視点６０２における背景画像データである。背景画像データ６０７には、前景の被写体６０３（ゴールキーパー）や前景の被写体（ゴール）６０４などの前景の被写体は写っていないものの、背景の被写体の一部が欠けて写っている。この理由は、背景画像データを作成するために撮像したシーン内で前景の被写体が密集しており、前景の被写体の一部が複数の視点から見えなかったためである。背景画像データ６０７を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６０８が取得される。前景画像データ６０８では、地上面からの高さを持つ前景の被写体による領域を概ね抽出できている。しかし、前景の被写体が密集しているために、複数の視点から見えない前景の被写体による領域６０９を、抽出できていない。 The image data 607 is background image data at the viewpoint 602 created based on a plurality of images captured from a plurality of different viewpoints at the same time as the time when the target image data 605 was captured according to the conventional method. In the background image data 607, although foreground subjects such as the foreground subject 603 (goal keeper) and the foreground subject (goal) 604 are not shown, a part of the background subject is missing. This is because the foreground subjects are densely packed in the scene captured to create the background image data, and some of the foreground subjects cannot be seen from a plurality of viewpoints. When the foreground area is extracted from the target image data 605 using the background image data 607, foreground image data 608 is acquired. In the foreground image data 608, a region of a foreground subject having a height from the ground surface can be extracted. However, because the foreground subjects are dense, the area 609 due to the foreground subjects that cannot be seen from a plurality of viewpoints cannot be extracted.

これに対し、本実施例では、複数の異なる視点における不完全な背景画像（例えば、背景画像データ６０１など）を用いて、完全な背景画像である背景画像データ６１０を作成する。背景画像データ６１０を用いて、対象画像データ６０５から前景領域を抽出した場合、前景画像データ６１１が取得される。前景画像データ６１１では、停止している前景の被写体６０３、６０４による領域や、複数の視点から見えない前景の被写体による領域を、高精度に抽出できている。このように、本実施例によれば、時間の経過に伴う被写体の変化（移動など）の有無や前景の被写体の疎密などの被写体の状態によらず、前景の被写体による領域を高精度に抽出することができる。 On the other hand, in this embodiment, background image data 610 that is a complete background image is created using incomplete background images (for example, background image data 601 and the like) at a plurality of different viewpoints. When the foreground area is extracted from the target image data 605 using the background image data 610, foreground image data 611 is acquired. In the foreground image data 611, the areas of the stopped foreground subjects 603 and 604 and the areas of the foreground subjects that cannot be seen from a plurality of viewpoints can be extracted with high accuracy. As described above, according to the present embodiment, the region of the foreground subject is extracted with high accuracy regardless of whether the subject has changed (moved or the like) with time and the state of the subject such as the density of the foreground subject. can do.

［実施例２］
実施例１では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、視点によって異なる変換背景画像の、着目画素における一致の度合いを示す一致度を用いる。一方、本実施例では、複数の不完全な背景画像に基づき完全な背景画像を作成する際、一致度に加えて、視点によって異なる変換背景画像の、着目画素における画素値の変化の滑らかさ度合い、所謂連続性を用いる。なお、実施例１と同様の構成及び同様の処理については、実施例１と同様の符号を付して説明を省略する。 [Example 2]
In the first embodiment, when a complete background image is created based on a plurality of incomplete background images, the degree of coincidence indicating the degree of coincidence in the pixel of interest of the converted background image that differs depending on the viewpoint is used. On the other hand, in this embodiment, when creating a complete background image based on a plurality of incomplete background images, in addition to the degree of coincidence, the degree of smoothness of the pixel value change in the pixel of interest of the converted background image that differs depending on the viewpoint So-called continuity is used. In addition, about the structure similar to Example 1, and the same process, the code | symbol similar to Example 1 is attached | subjected and description is abbreviate | omitted.

＜前景領域を抽出する処理の概要について＞
以下、本実施例における前景領域を抽出する処理の概要について説明する。本実施例では、複数の異なる視点における背景画像データを着目視点から見た場合の画像にそれぞれ変換することで得られる変換背景画像データを用いて、視点間の画素値の連続性を算出する。画素値の連続性とは、着目視点における変換背景画像データと該着目視点に隣接する視点における変換背景画像データとの間における、画素値の変化の滑らかさ度合いである。 <Outline of foreground region extraction processing>
The outline of the process of extracting the foreground area in the present embodiment will be described below. In the present embodiment, continuity of pixel values between viewpoints is calculated using converted background image data obtained by converting background image data at a plurality of different viewpoints into images when viewed from the viewpoint of interest. The continuity of pixel values is the degree of smoothness of changes in pixel values between the converted background image data at the viewpoint of interest and the converted background image data at the viewpoint adjacent to the viewpoint of interest.

具体的には、基準視点番号に対応する変換背景画像データにおける着目画素の画素値と、基準視点に隣接する視点における変換背景画像データにおける着目画素の画素値とを比較し、画素値間の差分の総和を連続性として算出する。続いて、実施例１で説明した一致度と、本実施例で算出した連続性とを用いて、一致の度合いが低く、且つ、画素値の変化が滑らかでない画素を、前景の被写体の画像領域の画素である可能性が高いとみなし補正対象の画素として検出する。そして、検出した補正対象の画素の画素値を更新して変換背景画像データを補正することで、完全な背景画像を作成する。最後に、作成した完全な背景画像と対象画像データとを比較して、前景領域を抽出する。 Specifically, the pixel value of the target pixel in the converted background image data corresponding to the reference viewpoint number is compared with the pixel value of the target pixel in the converted background image data in the viewpoint adjacent to the reference viewpoint, and the difference between the pixel values Is calculated as continuity. Subsequently, using the degree of coincidence described in the first embodiment and the continuity calculated in the present embodiment, a pixel having a low degree of coincidence and having a non-smooth change in pixel value is determined as an image area of the foreground subject. The pixel is considered as a pixel to be corrected. Then, the pixel value of the detected correction target pixel is updated to correct the converted background image data, thereby creating a complete background image. Finally, the foreground region is extracted by comparing the created complete background image with the target image data.

実施例１では、全視点における背景画像データに基づき算出した画素値の一致度のみを用いて、着目画素が前景の被写体の画像領域の画素であるかを判定した。そのため、視点によって色の見え方が変化することにより画素値が異なる背景の被写体の画像領域の画素も、前景の被写体の画像領域の画素である可能性が高いとみなされ、補正対象の画素として検出される。その結果、補正する必要のない画素も補正されてしまうため、補正後の変換背景画像データに誤差が発生し、前景の被写体の画像を含まない完全な背景画像を精度良く作成することができない。視点によって見え方が変化する背景の被写体として、スポーツなどの競技シーンを撮像した画像に存在する、方向性をもって刈られている芝が挙げられる。方向性をもって刈られている芝は、見る方向により芝の色の見え方が異なり、その結果、同一位置の芝であっても視点によって画素値が変化する。このような芝を背景の被写体とするシーンに実施例１を適用した場合、複数の変換背景画像データにおける画素間の一致の度合いは低くなるため、背景の被写体である芝の画像領域の画素が前景の被写体の画像領域の画素であると誤判定される。かかる誤判定を防ぐために、本実施例では、一致度に加えて連続性を用いて、着目画素が前景の被写体の画像領域の画素であるかを判定する。一般的に、視点によって色の見え方が変化する被写体に関しては、離れた視点間で色の見え方に顕著な違いが現れる場合はあるが、近接する視点間での色の見え方の変化は緩やかである。そのため、本実施例では、色の見え方の違いにより画素値が変化した背景の被写体の画像領域の画素と、地上面からの高さを持つために画素値が変化した前景の被写体の画像領域の画素とを区別する。その結果、変換背景画像を精度良く補正して完全な背景画像を作成することができるため、対象画像データから前景の被写体による領域を高精度に抽出することが可能となる。なお、視点によって色の見え方が変化する被写体は上記の芝生の例に限られず、体育館の床など様々なものが存在する。 In the first embodiment, it is determined whether the pixel of interest is a pixel in the image area of the foreground subject using only the degree of coincidence of the pixel values calculated based on the background image data at all viewpoints. For this reason, pixels in the background subject image area that have different pixel values due to changes in color appearance depending on the viewpoint are considered to be highly likely to be pixels in the foreground subject image area. Detected. As a result, pixels that do not need to be corrected are also corrected, so that an error occurs in the converted background image data after correction, and a complete background image that does not include the image of the foreground subject cannot be accurately created. As a background subject whose appearance changes depending on the viewpoint, there is a lawn that is mowed with directionality, which exists in an image obtained by capturing a sports scene such as sports. Grass that has been mowed with directionality has a different turf color appearance depending on the viewing direction. As a result, even if the grass is at the same position, the pixel value changes depending on the viewpoint. When Example 1 is applied to a scene having such a turf as a background subject, the degree of coincidence between pixels in a plurality of converted background image data is low. It is erroneously determined to be a pixel in the image area of the foreground subject. In order to prevent such erroneous determination, in this embodiment, it is determined whether the pixel of interest is a pixel in the image area of the foreground subject using continuity in addition to the degree of coincidence. In general, for subjects whose color appearance changes depending on the viewpoint, there may be a noticeable difference in the color appearance between distant viewpoints, but the change in color appearance between adjacent viewpoints is It is moderate. For this reason, in this embodiment, pixels in the background subject image area whose pixel values have changed due to differences in color appearance and image areas in the foreground subject whose pixel values have changed because of having a height from the ground level. Are distinguished from other pixels. As a result, it is possible to accurately correct the converted background image to create a complete background image, so that it is possible to extract a region of the foreground subject from the target image data with high accuracy. Note that the subject whose color appearance changes depending on the viewpoint is not limited to the lawn example, and there are various objects such as a gymnasium floor.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図７及び図８を用いて説明する。図７は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図８は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図７に示す各構成要素として機能し、図８に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About foreground area extraction processing>
Hereinafter, the process of extracting the foreground area executed by the image processing apparatus 100 according to the present embodiment will be described with reference to FIGS. FIG. 7 is a block diagram illustrating a functional configuration of the image processing apparatus 100 in the present embodiment, and FIG. 8 is a flowchart illustrating a flow of processing for extracting a foreground region in the present embodiment. The CPU 101 of the image processing apparatus 100 functions as each component shown in FIG. 7 and executes a series of processes shown in FIG. 8 by executing a program stored in the ROM 103 using the RAM 102 as a work memory. Note that it is not necessary for the CPU 101 to execute all of the processes described below, and the image processing apparatus 100 may be configured such that part or all of the processes are performed by one or more processing circuits other than the CPU 101. good.

ステップＳ８０１において、連続性算出部７０１は、連続性を算出する対象となる、変換背景画像データにおける着目画素を決定する。本実施例では、まず、変換背景画像の左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換背景画像データの全画素について連続性の算出が実行されれば、どのような順番で着目画素を決定しても良い。 In step S801, the continuity calculation unit 701 determines a pixel of interest in the converted background image data that is a target for calculating continuity. In this embodiment, first, the upper left pixel of the converted background image is selected as the target pixel, and then unprocessed pixels are sequentially selected as the target pixel. Note that the pixel of interest may be determined in any order as long as continuity calculation is performed for all the pixels of the converted background image data.

ステップＳ８０２において、連続性算出部７０１は、画像変換部２０３から取得した変換背景画像データを用いて、基準視点とその周辺の視点とに対応する変換背景画像における、着目画素の画素値の連続性を算出する。ここで、本ステップにおける連続性の算出手法を、図９を用いて説明する。 In step S802, the continuity calculation unit 701 uses the converted background image data acquired from the image conversion unit 203, and the continuity of the pixel values of the pixel of interest in the converted background image corresponding to the reference viewpoint and the surrounding viewpoints. Is calculated. Here, the continuity calculation method in this step will be described with reference to FIG.

まず、画像変換部２０３が定めた基準視点番号に対応するカメラ９０１と隣接する、カメラ９０２、９０３を検出し、これらのカメラに対応する視点番号を取得する。以下、取得した視点番号を隣接視点番号と呼ぶ。ここで、基準視点番号に対応するカメラ９０１と隣接するカメラは、カメラの３次元空間中の座標から算出した、カメラ９０１までの距離に基づいて決定される。本実施例では、カメラ９０１の左側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０２と、カメラ９０１の右側に存在するカメラの中でカメラ９０１までの距離が最も短いカメラ９０３とが、カメラ９０１に隣接するカメラとして検出される。 First, the cameras 902 and 903 adjacent to the camera 901 corresponding to the reference viewpoint number determined by the image conversion unit 203 are detected, and viewpoint numbers corresponding to these cameras are acquired. Hereinafter, the acquired viewpoint number is referred to as an adjacent viewpoint number. Here, the camera adjacent to the camera 901 corresponding to the reference viewpoint number is determined based on the distance to the camera 901 calculated from the coordinates in the three-dimensional space of the camera. In this embodiment, the camera 902 having the shortest distance to the camera 901 among the cameras existing on the left side of the camera 901, and the camera 903 having the shortest distance to the camera 901 among the cameras existing on the right side of the camera 901 are shown. Is detected as a camera adjacent to the camera 901.

次に、基準視点番号に対応する変換背景画像９０４と隣接視点番号に対応する変換背景画像９０５、９０６とから、着目画素の座標（ｕ₂、ｖ₂）の画素９０７、９０８、９０９の画素値を取得し、該取得した画素値を用いて、式（４）により連続性を算出する。 Next, from the converted background image 904 corresponding to the reference viewpoint number and the converted background images 905 and 906 corresponding to the adjacent viewpoint numbers, the pixel values of the pixels 907, 908, and 909 at the coordinates (u ₂ , v ₂ ) of the pixel of interest. And using the acquired pixel value, continuity is calculated by Equation (4).

ここで、Ｂ₉₀₁（ｕ₂、ｖ₂）、Ｂ₉₀₂（ｕ₂、ｖ₂）、Ｂ₉₀₃（ｕ₂、ｖ₂）はそれぞれ、カメラ９０１、９０２、９０３に対応する変換背景画像９０４、９０５、９０６における着目画素９０７、９０８、９０９の画素値を表す。またｋは、ＲＧＢ３チャンネルを識別するための添え字を表す。式（４）により算出するＣの値は、視点間の画素値の変化が滑らかであるほど小さくなる。なお、用いる連続性は、式（４）により算出されるＣに限られず、離散値からの二階微分など、視点間の画素値の連続性を示す任意の値を用いて良い。また、本実施例では、基準視点番号に対応するカメラ９０１と隣接するカメラ９０２、９０３を用いる場合について説明しているが、用いるカメラはこれらに限られず、被写体の見え方によっては他のカメラを用いても良い。例えば、基準視点番号に対応するカメラ９０１の左側で、カメラ９０２の代わりに、カメラ９０２の次にカメラ９０１までの距離が近いカメラを用いても良い。カメラ９０１の右側で用いるカメラについても同様である。 Here, B ₉₀₁ (u ₂ , v ₂ ), B ₉₀₂ (u ₂ , v ₂ ), and B ₉₀₃ (u ₂ , v ₂ ) are converted background images 904 and 905 corresponding to the cameras 901, 902, and 903, respectively. , 906 represents the pixel values of the target pixels 907, 908, and 909. K represents a subscript for identifying the RGB3 channel. The value of C calculated by Equation (4) becomes smaller as the change in pixel value between viewpoints becomes smoother. Note that the continuity to be used is not limited to C calculated by Expression (4), and any value indicating continuity of pixel values between viewpoints, such as second-order differentiation from discrete values, may be used. In this embodiment, the case where the cameras 902 and 903 adjacent to the camera 901 corresponding to the reference viewpoint number are used is described. However, the cameras to be used are not limited to these, and other cameras may be used depending on how the subject is seen. It may be used. For example, instead of the camera 902 on the left side of the camera 901 corresponding to the reference viewpoint number, a camera that is closest to the camera 901 after the camera 902 may be used. The same applies to the camera used on the right side of the camera 901.

ステップＳ８０３において、連続性算出部７０１は、変換背景画像データの全画素についてステップＳ８０１〜ステップＳ８０２の処理を行ったかを判定する。ステップＳ８０３の判定の結果が真の場合、連続性算出部７０１は、算出した全画素の連続性を補正判定部７０２に出力し、ステップＳ３０８に進む一方、該判定の結果が偽の場合、ステップＳ８０１に戻る。 In step S803, the continuity calculation unit 701 determines whether the processing in steps S801 to S802 has been performed on all the pixels of the converted background image data. If the determination result in step S803 is true, the continuity calculation unit 701 outputs the calculated continuity of all pixels to the correction determination unit 702, and proceeds to step S308. If the determination result is false, The process returns to S801.

ステップＳ８０４において、補正判定部７０２は、一致度算出部２０４から取得した一致度と連続性算出部７０１から取得した連続性とに基づいて、フラグマップを更新する。具体的には、補正判定部７０２は、基準視点番号に対応する変換背景画像データにおいて、前景の被写体の画像領域の画素である可能性が高いとみなされた画素に対応する、フラグマップの画素値を１に変更する。本実施例では、算出した一致度Ｄが事前に定めた閾値以上、かつ、算出した連続性Ｃが事前に定めた閾値以上であれば、基準視点番号に対応する変換背景画像と他の変換背景画像との着目画素における一致の度合い及び変化の滑らかさ度合いが低いとする。つまり、着目画素が前景の被写体の画像領域の画素である可能性が高いと判定する。一方、これらの条件を満たさない場合、着目画素が背景の被写体の画像領域の画素である可能性が高いと判定する。なお、本ステップで用いる閾値は、画素値の最大値などに基づいて決定し、最大値の２０％より小さい値、例えば、最大値の１％〜５％の範囲内の任意の値を用いて閾値を求めて良い。この閾値の求め方は実施例１と同様である。また、前景の被写体の画像領域の画素であるかの判定は、画素毎に行う。補正判定部７０２は、更新が完了したフラグマップを補正部２０６に出力する。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S804, the correction determination unit 702 updates the flag map based on the degree of coincidence acquired from the degree of coincidence calculation unit 204 and the continuity acquired from the continuity calculation unit 701. Specifically, the correction determination unit 702 corresponds to a pixel in the flag map corresponding to a pixel that is considered to be a pixel in the image area of the foreground subject in the converted background image data corresponding to the reference viewpoint number. Change the value to 1. In the present embodiment, if the calculated degree of coincidence D is equal to or greater than a predetermined threshold and the calculated continuity C is equal to or greater than a predetermined threshold, the converted background image corresponding to the reference viewpoint number and other converted backgrounds Assume that the degree of coincidence and the smoothness of change in the target pixel with the image are low. That is, it is determined that there is a high possibility that the target pixel is a pixel in the image area of the foreground subject. On the other hand, when these conditions are not satisfied, it is determined that there is a high possibility that the target pixel is a pixel in the image area of the background subject. Note that the threshold value used in this step is determined based on the maximum value of the pixel value and the like, and is a value smaller than 20% of the maximum value, for example, an arbitrary value within the range of 1% to 5% of the maximum value A threshold may be obtained. This threshold value is obtained in the same manner as in the first embodiment. The determination of whether the pixel is in the image area of the foreground subject is performed for each pixel. The correction determination unit 702 outputs the updated flag map to the correction unit 206. The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１０を用いて説明する。画像データ１００２は、背景画像データを、視点毎に、地上面を基準として視点１００１から見た場合の画像へ変換することで取得する変換背景画像データである。ここで視点１００１は、着目視点、且つ、基準視点であるものとする。また、被写体１００３は、視点によって色の見え方が変化する背景の被写体（例えば、芝生）であり、被写体１００５は、前景の被写体である。 <About the effects of this embodiment>
Hereinafter, the effect of the present embodiment will be described with reference to FIG. The image data 1002 is converted background image data acquired by converting the background image data into an image viewed from the viewpoint 1001 with respect to the ground surface for each viewpoint. Here, it is assumed that the viewpoint 1001 is a viewpoint of interest and a reference viewpoint. The subject 1003 is a background subject (for example, lawn) whose color appearance changes depending on the viewpoint, and the subject 1005 is a foreground subject.

図１０に示すシーンに実施例１を適用し、多視点の不完全な背景画像に基づき完全な背景画像を作成した場合、背景画像データ１００４が取得される。背景画像データ１００４では、前景の被写体１００５の画像を除去できているものの、背景の被写体１００３が正しく写っていない。この理由は、完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを一致度のみに基づき判定するので、背景の被写体１００３の画像領域の画素が、前景の被写体の画像領域の画素と判定されてしまうためである。この結果、背景の被写体１００３の画像領域の画素が補正された背景画像データ１００４が作成される。背景画像データ１００４を用いて、対象画像データから前景領域を抽出しようとしても、前景領域を精度良く抽出することはできない。 When the first embodiment is applied to the scene shown in FIG. 10 and a complete background image is created based on an incomplete background image of multiple viewpoints, background image data 1004 is acquired. In the background image data 1004, the image of the foreground subject 1005 can be removed, but the background subject 1003 is not correctly captured. This is because, when a complete background image is created, it is determined based on only the degree of coincidence whether the pixel of interest is a pixel in the image area of the foreground subject. This is because the pixel is determined to be a pixel in the image area of the subject. As a result, background image data 1004 in which the pixels in the image area of the background subject 1003 are corrected is created. Even if the background image data 1004 is used to extract the foreground area from the target image data, the foreground area cannot be extracted with high accuracy.

これに対し、本実施例では、多視点の不完全な背景画像に基づき完全な背景画像を作成する際に、着目画素が前景の被写体の画像領域の画素であるかを、一致度と連続性とに基づき判定する。この結果、背景の被写体１００３の画像領域の画素を、前景の被写体の画像領域の画素と判定せず、背景の被写体１００３の画像領域の画素が補正されていない背景画像データ１００６が作成される。背景画像データ１００６では、前景の被写体１００５の画像を除去しつつ、背景の被写体１００３が正しく写っている。背景画像データ１００６を用いて対象画像データから前景領域を抽出することで、前景領域を高精度に抽出できるようになる。このように、本実施例によれば、背景の被写体が、視点によって色の見え方が変化する被写体である場合であっても、前景の被写体による領域を高精度に抽出することができる。 On the other hand, in this embodiment, when creating a complete background image based on an incomplete background image of multiple viewpoints, it is determined whether the pixel of interest is a pixel in the image area of the foreground subject. Judgment based on. As a result, the pixels in the image area of the background subject 1003 are not determined as the pixels in the image area of the foreground subject, and background image data 1006 in which the pixels in the image area of the background subject 1003 are not corrected is created. The background image data 1006 correctly captures the background subject 1003 while removing the foreground subject 1005 image. By extracting the foreground area from the target image data using the background image data 1006, the foreground area can be extracted with high accuracy. As described above, according to the present embodiment, even when the background subject is a subject whose color appearance changes depending on the viewpoint, the region of the foreground subject can be extracted with high accuracy.

［実施例３］
実施例１及び実施例２では、複数の異なる視点における不完全な背景画像に基づき完全な背景画像を作成し、該作成した完全な背景画像と対象画像データとを比較することで前景の被写体による領域を抽出する。一方、本実施例では、複数の異なる視点における、不完全な前景画像を用いて、影による領域を含まないように前景領域を抽出する。ここで不完全な前景画像とは、前景の被写体による領域と該前景の被写体に付随する影による領域とが前景領域として抽出された画像を意味する。 [Example 3]
In the first embodiment and the second embodiment, a complete background image is generated based on incomplete background images at a plurality of different viewpoints, and the generated complete background image is compared with target image data, thereby depending on the foreground subject. Extract regions. On the other hand, in the present embodiment, foreground regions are extracted so as not to include a shadow region by using incomplete foreground images at a plurality of different viewpoints. Here, an incomplete foreground image means an image in which a region due to a foreground subject and a region due to a shadow accompanying the foreground subject are extracted as a foreground region.

本実施例では、視点毎の不完全な前景画像を、地上面を基準として着目視点から見た場合の画像へと変換することで、複数の変換前景画像データを取得し、該取得した複数の変換前景画像データにおいて画素間の一致度を算出する。実施例１で説明したように前景の被写体は地上面からの高さを持つが、前景の被写体に付随する影は地上面からの高さを持たない。そこで本実施例では、複数の変換前景画像データにおいて画素間の一致の度合いが高い画素を検出し、該検出した画素は高さを持たない影による領域の画素である可能性が高いとして補正する。その結果、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された前景画像を作成できる。以下、影による領域が抽出されることなく高さを持つ前景の被写体による領域のみが前景領域として抽出された画像を、完全な前景画像と呼ぶ。なお、上述の実施例と同様の構成及び同様の処理については、上述の実施例と同様の符号を付して説明を省略する。 In the present embodiment, by converting an incomplete foreground image for each viewpoint into an image when viewed from the viewpoint of interest on the basis of the ground surface, a plurality of converted foreground image data is acquired, The degree of coincidence between pixels in the converted foreground image data is calculated. As described in the first embodiment, the foreground subject has a height from the ground surface, but the shadow accompanying the foreground subject has no height from the ground surface. Therefore, in this embodiment, a pixel having a high degree of coincidence between pixels is detected in a plurality of converted foreground image data, and the detected pixel is corrected as having a high possibility of being a shadow region having no height. . As a result, it is possible to create a foreground image in which only a region of a foreground subject having a height is extracted as a foreground region without extracting a region due to a shadow. Hereinafter, an image in which only a region of a foreground subject having a height without extraction of a shadow region is extracted as a foreground region is referred to as a complete foreground image. In addition, about the structure similar to the above-mentioned Example, and the same process, the code | symbol similar to the above-mentioned Example is attached | subjected, and description is abbreviate | omitted.

＜前景領域を抽出する処理について＞
以下、本実施例における画像処理装置１００が実行する前景領域を抽出する処理について、図１１及び図１２を用いて説明する。図１１は、本実施例における画像処理装置１００の機能構成を示すブロック図であり、図１２は、本実施例における前景領域を抽出する処理の流れを示すフローチャートである。画像処理装置１００のＣＰＵ１０１は、ＲＡＭ１０２をワークメモリとして用いてＲＯＭ１０３に格納されたプログラムを実行することで、図１１に示す各構成要素として機能し、図１２に示す一連の処理を実行する。なお、以下に示す処理の全てがＣＰＵ１０１によって実行される必要はなく、処理の一部または全部が、ＣＰＵ１０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１００を構成しても良い。 <About foreground area extraction processing>
Hereinafter, the process of extracting the foreground area executed by the image processing apparatus 100 according to the present embodiment will be described with reference to FIGS. 11 and 12. FIG. 11 is a block diagram illustrating a functional configuration of the image processing apparatus 100 according to the present exemplary embodiment, and FIG. 12 is a flowchart illustrating a flow of processing for extracting a foreground region according to the present exemplary embodiment. The CPU 101 of the image processing apparatus 100 functions as each component shown in FIG. 11 and executes a series of processes shown in FIG. 12 by executing a program stored in the ROM 103 using the RAM 102 as a work memory. Note that it is not necessary for the CPU 101 to execute all of the processes described below, and the image processing apparatus 100 may be configured such that part or all of the processes are performed by one or more processing circuits other than the CPU 101. good.

ステップＳ１２０１において、カメラパラメータ取得部１１０１は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、対象画像データを撮像したカメラのカメラパラメータを取得する。また、カメラパラメータ取得部１１０１は、対象画像データを撮像したカメラの視点を着目視点と定める。本ステップで取得するカメラパラメータとは、実施例１で説明したカメラパラメータと同様である。カメラパラメータ取得部１１０１は、カメラパラメータを画像変換部１１０３に出力する。 In step S 1201, the camera parameter acquisition unit 1101 acquires camera parameters of the camera that captured the target image data from the external storage device 108 or the secondary storage device 104 via the input interface 105. In addition, the camera parameter acquisition unit 1101 determines the viewpoint of the camera that captured the target image data as the viewpoint of interest. The camera parameters acquired in this step are the same as the camera parameters described in the first embodiment. The camera parameter acquisition unit 1101 outputs the camera parameters to the image conversion unit 1103.

ステップＳ１２０２において、前景画像データ取得部１１０２は、入力インターフェース１０５を介して外部記憶装置１０８から、又は、二次記憶装置１０４から、複数の異なる視点における複数の前景画像データを取得する。本ステップで取得する前景画像データは、前景の被写体による領域を抽出した画像であり、該抽出した領域には、影による領域が含まれるものとする。本実施例では、この前景画像データは、事前に撮像した撮像画像と背景画像とに基づき作成される。以下、前景画像データの作成手法を具体的に説明する。ここで用いる撮像画像は、対象画像データにおける前景の被写体及び背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。また、背景画像は、対象画像データにおける背景の被写体を、対象画像データを撮像した際の環境と略同一の環境で撮像した画像である。本実施例では、視点毎に、撮像画像の画素値と背景画像の画素値とを画素毎に比較し、これらの画素値が同一の座標の画素の画素値を０、そうでない画素の画素値を１とすることで、視点毎の２値画像を作成する。この２値画像が前景画像データである。なお、前景画像データを作成する手法はこれに限られず、また、作成する前景画像データも２値画像に限られず多値画像であっても良い。また、前景画像データ取得部１１０２は、各前景画像データに対応するカメラパラメータを、前景画像データとともに取得する。さらに、前景画像データ取得部１１０２は、複数の前景画像データのそれぞれを区別するため、各前景画像データを、カメラの視点番号と対応付けて記憶する。前景画像データ取得部１１０２は、前景画像データとカメラパラメータとを画像変換部１１０３に出力する。 In step S 1202, the foreground image data acquisition unit 1102 acquires a plurality of foreground image data at a plurality of different viewpoints from the external storage device 108 or the secondary storage device 104 via the input interface 105. The foreground image data acquired in this step is an image obtained by extracting a region due to the foreground subject, and the extracted region includes a shadow region. In this embodiment, the foreground image data is created based on a captured image and a background image captured in advance. Hereinafter, a method for creating foreground image data will be described in detail. The captured image used here is an image obtained by capturing the foreground subject and the background subject in the target image data in substantially the same environment as when the target image data was captured. The background image is an image obtained by capturing the background subject in the target image data in substantially the same environment as when the target image data was captured. In this embodiment, for each viewpoint, the pixel value of the captured image and the pixel value of the background image are compared for each pixel, the pixel values of the pixels having the same coordinates as these pixel values are 0, and the pixel values of the other pixels By setting 1 to 1, a binary image for each viewpoint is created. This binary image is foreground image data. The method for creating the foreground image data is not limited to this, and the foreground image data to be created is not limited to the binary image, and may be a multi-valued image. The foreground image data acquisition unit 1102 acquires camera parameters corresponding to each foreground image data together with the foreground image data. Further, the foreground image data acquisition unit 1102 stores each foreground image data in association with the viewpoint number of the camera in order to distinguish each of the plurality of foreground image data. The foreground image data acquisition unit 1102 outputs the foreground image data and camera parameters to the image conversion unit 1103.

ステップＳ１２０３において、画像変換部１１０３は、カメラパラメータ取得部１１０１と前景画像データ取得部１１０２とから得たカメラパラメータを用いて、前景画像データ取得部１１０２から得た前景画像データを着目視点から見た場合の画像へと変換する。本ステップの変換は、実施例１のステップＳ３０３と同様の変換であり、視点毎に、前景画像データを、地上面を基準として射影変換することで、着目視点から見た場合の画像を得る。なお、本ステップでの画像変換により得られる前景画像（データ）を変換前景画像（データ）と呼ぶ。このように、画像変換部１１０３は、変換前景画像データ作成手段として機能する。画像変換部１１０３は、変換前景画像データを一致度算出部１１０４に出力する。 In step S1203, the image conversion unit 1103 uses the camera parameters obtained from the camera parameter acquisition unit 1101 and the foreground image data acquisition unit 1102 to view the foreground image data obtained from the foreground image data acquisition unit 1102 from the viewpoint of interest. Convert to case image. The conversion in this step is the same as that in step S303 in the first embodiment, and foreground image data is subjected to projective conversion with respect to the ground surface for each viewpoint to obtain an image when viewed from the viewpoint of interest. The foreground image (data) obtained by the image conversion in this step is referred to as a converted foreground image (data). Thus, the image conversion unit 1103 functions as a converted foreground image data creation unit. The image conversion unit 1103 outputs the converted foreground image data to the coincidence degree calculation unit 1104.

ステップＳ１２０４において、画像変換部１１０３は、前景画像データ取得部１１０２から取得した前景画像データの中から、対象画像データを撮像したカメラ位置（着目視点）と最も近い視点に対応する画像を基準の前景画像（以下、基準前景画像）として定める。具体的には、着目視点の座標と前景画像データに対応する視点の座標との距離を、視点毎に算出する。そして、算出した距離が最小となる視点（基準視点）に対応する前景画像（データ）を基準前景画像（データ）とする。画像変換部１１０３は、基準前景画像に対応する視点番号を、補正部１１０５に出力する。本実施例では、基準前景画像に対応する視点番号を、基準視点番号と呼ぶ。 In step S1204, the image conversion unit 1103 uses the foreground image data acquired from the foreground image data acquisition unit 1102 as a reference foreground based on an image corresponding to the viewpoint closest to the camera position (target viewpoint) that captured the target image data. It is determined as an image (hereinafter referred to as a reference foreground image). Specifically, the distance between the coordinates of the viewpoint of interest and the coordinates of the viewpoint corresponding to the foreground image data is calculated for each viewpoint. Then, the foreground image (data) corresponding to the viewpoint (reference viewpoint) having the smallest calculated distance is set as the reference foreground image (data). The image conversion unit 1103 outputs the viewpoint number corresponding to the reference foreground image to the correction unit 1105. In this embodiment, the viewpoint number corresponding to the reference foreground image is referred to as a reference viewpoint number.

ステップＳ１２０５では、一致度算出部１１０４は、複数の変換前景画像データにおいて画素が一致するかを判定する対象となる、変換前景画像データにおける着目画素を決定する。本実施例では、まず、変換前景画像データの左上の画素が着目画素として選択され、その後、未処理の画素が着目画素として順次選択される。なお、変換前景画像データの全画素について、複数の変換前景画像データにおいて画素が一致するかの判定が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1205, the degree-of-match calculation unit 1104 determines a target pixel in the converted foreground image data that is a target for determining whether the pixels match in the plurality of converted foreground image data. In this embodiment, first, the upper left pixel of the converted foreground image data is selected as the target pixel, and then unprocessed pixels are sequentially selected as the target pixel. Note that the pixel of interest may be determined in any order as long as it is determined whether the pixels in the plurality of converted foreground image data match for all the pixels of the converted foreground image data.

ステップＳ１２０６において、一致度算出部１１０４は、画像変換部１１０３から取得した複数の変換前景画像データを用いて、基準視点番号に対応する変換前景画像データと他の変換前景画像データとの間の、着目画素における一致度を算出する。以下、この一致度の算出手法を具体的に説明する。 In step S1206, the degree-of-match calculation unit 1104 uses a plurality of converted foreground image data acquired from the image conversion unit 1103 to convert between the converted foreground image data corresponding to the reference viewpoint number and other converted foreground image data. The degree of coincidence at the pixel of interest is calculated. Hereinafter, the method for calculating the degree of coincidence will be described in detail.

まず、一致度算出部１１０４は、決定した着目画素の座標（ｕ₂、ｖ₂）における、変換前景画像データの画素値Ｆ_l（ｕ₂、ｖ₂）を取得する。ここでｌは複数の変換前景画像データのそれぞれを区別する添え字を表し、一致度算出部１１０４は、変換前景画像データの数分の画素値を取得する。次に、一致度算出部１１０４は、取得した全画素値の平均値を算出する。本実施例では、この平均値を一致度として用いる。また、一致度はこれに限られず、複数の画素値の統計的な性質を反映する値を一致度として用いて良い。 First, the degree-of-match calculation unit 1104 acquires the pixel value F _l (u ₂ , v ₂ ) of the converted foreground image data at the determined coordinates (u ₂ , v ₂ ) of the target pixel. Here, l represents a subscript for distinguishing each of the plurality of converted foreground image data, and the coincidence calculation unit 1104 acquires pixel values corresponding to the number of converted foreground image data. Next, the coincidence degree calculation unit 1104 calculates an average value of all the acquired pixel values. In this embodiment, this average value is used as the degree of coincidence. In addition, the degree of coincidence is not limited to this, and a value reflecting the statistical properties of a plurality of pixel values may be used as the degree of coincidence.

ステップＳ１２０７において、一致度算出部１１０４は、変換前景画像データの全画素についてステップ１２０５〜ステップＳ１２０６の処理を行ったかを判定する。ステップＳ１２０７の判定の結果が真の場合、一致度算出部１１０４は、算出した全画素の一致度を補正部１１０５に出力し、ステップＳ１２０８に進む。一方、ステップＳ１２０７の判定の結果が偽の場合、ステップＳ１２０５に戻る。 In step S1207, the coincidence degree calculation unit 1104 determines whether the processing in steps 1205 to S1206 has been performed for all the pixels of the converted foreground image data. If the determination result in step S1207 is true, the coincidence calculation unit 1104 outputs the calculated coincidence of all pixels to the correction unit 1105, and the process proceeds to step S1208. On the other hand, if the result of the determination in step S1207 is false, the process returns to step S1205.

ステップＳ１２０８において、補正部１１０５は、基準視点番号に対応する変換前景画像データにおける着目画素を決定する。本実施例では、まず、基準視点番号に対応する変換前景画像データの左上の画素が着目画素として選択され、未処理の画素が着目画素として順次選択される。なお、基準視点番号に対応する変換前景画像データの全画素について一致度に基づく画素値の更新（ステップＳ１２０９）が実行されれば、どのような順番で着目画素を決定しても良い。 In step S1208, the correction unit 1105 determines a pixel of interest in the converted foreground image data corresponding to the reference viewpoint number. In this embodiment, first, the upper left pixel of the converted foreground image data corresponding to the reference viewpoint number is selected as the target pixel, and unprocessed pixels are sequentially selected as the target pixel. Note that the pixel of interest may be determined in any order as long as the pixel value update (step S1209) based on the matching degree is executed for all the pixels of the converted foreground image data corresponding to the reference viewpoint number.

ステップＳ１２０９において、補正部１１０５は、一致度算出部１１０４から取得した一致度に基づき、基準視点番号に対応する変換前景画像における影による領域の画素である可能性が高い画素を検出する。そして、補正部１１０５は、検出した画素の画素値を０に変更することで不完全な前景画像から影による領域を取り除く。本実施例では、算出した一致度が事前に定めた閾値以上であれば、全視点における着目画素間の一致の度合いが高いため、着目画素が高さを持たない影による領域の画素である可能性が高いと判定する。そして、基準視点番号に対応する変換前景画像データにおける着目画素の画素値を０に変更する。一方、算出した一致度が閾値未満であれば、全視点における着目画素間の一致の度合いが低く、着目画素が高さを持つ前景の被写体による領域の画素である可能性が高いと判定する。この場合、基準視点番号に対応する変換前景画像における着目画素の画素値を変更しない。なお、本実施例では、閾値として０．８を用いたが、閾値の値はこれに限らない。 In step S 1209, the correction unit 1105 detects a pixel that is highly likely to be a shadow region pixel in the converted foreground image corresponding to the reference viewpoint number based on the degree of coincidence acquired from the coincidence degree calculation unit 1104. Then, the correcting unit 1105 removes a shadow area from the incomplete foreground image by changing the pixel value of the detected pixel to 0. In this embodiment, if the calculated degree of coincidence is equal to or greater than a predetermined threshold, the degree of coincidence between the pixels of interest at all viewpoints is high, so the pixel of interest can be a pixel in a shadowed area having no height Judgment is high. Then, the pixel value of the pixel of interest in the converted foreground image data corresponding to the reference viewpoint number is changed to 0. On the other hand, if the calculated degree of coincidence is less than the threshold value, it is determined that the degree of coincidence between the target pixels in all viewpoints is low, and that the target pixel is likely to be a pixel in the region of the foreground subject having a height. In this case, the pixel value of the pixel of interest in the converted foreground image corresponding to the reference viewpoint number is not changed. In this embodiment, 0.8 is used as the threshold value, but the threshold value is not limited to this.

ステップＳ１２１０において、補正部１１０５は、基準視点番号に対応する変換前景画像データの全画素についてステップＳ１２０８〜ステップＳ１２０９の処理を行ったかを判定する。ステップＳ１２１０の判定の結果が真の場合、補正部１１０５は、補正が完了した基準視点番号に対応する変換前景画像データを、二次記憶装置１０４や外部記憶装置１０８や表示装置１０９に出力して、一連の処理は完了する。一方、ステップＳ１２１０の判定の結果が偽の場合、ステップＳ１２０８に戻る。以上が、本実施例における画像処理装置１００が実行する、前景領域を抽出する処理である。 In step S1210, the correction unit 1105 determines whether the processing in steps S1208 to S1209 has been performed on all the pixels of the converted foreground image data corresponding to the reference viewpoint number. If the determination result in step S1210 is true, the correction unit 1105 outputs the converted foreground image data corresponding to the reference viewpoint number for which correction has been completed to the secondary storage device 104, the external storage device 108, and the display device 109. The series of processing is completed. On the other hand, if the result of the determination in step S1210 is false, the process returns to step S1208. The above is the process for extracting the foreground region, which is executed by the image processing apparatus 100 according to the present embodiment.

＜本実施例の効果について＞
以下、本実施例の効果について図１３を用いて説明する。被写体１３０１は、被写体自身の影１３０２が地上面１３０３に存在する前景の被写体である。画像データ１３０４は、複数の異なる視点における、被写体１３０１とこれに付随する影１３０２とによる領域を前景領域として抽出した前景画像である。本実施例では、画像データ１３０４を着目視点１３０５から見た場合の画像に変換することで得られる複数の変換前景画像データにおける着目画素間の一致の度合いに基づき、地上面からの高さを持たない影による領域の画素を検出する。そして、検出した画素を補正することで、前景画像１３０６を作成する。前景画像１３０６では、高さを持つ前景の被写体１３０１に付随する影１３０２による領域が取り除かれており、前景の被写体１３０１による領域のみを抽出できている。このように、本実施例によれば、高さを持つ前景の被写体に付随する影が存在する場合であっても、影による領域を抽出することなく、この前景の被写体による領域のみを高精度に抽出することができる。 <About the effects of this embodiment>
Hereinafter, the effect of the present embodiment will be described with reference to FIG. The subject 1301 is a foreground subject in which the subject's own shadow 1302 exists on the ground surface 1303. The image data 1304 is a foreground image obtained by extracting an area of the subject 1301 and the accompanying shadow 1302 as a foreground area at a plurality of different viewpoints. In this embodiment, the image data 1304 has a height from the ground surface based on the degree of coincidence between the target pixels in a plurality of converted foreground image data obtained by converting the image data 1304 into an image when viewed from the target viewpoint 1305. Detect pixels in areas with no shadows. Then, the foreground image 1306 is created by correcting the detected pixels. In the foreground image 1306, the region due to the shadow 1302 associated with the foreground subject 1301 having a height is removed, and only the region due to the foreground subject 1301 can be extracted. Thus, according to the present embodiment, even if there is a shadow associated with a foreground subject having a height, only the region due to the foreground subject is accurately extracted without extracting the region due to the shadow. Can be extracted.

なお、本実施例では、不完全な前景画像として、事前に撮像した撮像画像と背景画像とに基づいて作成した前景画像を用いるが、実施例１や実施例２により作成した前景画像を用いてもよい。その場合、実施例１や実施例２と、実施例３とをそれぞれ単独で実行した場合に比べて、前景の被写体による領域を高精度に抽出することができる。 In this embodiment, the foreground image created based on the captured image and the background image captured in advance is used as the incomplete foreground image, but the foreground image created in the first or second embodiment is used. Also good. In this case, the foreground subject area can be extracted with higher accuracy than when the first, second, and third embodiments are executed independently.

［その他の実施例］
本発明の実施形態は、上述の実施例に限られるものではなく、様々な実施形態をとることが可能である。例えば、上述の実施例では、不完全な背景画像である背景画像データのサイズと対象画像データのサイズとが、同一である場合について説明しているが、これらのサイズは同一でなくても良い。その場合、地上面を上から見た視点を基準視点として、背景画像を基準視点から見た場合の画像へと変換する。そして、該変換した画像を用いて背景画像を補正し、該補正した背景画像を着目視点から見た場合の画像へと変換することで、対象画像データに対応する背景画像データを作成する。 [Other Examples]
Embodiments of the present invention are not limited to the above-described examples, and various embodiments can be employed. For example, in the above-described embodiment, the case where the size of the background image data that is an incomplete background image is the same as the size of the target image data is described, but these sizes may not be the same. . In this case, the background image is converted into an image when viewed from the reference viewpoint with the viewpoint viewed from above the ground surface as the reference viewpoint. Then, the background image is corrected using the converted image, and the corrected background image is converted into an image when viewed from the viewpoint of interest, thereby generating background image data corresponding to the target image data.

また、上述の実施例では、一致度の算出や前景の抽出において、ＲＧＢ空間における画素値を用いているが、用いる情報はこれに限られない。例えば、ＨＳＶやＬａｂなどの異なる色空間の画素値を用いて、一致度の算出や前景の抽出を行うようにしても良い。 In the above-described embodiment, pixel values in the RGB space are used in calculating the degree of coincidence and extracting the foreground, but the information to be used is not limited to this. For example, the degree of coincidence may be calculated or the foreground may be extracted using pixel values in different color spaces such as HSV and Lab.

さらに、上述の実施例では、画像を射影変換する際、地上面の一平面のみを基準としているが、地上面に平行な複数の平面を基準として用いても良い。例えば、地上面からの高さが０から１センチメートルまでを等間隔に刻むことで複数の平面を設定し、該設定した平面のそれぞれを基準とする射影変換により得られた変換画像を全て用いて一致度の算出を行うようにしても良い。このようにすることで、カメラパラメータの誤差に対するロバスト性が向上する。 Furthermore, in the above-described embodiment, when projective transforming an image, only one plane on the ground surface is used as a reference, but a plurality of planes parallel to the ground surface may be used as a reference. For example, a plurality of planes are set by engraving a height from the ground surface of 0 to 1 centimeter at equal intervals, and all converted images obtained by projective conversion using each of the set planes as a reference are used. Thus, the degree of coincidence may be calculated. In this way, robustness against camera parameter errors is improved.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２０１・・・対象画像データ取得部
２０２・・・背景画像データ取得部
２０３・・・画像変換部
２０４・・・一致度算出部
２０６・・・補正部
２０７・・・前景抽出部 201: target image data acquisition unit 202 ... background image data acquisition unit 203 ... image conversion unit 204 ... coincidence calculation unit 206 ... correction unit 207 ... foreground extraction unit

Claims

Target image data acquisition means for acquiring target image data that is an image of a foreground subject and a background subject from a viewpoint of interest;
Background image data acquisition means for acquiring background image data that is an image of the background subject at a plurality of different viewpoints;
Converted background image data creating means for creating a plurality of converted background image data by converting the acquired plurality of background image data into images when viewed from the viewpoint of interest;
Calculating means for calculating a degree of coincidence indicating a degree of coincidence between pixels of interest in the plurality of converted background image data;
Correction means for correcting the converted background image data at the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence;
Foreground image data creating means for creating foreground image data that is an image in which a region of the foreground subject is extracted based on the target image data and the corrected converted background image data. apparatus.

The converted background image data creation unit performs projective transformation on the acquired plurality of background image data, respectively, into an image when viewed from the viewpoint of interest on the basis of the ground surface. Image processing device.

The converted background image data creation means performs projective conversion of the acquired plurality of background image data into images when viewed from the viewpoint of interest on the basis of a plurality of planes parallel to the ground surface in addition to the ground surface. The image processing apparatus according to claim 2, wherein:

The image processing apparatus according to claim 1, further comprising a detection unit that detects a pixel to be corrected by the correction unit based on the degree of coincidence.

When the calculated degree of coincidence indicates that the degree of coincidence between the target pixels in the plurality of converted background image data is low, the detection unit detects the target pixel as the correction target. The image processing apparatus according to claim 4.

Continuity calculating means for calculating continuity indicating a degree of smoothness of a change in pixel value between target pixels in the plurality of converted background image data;
The image processing apparatus according to claim 1, further comprising a detection unit that detects a pixel to be corrected by the correction unit based on the degree of coincidence and the continuity. .

7. The continuity calculation unit calculates the continuity based on converted background image data at a viewpoint closest to the viewpoint of interest and converted background image data at a viewpoint adjacent to the viewpoint. An image processing apparatus according to 1.

The calculated degree of matching indicates that the degree of matching between the target pixels in the plurality of converted background image data is low, and the calculated continuity is a pixel between the target pixels in the plurality of converted background image data. The image processing apparatus according to claim 6 or 7, wherein when the degree of smoothness of the value change is low, the detection unit detects the target pixel as the correction target.

The correcting means replaces a pixel value of the pixel of interest in the converted background image data at a viewpoint closest to the viewpoint of interest with an intermediate value of pixel values of the pixel of interest in the plurality of converted background images. The image processing apparatus according to claim 1.

The image processing apparatus according to claim 1, wherein the background image data includes an image of the foreground subject.

The background image data is an image created based on a plurality of images corresponding to a plurality of different times taken continuously in time series at each of the plurality of different viewpoints, or at each of the plurality of different viewpoints. The image processing apparatus according to claim 1, wherein the image processing device is an image captured in a state where the foreground subject does not exist and only the background subject exists.

Target image data acquisition means for acquiring target image data, which is an image obtained by imaging a foreground subject and a background subject from a viewpoint of interest;
Foreground image data acquisition means for acquiring foreground image data in which areas of the foreground subject and shadows associated with the subject at a plurality of different viewpoints are extracted;
Conversion foreground image data creating means for creating a plurality of converted foreground image data by converting each of the acquired plurality of foreground image data into an image when viewed from the viewpoint of interest;
Calculating means for calculating a degree of coincidence indicating a degree of coincidence between pixels of interest in the plurality of converted foreground image data;
Correction means for correcting the converted foreground image data at the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence;
An image processing apparatus comprising: foreground image data creating means for creating foreground image data in which a region of the foreground subject is extracted based on the target image data and the corrected converted foreground image data.

Obtaining target image data that is an image of a foreground subject and a background subject from a viewpoint of interest;
Obtaining background image data that is an image of the background subject at a plurality of different viewpoints;
Creating a plurality of converted background image data by converting the plurality of acquired background image data into images when viewed from the viewpoint of interest, respectively;
Calculating a degree of coincidence indicating a degree of coincidence between pixels of interest in the plurality of converted background image data;
Correcting the converted background image data at the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence;
And generating foreground image data, which is an image obtained by extracting a region of the foreground subject based on the target image data and the corrected converted background image data.

Obtaining target image data that is an image of a foreground subject and a background subject from a viewpoint of interest;
Obtaining foreground image data in which areas of the foreground subject and shadows associated with the subject at a plurality of different viewpoints are extracted;
Creating a plurality of converted foreground image data by converting each of the acquired plurality of foreground image data into images when viewed from the viewpoint of interest;
Calculating a degree of coincidence indicating a degree of coincidence between pixels of interest in the plurality of converted foreground image data;
Correcting the converted foreground image data at the viewpoint determined according to the distance from the viewpoint of interest based on the degree of coincidence;
An image processing method comprising: creating foreground image data in which a region of the foreground subject is extracted based on the target image data and the corrected converted foreground image data.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 12.