JP2021056936A

JP2021056936A - Image processing device, image processing method, and program

Info

Publication number: JP2021056936A
Application number: JP2019181532A
Authority: JP
Inventors: 希名板倉; Kina Itakura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-10-01
Filing date: 2019-10-01
Publication date: 2021-04-08
Anticipated expiration: 2039-10-01
Also published as: JP7412947B2

Abstract

To allow for obtaining a high-precision foreground image even when it is difficult to accurately extract edge information because a boundary between a foreground object and a background is ambiguous.SOLUTION: In a provisional foreground image representing a foreground area in a captured image, a contour area that is likely to contain errors is detected. Pixels constituting the detected contour area are then corrected based on whether colors of pixels of the captured image corresponding to these pixels are similar to colors of a background or not.SELECTED DRAWING: Figure 4

Description

本発明は、撮像画像から、オブジェクトの形状を示す前景画像を生成する技術に関する。 The present invention relates to a technique for generating a foreground image showing the shape of an object from a captured image.

従来、撮像画像からオブジェクト（被写体）に対応する前景領域を抽出する手法として、背景差分法が存在する。背景差分法では、前景となるオブジェクトが写っている撮像画像の各画素値と、当該オブジェクトが写っていない背景のみの画像（背景画像）の各画素値との差分に基づいて、オブジェクトに対応する前景領域を抽出する。ここで、オブジェクトと背景との境界付近において、撮像画像における画素値と背景画像における画素値との違いが小さくなる場合がある。この場合、本来は背景となるべき部分が、オブジェクトに対応する前景領域として誤って抽出されてしまうことが起こる。つまり、従来の背景差分法には、オブジェクトの輪郭に沿って正確に前景領域を抽出することができないことがあった。この点、特許文献１には、撮像画像から前景領域を抽出した後、オブジェクトのエッジ情報に基づいて、抽出した前景領域の整形を行う技術が開示されている。 Conventionally, there is a background subtraction method as a method of extracting a foreground region corresponding to an object (subject) from a captured image. The background subtraction method corresponds to an object based on the difference between each pixel value of the captured image in which the object as the foreground is shown and each pixel value of the background-only image (background image) in which the object is not shown. Extract the foreground area. Here, in the vicinity of the boundary between the object and the background, the difference between the pixel value in the captured image and the pixel value in the background image may become small. In this case, the part that should originally be the background may be mistakenly extracted as the foreground area corresponding to the object. That is, in the conventional background subtraction method, it may not be possible to accurately extract the foreground region along the contour of the object. In this regard, Patent Document 1 discloses a technique of extracting a foreground region from a captured image and then shaping the extracted foreground region based on the edge information of the object.

特開平１０−２３４５２号公報Japanese Unexamined Patent Publication No. 10-23452

しかしながら、上記特許文献１の手法では、エッジ情報自体に誤りが含まれていると、オブジェクトの形状を適切に整形ができないという問題があった。 However, the method of Patent Document 1 has a problem that the shape of an object cannot be properly shaped if the edge information itself contains an error.

そこで、本開示に係る技術は、前景となるオブジェクトと背景との境界が曖昧でエッジ情報を正確に抽出することが困難な場合においても、精度のよい前景画像を得ることを目的とする。 Therefore, the technique according to the present disclosure aims to obtain an accurate foreground image even when the boundary between the foreground object and the background is ambiguous and it is difficult to accurately extract the edge information.

本開示に係る画像処理装置は、撮像画像からオブジェクトの形状を示す前景画像を生成する画像処理装置であって、前記撮像画像を取得する取得手段と、前記撮像画像から前記オブジェクトの形状に対応する前景領域を抽出して前景画像を生成する抽出手段と、前記前景画像から前記オブジェクの輪郭領域を検出する検出手段と、前記前景画像における前記輪郭領域を構成する画素を、当該画素に対応する前記撮像画像内の画素の色が背景の色と類似するかどうかに基づき補正する補正手段と、を有することを特徴とする。 The image processing device according to the present disclosure is an image processing device that generates a foreground image showing the shape of an object from a captured image, and corresponds to an acquisition means for acquiring the captured image and the shape of the object from the captured image. The extraction means for extracting the foreground region to generate a foreground image, the detection means for detecting the contour region of the object from the foreground image, and the pixels constituting the contour region in the foreground image correspond to the pixels. It is characterized by having a correction means for correcting based on whether or not the color of a pixel in a captured image is similar to the color of a background.

本開示の技術によれば、前景となるオブジェクトと背景との境界が曖昧でエッジ情報を正確に抽出することが困難な場合においても、精度のよい前景画像を得ることができる。 According to the technique of the present disclosure, it is possible to obtain an accurate foreground image even when the boundary between the foreground object and the background is ambiguous and it is difficult to accurately extract the edge information.

前景画像を生成する処理の概要を説明する図The figure explaining the outline of the process of generating a foreground image 仮想視点映像を生成する画像処理システムの構成の一例を示す図Diagram showing an example of the configuration of an image processing system that generates virtual viewpoint video カメラアダプタの内部構成を示す機能ブロック図Functional block diagram showing the internal configuration of the camera adapter 実施形態１に係る、画像処理部の詳細を示す機能ブロック図Functional block diagram showing details of the image processing unit according to the first embodiment 実施形態１に係る、前景画像生成処理の流れを示すフローチャートA flowchart showing the flow of the foreground image generation process according to the first embodiment. 注目画素が輪郭領域の画素であるか否かを判定する処理の説明図Explanatory drawing of process for determining whether or not a pixel of interest is a pixel of a contour region 実施形態１の効果を説明する図The figure explaining the effect of Embodiment 1. 実施形態２に係る、画像処理部の詳細を示す機能ブロック図Functional block diagram showing details of the image processing unit according to the second embodiment 実施形態２に係る、前景画像生成処理の流れを示すフローチャートA flowchart showing the flow of the foreground image generation process according to the second embodiment. 実施形態２の効果を説明する図The figure explaining the effect of Embodiment 2.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

［実施形態１］
本実施形態では、オブジェクトに対応する前景領域を撮像画像から抽出した後、当該オブジェクトの輪郭領域（背景との境界部分）を検出する。そして、検出した輪郭領域における色を、背景の色との違いに基づき補正することにより、オブジェクトの形状を示す前景画像を生成する。なお、本実施形態において前景画像は、前景領域を“1”、それ以外の背景領域を“0”で表す二値画像とするが、さらに前景領域の確からしさを表す値を加えた多値画像としてもよい。 [Embodiment 1]
In the present embodiment, after the foreground region corresponding to the object is extracted from the captured image, the contour region (boundary portion with the background) of the object is detected. Then, by correcting the color in the detected contour region based on the difference from the background color, a foreground image showing the shape of the object is generated. In the present embodiment, the foreground image is a binary image in which the foreground area is represented by “1” and the other background areas are represented by “0”, but a multivalued image in which a value indicating the certainty of the foreground region is further added. May be.

また、本実施形態では、撮像画像から生成したオブジェクトの前景画像を、仮想視点映像の生成に利用するケースを例に説明を行うものとする。すなわち、オブジェクトの前景画像からその３次元形状データを生成し、仮想視点情報に基づいて当該オブジェクト含んだ仮想視点映像を生成する、というユースケースを想定する。 Further, in the present embodiment, the case where the foreground image of the object generated from the captured image is used for generating the virtual viewpoint image will be described as an example. That is, assume a use case in which the three-dimensional shape data is generated from the foreground image of an object, and the virtual viewpoint image including the object is generated based on the virtual viewpoint information.

（本実施形態の概要）
本実施形態における、前景画像を生成する処理の概要について、図１を参照して説明する。まず、前景領域の抽出対象となる画像（以下、「対象画像」と表記）１０が撮像装置であるカメラ１１によって撮像される。この対象画像１０には、前景としての人物オブジェクト１２と背景としてのフィールド１３とが写っている。さらに、対象画像１０と同一の撮像視点から、人物オブジェクト１２がいない状態で背景となるフィールド１３を写した画像（以下、「背景画像」と表記）１４を撮像する。次に、対象画像１０と背景画像１４とを対応する画素同士で比較し、画素値の違いが大きい画素を人物オブジェクト１２の構成画素と見做すことで、対象画像１０から前景領域だけを抽出する。これにより、人物オブジェクト１２のシルエットを表した前景画像を取得する。なお、オブジェクトの形状を示す前景画像は、シルエット画像やマスク画像とも呼ばれるものであるが、本明細書では“前景画像”を用いることとする。図１における画像１５は、理想的な前景画像を示している。しかしながら実際には、レンズの収差や画像解像度などの影響により、前景領域の輪郭が不明確になることがある。これは、対象画像１０上のオブジェクトと背景との境界付近において、背景の色に近いものからオブジェクトの色に近いものまで、色々な画素値の画素が混ざるためである。オブジェクトと背景との境界付近の画素がこのような状態にあると、オブジェクトの画素と背景の画素とに正しく峻別することができないため、オブジェクトの輪郭に沿わない精度の低い前景領域となってしまう。図１における画像１５’は、オブジェクトの輪郭に沿っていない精度の低い（誤りを含んだ）前景画像を示している。そこで、本実施形態では、撮像画像から生成した前景画像において誤りを含んでいる可能性の高い輪郭部分１６を検出する。そして、検出した輪郭部分１６を構成する画素の画素値を、撮像画像における対応する画素の色が背景の色と類似する場合に、背景を示す画素値に変更する補正処理を行う。こうして、輪郭部分における誤りを修正した前景画像が、最終的な前景画像として出力されることになる。以上が、本実施形態で行われる、前景画像生成処理の概要である。以下、本実施形態の具体的な構成について述べる。 (Outline of this embodiment)
The outline of the process for generating the foreground image in the present embodiment will be described with reference to FIG. First, an image (hereinafter, referred to as “target image”) 10 to be extracted in the foreground region is captured by the camera 11 which is an imaging device. The target image 10 shows a person object 12 as a foreground and a field 13 as a background. Further, from the same imaging viewpoint as the target image 10, an image (hereinafter, referred to as “background image”) 14 in which the background field 13 is captured without the person object 12 is imaged. Next, the target image 10 and the background image 14 are compared with each other, and the pixels having a large difference in pixel values are regarded as the constituent pixels of the person object 12, so that only the foreground region is extracted from the target image 10. To do. As a result, a foreground image representing the silhouette of the person object 12 is acquired. The foreground image showing the shape of the object is also called a silhouette image or a mask image, but in the present specification, the "foreground image" is used. Image 15 in FIG. 1 shows an ideal foreground image. However, in reality, the contour of the foreground region may become unclear due to the influence of lens aberration, image resolution, and the like. This is because pixels having various pixel values, from those close to the background color to those close to the object color, are mixed in the vicinity of the boundary between the object and the background on the target image 10. If the pixels near the boundary between the object and the background are in such a state, the pixels of the object and the pixels of the background cannot be correctly distinguished, resulting in a low-precision foreground region that does not follow the contour of the object. .. Image 15'in FIG. 1 shows an inaccurate (including error) foreground image that does not follow the contours of the object. Therefore, in the present embodiment, the contour portion 16 that is likely to contain an error is detected in the foreground image generated from the captured image. Then, when the color of the corresponding pixel in the captured image is similar to the color of the background, the pixel value of the pixel constituting the detected contour portion 16 is changed to the pixel value indicating the background. In this way, the foreground image in which the error in the contour portion is corrected is output as the final foreground image. The above is the outline of the foreground image generation processing performed in the present embodiment. Hereinafter, a specific configuration of the present embodiment will be described.

（システム構成）
図２は、仮想視点映像を生成する画像処理システムの構成の一例を示す図である。画像処理システム１００は、撮影モジュール１１０ａ〜１１０ｚ、データベース（ＤＢ）２５０、サーバ２７０、制御装置３００、スイッチングハブ１８０、及びエンドユーザ端末１９０を有する。すなわち、画像処理システム１００は、映像収集ドメイン、データ保存ドメイン、及び映像生成ドメインという３つの機能ドメインを有する。映像収集ドメインは撮影モジュール１１０ａ〜１１０ｚを含み、データ保存ドメインはＤＢ２５０とサーバ２７０を含み、映像生成ドメインは制御装置３００及びエンドユーザ端末１９０を含む。 (System configuration)
FIG. 2 is a diagram showing an example of the configuration of an image processing system that generates a virtual viewpoint image. The image processing system 100 includes imaging modules 110a to 110z, a database (DB) 250, a server 270, a control device 300, a switching hub 180, and an end user terminal 190. That is, the image processing system 100 has three functional domains, that is, a video collection domain, a data storage domain, and a video generation domain. The video acquisition domain includes the photographing modules 110a to 110z, the data storage domain includes the DB 250 and the server 270, and the video generation domain includes the control device 300 and the end user terminal 190.

制御装置３００は、画像処理システム１００を構成するそれぞれのブロックに対してネットワークを通じて動作状態の管理及びパラメータ設定制御などを行う。ここで、ネットワークはＥｔｈｅｒｎｅｔ（登録商標）であるＩＥＥＥ標準準拠のＧｂＥ（ギガビットイーサーネット）や１０ＧｂＥでもよいし、インターコネクトＩｎｆｉｎｉｂａｎｄ、産業用ローカルエリアネットワーク等を組合せて構成されてもよい。また、これらに限定されず、他の種類のネットワークであってもよい。 The control device 300 manages the operating state and controls the parameter setting for each block constituting the image processing system 100 through the network. Here, the network may be GbE (Gigabit Ethernet) or 10 GbE conforming to the IEEE standard (registered trademark), or may be configured by combining an interconnect Infiniband, an industrial local area network, or the like. Further, the network is not limited to these, and other types of networks may be used.

最初に、撮影モジュール１１０ａ〜１１０ｚの２６セット分の撮像画像を撮影モジュール１１０ｚからサーバ２７０へ送信する動作を説明する。撮影モジュール１１０ａ〜１１０ｚは、それぞれ１台ずつのカメラ１１２ａ〜１１２ｚを有する。以下では、撮影モジュール１１０ａ〜１１０ｚまでの２６セットのシステムを区別せず、単に「撮影モジュール１１０」と記載する場合がある。各撮影モジュール１１０内の装置についても同様に、「カメラ１１２」、「カメラアダプタ１２０」と記載する場合がある。なお、撮影モジュール１１０の台数を２６セットとしているが、あくまでも一例でありこれに限定されない。 First, an operation of transmitting 26 sets of captured images of the photographing modules 110a to 110z from the photographing module 110z to the server 270 will be described. The photographing modules 110a to 110z each have one camera 112a to 112z. In the following, the 26 sets of systems from the photographing modules 110a to 110z may not be distinguished and may be simply referred to as “photographing module 110”. Similarly, the devices in each photographing module 110 may be described as "camera 112" and "camera adapter 120". The number of shooting modules 110 is 26 sets, but this is just an example and is not limited to this.

撮影モジュール１１０ａ〜１１０ｚはデイジーチェーンにより接続される。この接続形態により、撮影画像の４Ｋや８Ｋなどへの高解像度化及び高フレームレート化に伴う画像データの大容量化において、接続ケーブル数の削減や配線作業の省力化ができる効果がある。なお、接続形態は任意であり、例えば撮影モジュール１１０ａ〜１１０ｚがスイッチングハブ１８０にそれぞれ接続されて、スイッチングハブ１８０を経由して撮影モジュール１１０間のデータ送受信を行うスター型のネットワーク構成としてもよい。 The photographing modules 110a to 110z are connected by a daisy chain. This connection form has the effect of reducing the number of connection cables and labor saving in wiring work in increasing the resolution of captured images to 4K or 8K and increasing the capacity of image data due to the increase in frame rate. The connection form is arbitrary. For example, a star-type network configuration in which the photographing modules 110a to 110z are connected to the switching hub 180 and data is transmitted / received between the photographing modules 110 via the switching hub 180 may be used.

本実施形態では、各撮影モジュール１１０はカメラ１１２とカメラアダプタ１２０とで構成されているがこれに限定されない。例えば、マイク、雲台、外部センサを有していてもよい。また、本実施形態では、カメラ１１２とカメラアダプタ１２０とが分離された構成となっているが、同一筺体で一体化されていてもよい。撮影モジュール１１０ａ内のカメラ１１２ａにて得られた撮像画像は、カメラアダプタ１２０ａにおいて後述の画像処理が施された後、撮影モジュール１１０ｂのカメラアダプタ１２０ｂに伝送される。同様に撮影モジュール１１０ｂは、カメラ１１２ｂにて得られた撮像画像を、撮影モジュール１１０ａから取得した撮像画像と合わせて撮影モジュール１１０ｃに伝送する。このような動作を続けることにより、２６セット分の撮像画像が、撮影モジュール１１０ｚからスイッチングハブ１８０に伝わり、その後、サーバ２７０へ伝送される。 In the present embodiment, each photographing module 110 is composed of a camera 112 and a camera adapter 120, but is not limited thereto. For example, it may have a microphone, a pan head, and an external sensor. Further, in the present embodiment, the camera 112 and the camera adapter 120 are separated from each other, but they may be integrated in the same housing. The captured image obtained by the camera 112a in the photographing module 110a is transmitted to the camera adapter 120b of the photographing module 110b after the image processing described later is performed on the camera adapter 120a. Similarly, the photographing module 110b transmits the captured image obtained by the camera 112b to the photographing module 110c together with the captured image acquired from the photographing module 110a. By continuing such an operation, 26 sets of captured images are transmitted from the photographing module 110z to the switching hub 180, and then transmitted to the server 270.

なお、本実施形態では、個々のカメラアダプタ１２０内で前景画像の生成までを行うものとして説明する。ただし、このような態様に限定されるものではなく、２６セット分の撮像画像を受け取ったサーバ２７０にて、個々の撮像画像に対応する前景画像の生成を行うような構成であってもよい。 In this embodiment, it is assumed that the foreground image is generated in each camera adapter 120. However, the present invention is not limited to such an aspect, and the server 270 that has received 26 sets of captured images may be configured to generate a foreground image corresponding to each captured image.

（カメラアダプタの構成）
次に、カメラアダプタ１２０の詳細について説明する。図３は、カメラアダプタ１２０の内部構成を示す機能ブロック図である。カメラアダプタ１２０は、ネットワークアダプタ１２１、伝送部１２２、画像処理部１２３及びカメラ制御部１２４から構成される。 (Camera adapter configuration)
Next, the details of the camera adapter 120 will be described. FIG. 3 is a functional block diagram showing the internal configuration of the camera adapter 120. The camera adapter 120 includes a network adapter 121, a transmission unit 122, an image processing unit 123, and a camera control unit 124.

ネットワークアダプタ１２１は、他のカメラアダプタ１２０やサーバ２７０、制御装置３００とデータ通信を行う。また、例えばＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、サーバ２７０との間で送受信したデータのタイムスタンプの保存や、サーバ２７０との時刻同期も行う。なお、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバとの時刻同期を実現してもよい。本実施形態では、ネットワークアダプタ１２１としてＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）を利用するが、これに限定されない。 The network adapter 121 performs data communication with another camera adapter 120, a server 270, and a control device 300. Further, for example, in accordance with the Ordinary Clock of the IEEE1588 standard, the time stamp of the data transmitted / received to / from the server 270 is saved and the time is synchronized with the server 270. In addition, time synchronization with a time server may be realized by another EtherAVB standard or an original protocol. In the present embodiment, a NIC (Network Interface Card) is used as the network adapter 121, but the present embodiment is not limited to this.

伝送部１２２は、ネットワークアダプタ１２１を介してスイッチングハブ１８０等に対するデータの伝送を制御する。伝送部１２２は、送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能とを有している。また、受信したデータ及び画像処理部１２３で処理されたデータのルーティング先を決定する機能や、決定したルーティング先へデータを送信する機能を有している。また、画像データを、他のカメラアダプタ１２０またはサーバ２７０へ転送するためのメッセージを作成する機能も有している。メッセージには画像データのメタ情報が含まれる。このメタ情報には、画像撮影のサンプリング時のタイムコードまたはシーケンス番号、データ種別、及びカメラ１１２の識別子などが含まれる。なお、送信する画像データは圧縮されていてもよい。また、他のカメラアダプタ１２０からメッセージを受け取り、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を画像データに復元する。 The transmission unit 122 controls the transmission of data to the switching hub 180 and the like via the network adapter 121. The transmission unit 122 has a function of performing compression by applying a predetermined compression method, compression rate, and frame rate to the transmitted / received data, and a function of decompressing the compressed data. It also has a function of determining the routing destination of the received data and the data processed by the image processing unit 123, and a function of transmitting the data to the determined routing destination. It also has a function of creating a message for transferring image data to another camera adapter 120 or server 270. The message contains meta information of image data. This meta information includes a time code or sequence number at the time of sampling for image capture, a data type, an identifier of the camera 112, and the like. The image data to be transmitted may be compressed. Further, a message is received from another camera adapter 120, and data information fragmented to a packet size defined by a transmission protocol is restored to image data according to the data type included in the message.

画像処理部１２３は、カメラ制御部１２４の制御によりカメラ１１２が撮影した画像データに基づき、オブジェクトの形状を示す前景画像を生成する処理を行う。また、動的キャリブレーションなどの処理も行う。前景画像の生成を複数のカメラアダプタ１２０それぞれが行うことで、画像処理システム１００における負荷を分散させることができる。動的キャリブレーションは、撮影中に行うキャリブレーションで、カメラ毎の色のばらつきを抑えるための色補正処理や、カメラの振動に起因するブレに対して画像の位置を安定させるためのブレ補正処理（電子防振処理）などが含まれる。 The image processing unit 123 performs a process of generating a foreground image showing the shape of the object based on the image data captured by the camera 112 under the control of the camera control unit 124. It also performs processing such as dynamic calibration. By generating the foreground image by each of the plurality of camera adapters 120, the load in the image processing system 100 can be distributed. Dynamic calibration is a calibration performed during shooting, and is a color correction process for suppressing color variation between cameras and a blur correction process for stabilizing the position of an image against blur caused by camera vibration. (Electronic vibration isolation treatment) etc. are included.

カメラ制御部１２４は、カメラ１１２と接続し、カメラ１１２の制御、撮影画像取得、同期信号提供、時刻設定などを行う。カメラ１１２の制御には、例えば撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランスの設定など）の設定及び参照、カメラ１１２の状態情報（撮影中、停止中、同期中、及びエラーなど）の取得、撮影の開始及び停止や、ピント調整などがある。 The camera control unit 124 is connected to the camera 112 to control the camera 112, acquire a captured image, provide a synchronization signal, set a time, and the like. For control of the camera 112, for example, setting and reference of shooting parameters (number of pixels, color depth, frame rate, white balance setting, etc.), state information of the camera 112 (shooting, stopped, synchronizing, error, etc.) ), Start and stop of shooting, focus adjustment, etc.

（前景画像生成処理の詳細）
続いて、本実施形態に係る、カメラアダプタ１２０での前景画像生成処理について、図４に示す機能ブロック図及び図５に示すフローチャートを参照して、詳しく説明する。図４に示すとおり、カメラアダプタ１２０内の画像処理部１２３は、前景画像の生成に関わる５つの機能部を有する。具体的には、画像取得部４０１、前景領域抽出部４０２、輪郭領域検出部４０３、類似度判定部４０４、画素値補正部４０５を有する。また、図５のフローチャートに示す一連の処理は、カメラアダプタ１２０内の不図示のＣＰＵが、所定のプログラムを不図示のワークメモリ（ＲＡＭ）に展開して実行することで実現される。但し、以下に示す処理の全てが１個のＣＰＵによって実行される必要はなく、処理の一部または全部がＣＰＵ以外の１つ又は複数の処理回路によって行われるように構成してもよい。カメラアダプタ１２０に入力される撮像画像は、複数フレームからなる動画像である。例えば、６０ｆｐｓのフレームレートで撮像された動画像データが１０秒分入力された場合は、全６００フレームの連番画像がストリームとして入力され、フレーム単位で以下に示す処理が順に行われることになる。ただし、入力される撮像画像は動画像に限定される訳ではなく、静止画像であってもよい。なお、以下の説明において記号「Ｓ」はステップを表す。 (Details of foreground image generation processing)
Subsequently, the foreground image generation process in the camera adapter 120 according to the present embodiment will be described in detail with reference to the functional block diagram shown in FIG. 4 and the flowchart shown in FIG. As shown in FIG. 4, the image processing unit 123 in the camera adapter 120 has five functional units related to the generation of the foreground image. Specifically, it has an image acquisition unit 401, a foreground area extraction unit 402, a contour area detection unit 403, a similarity determination unit 404, and a pixel value correction unit 405. Further, the series of processes shown in the flowchart of FIG. 5 is realized by the CPU (not shown) in the camera adapter 120 deploying a predetermined program in a work memory (RAM) (not shown) and executing the program. However, it is not necessary that all of the processes shown below are executed by one CPU, and a part or all of the processes may be performed by one or a plurality of processing circuits other than the CPU. The captured image input to the camera adapter 120 is a moving image composed of a plurality of frames. For example, when moving image data captured at a frame rate of 60 fps is input for 10 seconds, serial number images of all 600 frames are input as a stream, and the following processing is performed in order in frame units. .. However, the input captured image is not limited to the moving image, and may be a still image. In the following description, the symbol "S" represents a step.

Ｓ５０１では、画像取得部６０１が、前景領域の抽出対象となるフレーム（以下、「対象画像」と表記）と背景画像のデータを取得する。ここで、背景画像データは、例えば撮像シーンがサッカーの試合であれば選手やボールなどのオブジェクトが存在しない状態のスタジアムにて試合開始直前などに対象画像と同一条件で撮像し、不図示のＨＤＤ等に保持しておいたものを読み込めばよい。この場合において同一条件とは、カメラ１１２の位置、姿勢、焦点距離、光学中心などの物理的条件のほか、天候や時間帯など環境条件も含む。なお、予め撮像した背景画像のデータをＨＤＤ等から読み込んで取得するのではなく、入力された動画像を構成する複数フレームに対して中間値フィルタや平均値フィルタを用いたフィルタ処理を行って作成してもよい。或いは、複数フレームに対してクラスタリング処理を行って作成してもよい。本ステップで取得した対象画像と背景画像のデータは、前景領域抽出部４０２と類似度判定部４０４に出力される。 In S501, the image acquisition unit 601 acquires the data of the frame (hereinafter, referred to as “target image”) to be extracted in the foreground region and the background image. Here, if the imaging scene is a soccer match, the background image data is captured under the same conditions as the target image at a stadium where there are no players, balls, or other objects, and the HDD is not shown. You can read what you have stored in. In this case, the same conditions include not only physical conditions such as the position, attitude, focal length, and optical center of the camera 112, but also environmental conditions such as weather and time zone. In addition, instead of reading the data of the background image captured in advance from the HDD or the like and acquiring it, it is created by performing filter processing using an intermediate value filter or an average value filter on a plurality of frames constituting the input moving image. You may. Alternatively, it may be created by performing clustering processing on a plurality of frames. The target image and background image data acquired in this step are output to the foreground region extraction unit 402 and the similarity determination unit 404.

Ｓ５０２では、前景領域抽出部４０２が、Ｓ５０１で取得した対象画像と背景画像を用いて、対象画像から前景領域を抽出する。具体的には、対象画像と背景画像における対応する画素同士の画素値を比較し、異なる画素値を持つ画素位置を特定する。そして、画素値が同一である座標の画素の画素値を“0”、画素値が同一でない座標の画素の画素値を“1”とすることで、対象画像における前景領域を示す２値画像が得られる。なお、画素値は厳密に同一である必要はなく、所定の閾値（例えば最大画素値の凡そ５％）の範囲内であれば同一と見做してもよい。こうして得られた２値画像は、暫定的な前景画像（以下、「暫定前景画像」と呼ぶ。）として、輪郭領域検出部４０３、類似度判定部４０４及び画素値補正部４０５に出力される。なお、本実施形態では、背景差分法を用いて前景領域の抽出を行うことを前提に説明を行っているが、例えばフレーム間差分法を用いても構わない。 In S502, the foreground region extraction unit 402 extracts the foreground region from the target image using the target image and the background image acquired in S501. Specifically, the pixel values of the corresponding pixels in the target image and the background image are compared, and the pixel positions having different pixel values are specified. Then, by setting the pixel value of the pixel of the coordinates having the same pixel value to "0" and the pixel value of the pixel having the coordinates not having the same pixel value to "1", a binary image showing the foreground region in the target image can be obtained. can get. The pixel values do not have to be exactly the same, and may be regarded as the same as long as they are within a predetermined threshold value (for example, about 5% of the maximum pixel value). The binary image thus obtained is output as a provisional foreground image (hereinafter, referred to as “provisional foreground image”) to the contour region detection unit 403, the similarity determination unit 404, and the pixel value correction unit 405. In the present embodiment, the description is made on the premise that the foreground region is extracted by using the background subtraction method, but for example, the inter-frame subtraction method may be used.

Ｓ５０３では、輪郭領域検出部４０３が、Ｓ５０２で得られた暫定前景画像におけるオブジェクトの輪郭領域を検出するためのマップ（以下、「輪郭マップ」と呼ぶ。）を初期化する。具体的には、輪郭マップの全画素の画素値を“0”に設定する。この輪郭マップは、暫定前景画像と画素数や画像サイズが共通の２値画像である。暫定前景画像における、オブジェクトと背景との境界付近を示す輪郭領域を構成する画素（或いは、その可能性が高い画素）に対応する画素には“1”、それ以外の画素には“0”の値が付与される。本ステップでの初期化により、輪郭マップ内の全画素について、輪郭領域ではないことを示す値“0”が初期値として設定されることになる。初期化処理後はＳ５０４に進む。 In S503, the contour area detection unit 403 initializes a map (hereinafter, referred to as “contour map”) for detecting the contour area of the object in the provisional foreground image obtained in S502. Specifically, the pixel values of all the pixels of the contour map are set to "0". This contour map is a binary image having the same number of pixels and image size as the provisional foreground image. In the provisional foreground image, "1" is used for the pixels corresponding to the pixels (or pixels that are likely to be) that constitute the contour area indicating the vicinity of the boundary between the object and the background, and "0" is used for the other pixels. A value is given. By the initialization in this step, the value "0" indicating that it is not the contour area is set as the initial value for all the pixels in the contour map. After the initialization process, the process proceeds to S504.

Ｓ５０４では、輪郭領域検出部４０３が、Ｓ５０２で得られた暫定前景画像の構成画素のうち注目する画素（以下、「注目画素」と表記）について、オブジェクトと背景との境界付近を示す輪郭領域の画素であるかを判定する。この際の注目画素は、例えば前景画像の左上の画素から順次選択される。ここで、図６を参照して、注目画素が輪郭領域の画素であるか否かの判定処理の詳細を説明する。 In S504, the contour area detection unit 403 indicates the vicinity of the boundary between the object and the background with respect to the pixel of interest (hereinafter, referred to as “focused pixel”) among the constituent pixels of the provisional foreground image obtained in S502. Determine if it is a pixel. At this time, the pixel of interest is sequentially selected from, for example, the upper left pixel of the foreground image. Here, with reference to FIG. 6, the details of the determination process of whether or not the pixel of interest is a pixel in the contour region will be described.

まず、注目画素の座標（ｕ₀，ｖ₀）を中心とした所定サイズ（ここでは５×５）のブロック６００を設定する。所定サイズは、対象画像のサイズ（４ＫやフルＨＤなど）やオブジェクトの大きさ（対象画像に占める割合）に応じて事前に定めておけばよい。次に、暫定前景画像における注目画素６０１の画素値とブロック６００に含まれる各画素の画素値とに基づき、注目画素６０１が輪郭領域の画素であるかを判定する。具体的には、暫定前景画像における注目画素６０１の画素値が“1（＝オブジェクト）”であり、かつ、ブロック６００内に画素値が“0（＝背景）の画素が１つ以上存在する場合、注目画素６０１は輪郭領域の画素であると判定する。いま、ブロック６００が、それぞれ６０２〜６０４の状態であったとする。ブロック６０２〜６０４における各格子はそれぞれ１画素を表しており、画素値が“0”ある画素を黒、画素値が“1”である画素を白で示している。図６の例では、ブロック６０４の場合に、注目画素６０１は輪郭領域の画素であると判定されることになる。注目画素６０１が背景であるブロック６０２や、ブロック６００内に背景の画素がないブロック６０３のように、上記条件を満たさない場合は、注目画素６０１は輪郭領域の画素ではないと判定される。なお、輪郭領域の画素であるか否かの判定手法は、上述の方法に限られない。例えば、暫定前景画像に対してモルフォロジ処理を施した収縮画像を生成し、暫定前景画像における注目画素の画素値と収縮画像における当該注目画素に対応する画素の画素値とが同一でない場合に輪郭領域の画素であると判定してもよい。 First, a block 600 having a predetermined size (here, 5 × 5) centered on the coordinates (u ₀ , v _{0) of the pixel of interest is set.} The predetermined size may be determined in advance according to the size of the target image (4K, full HD, etc.) and the size of the object (ratio to the target image). Next, it is determined whether the pixel of interest 601 is a pixel of the contour region based on the pixel value of the pixel of interest 601 in the provisional foreground image and the pixel value of each pixel included in the block 600. Specifically, when the pixel value of the pixel of interest 601 in the provisional foreground image is "1 (= object)" and there is one or more pixels with a pixel value of "0 (= background)" in the block 600. It is determined that the pixel of interest 601 is a pixel in the contour region. Now, it is assumed that the blocks 600 are in the state of 602 to 604, respectively. Each lattice in the blocks 602 to 604 represents one pixel, and the pixel value is A pixel having "0" is shown in black, and a pixel having a pixel value of "1" is shown in white. In the example of FIG. 6, in the case of block 604, the pixel of interest 601 is determined to be a pixel in the contour region. If the above conditions are not satisfied, such as block 602 in which the pixel of interest 601 is the background and block 603 in which the pixel of interest is not in the block 600, the pixel of interest 601 is not a pixel in the contour region. The method for determining whether or not the pixels are in the contour region is not limited to the above method. For example, a contraction image obtained by subjecting a provisional foreground image to a morphology process is generated to generate a provisional foreground image. When the pixel value of the pixel of interest in the above and the pixel value of the pixel corresponding to the pixel of interest in the contracted image are not the same, it may be determined that the pixel is a pixel in the contour region.

Ｓ５０５では、輪郭領域検出部４０３が、Ｓ５０４での判定結果に従って、注目画素についての輪郭マップにおける画素値を更新する。すなわち、注目画素が輪郭領域の画素であると判定された場合は、注目画素の座標（ｕ₀，ｖ₀）と同一座標に対応する、輪郭マップＥ（ｕ₀，ｖ₀）における画素の画素値を“1”に変更する。また、注目画素が輪郭領域の画素ではないと判定された場合は、注目画素の座標（ｕ₀，ｖ₀）と同一座標に対応する、輪郭マップＥ（ｕ₀，ｖ₀）の画素の画素値を“0”のまま維持する。 In S505, the contour area detection unit 403 updates the pixel value in the contour map for the pixel of interest according to the determination result in S504. That is, when the pixel of interest is determined to be a pixel of the contour region, corresponding to the same coordinates as the coordinates of the target pixel (u _{_0,} v _0), the pixels of the pixel in the edge map E (u _{_0,} v ₀₎ Change the value to "1". Also, if the pixel of interest is determined not to be a pixel of the contour regions, the pixels of the pixel of the coordinates of the target pixel (u _0, v ₀₎ corresponds to the same coordinates as the contour map E (u _0, v ₀₎ Keep the value at “0”.

Ｓ５０６では、暫定前景画像内の全画素が注目画素として処理されたかどうかが判定される。未処理の画素があればＳ５０４に戻り、次の注目画素が選択されて処理が続行される。一方、全画素の処理が完了していれば、輪郭領域検出部４０３から更新を終えた輪郭マップが画素値補正部４０５に出力され、次の処理（Ｓ５０７）に進む。 In S506, it is determined whether or not all the pixels in the provisional foreground image have been processed as the pixels of interest. If there is an unprocessed pixel, the process returns to S504, the next pixel of interest is selected, and processing is continued. On the other hand, if the processing of all the pixels is completed, the contour map that has been updated is output from the contour area detection unit 403 to the pixel value correction unit 405, and the process proceeds to the next processing (S507).

Ｓ５０７では、類似度判定部４０４が、対象画像を構成する各画素の色が背景の色に類似しているか否かを判定するためのマップ（以下、「類否マップ」と呼ぶ。）を初期化する。具体的には、類否マップの全画素の画素値を“0”に設定する。この類否マップは、対象画像と画素数や画像サイズが共通の２値画像である。対象画像内の画素のうち背景の色と類似する色を持つと判定された画素に対応する画素には“1”、それ以外の画素には“0”の値が付与される。本ステップでの初期化により、類否マップの全画素について、背景の色に類似していないことを示す値“0”が初期値として設定されることになる。初期化処理後はＳ５０８に進む。 In S507, the similarity determination unit 404 initially sets a map (hereinafter, referred to as “similarity map”) for determining whether or not the color of each pixel constituting the target image is similar to the background color. To become. Specifically, the pixel values of all the pixels of the similarity map are set to "0". This similarity map is a binary image having the same number of pixels and image size as the target image. Among the pixels in the target image, the pixel corresponding to the pixel determined to have a color similar to the background color is given a value of "1", and the other pixels are given a value of "0". By the initialization in this step, the value "0" indicating that the background color is not similar to all the pixels of the similarity map is set as the initial value. After the initialization process, the process proceeds to S508.

Ｓ５０８では、類似度判定部４０４が、対象画像における注目画素の色と背景画像によって示される背景の色との類似度を、色味に関する所定の評価値を用いて判定する。ここでいう「色」には、いわゆる色の三属性（色相、明度、彩度）の情報が含まれる。また、注目画素の意味や選択方法は、Ｓ５０４のときと同じであるので説明を省く。以下、本ステップにおける類似度判定について詳しく説明する。 In S508, the similarity determination unit 404 determines the similarity between the color of the pixel of interest in the target image and the background color indicated by the background image using a predetermined evaluation value regarding the tint. The "color" here includes information on the so-called three attributes of color (hue, lightness, and saturation). Further, since the meaning and selection method of the pixel of interest are the same as those in S504, the description thereof will be omitted. Hereinafter, the similarity determination in this step will be described in detail.

まず、注目画素の座標（ｕ₀，ｖ₀）を中心としたブロックを設定する。ブロックのサイズはＳ５０４のときと同じでもよいし異なっていてもよい。次に、背景の色と類似する色を持つ画素であるか否かを判定する指標となる、注目画素についての２つの評価値Ｃ_F、Ｃ_Bを求める。この２つの評価値Ｃ_F、Ｃ_Bには例えば平均二乗誤差（ＭＳＥ）を用い、それぞれ以下の式（１）及び式（２）で表される。 First, a block centered on the coordinates (u ₀ , v _{0) of the pixel of interest is set.} The block size may be the same as or different from that of S504. _{Next, two evaluation values C F} and C _B for the pixel of interest, which are indexes for determining whether or not the pixel has a color similar to the background color, are obtained. For example, the mean square error (MSE) is used for these two evaluation values C _F and C _{B, and} they are represented by the following equations (1) and (2), respectively.

上記式（１）及び式（２）において、Ｉ（ｘ，ｙ）は対象画像における座標（ｘ，ｙ）の画素値、Ｉ_F（ｘ，ｙ）は暫定前景画像における座標（ｘ，ｙ）の画素値を表す。また、ｋはＲＧＢ３チャンネルを識別するための添え字を表し、Ｂはブロックを表す。また、ｎ_Fは、ブロック内に含まれる、暫定前景画像においてオブジェクトに対応する前景領域の画素（画素値＝１）の総数、ｎ_Bはオブジェクトに対応しない背景領域の画素（画素値＝０）の総数を表す。ここで、評価値Ｃ_Fは、注目画素の画素値とブロック内に含まれる前景領域の画素値との違いを表し、両者が類似しているほど値は小さくなる。また、評価値Ｃ_Bは、注目画素の画素値とブロック内に含まれる背景領域の画素値との違いを表し、両者が類似しているほど値は小さくなる。なお、本実施形態では色の違いをＭＳＥ（Mean Squared Error）により評価しているが、評価指標はこれに限定されるものではなく、ＭＡＥ（Mean Absolute Error）やＲＭＳＥ（Root Mean Square Error）などを用いてもよい。次に、求めた２つの評価値Ｃ_F、Ｃ_Bに基づいて、注目画素の色が背景の色と類似しているかを判定する。具体的には、注目画素について得られた評価値Ｃ_Fが評価値Ｃ_B以上であれば背景の色と類似していると判定する。逆に、注目画素について得られた評価値Ｃ_Fが評価値Ｃ_B未満であれば背景の色と類似していないと判定する。なお、本実施形態では、注目画素の画素値が、注目画素を中心としたブロック内に含まれる前景領域の画素値に比べて背景領域の画素値に類似している場合、注目画素の色が背景の色に類似していると判定した。しかしながら、類否の判定方法はこれに限られない。例えば、撮像画像内の注目画素における画素値と、当該注目画素と同一座標の背景画像における画素値との差が小さい場合（所定の閾値以下の場合）に、注目画素の色が背景の色と類似していると判定してもよい。この際は、上述した輪郭領域の画素における色の混ざりを考慮し、所定の閾値として相対的に大きな値を設定する。 In the above formula (1) and (2), I (x, y) pixel values of the coordinates (x, y) in the target image, I _F (x, y) coordinates in the interim foreground image (x, y) Represents the pixel value of. Further, k represents a subscript for identifying an RGB3 channel, and B represents a block. Further, n _F is the total number of pixels (pixel value = 1) in the foreground region corresponding to the object in the provisional foreground image included in the block, and n _B is the pixels (pixel value = 0) in the background region not corresponding to the object. Represents the total number of. Here, the evaluation value C _F represents the difference between the pixel value of the pixel of interest and the pixel value of the foreground region included in the block, and the more similar the two, the smaller the value. Further, the evaluation value C _B represents the difference between the pixel value of the pixel of interest and the pixel value of the background region included in the block, and the more similar the two, the smaller the value. In this embodiment, the difference in color is evaluated by MSE (Mean Squared Error), but the evaluation index is not limited to this, and MAE (Mean Absolute Error), RMSE (Root Mean Square Error), etc. May be used. Next, it is determined whether the color of the pixel of interest is similar to the background color based on the obtained two evaluation values C _F and C _B. Specifically, if the evaluation value C _F obtained for the pixel of interest is the evaluation value C _B or more, it is determined that the color is similar to the background color. On the contrary, if the evaluation value C _F obtained for the pixel of interest is less than the evaluation value C _B , it is determined that the color is not similar to the background color. In the present embodiment, when the pixel value of the pixel of interest is closer to the pixel value of the background region than the pixel value of the foreground region included in the block centered on the pixel of interest, the color of the pixel of interest is It was judged to be similar to the background color. However, the method for determining similarity is not limited to this. For example, when the difference between the pixel value of the pixel of interest in the captured image and the pixel value of the background image having the same coordinates as the pixel of interest is small (when it is equal to or less than a predetermined threshold), the color of the pixel of interest is the background color. It may be determined that they are similar. In this case, a relatively large value is set as a predetermined threshold value in consideration of color mixing in the pixels in the contour region described above.

Ｓ５０９では、類似度判定部４０４が、Ｓ５０８での判定結果に従って、注目画素についての類否マップにおける画素値を更新する。すなわち、注目画素の色が背景の色と類似すると判定された場合は、注目画素の座標（ｘ，ｙ）と同一座標に対応する、類否マップＭ（ｘ，ｙ）における画素の画素値を“1”に変更する。また、注目画素の色が背景の色と類似していないと判定された場合は、注目画素の座標（ｘ，ｙ）と同一座標に対応する、類否マップ（ｘ，ｙ）の画素の画素値を“0”のまま維持する。 In S509, the similarity determination unit 404 updates the pixel value in the similarity map for the pixel of interest according to the determination result in S508. That is, when it is determined that the color of the pixel of interest is similar to the color of the background, the pixel value of the pixel in the similarity map M (x, y) corresponding to the same coordinates as the coordinates (x, y) of the pixel of interest is used. Change to "1". If it is determined that the color of the pixel of interest is not similar to the color of the background, the pixel of the pixel of the similarity map (x, y) corresponding to the same coordinates as the coordinates (x, y) of the pixel of interest. Keep the value at “0”.

Ｓ５１０では、対象画像内の全画素が注目画素として処理されたかどうかが判定される。未処理の画素があればＳ５０８に戻り、次の注目画素が選択されて処理が続行される。一方、全画素の処理が完了していれば、類似度判定部４０４から更新を終えた類否マップが画素値補正部４０５に出力され、次の処理（Ｓ５１１）に進む。 In S510, it is determined whether or not all the pixels in the target image have been processed as the pixels of interest. If there are unprocessed pixels, the process returns to S508, the next pixel of interest is selected, and processing is continued. On the other hand, if the processing of all the pixels is completed, the similarity determination unit 404 outputs the updated similarity map to the pixel value correction unit 405, and proceeds to the next processing (S511).

Ｓ５１１では、画素値補正部４０５が、暫定前景画像における注目画素（ｕ₁，ｖ₁）の画素値を、輪郭マップと類否マップとに基づき補正する。注目画素の意味や選択方法は、Ｓ５０４やＳ５０８のときと同じであるので説明を省く。具体的には、以下の３つの条件を満たす場合、注目画素は前景領域を構成する画素ではない（背景領域を構成する画素である）と判定し、暫定前景画像における注目画素の画素値を“0”に変更する。一方、以下の３つの条件を満たさない場合、注目画素は前景領域を構成する画素であると判定し、注目画素の画素値の変更は行わない。
第１条件：Ｉ_F（ｕ₁，ｖ₁）＝1（暫定前景画像における画素値が“1”）であること。
第２条件：Ｅ（ｕ₁，ｖ₁）＝1（輪郭マップにおける画素値が“1”）であること。
第３条件：Ｍ（ｕ₁，ｖ₁）＝1（類否マップにおける画素値が“1”）であること。 In S511, the pixel value correction unit 405 corrects the pixel values of the pixels of interest (u ₁ , v ₁ ) in the provisional foreground image based on the contour map and the similarity map. Since the meaning and selection method of the pixel of interest are the same as those of S504 and S508, the description thereof will be omitted. Specifically, when the following three conditions are satisfied, it is determined that the pixel of interest is not a pixel constituting the foreground region (a pixel constituting the background region), and the pixel value of the pixel of interest in the provisional foreground image is set to ". Change to 0 ”. On the other hand, when the following three conditions are not satisfied, it is determined that the pixel of interest is a pixel constituting the foreground region, and the pixel value of the pixel of interest is not changed.
First condition: _IF (u ₁ , v ₁ ) = 1 (the pixel value in the provisional foreground image is "1").
Second condition: E (u ₁ , v ₁ ) = 1 (the pixel value in the contour map is "1").
Third condition: M (u ₁ , v ₁ ) = 1 (the pixel value in the similarity map is "1").

上述の通り、暫定前景画像における輪郭領域の各画素は、背景の色とオブジェクトの色とが混ざった画素値を持ち、誤りを含んでいる可能性が高い。そこで、上述の第１条件と第２条件とを用いて、前景領域と輪郭領域のいずれにも属するかどうかを判定する。そして、いずれにも属すると判定された場合において、さらに第３条件も満たす場合は、当該注目画素は背景に属する画素であると判定して、注目画素の画素値を“0”に変更する。以上のような補正処理により、暫定前景画像における誤り部分が修正される。 As described above, each pixel in the contour region in the provisional foreground image has a pixel value in which the background color and the object color are mixed, and there is a high possibility that an error is included. Therefore, using the above-mentioned first condition and second condition, it is determined whether or not it belongs to either the foreground region or the contour region. Then, when it is determined that the pixel belongs to any of the above, and if the third condition is also satisfied, it is determined that the pixel of interest is a pixel belonging to the background, and the pixel value of the pixel of interest is changed to "0". By the correction processing as described above, the error portion in the provisional foreground image is corrected.

Ｓ５１２では、暫定前景画像内の全画素が注目画素として処理されたかどうかが判定される。未処理の画素があればＳ５１１に戻り、次の注目画素が選択されて処理が続行される。一方、全画素の処理が完了していれば、画素値補正部４０５は、補正処理後の暫定前景画像を最終的な前景画像として出力する。 In S512, it is determined whether or not all the pixels in the provisional foreground image have been processed as the pixels of interest. If there is an unprocessed pixel, the process returns to S511, the next pixel of interest is selected, and processing is continued. On the other hand, if the processing of all the pixels is completed, the pixel value correction unit 405 outputs the provisional foreground image after the correction processing as the final foreground image.

以上が、本実施形態に係る、前景画像生成処理の内容である。ここで、図７を参照して、本実施形態の効果について説明する。前述の図１の場合と同様、カメラ１１によって撮像された対象画像１０には、前景としての人物オブジェクト１２と背景としてのフィールド１３とが写っている。そして、対象画像１０と同一の撮像視点からフィールド１３のみを撮像した背景画像を用いて人物オブジェクト１２のシルエットを表した前景画像が生成される。図７には、３種類の前景画像７０１〜７０３が示されている。まず、前景画像７０１の場合、人物オブジェクト１２とフィールド１３との境界付近において、本来は背景領域となるべき画素を誤って前景領域の構成画素と判定してしまっており、人物オブジェクト１２のシルエットが実際の輪郭よりも膨張して表現されている。また、前景画像７０２の場合、人物オブジェクト１２のシルエットに相当する部分は正確に抽出できている一方で、対象画像１０に含まれるノイズ部分を前景領域として誤って抽出してしまっている。従来手法によるこれら前景画像に対し、本実施形態の手法によって得られる前景画像７０３の場合、ノイズ部分を前景領域として誤抽出することなく人物オブジェクトのシルエットのみを正確に抽出できている。 The above is the content of the foreground image generation processing according to the present embodiment. Here, the effect of the present embodiment will be described with reference to FIG. 7. Similar to the case of FIG. 1 described above, the target image 10 captured by the camera 11 shows the person object 12 as the foreground and the field 13 as the background. Then, a foreground image showing the silhouette of the person object 12 is generated using the background image obtained by capturing only the field 13 from the same imaging viewpoint as the target image 10. FIG. 7 shows three types of foreground images 701 to 703. First, in the case of the foreground image 701, in the vicinity of the boundary between the person object 12 and the field 13, the pixels that should originally be the background area are mistakenly determined to be the constituent pixels of the foreground area, and the silhouette of the person object 12 is It is expressed more expanded than the actual contour. Further, in the case of the foreground image 702, while the portion corresponding to the silhouette of the person object 12 can be accurately extracted, the noise portion included in the target image 10 is erroneously extracted as the foreground region. In the case of the foreground image 703 obtained by the method of the present embodiment with respect to these foreground images by the conventional method, only the silhouette of the person object can be accurately extracted without erroneously extracting the noise portion as the foreground region.

＜変形例＞
上述のＳ５０８では、対象画像における注目画素の色と背景の色との類似度を判定する際にＲＧＢ値の情報を用いていた。ＲＧＢ値に代えて、例えば、ＲＧＢ色空間をＨＳＶやＬａｂなどの異なる色空間に変換し、変換後の色空間におけるＨＳＶ値やＬａｂ値の情報を用いてもよい。 <Modification example>
In S508 described above, the RGB value information is used when determining the degree of similarity between the color of the pixel of interest and the color of the background in the target image. Instead of the RGB values, for example, the RGB color space may be converted into a different color space such as HSV or Lab, and the information of the HSV value or Lab value in the converted color space may be used.

以上のとおり本実施形態によれば、前景となるオブジェクトと背景との境界が曖昧であっても、前景となるオブジェクトの輪郭に沿って高精度に前景領域を抽出することができる。 As described above, according to the present embodiment, even if the boundary between the foreground object and the background is ambiguous, the foreground region can be extracted with high accuracy along the contour of the foreground object.

［実施形態２］
次に、背景色との類似度を色成分の出現頻度に基づいて判定する態様を、実施形態２として説明する。なお、画像処理システムの構成といった実施形態１と共通の内容については説明を省略することとし、以下では差異点である画像処理部１２３の構成と処理内容を中心に説明を行うこととする。 [Embodiment 2]
Next, a mode in which the degree of similarity with the background color is determined based on the appearance frequency of the color component will be described as the second embodiment. It should be noted that the description of the contents common to the first embodiment such as the configuration of the image processing system will be omitted, and the description will be given below focusing on the configuration and the processing contents of the image processing unit 123, which are the differences.

（本実施形態の概要）
実施形態１では、オブジェクトの輪郭領域において背景の色に近い色を持つ画素は背景領域の画素と見做されるところ、オブジェクトの色にも同程度に近い色を持つ画素の場合は前景領域の画素と見做されることもある。つまり、所定の評価値に基づくＲＧＢ値全体の類似度合いに基づく判定の場合には、誤りを持つ画素を検出できないことがあり、最終的に得られる前景画像の精度がその分だけ低下してしまう。そこで、本実施形態では、背景画像の各画素において出現頻度が最も高い色成分を求め、対象画像内の注目画素において、当該求めた色成分の値が最も大きいかどうかを基準に、注目画素の色が背景の色に近いかどうかを判定する。ここで、色成分とは、ＲＧＢ色空間の場合であればＲ（レッド）、Ｇ（グリーン）、Ｂ（ブルー）の各チャンネルに対応する色成分を意味する。本実施形態では、対象画像はＲＧＢ色空間で色が表現されているものとする。例えば、芝生の上に人が立っているシーンを撮像した場合、得られた撮像画像における人と芝生の境界付近の画素は、芝生の色と人の色とが混ざった色の画素値を持つ。また、芝生の色を表す画素値は、一般的にＧ成分が最大値を取るような画素値となる。従って、輪郭領域の画素において、Ｇ成分が最大となる画素については背景を構成する画素と見做すことができる。具体的な処理内容としては、まず、前景領域の抽出処理と併せて、背景画像における各画素の色解析処理を行う。この色解析処理により、ＲＧＢのうち出現頻度の最も高い色成分を特定する。そして、対象画像における前景領域かつ輪郭領域の画素において最大値を持つ色成分が、背景画像において出現頻度の最も高い色成分と同一である場合、当該画素は前景領域の画素ではないと判定して、その色を背景の色に変更する。これにより、精度の高い前景画像が得られるようにする。 (Outline of this embodiment)
In the first embodiment, a pixel having a color close to the background color in the outline area of the object is regarded as a pixel in the background area, but a pixel having a color close to the color of the object is in the foreground area. It may be regarded as a pixel. That is, in the case of the judgment based on the degree of similarity of the entire RGB values based on the predetermined evaluation value, it may not be possible to detect the pixel with an error, and the accuracy of the finally obtained foreground image is lowered by that amount. .. Therefore, in the present embodiment, the color component having the highest frequency of appearance in each pixel of the background image is obtained, and in the pixel of interest in the target image, the value of the obtained color component is the largest based on whether or not the value of the obtained color component is the largest. Determine if the color is close to the background color. Here, the color component means a color component corresponding to each channel of R (red), G (green), and B (blue) in the case of the RGB color space. In the present embodiment, it is assumed that the target image has colors expressed in the RGB color space. For example, when a scene in which a person is standing on a lawn is imaged, the pixels near the boundary between the person and the lawn in the obtained image have pixel values of a color in which the color of the lawn and the color of the person are mixed. .. Further, the pixel value representing the color of the lawn is generally a pixel value such that the G component takes the maximum value. Therefore, among the pixels in the contour region, the pixel having the maximum G component can be regarded as the pixel constituting the background. As a specific processing content, first, the color analysis processing of each pixel in the background image is performed together with the extraction processing of the foreground region. By this color analysis processing, the color component having the highest frequency of appearance among RGB is specified. Then, when the color component having the maximum value in the pixels in the foreground region and the contour region in the target image is the same as the color component having the highest frequency of appearance in the background image, it is determined that the pixel is not a pixel in the foreground region. , Change that color to the background color. This makes it possible to obtain a highly accurate foreground image.

（前景画像生成処理の詳細）
本実施形態に係る、カメラアダプタ１２０での前景画像生成処理について、図８に示す機能ブロック図及び図９に示すフローチャートを参照して、詳しく説明する。図８に示すとおり、本実施形態に係る画像処理部１２３’は、前景画像の生成に関わる６つの機能部を有する。具体的には、画像取得部４０１、前景領域抽出部４０２、輪郭領域検出部４０３、類似度判定部４０４’、画素値補正部４０５及び背景色解析部８０１を有する。 (Details of foreground image generation processing)
The foreground image generation process in the camera adapter 120 according to the present embodiment will be described in detail with reference to the functional block diagram shown in FIG. 8 and the flowchart shown in FIG. As shown in FIG. 8, the image processing unit 123'according to the present embodiment has six functional units related to the generation of the foreground image. Specifically, it has an image acquisition unit 401, a foreground area extraction unit 402, a contour area detection unit 403, a similarity determination unit 404', a pixel value correction unit 405, and a background color analysis unit 801.

Ｓ９０１〜Ｓ９０６は、実施形態１の図５のフローチャートにおけるＳ５０１〜Ｓ５０６にそれぞれ対応し、異なるところはないので説明を省く。 S901 to S906 correspond to S501 to S506 in the flowchart of FIG. 5 of the first embodiment, and there are no differences, so the description thereof will be omitted.

Ｓ９０７では、背景色解析部８０１が、Ｓ９０１で取得した背景画像を解析し、背景画像を構成する各画素において出現頻度が最も高い色成分を導出する。より詳細には、背景画像において、画素値としてのＲＧＢ値の中でＲ成分の値が最大となる画素の総数Ｐ_R、Ｇ成分の値が最大となる画素の総数Ｐ_G、Ｂ成分の値が最大となる画素の総数Ｐ_Bを求める。そして、求めたＰ_R、Ｐ_G、Ｐ_Bのうち、その数が最も大きい色成分を、出現頻度が最も高い色成分とする。なお、ここでは背景画像の全画素を用いているが、前景となるオブジェクトの周辺領域など背景画像内の一部領域の中で出現頻度が最も高い色成分を特定してもよい。導出された出現頻度が最も高い色成分の情報は、類似度判定部４０４’に出力される。 In S907, the background color analysis unit 801 analyzes the background image acquired in S901 and derives the color component having the highest frequency of appearance in each pixel constituting the background image. More specifically, in the background image, the total number P _G, B component of the pixel total number of pixels the value of the R component is maximized P _R, the value of the G component is maximum among the RGB values as the pixel value the value Find the total number P _B of pixels that maximizes. Then, the obtained P _R, P _G, among the P _B, that number is the largest color component, the occurrence frequency is the highest color component. Although all the pixels of the background image are used here, the color component having the highest frequency of appearance may be specified in a part of the background image such as the peripheral area of the object which is the foreground. The derived information on the color component having the highest frequency of appearance is output to the similarity determination unit 404'.

Ｓ９０８では、類似度判定部４０４’が、類否マップを初期化する。この類否マップについては、実施形態１で説明したとおりである。初期化処理後はＳ９０９に進む。 In S908, the similarity determination unit 404'initializes the similarity map. This similarity map is as described in the first embodiment. After the initialization process, the process proceeds to S909.

Ｓ９０９では、類似度判定部４０４’が、対象画像における注目画素のＲＧＢ値をチェックし、値が最大の色成分を特定する。そして、次のＳ９１０では、類似度判定部４０４’が、対象画像における注目画素の色と背景の色との類似度を、色成分比に基づいて判定する。具体的には、Ｓ９０７にて特定された“背景画像内で出現頻度が最も高い色成分”と、Ｓ９０９にて特定された“注目画素においてその値が最大の色成分”とが同一である場合は、注目画素の色と背景の色とが類似していると判定する。一方、両者が同一でない場合は、注目画素の色と背景の色とは類似していないと判定する。なお、注目画素の意味や選択方法は、実施形態１の図５のフローチャートにおけるＳ５０４やＳ５０８のときと同じである。 In S909, the similarity determination unit 404'checks the RGB value of the pixel of interest in the target image and identifies the color component having the largest value. Then, in the next S910, the similarity determination unit 404'determines the similarity between the color of the pixel of interest and the background color in the target image based on the color component ratio. Specifically, when the "color component having the highest frequency of appearance in the background image" specified in S907 and the "color component having the largest value in the pixel of interest" specified in S909 are the same. Determines that the color of the pixel of interest and the color of the background are similar. On the other hand, if they are not the same, it is determined that the color of the pixel of interest and the color of the background are not similar. The meaning and selection method of the pixel of interest are the same as those of S504 and S508 in the flowchart of FIG. 5 of the first embodiment.

Ｓ９１１では、類似度判定部４０４’が、Ｓ９１０での判定結果に従って、注目画素についての類否マップにおける画素値を更新する。すなわち、注目画素の色が背景の色と類似すると判定された場合は、注目画素の座標（ｘ，ｙ）と同一座標に対応する、類否マップＭ（ｘ，ｙ）における画素の画素値を“1”に変更する。また、注目画素の色が背景の色と類似していないと判定された場合は、注目画素の座標（ｘ，ｙ）と同一座標に対応する、類否マップ（ｘ，ｙ）の画素の画素値を“0”のまま維持する。 In S911, the similarity determination unit 404'updates the pixel value in the similarity map for the pixel of interest according to the determination result in S910. That is, when it is determined that the color of the pixel of interest is similar to the color of the background, the pixel value of the pixel in the similarity map M (x, y) corresponding to the same coordinates as the coordinates (x, y) of the pixel of interest is used. Change to "1". If it is determined that the color of the pixel of interest is not similar to the color of the background, the pixel of the pixel of the similarity map (x, y) corresponding to the same coordinates as the coordinates (x, y) of the pixel of interest. Keep the value at “0”.

Ｓ９１２では、対象画像内の全画素が注目画素として処理されたかどうかが判定される。未処理の画素があればＳ９０９に戻り、次の注目画素が選択されて処理が続行される。一方、全画素の処理が完了していれば、類似度判定部４０４’から更新を終えた類否マップが画素値補正部４０５に出力され、次の処理（Ｓ９１３）に進む。 In S912, it is determined whether or not all the pixels in the target image have been processed as the pixels of interest. If there is an unprocessed pixel, the process returns to S909, the next pixel of interest is selected, and processing is continued. On the other hand, if the processing of all the pixels is completed, the similarity determination unit 404'outputs the updated similarity map to the pixel value correction unit 405, and proceeds to the next processing (S913).

Ｓ９１３及びＳ９１４は、実施形態１の図５のフローチャートにおけるＳ５１１及びＳ５１２にそれぞれ対応し、異なるところはないので説明を省く。 S913 and S914 correspond to S511 and S512 in the flowchart of FIG. 5 of the first embodiment, respectively, and there are no differences, so the description thereof will be omitted.

以上が、本実施形態に係る、前景画像生成処理の内容である。ここで、図１０を参照して、本実施形態の効果について説明する。実施形態１の図７の場合と同様、カメラ１１によって撮像された対象画像１０には、前景としての人物オブジェクト１２と背景としてのフィールド１３とが写っている。そして、対象画像１０と同一の撮像視点からフィールド１３のみを撮像した背景画像を用いて人物オブジェクト１２のシルエットを表した前景画像１００１及び１００２が生成される。図１０において、４×４のブロック１４は、対象画像１０の一部を拡大したものであり、ブロック１４の各格子は１画素を表している。また、ブロック１４内の画素群１８は人物オブジェクト１２の色よりも背景の色に類似している画素群を示し、画素群１９は人物オブジェクト１２の色との類似度合いと背景の色との類似度合いが同等である画素群を示す。前景画像１００１は前述の実施形態１の手法を適用して得られた前景画像であり、前景画像１００２は前述の本実施形態の手法を適用して得られた前景画像である。実施形態１の手法で得られる前景画像１００１では、画素群１０１１に示すように、人物オブジェクト１２の色よりも背景の色に類似している画素群１８を、正しく背景として認識することができている。その一方で、オブジェクトの色との類似度合いと背景の色との類似度合いとが同等の画素群１９は、前景領域を構成する画素として処理されており、誤りを含んでいる。これに対し、本実施形態の手法で得られる前景画像１００２では、オブジェクトの輪郭に沿って正確に前景領域が抽出されている。 The above is the content of the foreground image generation processing according to the present embodiment. Here, the effect of the present embodiment will be described with reference to FIG. Similar to the case of FIG. 7 of the first embodiment, the target image 10 captured by the camera 11 shows the person object 12 as the foreground and the field 13 as the background. Then, foreground images 1001 and 1002 representing the silhouette of the person object 12 are generated using the background image obtained by capturing only the field 13 from the same imaging viewpoint as the target image 10. In FIG. 10, the 4 × 4 block 14 is an enlargement of a part of the target image 10, and each grid of the block 14 represents one pixel. Further, the pixel group 18 in the block 14 indicates a pixel group that is more similar to the background color than the color of the person object 12, and the pixel group 19 is similar to the color of the person object 12 and the background color. Indicates a group of pixels having the same degree. The foreground image 1001 is a foreground image obtained by applying the method of the first embodiment described above, and the foreground image 1002 is a foreground image obtained by applying the method of the present embodiment described above. In the foreground image 1001 obtained by the method of the first embodiment, as shown in the pixel group 1011 the pixel group 18 which is more similar to the background color than the color of the person object 12 can be correctly recognized as the background. There is. On the other hand, the pixel group 19 having the same degree of similarity with the color of the object and the degree of similarity with the color of the background is processed as a pixel constituting the foreground region and contains an error. On the other hand, in the foreground image 1002 obtained by the method of the present embodiment, the foreground region is accurately extracted along the contour of the object.

＜変形例＞
なお、上述のＳ９０２では、実施形態１におけるＳ５０２と同様、事前に撮像した背景画像と対象画像とを比較することで暫定前景画像を得ることとしていた。これに代えて、実施形態１を適用して得られた補正後の出力前景画像を、本実施形態における暫定前景画像として扱い、各処理を行うように構成してもよい。この場合、実施形態１と実施形態２をそれぞれ単独で実行した場合に比べて、より高精度な前景画像を得ることができる。 <Modification example>
In the above-mentioned S902, as in the case of S502 in the first embodiment, the provisional foreground image is obtained by comparing the background image captured in advance with the target image. Instead of this, the corrected output foreground image obtained by applying the first embodiment may be treated as a provisional foreground image in the present embodiment, and each process may be performed. In this case, a more accurate foreground image can be obtained as compared with the case where the first embodiment and the second embodiment are executed independently.

上述のＳ９０１における、背景画像内で出現頻度の最も高い色成分を決定する方法は、上記の例に限定されない。例えば、ＲＧＢ色空間をＨＳＶやＬａｂなどの異なる色空間に変換し、変換後の色空間における画素値を用いて、色相を複数のグループに分割した後、頻度の高い色相グループを出現頻度の高い色として決定しても良い。 The method for determining the color component having the highest frequency of appearance in the background image in S901 described above is not limited to the above example. For example, the RGB color space is converted into a different color space such as HSV or Lab, the hue is divided into a plurality of groups by using the pixel values in the converted color space, and then the frequently appearing hue group has a high frequency of appearance. It may be decided as a color.

以上のとおり本実施形態によれば、オブジェクトの輪郭に含まれる画素において背景と前景との色の違いが同等程度の場合であっても、オブジェクトの輪郭に沿って前景のオブジェクトを高精度に抽出することができる。 As described above, according to the present embodiment, even if the color difference between the background and the foreground is about the same in the pixels included in the outline of the object, the object in the foreground is extracted with high accuracy along the outline of the object. can do.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other Embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

An image processing device that generates a foreground image showing the shape of an object from an captured image.
An acquisition means for acquiring the captured image and
An extraction means for generating a foreground image by extracting a foreground region corresponding to the shape of the object from the captured image, and
A detection means for detecting the contour region of the object from the foreground image, and
A correction means for correcting the pixels constituting the contour region in the foreground image based on whether or not the color of the pixels in the captured image corresponding to the pixels is similar to the background color.
An image processing device characterized by having.

The acquisition means further acquires a background image in which the object does not exist, and obtains the background image.
Further, it has a determination means for determining whether or not the color of each pixel constituting the captured image is similar to the background color indicated by the background image.
The correction means makes the correction based on the determination result of the determination means.
The image processing apparatus according to claim 1.

In the determination means, when the color of the pixel of interest in the captured image is closer to the color of the background indicated by the background image than the color of the object, the color of the pixel of interest is similar to the background color. Judging that it is
When the pixel in the foreground image corresponding to the pixel of interest determined by the determination means to be close to the background color is a pixel constituting the contour region, the correction means determines the pixel value of the pixel. Make corrections to the pixel values that indicate the background,
The image processing apparatus according to claim 2.

The determination means is characterized in that it determines whether or not the color of the pixel of interest is closer to the background color than the color of the object by using a predetermined evaluation value regarding the tint. The image processing apparatus according to 3.

The image processing apparatus according to claim 4, wherein the predetermined evaluation value is any one of MSE, MAE, and RMSE.

The pixel values of the corresponding pixels between the foreground image and the background image are compared, and when the difference is equal to or less than a predetermined threshold value, it is determined that the color of the pixel of interest is similar to the background color. The image processing apparatus according to claim 3, wherein the image processing apparatus is characterized by the above.

4. The determination means is characterized in that the determination is performed using the pixel value in the color space of the captured image or the pixel value in the converted color space obtained by converting the color space into a different color space. The image processing apparatus according to any one of items 6 to 6.

The captured image and the background image have a common color space represented by a plurality of color components, and have a common color space.
The determination means determines whether or not the color of the pixel of interest is closer to the background color indicated by the background image than the color of the object, and the color component ratio of the pixel values in the pixels constituting the captured image. The image processing apparatus according to claim 3, wherein the determination is made based on the above.

Further, it has a derivation means for deriving the color component having the highest frequency of appearance in the color space of the background image.
When the color component with the highest appearance frequency in the background image derived by the derivation means and the color component having the maximum value in the attention pixel in the captured image are not the same, the determination means is described. It is determined that the color of the pixel of interest is close to the color of the background.
The image processing apparatus according to claim 8.

The determination means is characterized in that the determination is performed using the pixel values according to the common color space or the pixel values according to the converted color space obtained by converting the color space into a different color space. The image processing apparatus according to claim 8.

The determination means uses the corrected foreground image obtained by the image processing apparatus according to any one of claims 4 to 10 as the foreground image to be processed by the detection means and the correction means. The image processing apparatus according to claim 1.

The detection means satisfies a predetermined condition that can be regarded as pixels constituting the vicinity of the boundary between the object and the background in the foreground image generated by the extraction means. The image processing apparatus according to any one of claims 1 to 11, wherein the pixel of interest detects the contour region.

The predetermined condition is a case where the pixels in the foreground image are pixels constituting the object and one or more pixels constituting the background are present in a predetermined block including the pixels. The image processing apparatus according to claim 12.

The predetermined condition is a case where the pixel value of a pixel in the foreground image and the pixel value of the corresponding pixel in the contracted image obtained by performing morphology processing on the foreground image are not the same. The image processing apparatus according to claim 12.

An image processing method that generates a foreground image showing the shape of an object from a captured image.
The acquisition step of acquiring the captured image and
An extraction step of extracting a foreground region corresponding to the shape of the object from the captured image to generate a foreground image, and
A detection step for detecting the contour region of the object from the foreground image,
A correction step of correcting the pixels constituting the contour region in the foreground image based on whether or not the color of the pixel in the captured image corresponding to the pixel is similar to the background color.
An image processing method comprising.

A program that causes a computer to function as each means of the image processing apparatus according to any one of claims 1 to 14.