JP2018195190A

JP2018195190A - Image processing apparatus and image processing method

Info

Publication number: JP2018195190A
Application number: JP2017100074A
Authority: JP
Inventors: 松下　明弘; Akihiro Matsushita; 明弘松下; 究小林; Kiwamu Kobayashi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-05-19
Filing date: 2017-05-19
Publication date: 2018-12-06
Anticipated expiration: 2037-05-19
Also published as: JP7012457B2

Abstract

To improve specific accuracy in a foreground region inside an input image.SOLUTION: By using difference on the basis of a first feature amount between an input image and a background image, a first region in the input image is specified. By using difference on the basis of a second feature amount between the input image and the background image, a second region in the input image is specified. A foreground region corresponding to a predetermined object in the input image is determined on the basis of the first region and the second region.SELECTED DRAWING: Figure 2

Description

本発明は、入力画像内の前景領域を特定するための技術に関するものである。 The present invention relates to a technique for specifying a foreground region in an input image.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、該撮影により得られた複数視点画像を用いて任意の仮想視点コンテンツを生成する技術が注目されている。このようにして複数視点画像から任意の仮想視点コンテンツを生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴閲覧することが出来るため、通常の画像と比較してユーザに高臨場感を与えることができる。 In recent years, attention has been paid to a technique of installing a plurality of cameras at different positions, performing synchronous shooting from multiple viewpoints, and generating an arbitrary virtual viewpoint content using a plurality of viewpoint images obtained by the shooting. In this way, according to the technology for generating an arbitrary virtual viewpoint content from a plurality of viewpoint images, for example, a soccer or basketball highlight scene can be viewed and viewed from various angles. High sense of presence can be given to the user.

一方、複数視点画像に基づく任意の仮想視点コンテンツの生成及び閲覧は、複数のカメラが撮影した画像をサーバなどの画像処理部に集約し、該画像処理部にてレンダリングなどの処理を施し、最後にユーザ端末に伝送をおこなうことで実現できる。仮想視点の映像を生成する技術についてはさまざまな手法が開発されている。例えば、複数のカメラ映像から選手などの前景を分離し、３次元座標を算出し、平面に再投影して仮想視点映像を生成する技術がある。 On the other hand, the generation and browsing of arbitrary virtual viewpoint content based on a plurality of viewpoint images is performed by consolidating images taken by a plurality of cameras into an image processing unit such as a server, and performing processing such as rendering in the image processing unit. This can be realized by transmitting to the user terminal. Various techniques have been developed for technologies for generating virtual viewpoint images. For example, there is a technique in which a foreground such as a player is separated from a plurality of camera images, three-dimensional coordinates are calculated, and reprojected onto a plane to generate a virtual viewpoint image.

ここで、前景を分離する手法については、例えば、背景差分法と呼ばれるものが一般的に知られている。まず過去の一定時間の画像の情報に基づいて背景情報を生成し、現在の画像との差分を検出して前景として分離する手法である。前景の分離についてはその他にも、特徴量や機械学習を用いる手法などが知られている。また特許文献１では、空間周波数に基づく注目点の推定と、色情報に基づく前景形状の推定処理を組み合わせて、前景分離の性能を改善する方法が開示されている。 Here, as a method for separating the foreground, for example, a so-called background subtraction method is generally known. First, background information is generated based on image information for a certain period of time in the past, and a difference from the current image is detected and separated as a foreground. As for foreground separation, other methods such as a feature amount and machine learning are known. Patent Document 1 discloses a method for improving the performance of foreground separation by combining estimation of a point of interest based on spatial frequency and foreground shape estimation processing based on color information.

特開２０１４−２３２４７７号公報JP 2014-232477 A

しかし、一つの特徴量を用いた前景形状推定は性能に限界があった。例えば色情報に基づく手法では、前景と背景の色が近い場合に性能が低下する場合があった。 However, foreground shape estimation using one feature has a limit in performance. For example, in the method based on color information, the performance may deteriorate when the foreground and background colors are close.

本発明はこのような問題に鑑みてなされたものであり、入力画像内における前景領域の特定の精度を改善するための技術を提供する。 The present invention has been made in view of such problems, and provides a technique for improving the accuracy of specifying a foreground region in an input image.

本発明の一様態は、入力画像と背景画像との間の第１の特徴量に基づく差分を用いて、該入力画像における第１領域を特定する第１の特定手段と、前記入力画像と前記背景画像との間の第２の特徴量に基づく差分を用いて、該入力画像における第２領域を特定する第２の特定手段と、前記第１の特定手段により特定された前記第１領域及び前記第２の特定手段により特定された前記第２領域に基づいて、前記入力画像における所定の被写体に対応する前景領域を決定する決定手段とを備えることを特徴とする。 According to an aspect of the present invention, a first specifying unit that specifies a first region in an input image using a difference based on a first feature amount between the input image and the background image, the input image, Using a difference based on a second feature amount with respect to a background image, a second specifying means for specifying a second area in the input image, the first area specified by the first specifying means, and And determining means for determining a foreground area corresponding to a predetermined subject in the input image based on the second area specified by the second specifying means.

本発明の構成によれば、入力画像内における前景領域の特定の精度を改善することができる。 According to the configuration of the present invention, the specific accuracy of the foreground area in the input image can be improved.

システムの構成例を示す図である。It is a figure which shows the structural example of a system. 画像処理装置１０３の機能構成例を示すブロック図である。3 is a block diagram illustrating a functional configuration example of an image processing apparatus 103. FIG. 画像処理装置１０３が行う処理のフローチャートである。3 is a flowchart of processing performed by the image processing apparatus 103. 画像処理装置１０３が行う処理のフローチャートである。3 is a flowchart of processing performed by the image processing apparatus 103. 前景領域の一例を示す図である。It is a figure which shows an example of a foreground area | region. 画像処理装置１０３の機能構成例を示すブロック図である。3 is a block diagram illustrating a functional configuration example of an image processing apparatus 103. FIG. 前景分離部６０６が行う処理のフローチャートである。5 is a flowchart of processing performed by a foreground separation unit 606. コンピュータ装置のハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of a computer apparatus.

以下、添付図面を参照し、本発明の実施形態について説明する。なお、以下説明する実施形態は、本発明を具体的に実施した場合の一例を示すもので、特許請求の範囲に記載した構成の具体的な実施例の１つである。 Embodiments of the present invention will be described below with reference to the accompanying drawings. The embodiment described below shows an example when the present invention is specifically implemented, and is one of the specific examples of the configurations described in the claims.

［第１の実施形態］
本実施形態に係るシステムの構成例について、図１を用いて説明する。図１に示す如く、本実施形態に係るシステムは、伝送路を介してリング状に接続されている複数のカメラ１０２と、該複数のカメラ１０２のうち少なくとも一部のカメラ１０２による撮像画像に基づいて仮想視点画像を生成する画像処理装置１０３と、を有する。 [First Embodiment]
A configuration example of a system according to the present embodiment will be described with reference to FIG. As shown in FIG. 1, the system according to the present embodiment is based on a plurality of cameras 102 connected in a ring shape via a transmission path and images captured by at least some of the plurality of cameras 102. And an image processing device 103 that generates a virtual viewpoint image.

複数のカメラ１０２は競技場１０１の周囲に設置されており、それぞれのカメラ１０２は互いに異なる方向から競技場１０１の動画像を撮像している。複数のカメラ１０２のうち１つは画像処理装置１０３に接続されており、それぞれのカメラ１０２による撮像画像（動画像を構成する各フレームの画像）は、伝送路や他のカメラ１０２等を介して画像処理装置１０３に転送される。これにより画像処理装置１０３には、複数のカメラ１０２による撮像画像がフレーム単位で入力されることになる。 The plurality of cameras 102 are installed around the stadium 101, and each camera 102 captures a moving image of the stadium 101 from different directions. One of the plurality of cameras 102 is connected to the image processing apparatus 103, and an image captured by each camera 102 (an image of each frame constituting a moving image) is transmitted via a transmission path, another camera 102, or the like. The image is transferred to the image processing apparatus 103. As a result, images captured by the plurality of cameras 102 are input to the image processing apparatus 103 in units of frames.

画像処理装置１０３は、それぞれのカメラ１０２による撮像画像のうち、ユーザにより指定された仮想的な視点から見た競技場１０１の画像（仮想視点画像）を生成するために必要な撮像画像群に基づいて、該仮想視点画像を生成する。なお、競技場１０１では例えばサッカーなどの競技が行われており、競技場１０１の中に被写体としての人物１０４がいるものとする。ただし、撮影対象は競技場１０１における競技に限らず、例えばステージにおける演技などであってもよい。また、視点はユーザにより指定されるものに限らず、自動的に決定されるものであってもよい。 The image processing device 103 is based on a group of captured images necessary for generating an image (virtual viewpoint image) of the stadium 101 viewed from a virtual viewpoint designated by the user, among the captured images of the respective cameras 102. Then, the virtual viewpoint image is generated. It is assumed that a competition such as soccer is being performed in the stadium 101, and a person 104 as a subject is present in the stadium 101. However, the subject to be photographed is not limited to the competition in the stadium 101, but may be, for example, performance on the stage. Further, the viewpoint is not limited to that specified by the user, but may be determined automatically.

なお、競技場１０１の周囲に配置された複数のカメラ１０２による撮像画像を画像処理装置１０３が取得することができるのであれば、システムの構成は図１に示した構成に限らない。例えば、各カメラ１０２と画像処理装置１０３とが直接接続されていてもよい。また例えば、各カメラ１０２に撮像画像を記憶するための取り外し可能な記憶装置が備えられており、当該記憶装置をカメラ１０２から取り外して画像処理装置１０３に接続することで撮像画像を画像処理装置１０３に取得させてもよい。 Note that the configuration of the system is not limited to the configuration illustrated in FIG. 1 as long as the image processing apparatus 103 can acquire images captured by the plurality of cameras 102 arranged around the stadium 101. For example, each camera 102 and the image processing apparatus 103 may be directly connected. Further, for example, each camera 102 is provided with a removable storage device for storing the captured image. The storage device is detached from the camera 102 and connected to the image processing device 103, whereby the captured image is displayed on the image processing device 103. May be acquired.

次に、画像処理装置１０３の機能構成例について、図２のブロック図を用いて説明する。本実施形態では、画像処理装置１０３における画像処理は、画像処理装置１０３に内蔵されたＡＳＩＣ（application specific integrated circuit）やＦＰＧＡ（field programmable gate array）などのハードウェアにより処理される。然るに本実施形態では、図２に示した各機能部（各モジュール）は、ハードウェアとしてＡＳＩＣやＦＰＧＡの内部に実装されている。 Next, a functional configuration example of the image processing apparatus 103 will be described with reference to the block diagram of FIG. In the present embodiment, the image processing in the image processing apparatus 103 is processed by hardware such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA) incorporated in the image processing apparatus 103. However, in the present embodiment, each functional unit (each module) illustrated in FIG. 2 is mounted as hardware inside an ASIC or FPGA.

データ受信部２０１は、それぞれのカメラ１０２が撮像した撮像画像を取得し、該取得した撮像画像を記録部２０２に格納する。撮像画像には、該撮像画像を撮像したカメラ１０２に固有の識別情報が添付されているものとする。記録部２０２は、例えばハードディスクや、ＳＳＤ（ｓｏｌｉｄｓｔａｔｅｄｒｉｖｅ）、またはそれらの組み合わせなどで構成される。そしてデータ受信部２０１は、記録部２０２に格納されている撮像画像群のうち、ユーザ指定などにより入力された仮想視点位置に応じてこれから生成する仮想視点画像の生成処理のために必要となる撮像画像群を記録部２０２から読み出す。そしてデータ受信部２０１は、記録部２０２から読み出した撮像画像群を、背景生成部２０３、前景分離部２０５、前景分離部２０６、生成部２０９に対して出力する。 The data receiving unit 201 acquires captured images captured by the respective cameras 102 and stores the acquired captured images in the recording unit 202. It is assumed that identification information unique to the camera 102 that captured the captured image is attached to the captured image. The recording unit 202 is configured by, for example, a hard disk, an SSD (solid state drive), or a combination thereof. Then, the data reception unit 201 captures images necessary for the generation processing of a virtual viewpoint image to be generated from the group of captured images stored in the recording unit 202 according to the virtual viewpoint position input by user designation or the like. An image group is read from the recording unit 202. Then, the data receiving unit 201 outputs the captured image group read from the recording unit 202 to the background generation unit 203, the foreground separation unit 205, the foreground separation unit 206, and the generation unit 209.

背景バッファ２０４には、カメラ１０２ごとに、該カメラ１０２に固有の識別情報と、該カメラ１０２が過去に撮像した撮像画像から生成した背景画像（以下では登録背景画像と称する）と、が関連づけて格納されている。本実施形態において背景画像とは、撮像画像から人物などの所定の被写体の画像（前景画像）が取り除かれた画像である。例えば、競技場１０１におけるサッカーの試合がカメラ１０２により撮影される場合に、選手やボールなどの画像が前景画像となり、選手やボールなどを含まない競技場１０１の地面の画像が背景画像となる。 In the background buffer 204, for each camera 102, identification information unique to the camera 102 is associated with a background image (hereinafter referred to as a registered background image) generated from a captured image captured by the camera 102 in the past. Stored. In the present embodiment, the background image is an image obtained by removing an image (foreground image) of a predetermined subject such as a person from the captured image. For example, when a soccer game on the stadium 101 is photographed by the camera 102, an image of a player or a ball becomes a foreground image, and an image of the ground of the stadium 101 that does not include a player or a ball becomes a background image.

背景生成部２０３は、データ受信部２０１から入力された撮像画像に添付されている識別情報と同じ識別情報と関連づけて背景バッファ２０４に格納されている登録背景画像を該背景バッファ２０４から読み出す。そして背景生成部２０３は、同じ識別情報に対応する撮像画像と登録背景画像とに基づいて、該撮像画像における背景を表す背景画像を生成し、該識別情報と関連づけて背景バッファ２０４に格納されている登録背景画像を該生成した背景画像を用いて更新する。撮像画像と登録背景画像とに基づいて該撮像画像の背景を表す背景画像を生成するための方法には様々な方法を適用することができる。例えば背景生成部２０３は、撮像画像と登録背景画像との差分画像（撮像画像と登録背景画像とで同位置の画素位置における画素値の差分をとった画像）を用いて、該撮像画像における背景を表す背景画像を生成しても良い。また、混合ガウスモデルによる背景画像生成方法を用いても良い。混合ガウスモデルについては一般的によく知られている手法であるので詳細な説明は省く。そして背景生成部２０３は、生成した背景画像を、前景分離部２０５及び前景分離部２０６に対して出力する。 The background generation unit 203 reads from the background buffer 204 the registered background image stored in the background buffer 204 in association with the same identification information as the identification information attached to the captured image input from the data reception unit 201. The background generation unit 203 generates a background image representing the background in the captured image based on the captured image corresponding to the same identification information and the registered background image, and is stored in the background buffer 204 in association with the identification information. The registered background image is updated using the generated background image. Various methods can be applied to the method for generating the background image representing the background of the captured image based on the captured image and the registered background image. For example, the background generation unit 203 uses a difference image between the captured image and the registered background image (an image obtained by calculating a difference in pixel values at the same pixel position between the captured image and the registered background image), and uses the background in the captured image. A background image may be generated. A background image generation method using a mixed Gaussian model may be used. The mixed Gaussian model is a well-known method and will not be described in detail. Then, the background generation unit 203 outputs the generated background image to the foreground separation unit 205 and the foreground separation unit 206.

前景分離部２０５は、データ受信部２０１から入力した撮像画像と、該撮像画像における背景を表す背景画像として背景生成部２０３が生成した背景画像と、の画素ごとの第１の特徴量の差分に基づいて、該撮像画像における前景領域を推定する。本実施形態において前景領域とは、撮影画像内における人物などの所定の被写体が位置する領域である。なお、前景領域は人物に対応する領域に限らず、例えば移動物体に対応する領域や、あらかじめ指定された被写体に対応する領域としてもよい。前景分離部２０５が行う処理について、同処理のフローチャートを示す図３（Ａ）を用いて説明する。前景分離部２０５は第１の特徴量として色情報を用いる。本実施形態において或る画素についての第１の特徴量は、当該画素の画素値に応じて決まる。 The foreground separation unit 205 determines the difference between the first feature amount for each pixel of the captured image input from the data reception unit 201 and the background image generated by the background generation unit 203 as a background image representing the background in the captured image. Based on this, the foreground area in the captured image is estimated. In the present embodiment, the foreground area is an area where a predetermined subject such as a person in the captured image is located. Note that the foreground area is not limited to an area corresponding to a person, and may be an area corresponding to a moving object or an area corresponding to a subject specified in advance. Processing performed by the foreground separation unit 205 will be described with reference to FIG. The foreground separation unit 205 uses color information as the first feature amount. In the present embodiment, the first feature amount for a certain pixel is determined according to the pixel value of the pixel.

ステップＳ３０１では、前景分離部２０５は、撮像画像（入力画像）と背景画像との色情報に基づく差分画像を生成する。ステップＳ３０１で生成される差分画像は、差分画像中の画素位置（ｘ、ｙ）における画素値が、撮像画像中の画素位置（ｘ、ｙ）における色情報と、背景画像中の画素位置（ｘ、ｙ）における色情報と、の差分を表す画像である。例えば、撮像画像中の画素位置（ｘ、ｙ）におけるＲ，Ｇ，Ｂのそれぞれの画素値をＰＲ，ＰＧ，ＰＢとし、背景画像中の画素位置（ｘ、ｙ）におけるＲ，Ｇ，Ｂのそれぞれの画素値をＰＲ’，ＰＧ’，ＰＢ’とする。このとき、差分画像中の画素位置（ｘ、ｙ）における画素値Δは、Δ＝｜ＰＲ−ＰＲ’｜＋｜ＰＧ−ＰＧ’｜＋｜ＰＢ−ＰＢ’｜を計算することで決定される。もちろん、撮像画像と背景画像との色情報についての差分画像の求め方は、このような方法に限るものではない。例えば、色情報はＲＧＢであっても良いし、ＹＵＶ色空間やその他の色空間における各色成分であっても良く、特定の色成分に限らない。 In step S301, the foreground separation unit 205 generates a difference image based on color information between the captured image (input image) and the background image. In the difference image generated in step S301, the pixel value at the pixel position (x, y) in the difference image has the color information at the pixel position (x, y) in the captured image and the pixel position (x in the background image). , Y) is an image representing a difference from the color information. For example, the pixel values of R, G, B at the pixel position (x, y) in the captured image are PR, PG, PB, and R, G, B at the pixel position (x, y) in the background image The pixel values are PR ′, PG ′, and PB ′. At this time, the pixel value Δ at the pixel position (x, y) in the difference image is determined by calculating Δ = | PR−PR ′ | + | PG−PG ′ | + | PB−PB ′ |. . Of course, the method of obtaining the difference image for the color information between the captured image and the background image is not limited to such a method. For example, the color information may be RGB, may be each color component in the YUV color space or other color space, and is not limited to a specific color component.

ステップＳ３０２では、前景分離部２０５は、差分画像に対して画素値の平滑化処理を行う。平滑化処理を行うことで、画素の欠落やノイズを除去する。平滑化については最も簡単なものとして単純平均処理があるが、性能を向上するために、例えばガウシアンフィルタやバイラテラルフィルタなどが知られており、どのようなフィルタを用いてもよい。 In step S302, the foreground separation unit 205 performs pixel value smoothing processing on the difference image. By performing the smoothing process, pixel loss and noise are removed. For smoothing, there is a simple average process as the simplest one, but in order to improve the performance, for example, a Gaussian filter, a bilateral filter, etc. are known, and any filter may be used.

ステップＳ３０３では、前景分離部２０５は、ステップＳ３０２で平滑化処理を行った差分画像における各画素について、該画素の画素値が閾値よりも大きいか否かを判断する。そしてこの判断の結果、前景分離部２０５は、差分画像において画素値が閾値よりも大きい画素については、ステップＳ３０４において該画素に対し、「前景領域を構成する画素（前景画素）」を表すビット値「１」を割り当てる。一方、前景分離部２０５は、差分画像において画素値が閾値以下の画素については、ステップＳ３０５において該画素に対し、「背景領域を構成する画素（背景画素）」を表すビット値「０」を割り当てる。このような処理により、差分画像の各画素に対応するビット値を画素値とする二値画像を生成することができる。二値画像中の画素位置（ｘ、ｙ）における画素値が「１」である場合、撮像画像中の画素位置（ｘ、ｙ）における画素が前景画素と推定された画素であることを示す。一方、二値画像中の画素位置（ｘ、ｙ）における画素値が「０」である場合、撮像画像中の画素位置（ｘ、ｙ）における画素が背景画素と推定された画素であることを示す。このように前景分離部２０５は、撮像画像と背景画像との色情報の差分に基づく二値画像を、撮像画像上における前景領域の推定結果を表す情報として生成する。 In step S303, the foreground separation unit 205 determines whether or not the pixel value of each pixel in the difference image subjected to the smoothing process in step S302 is greater than a threshold value. As a result of this determination, the foreground separation unit 205 determines, for a pixel whose pixel value is larger than the threshold value in the difference image, a bit value representing “a pixel constituting the foreground region (foreground pixel)” for the pixel in step S304. Assign “1”. On the other hand, the foreground separation unit 205 assigns a bit value “0” representing “a pixel constituting the background region (background pixel)” to the pixel having a pixel value equal to or smaller than the threshold in the difference image in step S305. . By such processing, a binary image having a bit value corresponding to each pixel of the difference image as a pixel value can be generated. When the pixel value at the pixel position (x, y) in the binary image is “1”, this indicates that the pixel at the pixel position (x, y) in the captured image is a pixel estimated as the foreground pixel. On the other hand, when the pixel value at the pixel position (x, y) in the binary image is “0”, the pixel at the pixel position (x, y) in the captured image is a pixel estimated as the background pixel. Show. As described above, the foreground separation unit 205 generates a binary image based on the difference in color information between the captured image and the background image as information representing the estimation result of the foreground region on the captured image.

なお、ステップＳ３０３からＳ３０５において生成されるこの二値画像では、実際には前景に対応する領域の内部に細かな画素の欠落が発生したり、実際には背景に対応する領域に細かなノイズが発生したりする。そのため次のステップＳ３０６では、前景分離部２０５は、この二値画像に対して上記のような平滑化処理を行うことで、画素の欠落やノイズを除去する。なお、平滑化処理は、二値画像全体に対して行っても良いし、画素値が「１」である画素の領域及びその近傍に対して行っても良い。 Note that in the binary image generated in steps S303 to S305, in actuality, a small pixel dropout occurs in the area corresponding to the foreground, or in fact, there is fine noise in the area corresponding to the background. Occur. For this reason, in the next step S306, the foreground separation unit 205 performs the smoothing process as described above on the binary image, thereby removing missing pixels and noise. The smoothing process may be performed on the entire binary image, or may be performed on a pixel region having a pixel value “1” and its vicinity.

そして前景分離部２０５は、図３（Ａ）のフローチャートに従った処理を行うことで生成した二値画像（第１の二値画像）を、後段の前景領域調整部２０７に対して送出する。なお、前景分離部２０５は、データ受信部２０１から撮像画像が入力されるたびに図３（Ａ）のフローチャートに従った処理を行う。 Then, the foreground separation unit 205 sends the binary image (first binary image) generated by performing the processing according to the flowchart of FIG. 3A to the subsequent foreground region adjustment unit 207. The foreground separation unit 205 performs processing according to the flowchart of FIG. 3A every time a captured image is input from the data reception unit 201.

前景分離部２０６は、データ受信部２０１から入力した撮像画像と、該撮像画像における背景領域を表す背景画像として背景生成部２０３が生成した背景画像と、の画素ごとの第２の特徴量の差分に基づいて、該撮像画像における前景領域を推定する。前景分離部２０６が行う処理について、同処理のフローチャートを示す図３（Ｂ）を用いて説明する。前景分離部２０６は第２の特徴量としてテクスチャ情報を用いる。すなわち、本実施形態において或る画素についての第２の特徴量は、当該画素の画素値と当該画素の周囲の画素の画素値とに応じて決まる。 The foreground separation unit 206 is a second feature amount difference for each pixel between the captured image input from the data reception unit 201 and the background image generated by the background generation unit 203 as a background image representing the background area in the captured image. Based on the above, the foreground region in the captured image is estimated. Processing performed by the foreground separation unit 206 will be described with reference to FIG. 3B showing a flowchart of the processing. The foreground separation unit 206 uses texture information as the second feature amount. That is, in the present embodiment, the second feature amount for a certain pixel is determined according to the pixel value of the pixel and the pixel values of pixels around the pixel.

ステップＳ３０７では、前景分離部２０６は、撮像画像（入力画像）と背景画像とのテクスチャ情報に基づく差分画像を生成する。ステップＳ３０７で生成される差分画像中の画素位置（ｘ、ｙ）における画素値は、例えば次のようにして求める。撮像画像中の画素位置（ｘ、ｙ）を中心とする画像領域（例えば３×３画素の画像領域）と、背景画像中の画素位置（ｘ、ｙ）を中心とする画像領域（例えば３×３画素の画像領域）と、の間で同じ画素位置の画素同士で輝度値の差分（絶対値）を求め、求めた差分の合計値Ｓを、差分画像中の画素位置（ｘ、ｙ）における画素値とする。画素の輝度値は、Ｌａｂ色空間におけるＬの成分値である。画素の画素値がＲＧＢの色空間におけるものであれば、Ｒ成分の画素値、Ｇ成分の画素値、Ｂ成分の画素値から周知の方法でもってＬの成分値を求めることができる。なお、画素の輝度値はＬ成分に限らず、ＹＵＶやＹＣｂＣｒ等のＹ成分の値であっても良い。 In step S307, the foreground separation unit 206 generates a difference image based on texture information between the captured image (input image) and the background image. The pixel value at the pixel position (x, y) in the difference image generated in step S307 is obtained as follows, for example. An image area centered on the pixel position (x, y) in the captured image (for example, an image area of 3 × 3 pixels) and an image area centered on the pixel position (x, y) in the background image (for example, 3 × A difference (absolute value) of luminance values between pixels at the same pixel position between the three pixel image areas) and the total value S of the obtained differences at the pixel position (x, y) in the difference image The pixel value. The luminance value of the pixel is an L component value in the Lab color space. If the pixel value of the pixel is in the RGB color space, the L component value can be obtained by a known method from the R component pixel value, the G component pixel value, and the B component pixel value. The luminance value of the pixel is not limited to the L component, and may be a Y component value such as YUV or YCbCr.

なお、テクスチャ情報に基づく特徴量としては、そのほかにも例えば、ＨＯＧ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量やＬＢＰ（ＬｏｃａｌＢｉｎａｒｙＰａｔｔｅｒｎ）特徴量などを用いてもよい。 In addition, for example, HOG (Histograms of Oriented Gradients) feature quantity, LBP (Local Binary Pattern) feature quantity, or the like may be used as the feature quantity based on texture information.

ステップＳ３０８では、前景分離部２０６は上記のステップＳ３０２と同様にして、差分画像に対して画素値の平滑化処理を行う。ステップＳ３０９では、前景分離部２０６は、ステップＳ３０８で平滑化処理を行った差分画像における各画素について、該画素の画素値が閾値よりも大きいか否かを判断する。そしてこの判断の結果、前景分離部２０６は、差分画像において画素値が閾値よりも大きい画素については、ステップＳ３１０において該画素に対し、「前景領域を構成する画素（前景画素）」を表すビット値「１」を割り当てる。一方、前景分離部２０６は、差分画像において画素値が閾値以下の画素については、ステップＳ３１１において該画素に対し、「背景領域を構成する画素（背景画素）」を表すビット値「０」を割り当てる。このような処理により、上記のステップＳ３０４，Ｓ３０５と同様に、差分画像の各画素に対応するビット値を画素値とする二値画像を生成することができる。このように前景分離部２０６は、撮像画像と背景画像とのテクスチャ情報の差分に基づく二値画像を、撮像画像上における前景領域の推定結果を表す情報として生成する。 In step S308, the foreground separation unit 206 performs pixel value smoothing processing on the difference image in the same manner as in step S302. In step S309, the foreground separation unit 206 determines whether or not the pixel value of the pixel in the difference image subjected to the smoothing process in step S308 is greater than a threshold value. As a result of this determination, the foreground separation unit 206 determines, for the pixel whose pixel value is larger than the threshold value in the difference image, a bit value representing “a pixel constituting the foreground region (foreground pixel)” for the pixel in step S310. Assign “1”. On the other hand, the foreground separation unit 206 assigns a bit value “0” representing “a pixel forming the background region (background pixel)” to the pixel having a pixel value equal to or smaller than the threshold in the difference image in step S311. . By such processing, a binary image having a bit value corresponding to each pixel of the difference image as a pixel value can be generated in the same manner as in steps S304 and S305 described above. As described above, the foreground separation unit 206 generates a binary image based on the difference in texture information between the captured image and the background image as information representing the estimation result of the foreground region on the captured image.

ステップＳ３１２では、前景分離部２０６は、上記のステップＳ３０６と同様にして、この二値画像に対して上記のような平滑化処理を行うことで、画素の欠落やノイズを除去する。 In step S312, the foreground separation unit 206 performs the above-described smoothing process on the binary image in the same manner as in step S306, thereby removing pixel omissions and noise.

そして前景分離部２０６は、図３（Ｂ）のフローチャートに従った処理を行うことで生成した二値画像（第２の二値画像）を、後段の前景領域調整部２０７に対して送出する。なお、前景分離部２０６は、データ受信部２０１から撮像画像が入力されるたびに図３（Ｂ）のフローチャートに従った処理を行う。 The foreground separation unit 206 transmits the binary image (second binary image) generated by performing the processing according to the flowchart of FIG. 3B to the foreground region adjustment unit 207 in the subsequent stage. Note that the foreground separation unit 206 performs processing according to the flowchart of FIG. 3B every time a captured image is input from the data reception unit 201.

図３（Ａ）のフローチャートに従って生成された二値画像の一例を図５（Ａ）に示す。図５（Ａ）に示す如く、二値画像上では前景（人物１０４）が人の形として検出されているが、ここでは例として、検出された前景の領域の一部に大きな欠落があるものとする。色情報の差分に基づいて前景領域の推定を行う際、特に前景と背景の色が近い場合にこのような欠落が発生することがある。 An example of a binary image generated according to the flowchart of FIG. 3A is shown in FIG. As shown in FIG. 5A, the foreground (person 104) is detected as a human shape on the binary image, but here, as an example, a part of the detected foreground area has a large omission. And When the foreground area is estimated based on the difference in color information, such omission may occur particularly when the foreground and background colors are close.

図３（Ｂ）のフローチャートに従って生成された二値画像の一例を図５（Ｂ）に示す。図５（Ｂ）に示す如く、二値画像上では検出された前景（人物１０４）の領域の境界部が滑らかでなく、また前景の領域の内部に小さな欠落が見られる。テクスチャ情報の差分に基づいて前景領域の推定を行う際には、このように前景と背景との境界が粗くなりやすい傾向がある。 An example of a binary image generated according to the flowchart of FIG. 3B is shown in FIG. As shown in FIG. 5B, the boundary portion of the detected foreground (person 104) region is not smooth on the binary image, and a small omission is seen inside the foreground region. When the foreground region is estimated based on the difference in texture information, the boundary between the foreground and the background tends to be rough as described above.

前景領域調整部２０７では、前景分離部２０６によって生成された第２の二値画像における前景領域を調整する。第２の二値画像における前景領域の調整処理の一例について、図４（Ａ）のフローチャートを用いて説明する。 The foreground area adjustment unit 207 adjusts the foreground area in the second binary image generated by the foreground separation unit 206. An example of the foreground region adjustment process in the second binary image will be described with reference to the flowchart of FIG.

ステップＳ４０１で前景領域調整部２０７は、第１の二値画像及び第２の二値画像のそれぞれに対してラベリング処理を行うことで、第１の二値画像及び第２の二値画像のそれぞれにおける前景領域（画素値が「１」の画素の領域）に対してラベルを付与する。ラベリング処理では、同じ画素値「１」を有し且つ繋がった画素のかたまりを一つの前景領域（同一の被写体に対応する領域）として検出し、前景領域ごとにラベル（番号）を付加する。 In step S401, the foreground area adjustment unit 207 performs a labeling process on each of the first binary image and the second binary image, thereby each of the first binary image and the second binary image. A label is assigned to the foreground region (a pixel region having a pixel value “1”). In the labeling process, a group of connected pixels having the same pixel value “1” is detected as one foreground area (area corresponding to the same subject), and a label (number) is added to each foreground area.

以下では説明を簡単にするために、第１の二値画像における着目前景領域と、第２の二値画像における前景領域群のうち着目前景領域に最も近い前景領域と、には同じラベル番号が付加されているものとする。しかし、第１の二値画像における着目前景領域と、第２の二値画像における前景領域群のうち着目前景領域に最も近い前景領域と、の対応関係が導出可能であれば、それぞれの二値画像において近接する前景領域に同じラベル番号を付加することに限らない。 For the sake of simplicity, the same label number is assigned to the foreground region of interest in the first binary image and the foreground region closest to the foreground region of interest among the foreground region group in the second binary image. It shall be added. However, if the correspondence between the foreground region of interest in the first binary image and the foreground region closest to the foreground region of interest in the foreground region group in the second binary image can be derived, the respective binary values It is not limited to adding the same label number to the foreground region adjacent in the image.

そしてステップＳ４０３における処理を、第１の二値画像（第２の二値画像）におけるそれぞれのラベル番号について行う（ラベル数に応じた回数繰り返す）ことで、それぞれのラベル番号に対応する前景領域についてステップＳ４０３の処理を行う。 Then, the processing in step S403 is performed for each label number in the first binary image (second binary image) (repeated by the number corresponding to the number of labels), so that the foreground area corresponding to each label number is processed. The process of step S403 is performed.

ステップＳ４０３では、前景領域調整部２０７は、未選択のラベル番号を１つ選択ラベル番号として選択し、第２の二値画像において該選択ラベル番号が付加された前景領域に対して収縮処理を行うことで、該前景領域を調整する。収縮処理にはモルフォロジー演算と呼ばれる処理を用いる。モルフォロジー演算については一般的によく知られているので説明は省く。またこのときの収縮量は、第１の二値画像と第２の二値画像とを重ねたときに、第２の二値画像における前景領域が第１の二値画像における前景領域のほぼ内側になるように、予め一定の値として決められている。図５（Ｂ）の前景領域に対する収縮処理の結果の例を図５（Ｃ）に示す。 In step S403, the foreground area adjustment unit 207 selects one unselected label number as the selected label number, and performs contraction processing on the foreground area to which the selected label number is added in the second binary image. Thus, the foreground area is adjusted. For the shrinking process, a process called morphological operation is used. Morphological operations are generally well known and will not be described. Further, the contraction amount at this time is such that when the first binary image and the second binary image are overlapped, the foreground area in the second binary image is substantially inside the foreground area in the first binary image. It is determined in advance as a constant value. FIG. 5C shows an example of the result of the contraction process for the foreground region in FIG.

第２の二値画像における前景領域の調整処理の他の例について、図４（Ｂ）のフローチャートを用いて説明する。ステップＳ４０５では前景領域調整部２０７は、上記のステップＳ４０１と同様にして、第１の二値画像及び第２の二値画像のそれぞれに対するラベリング処理を行う。 Another example of the foreground area adjustment process in the second binary image will be described with reference to the flowchart in FIG. In step S405, the foreground area adjustment unit 207 performs a labeling process on each of the first binary image and the second binary image, similarly to step S401 described above.

ステップＳ４０７〜Ｓ４１１における処理を、第１の二値画像（第２の二値画像）におけるそれぞれのラベル番号について行う（ラベル数に応じた回数繰り返す）ことで、それぞれのラベル番号に対応する前景領域についてステップＳ４０７〜Ｓ４１１の処理を行う。 By performing the processing in steps S407 to S411 for each label number in the first binary image (second binary image) (repeating the number of times according to the number of labels), the foreground region corresponding to each label number The processes of steps S407 to S411 are performed.

ステップＳ４０７では、前景領域調整部２０７は、未選択のラベル番号を１つ選択ラベル番号として選択し、第１の二値画像における前景領域群のうち、選択ラベル番号に対応する前景領域の面積Ｓ１（例えば、該前景領域に含まれている画素の数）を特定する。 In step S407, the foreground area adjustment unit 207 selects one unselected label number as the selected label number, and out of the foreground area group in the first binary image, the area S1 of the foreground area corresponding to the selected label number. (For example, the number of pixels included in the foreground area) is specified.

ステップＳ４０８では、前景領域調整部２０７は、第２の二値画像における前景領域群のうち、選択ラベル番号に対応する前景領域の面積Ｓ２（例えば、該前景領域に含まれている画素の数）を特定する。 In step S408, the foreground area adjustment unit 207 selects the area S2 of the foreground area corresponding to the selected label number in the foreground area group in the second binary image (for example, the number of pixels included in the foreground area). Is identified.

ステップＳ４０９では、前景領域調整部２０７は、面積Ｓ１に対する面積Ｓ２の比率（Ｓ２／Ｓ１）を求める。そしてステップＳ４１０では、前景領域調整部２０７は、Ｓ２／Ｓ１＜θという条件を満たすか否かを判断する。ここでθは、予め決められた閾値である。この判断の結果、Ｓ２／Ｓ１＜θであれば、処理はステップＳ４０７に戻る。一方、Ｓ２／Ｓ１≧θであれば、処理はステップＳ４１１に進む。 In step S409, the foreground area adjustment unit 207 obtains the ratio of the area S2 to the area S1 (S2 / S1). In step S410, the foreground area adjustment unit 207 determines whether or not a condition of S2 / S1 <θ is satisfied. Here, θ is a predetermined threshold value. As a result of this determination, if S2 / S1 <θ, the process returns to step S407. On the other hand, if S2 / S1 ≧ θ, the process proceeds to step S411.

ステップＳ４１１では、前景領域調整部２０７は、上記のステップＳ４０３と同様にして、第２の二値画像において選択ラベル番号が付加された前景領域に対して収縮処理を行うことで、該前景領域を調整する。なお、調整量（収縮量）は予め定められていてもよいし、例えばＳ１の値とＳ２の値とに基づいて決定されてもよい。 In step S411, the foreground area adjustment unit 207 performs the contraction process on the foreground area to which the selection label number is added in the second binary image in the same manner as in step S403 described above, thereby reducing the foreground area. adjust. The adjustment amount (shrinkage amount) may be determined in advance, or may be determined based on, for example, the value of S1 and the value of S2.

以上の処理によって、第２の二値画像における前景領域の面積が、第１の二値画像における前景領域の面積に対して一定比率以下となっている。そのため結果的に、第１の二値画像と第２の二値画像とを重ねたときに、第２の二値画像における前景領域の境界が第１の二値画像における前景領域のほぼ内側となることが期待できる。 With the above processing, the area of the foreground region in the second binary image is equal to or less than a certain ratio with respect to the area of the foreground region in the first binary image. Therefore, as a result, when the first binary image and the second binary image are overlaid, the boundary of the foreground region in the second binary image is substantially inside the foreground region in the first binary image. Can be expected.

さらに、前景領域調整部２０７における前景領域の調整方法の他の例としては、前景領域の境界部の長さを比較する方法がある。ステップＳ４０７では、第１の二値画像における前景領域の境界部の長さをＳ１１として求め、ステップＳ４０８では、第２の二値画像における前景領域の境界部の長さをＳ１２として求める。そしてステップＳ４０９以降では、Ｓ１，Ｓ２の代わりにそれぞれＳ１１，Ｓ１２を用いて処理を行う。 Furthermore, as another example of the foreground area adjustment method in the foreground area adjustment unit 207, there is a method of comparing the lengths of the boundary portions of the foreground area. In step S407, the length of the boundary portion of the foreground region in the first binary image is obtained as S11, and in step S408, the length of the boundary portion of the foreground region in the second binary image is obtained as S12. In step S409 and subsequent steps, processing is performed using S11 and S12, respectively, instead of S1 and S2.

前景領域決定部２０８は、第１の二値画像における前景領域（第１領域）と、前景領域調整部２０７による調整済み（収縮済み）の第２の二値画像における前景領域(第２領域)と、に基づいて、撮像画像における前景領域を決定する。前景領域決定部２０８による前景領域の決定処理について、図３（Ｃ）のフローチャートに従って説明する。 The foreground area determination unit 208 includes a foreground area (first area) in the first binary image and a foreground area (second area) in the second binary image that has been adjusted (shrinked) by the foreground area adjustment unit 207. Based on the above, the foreground region in the captured image is determined. The foreground area determination processing by the foreground area determination unit 208 will be described with reference to the flowchart of FIG.

ステップＳ３１３では、前景領域決定部２０８は、第１の二値画像と第２の二値画像とで同位置の画素ごとに画素値のＯＲ演算を行う。このようなＯＲ演算を行うことで得られる二値画像（第３の二値画像）中の画素位置（ｘ、ｙ）における画素値が「１」である場合、撮像画像中の画素位置（ｘ、ｙ）における画素が前景画素と推定された画素であることを示す。一方、第３の二値画像中の画素位置（ｘ、ｙ）における画素値が「０」である場合、撮像画像中の画素位置（ｘ、ｙ）における画素が背景画素と推定された画素であることを示す。すなわち、第３の二値画像において画素値が「１」の領域とは、第１の二値画像における前景領域と第２の二値画像における前景領域との和である。言い換えると、前景領域決定部２０８により決定される前景領域は、前景分離部２０５により特定される前景領域と前景分離部２０６により特定される前景領域との少なくとも何れかに含まれる画素により構成される。なお、第１の二値画像及び第２の二値画像の全体領域についてＯＲ演算を行うことに限らず、例えば、同じラベル番号の前景領域ごとにＯＲ演算を行うようにしても良い。 In step S313, the foreground region determination unit 208 performs an OR operation on the pixel values for each pixel at the same position in the first binary image and the second binary image. When the pixel value at the pixel position (x, y) in the binary image (third binary image) obtained by performing such an OR operation is “1”, the pixel position (x , Y) indicates that the pixel estimated as the foreground pixel. On the other hand, when the pixel value at the pixel position (x, y) in the third binary image is “0”, the pixel at the pixel position (x, y) in the captured image is a pixel estimated as the background pixel. Indicates that there is. That is, the region having a pixel value “1” in the third binary image is the sum of the foreground region in the first binary image and the foreground region in the second binary image. In other words, the foreground area determined by the foreground area determination unit 208 is configured by pixels included in at least one of the foreground area specified by the foreground separation unit 205 and the foreground area specified by the foreground separation unit 206. . Note that the OR operation is not limited to the entire region of the first binary image and the second binary image, and for example, the OR operation may be performed for each foreground region having the same label number.

図５（Ａ）の前景領域と図５（Ｃ）の前景領域とを用いて前景領域決定部２０８により得られる前景領域を図５（Ｄ）に示す。図５（Ｄ）に示す如く、図５（Ａ）の前景領域と比べると、前景領域の欠落が埋まり、前景形状の品質が改善されている。また図５（Ｂ）の前景領域と比べると、図５（Ｂ）の前景領域の境界部が図５（Ａ）の前景領域の内側となるよう調整したために、前景領域の境界部の粗い部分が少なくなっており、前景形状の品質が改善されている。 FIG. 5D shows a foreground area obtained by the foreground area determination unit 208 using the foreground area in FIG. 5A and the foreground area in FIG. As shown in FIG. 5D, compared to the foreground area of FIG. 5A, the missing foreground area is filled and the quality of the foreground shape is improved. Compared with the foreground area in FIG. 5B, the boundary portion of the foreground area in FIG. 5B is adjusted to be inside the foreground area in FIG. And the quality of the foreground shape is improved.

生成部２０９は、データ受信部２０１から入力された撮像画像と、背景バッファ２０４に格納されている背景画像と、前景領域決定部２０８によって生成された第３の二値画像と、を用いて仮想視点画像を生成する。仮想視点画像の生成方法としては、例えばＶｉｓｕａｌＨｕｌｌなどの方法が知られている。例えば生成部２０９は、ユーザにより指定された視点に対応する仮想視点画像の生成に要する背景画像を背景バッファ２０４から読み出し、該読み出した背景画像を２次元平面上に投影することで、該２次元平面上に仮想視点における背景を生成する。また生成部２０９は、データ受信部２０１から入力された撮像画像について前景領域決定部２０８が生成した第３の二値画像中の画素値が「１」の画素群に対応する該撮像画像上の画素群の領域を前景領域として特定する。そして生成部２０９は、データ受信部２０１から入力されたそれぞれの撮像画像から、該撮像画像について特定した前景領域を抽出し、該抽出した前景領域に基づいて前景（例えば人物１０４）の３次元モデルを生成する。そして生成部２０９は、該生成した３次元モデルを、ユーザにより指定された視点に応じて上記の２次元平面上に投影することで、該２次元平面上に仮想視点画像を生成する。ＶｉｓｕａｌＨｕｌｌについてはよく知られた手法なので詳細な説明は省く。 The generation unit 209 uses the captured image input from the data reception unit 201, the background image stored in the background buffer 204, and the third binary image generated by the foreground region determination unit 208. Generate a viewpoint image. As a method for generating a virtual viewpoint image, for example, a method such as VisualHull is known. For example, the generation unit 209 reads the background image necessary for generating the virtual viewpoint image corresponding to the viewpoint designated by the user from the background buffer 204, and projects the read background image on the two-dimensional plane, thereby A background at a virtual viewpoint is generated on a plane. In addition, the generation unit 209 has a pixel value in the third binary image generated by the foreground region determination unit 208 for the captured image input from the data reception unit 201 on the captured image corresponding to the pixel group having the pixel value “1”. The area of the pixel group is specified as the foreground area. Then, the generation unit 209 extracts the foreground area specified for the captured image from each captured image input from the data reception unit 201, and based on the extracted foreground area, a three-dimensional model of the foreground (for example, the person 104). Is generated. Then, the generation unit 209 generates a virtual viewpoint image on the two-dimensional plane by projecting the generated three-dimensional model onto the two-dimensional plane according to the viewpoint designated by the user. Since VisualHull is a well-known technique, a detailed description thereof will be omitted.

そして生成部２０９は、生成した仮想視点画像を出力するのであるが、出力先については特定の出力先に限らない。例えば、画像処理装置１０３に接続されている表示装置に表示しても良いし、記憶装置に記録するようにしても良い。また、ネットワークを介して外部の端末に対して送信しても良い。 The generation unit 209 outputs the generated virtual viewpoint image, but the output destination is not limited to a specific output destination. For example, the image may be displayed on a display device connected to the image processing apparatus 103 or may be recorded in a storage device. Moreover, you may transmit to an external terminal via a network.

このように、本実施形態によれば、特性の異なる複数の前景分離手法による前景分離を行って、その結果に対して調整を行った後合成するという手順で、前景形状の品質を改善することができる。すなわち、入力画像内における前景領域の特定の精度を改善できる。その結果、生成される仮想視点画像の画質を向上させることができる。 As described above, according to the present embodiment, the quality of the foreground shape is improved by performing the foreground separation using a plurality of foreground separation methods having different characteristics and performing the adjustment after adjusting the result. Can do. That is, the accuracy of specifying the foreground area in the input image can be improved. As a result, the image quality of the generated virtual viewpoint image can be improved.

［第２の実施形態］
第１の実施形態では、第２の特徴量として画像のテクスチャ情報を用いたが、本実施形態では画像の輝度情報を用いる。すなわち、本実施形態において或る画素についての第２の特徴量は、当該画素の画素値に応じて決まる。以下では第１の実施形態との差分について重点的に説明し、以下で特に触れない限りは、第１の実施形態と同様であるものとする。 [Second Embodiment]
In the first embodiment, the texture information of the image is used as the second feature amount, but the luminance information of the image is used in the present embodiment. That is, in the present embodiment, the second feature amount for a certain pixel is determined according to the pixel value of the pixel. In the following, differences from the first embodiment will be described mainly, and unless otherwise noted, the same as the first embodiment.

画像処理装置１０３の機能構成例について、図６のブロック図を用いて説明する。図６において図２に示した機能部と同じ機能部には同じ参照番号を付しており、該機能部に係る説明は省略する。本実施形態でも第１の実施形態と同様、画像処理装置１０３における画像処理は、画像処理装置１０３に内蔵されたＡＳＩＣやＦＰＧＡなどのハードウェアにより処理される。然るに本実施形態では、図６に示した各機能部（各モジュール）は、ハードウェアとしてＡＳＩＣやＦＰＧＡの内部に実装されている。 A functional configuration example of the image processing apparatus 103 will be described with reference to the block diagram of FIG. In FIG. 6, the same functional units as those shown in FIG. 2 are denoted by the same reference numerals, and description thereof will be omitted. Also in this embodiment, as in the first embodiment, the image processing in the image processing apparatus 103 is processed by hardware such as an ASIC or FPGA built in the image processing apparatus 103. However, in the present embodiment, each functional unit (each module) illustrated in FIG. 6 is mounted as hardware inside an ASIC or FPGA.

前景分離部６０６は、データ受信部２０１から入力した撮像画像と、該撮像画像における背景領域を表す背景画像として背景生成部２０３が生成した背景画像と、の輝度情報の差分に基づいて、該撮像画像における前景領域を推定する。前景分離部６０６が行う処理について、同処理のフローチャートを示す図７を用いて説明する。図７において図３（Ｂ）と同じ処理ステップには同じステップ番号を付しており、該処理ステップに係る説明は省略する。 The foreground separation unit 606 performs the imaging based on the difference in luminance information between the captured image input from the data reception unit 201 and the background image generated by the background generation unit 203 as a background image representing the background area in the captured image. Estimate the foreground region in the image. Processing performed by the foreground separation unit 606 will be described with reference to FIG. 7 showing a flowchart of the processing. In FIG. 7, the same processing steps as those in FIG. 3B are denoted by the same step numbers, and description of the processing steps is omitted.

ステップＳ７０７では、前景分離部６０６は、撮像画像（入力画像）と背景画像との輝度情報に基づく差分画像を生成する。ステップＳ７０７で生成される差分画像は、差分画像中の画素位置（ｘ、ｙ）における画素値が、撮像画像中の画素位置（ｘ、ｙ）における輝度値と、背景画像中の画素位置（ｘ、ｙ）における輝度値と、の差分を表す画像である。 In step S707, the foreground separation unit 606 generates a difference image based on luminance information between the captured image (input image) and the background image. In the difference image generated in step S707, the pixel value at the pixel position (x, y) in the difference image has the luminance value at the pixel position (x, y) in the captured image and the pixel position (x in the background image). , Y) is an image representing a difference between the luminance value and y).

ステップＳ７０９では、前景分離部６０６は、ステップＳ３０８で平滑化処理を行った差分画像における各画素について、該画素の画素値が閾値よりも大きいか否かを判断する。そしてこの判断の結果、前景分離部６０６は、差分画像において画素値が閾値よりも大きい画素については、ステップＳ７１０において該画素に対し、「前景領域を構成する画素（前景画素）」を表すビット値「１」を割り当てる。一方、前景分離部６０６は、差分画像において画素値が閾値以下の画素については、ステップＳ７１１において該画素に対し、「背景領域を構成する画素（背景画素）」を表すビット値「０」を割り当てる。このような処理により、差分画像の各画素に対応するビット値を画素値とする二値画像を生成することができる。 In step S709, the foreground separation unit 606 determines whether the pixel value of the pixel in the difference image subjected to the smoothing process in step S308 is greater than a threshold value. As a result of this determination, the foreground separation unit 606, for a pixel whose pixel value is larger than the threshold value in the difference image, for the pixel in step S710, represents a bit value representing “a pixel constituting the foreground region (foreground pixel)”. Assign “1”. On the other hand, the foreground separation unit 606 assigns a bit value “0” representing “a pixel constituting the background region (background pixel)” to the pixel having a pixel value equal to or smaller than the threshold in the difference image in step S711. . By such processing, a binary image having a bit value corresponding to each pixel of the difference image as a pixel value can be generated.

本実施形態によれば、第２の二値画像における前景領域の面積が、第１の二値画像における前景領域の面積に対して一定比率以下となっている。そのため結果的に、第１の二値画像と第２の二値画像とを重ねたときに、第２の二値画像における前景領域の境界がほぼ内側となることが期待できる。 According to this embodiment, the area of the foreground region in the second binary image is equal to or less than a certain ratio with respect to the area of the foreground region in the first binary image. Therefore, as a result, when the first binary image and the second binary image are overlapped, it can be expected that the boundary of the foreground region in the second binary image is almost inside.

＜変形例＞
上記の各実施形態では、第１の二値画像における前景領域のサイズに鑑みて第２の二値画像における前景領域の収縮を行ったが、状況に応じて、第２の二値画像における前景領域のサイズに鑑みて第１の二値画像における前景領域の収縮又は拡張を行うようにしても良い。 <Modification>
In each of the above embodiments, the foreground area in the second binary image is contracted in view of the size of the foreground area in the first binary image. However, depending on the situation, the foreground in the second binary image is reduced. In view of the size of the area, the foreground area in the first binary image may be contracted or expanded.

また、第１及び第２の実施形態では、仮想視点画像を生成するべく前景分離を行っているが、第１及び第２の実施形態で説明した前景分離は、他の用途における画像処理において実行しても良い。 In the first and second embodiments, foreground separation is performed to generate a virtual viewpoint image, but the foreground separation described in the first and second embodiments is performed in image processing in other applications. You may do it.

また、上記の各実施形態では、前景分離処理と仮想視点画像の生成処理とを同一の画像処理装置１０３が実行する場合について説明したが、これに限らない。例えば、システム内には複数のカメラ１０２に対応する複数の画像処理装置１０３が存在し、各画像処理装置は対応するカメラ１０２の撮像画像に対する前景分離処理を行ってもよい。そして、複数の画像処理装置１０３のそれぞれが、自装置による前景分離処理の結果を、仮想視点画像の生成処理を行う生成装置に出力してもよい。 In each of the above embodiments, the case where the same image processing apparatus 103 executes the foreground separation process and the virtual viewpoint image generation process has been described, but the present invention is not limited thereto. For example, there may be a plurality of image processing apparatuses 103 corresponding to the plurality of cameras 102 in the system, and each image processing apparatus may perform foreground separation processing on the captured image of the corresponding camera 102. Then, each of the plurality of image processing apparatuses 103 may output the result of the foreground separation process performed by the own apparatus to a generation apparatus that performs a virtual viewpoint image generation process.

また、本変形例を含む上記の実施形態の一部若しくは全部を適宜組み合わせても構わないし、本変形例を含む上記の実施形態の一部若しくは全部を選択的に使用しても構わない。 Moreover, you may combine suitably a part or all of said embodiment containing this modification, and you may selectively use a part or all of said embodiment containing this modification.

［その他の実施形態］
第１及び第２の実施形態では、図２及び図６に示した各機能部は何れもハードウェアで構成されているものとして説明した。しかし、記録部２０２、背景バッファ２０４、データ受信部２０１以外の機能部はソフトウェア（コンピュータプログラム）で実装しても良い。そのような場合、記録部２０２及び背景バッファ２０４として機能するメモリを有し、データ受信部２０１として機能するインターフェース（Ｉ／Ｆ）を有し、上記ソフトウェアを実行可能なコンピュータ装置を画像処理装置１０３に適用することができる。 [Other Embodiments]
In the first and second embodiments, the functional units illustrated in FIGS. 2 and 6 have been described as being configured by hardware. However, functional units other than the recording unit 202, the background buffer 204, and the data receiving unit 201 may be implemented by software (computer program). In such a case, a computer device having a memory functioning as the recording unit 202 and the background buffer 204, an interface (I / F) functioning as the data receiving unit 201, and capable of executing the software is an image processing device 103. Can be applied to.

画像処理装置１０３に適用可能なコンピュータ装置のハードウェア構成例について、図８のブロック図を用いて説明する。なお、図８に示したハードウェア構成は、画像処理装置１０３に適用可能なコンピュータ装置のハードウェア構成の一例に過ぎない。 A hardware configuration example of a computer apparatus applicable to the image processing apparatus 103 will be described with reference to the block diagram of FIG. Note that the hardware configuration shown in FIG. 8 is merely an example of the hardware configuration of a computer device applicable to the image processing apparatus 103.

ＣＰＵ８０１は、ＲＡＭ８０２やＲＯＭ８０３に格納されているコンピュータプログラムやデータを用いて処理を実行する。これによりＣＰＵ８０１は、コンピュータ装置全体の動作制御を行うと共に、画像処理装置１０３が行うものとして上述した各処理を実行若しくは制御する。 The CPU 801 executes processing using computer programs and data stored in the RAM 802 and the ROM 803. As a result, the CPU 801 controls the operation of the entire computer apparatus and executes or controls each process described above as being performed by the image processing apparatus 103.

ＲＡＭ８０２は、ＲＯＭ８０３や外部記憶装置８０６からロードされたコンピュータプログラムやデータ、Ｉ／Ｆ８０７を介して外部から受信したデータ（例えば撮像画像群）、を格納するためのエリアを有する。更にＲＡＭ８０２は、ＣＰＵ８０１が各種の処理を実行する際に用いるワークエリアを有する。このようにＲＡＭ８０２は、各種のエリアを適宜提供することができる。ＲＯＭ８０３には、書換不要のコンピュータプログラムやデータが格納されている。 The RAM 802 has an area for storing computer programs and data loaded from the ROM 803 and the external storage device 806 and data received from the outside via the I / F 807 (for example, a captured image group). Further, the RAM 802 has a work area used when the CPU 801 executes various processes. As described above, the RAM 802 can appropriately provide various areas. The ROM 803 stores computer programs and data that do not require rewriting.

操作部８０４は、キーボードやマウスなどのユーザインターフェースにより構成されており、ユーザが操作することで各種の指示をＣＰＵ８０１に対して入力することができる。例えばユーザは操作部８０４を操作することで、仮想視点の位置を指定することができる。 The operation unit 804 is configured by a user interface such as a keyboard and a mouse, and can input various instructions to the CPU 801 by a user operation. For example, the user can designate the position of the virtual viewpoint by operating the operation unit 804.

表示装置８０５は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ８０１による処理結果を画像や文字などでもって表示することができる。例えば、表示装置８０５には、仮想視点の位置を指定するためのＧＵＩ（グラフィカルユーザインターフェース）や、生成部２０９が生成した仮想視点画像を表示することができる。なお、操作部８０４と表示装置８０５とを一体化させてタッチパネル画面を構成しても良い。 The display device 805 is configured by a CRT, a liquid crystal screen, or the like, and can display a processing result by the CPU 801 using an image, text, or the like. For example, the display device 805 can display a GUI (graphical user interface) for designating the position of the virtual viewpoint or the virtual viewpoint image generated by the generation unit 209. Note that the operation unit 804 and the display device 805 may be integrated to form a touch panel screen.

外部記憶装置８０６は、ハードディスクドライブ装置に代表される大容量情報記憶装置である。外部記憶装置８０６には、ＯＳ（オペレーティングシステム）や、図２及び図６において記録部２０２、背景バッファ２０４、データ受信部２０１を除く各機能部の機能をＣＰＵ８０１に実現させるためのコンピュータプログラムやデータが保存されている。外部記憶装置８０６に保存されているデータには、上記の説明において既知の情報として取り扱ったもの、例えば、上記の閾値や、ＲＧＢをＬａｂやＹＵＶに変換するためのルックアップテーブル、等のデータが含まれている。外部記憶装置８０６に保存されているコンピュータプログラムやデータは、ＣＰＵ８０１による制御に従って適宜ＲＡＭ８０２にロードされ、ＣＰＵ８０１による処理対象となる。なお、上記の記録部２０２や背景バッファ２０４は、ＲＡＭ８０２や外部記憶装置８０６を用いて実装することができる。 The external storage device 806 is a mass information storage device represented by a hard disk drive device. The external storage device 806 includes an OS (operating system) and computer programs and data for causing the CPU 801 to implement the functions of the functional units other than the recording unit 202, the background buffer 204, and the data receiving unit 201 in FIGS. Is saved. The data stored in the external storage device 806 includes data handled as known information in the above description, such as the above-described threshold value, a lookup table for converting RGB into Lab or YUV, and the like. include. Computer programs and data stored in the external storage device 806 are appropriately loaded into the RAM 802 under the control of the CPU 801 and are processed by the CPU 801. Note that the recording unit 202 and the background buffer 204 described above can be mounted using a RAM 802 or an external storage device 806.

Ｉ／Ｆ８０７は、上記のデータ受信部２０１として機能するものであり、それぞれのカメラ１０２による撮像画像を受信するためのインターフェースとして機能するものである。また、それぞれのカメラ１０２に対する制御信号は、Ｉ／Ｆ８０７を介してコンピュータ装置からそれぞれのカメラ１０２に対して送信される。上記のＣＰＵ８０１、ＲＡＭ８０２、ＲＯＭ８０３、操作部８０４、表示装置８０５、外部記憶装置８０６、Ｉ／Ｆ８０７は何れもバス８０８に接続されている。 The I / F 807 functions as the data receiving unit 201 described above, and functions as an interface for receiving images captured by the respective cameras 102. In addition, a control signal for each camera 102 is transmitted from the computer device to each camera 102 via the I / F 807. The CPU 801, RAM 802, ROM 803, operation unit 804, display device 805, external storage device 806, and I / F 807 are all connected to the bus 808.

なお本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 In the present invention, a program that realizes one or more functions of the above-described embodiments is supplied to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read the program. It can also be realized by processing to be executed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２０１：データ受信部２０２：記録部２０３：背景生成部２０４：背景バッファ２０５：前景分離部２０６：前景分離部２０７：前景領域調整部２０８：前景領域決定部２０９：生成部 201: Data reception unit 202: Recording unit 203: Background generation unit 204: Background buffer 205: Foreground separation unit 206: Foreground separation unit 207: Foreground region adjustment unit 208: Foreground region determination unit 209: Generation unit

Claims

First specifying means for specifying a first region in the input image using a difference based on a first feature amount between the input image and the background image;
Using a difference based on a second feature amount between the input image and the background image, second specifying means for specifying a second region in the input image;
Determination means for determining a foreground area corresponding to a predetermined subject in the input image based on the first area specified by the first specifying means and the second area specified by the second specifying means. An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the first specifying unit specifies, as the first region, a region where a difference based on color information between the input image and the background image exceeds a threshold value.

The image processing apparatus according to claim 1, wherein the second specifying unit specifies, as the second region, a region where a difference based on texture information between the input image and the background image exceeds a threshold value.

The image processing apparatus according to claim 1, wherein the second specifying unit specifies, as the second region, a region in which a difference based on luminance information between the input image and the background image exceeds a threshold value.

The determining means is at least one of the first area specified by the first specifying means and the contracted second area obtained by contracting the second area specified by the second specifying means. 5. The image processing apparatus according to claim 1, wherein a region included in the input image is determined as a foreground region in the input image.

The second region that has been contracted is specified by the second specifying unit and the area of the first region specified by the first specifying unit and the second specifying unit specified by the second specifying unit. The image processing apparatus according to claim 5, wherein the image processing device is a region contracted based on a ratio with the area of the second region.

The contracted second area is the second area specified by the second specifying means, the length of the boundary portion of the first area specified by the first specifying means, and the second specifying. The image processing apparatus according to claim 5, wherein the image processing device is a region contracted based on a ratio with a length of a boundary portion of the second region specified by the means.

The image processing apparatus according to claim 1, further comprising an acquisition unit configured to acquire a captured image obtained by each of a plurality of imaging apparatuses connected in a ring shape as the input image.

The image processing apparatus according to claim 1, further comprising a generation unit configured to generate a virtual viewpoint image based on the foreground region determined by the determination unit.

A plurality of imaging devices connected in a ring shape;
A system comprising: the image processing apparatus according to claim 8.

An image processing method performed by an image processing apparatus,
A first specifying step in which a first specifying unit of the image processing device specifies a first region in the input image using a difference based on a first feature amount between the input image and a background image;
A second specifying step in which a second specifying unit of the image processing apparatus specifies a second region in the input image using a difference based on a second feature amount between the input image and the background image; When,
A determination unit of the image processing device applies a predetermined subject in the input image based on the first region specified in the first specifying step and the second region specified in the second specifying step. And a determining step for determining a corresponding foreground region.

The computer program for functioning a computer as each means of the image processing apparatus of any one of Claims 1 thru | or 9.