JP2014039126A

JP2014039126A - Image processing device, image processing method, and program

Info

Publication number: JP2014039126A
Application number: JP2012179854A
Authority: JP
Inventors: Keiichi Sawada; 圭一澤田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-08-14
Filing date: 2012-08-14
Publication date: 2014-02-27

Abstract

PROBLEM TO BE SOLVED: To estimate appropriate texture data even under a light source not sufficiently including a high-frequency component.SOLUTION: When light source information includes a high-frequency component, it is determined to estimate texture data of a subject from image data captured using a flash. Then, texture data indicating reflectance distribution of the subject is estimated in accordance with the determination. The texture data is estimated using image data acquired by image data acquisition means, shape data acquired by shape data acquisition means, and light source data acquired by light source data acquisition means.

Description

本発明は、質感データを推定する画像処理装置、画像処理方法、およびプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program for estimating texture data.

撮像画像に対して撮像者の意図に沿った効果を与えるために、画像加工ソフトウェアなどによる加工処理を行う場合がある。例えば、光源の方向や輝度を変えたり、ＣＧ被写体をシーン中に合成するといった加工処理である。ここで、被写体の形状や質感、光源の情報が未知の場合、上記のような加工処理によって自然な加工画像を作成するためには、熟練者が時間をかけて加工を行う必要がある。しかし、形状、質感、光源の情報が既知の場合には、ＣＧレンダリング技術によって加工を自動化することが可能である（特許文献１）。 In order to give an effect according to the photographer's intention to the captured image, there is a case where processing by image processing software or the like is performed. For example, processing such as changing the direction and brightness of the light source or synthesizing a CG subject in the scene. Here, in the case where information on the shape, texture, and light source of the subject is unknown, in order to create a natural processed image by the above processing, it is necessary for an expert to perform processing over time. However, if the information on the shape, texture, and light source is known, the processing can be automated by the CG rendering technique (Patent Document 1).

上記加工処理の自動化に必要な形状、質感、光源の情報を推定する技術が数多く提案されている。例えば、特許文献２では、多視点から撮像した画像と整合性がとれるように被写体表面にパッチを生成していくことで、被写体形状を推定する技術が開示されている。また、非特許文献１では、多視点画像の画素値とレンダリング方程式から算出される反射輝度との整合性がとれるように最適化計算を行うことで、形状、質感、光源を同時に推定する技術が開示されている。 Many techniques for estimating information on the shape, texture, and light source necessary for the automation of the processing have been proposed. For example, Patent Document 2 discloses a technique for estimating a subject shape by generating a patch on a subject surface so as to be consistent with an image captured from multiple viewpoints. Non-Patent Document 1 discloses a technique for simultaneously estimating the shape, texture, and light source by performing optimization calculation so that the pixel values of the multi-viewpoint image and the reflection luminance calculated from the rendering equation are consistent. It is disclosed.

特許第４０７８９２６号明細書Japanese Patent No. 4078926 米国特許出願公開第２００９／００５２７９６号明細書US Patent Application Publication No. 2009/0052796

"Incorporating the Torrance and Sparrow Model of Reflectance in Uncalibrated Photometric Stereo", A. Georghiades, ICCV, pp.816-823 (2003)"Incorporating the Torrance and Sparrow Model of Reflectance in Uncalibrated Photometric Stereo", A. Georghiades, ICCV, pp.816-823 (2003) "signalprocessing framework for inverse rendering", Ravi Ramamoorthi, et al, Proc. ACM SIGGRAPH 2001, pp.117-128(2001)"signalprocessing framework for inverse rendering", Ravi Ramamoorthi, et al, Proc. ACM SIGGRAPH 2001, pp.117-128 (2001)

しかし、高周波成分を十分に含まない光源下（例えば、くもり空や物陰のように、局所的に輝度が高い光源がない環境）では、被写体の質感を推定できないことが知られている（非特許文献２）。もし、間違った質感を元に上記の自動加工処理を行うと、被写体の実際の質感を反映していない不自然な加工画像が生成されてしまう。 However, it is known that the texture of a subject cannot be estimated under a light source that does not sufficiently contain high-frequency components (for example, an environment where there is no locally high light source such as a cloudy sky or a shadow) (non-patent document). Reference 2). If the above automatic processing is performed based on the wrong texture, an unnatural processed image that does not reflect the actual texture of the subject is generated.

そこで本発明では、高周波成分を十分に含まない光源下においても、適切な質感データを推定することを目的とする。 Therefore, an object of the present invention is to estimate appropriate texture data even under a light source that does not sufficiently contain high-frequency components.

本発明にかかる画像処理装置は、フラッシュを用いて撮像した画像データから被写体の質感データを推定するかを光源情報に基づいて判定する判定手段と、前記判定手段による判定に応じて前記被写体の質感データを推定する質感データ推定手段とを備えることを特徴とする。 An image processing apparatus according to the present invention includes: a determination unit that determines based on light source information whether to estimate subject texture data from image data captured using a flash; and the subject texture according to the determination by the determination unit It is characterized by comprising texture data estimation means for estimating data.

本発明は、高周波成分を十分に含まない光源下でも適切な質感データを得ることができる。 The present invention can obtain appropriate texture data even under a light source that does not sufficiently contain high-frequency components.

実施例１における画像処理装置の各処理部の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of each processing unit of the image processing apparatus according to the first embodiment. 実施例１における画像処理装置の動作の一例を示すフローチャートである。3 is a flowchart illustrating an example of an operation of the image processing apparatus according to the first exemplary embodiment. 実施例１における極角と方位角で表現できる「方向」の一例を示す図である。FIG. 3 is a diagram illustrating an example of a “direction” that can be expressed by a polar angle and an azimuth angle in the first embodiment. 実施例１における撮像部の一例を模式的に示す図である。FIG. 3 is a diagram schematically illustrating an example of an imaging unit according to the first embodiment. 実施例１におけるシーン要素データ取得部の各処理部の一例を示すブロック図である。6 is a block diagram illustrating an example of each processing unit of a scene element data acquisition unit in Embodiment 1. FIG. 実施例１における入射極角と入射輝度との関係の一例を示す図である。It is a figure which shows an example of the relationship between the incident polar angle in Example 1, and incident luminance. 実施例１における周波数次数と入射輝度周波数係数との関係の一例を示す図である。It is a figure which shows an example of the relationship between the frequency order in Example 1, and an incident luminance frequency coefficient. 実施例１におけるフラッシュ要否判定部の各処理部の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of each processing unit of a flash necessity determination unit according to the first embodiment. 実施例１におけるフラッシュ要否判定部１１２の動作の一例を示すフローチャートである。6 is a flowchart illustrating an example of an operation of a flash necessity determination unit 112 according to the first embodiment. 実施例１における周波数次数と周波数次数の閾値との関係の一例を示す図である。It is a figure which shows an example of the relationship between the frequency order in Example 1, and the threshold value of a frequency order. 実施例１における撮像モードと光源周波数との関係の一例を示す図である。It is a figure which shows an example of the relationship between the imaging mode in Example 1, and a light source frequency. 実施例２における画像処理装置の動作の一例を示すフローチャートである。10 is a flowchart illustrating an example of an operation of the image processing apparatus according to the second exemplary embodiment. 実施例２におけるシーン要素データ取得部の各処理部の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of each processing unit of a scene element data acquisition unit according to a second embodiment. 実施例２における質感・光源データ取得部の動作の一例を示すフローチャートである。10 is a flowchart illustrating an example of an operation of a texture / light source data acquisition unit according to the second embodiment. 実施例２における差分画像を用いた質感・光源データ推定処理の流れの一例を示すフローチャートである。12 is a flowchart illustrating an example of a flow of texture / light source data estimation processing using a difference image in the second embodiment.

以下、図面を参照して各実施例について説明する。 Embodiments will be described below with reference to the drawings.

＜装置構成＞
図１は、実施例１における画像処理装置の一例を示すブロック図である。撮像部１０１、１０２は、ズームレンズ、フォーカスレンズ、ぶれ補正レンズ、絞り、シャッター、光学ローパスフィルタ、ｉＲカットフィルタ、カラーフィルタ、ＣＭＯＳやＣＣＤなどのセンサ、及びフラッシュなどから構成され、被写体の光量を検知する。ここで、図１では撮像部が２つの例を示しているが、撮像部が２つの場合に限定されず、１つ以上の任意の数の撮像部を持つ画像処理装置に適用可能である。Ａ／Ｄ変換部１０３は、被写体の光量をデジタル値に変換する。信号処理部１０４は、上記デジタル値にホワイトバランス処理、ガンマ処理、ノイズ低減処理などを行い、デジタル画像データを生成する。Ｄ／Ａ変換部１０５は、上記デジタル画像データに対しアナログ変換を行う。シーン要素データ取得部１０６は、形状、光源、および質感などのシーン要素データの取得、並びに推定を行う。エンコーダ部１０７は、上記デジタル画像をＪｐｅｇやＭｐｅｇなどのファイルフォーマットに変換する処理を行う。メディアインターフェース１０８は、ＰＣその他メディア（例えば、ハードディスク、メモリーカード、ＣＦカード、ＳＤカード、ＵＳＢメモリ）につなぐためのインターフェースである。また、メディアインターフェース１０８はインターネットなど通信網へ接続されており必要に応じてデータの送受信を行う。ＣＰＵ１０９は、各構成の処理全てに関わり、ＲＯＭ１１０やＲＡＭ１１１に格納された命令を順に読み込み、解釈し、その結果に従って処理を実行する。また、ＲＯＭ１１０とＲＡＭ１１１は、その処理に必要なプログラム、データ、作業領域などをＣＰＵ１０９に提供する。フラッシュ要否判定部１１２は、シーン要素データ取得部１０６において上記質感データを推定するために、撮像時にフラッシュを焚く必要があるかどうかを判定する。撮像系制御部１１３は、フォーカスを合わせる、シャッターを開く、絞りを調節する、フラッシュを焚くなどの、ＣＰＵ１０９から指示された撮像系の制御を行う。操作部１１４は、ボタンやモードダイヤルなどが該当し、これらを介して入力されたユーザ指示を受け取る。キャラクタージェネレーション１１５は、文字やグラフィックなどを生成する。表示部１１６は、一般的には液晶ディスプレイが広く用いられており、Ｄ／Ａ変換部１０５やキャラクタージェネレーション１１５から受け取った撮像画像や文字の表示を行う。また、タッチスクリーン機能を有していても良く、その場合は、ユーザ指示を操作部１１４の入力として扱うことも可能である。なお、装置の構成要素は上記以外にも存在するが、本実施例の主眼ではないので、説明を省略する。 <Device configuration>
FIG. 1 is a block diagram illustrating an example of an image processing apparatus according to the first embodiment. The imaging units 101 and 102 include a zoom lens, a focus lens, a shake correction lens, an aperture, a shutter, an optical low-pass filter, an iR cut filter, a color filter, a sensor such as a CMOS or CCD, a flash, and the like. Detect. Here, FIG. 1 shows an example in which there are two imaging units, but the present invention is not limited to the case where there are two imaging units, and can be applied to an image processing apparatus having one or more arbitrary number of imaging units. The A / D conversion unit 103 converts the light amount of the subject into a digital value. The signal processing unit 104 performs white balance processing, gamma processing, noise reduction processing, and the like on the digital value to generate digital image data. The D / A conversion unit 105 performs analog conversion on the digital image data. The scene element data acquisition unit 106 acquires and estimates scene element data such as shape, light source, and texture. The encoder unit 107 performs processing for converting the digital image into a file format such as Jpeg or Mpeg. The media interface 108 is an interface for connecting to a PC or other media (for example, hard disk, memory card, CF card, SD card, USB memory). The media interface 108 is connected to a communication network such as the Internet, and transmits and receives data as necessary. The CPU 109 is involved in all the processes of each configuration, reads the instructions stored in the ROM 110 and the RAM 111 in order, interprets them, and executes the processes according to the results. The ROM 110 and the RAM 111 provide the CPU 109 with programs, data, work areas, and the like necessary for the processing. The flash necessity determination unit 112 determines whether it is necessary to throw a flash at the time of imaging in order for the scene element data acquisition unit 106 to estimate the texture data. The imaging system control unit 113 performs control of the imaging system instructed by the CPU 109 such as focusing, opening a shutter, adjusting an aperture, and blowing a flash. The operation unit 114 corresponds to a button, a mode dial, or the like, and receives a user instruction input via these buttons. The character generation 115 generates characters and graphics. Generally, a liquid crystal display is widely used as the display unit 116, and displays a captured image and characters received from the D / A conversion unit 105 and the character generation 115. Further, it may have a touch screen function, and in that case, a user instruction can be handled as an input of the operation unit 114. Although there are other components of the apparatus than the above, the description thereof is omitted because it is not the main point of the present embodiment.

＜実施例１のフローチャート＞
図２は、図１に示す画像処理装置の実施例１における動作を示すフローチャートである。図２に示すフローチャートは、例えばＣＰＵ１０９がＲＯＭ１１０やＲＡＭ１１１に格納された命令を読み込み、解釈することによって実行される。以降では、図２を参照して画像処理装置の実施例１における動作を説明する。 <Flowchart of Example 1>
FIG. 2 is a flowchart showing an operation in the first embodiment of the image processing apparatus shown in FIG. The flowchart shown in FIG. 2 is executed when the CPU 109 reads and interprets a command stored in the ROM 110 or the RAM 111, for example. Hereinafter, the operation of the image processing apparatus according to the first embodiment will be described with reference to FIG.

ステップＳ２０１は光源情報取得処理であり、シーン要素データ取得部１０６が撮像シーンの光源情報を取得する。光源情報は、後述するようにフラッシュを焚く必要があるか否かを判定するために用いられる情報である。光源情報には光源データが含まれる。光源データの一例としては、光源の位置、方向、輝度などの照明情報に関するデータが挙げられる。以下では、図３に示すような極角と方位角で表現できる「方向」と、各方向から被写体に入射する「入射輝度」の組み合わせにより光源データが表されるものとして説明を行う。なお、光源データの表し方には様々な方法があり、図３で示す例に限られないことはもちろんである。実施例１においては、光源データの取得のために任意の方法を用いてよい。光源データの取得方法は本実施例の主眼ではないため、詳細な説明は省略するが、鏡面球を撮像した画像データ、もしくは魚眼レンズを用いてシーンの周囲を撮像した画像データから光源データを取得する方法が知られている。 Step S201 is light source information acquisition processing, in which the scene element data acquisition unit 106 acquires light source information of an imaging scene. The light source information is information used to determine whether or not it is necessary to throw a flash as will be described later. The light source information includes light source data. As an example of the light source data, there is data relating to illumination information such as the position, direction, and luminance of the light source. In the following description, it is assumed that the light source data is represented by a combination of a “direction” that can be expressed by a polar angle and an azimuth angle as shown in FIG. 3 and an “incident luminance” that enters the subject from each direction. It should be noted that there are various methods for expressing the light source data, and it is needless to say that the method is not limited to the example shown in FIG. In the first embodiment, any method may be used for obtaining light source data. Since the light source data acquisition method is not the main focus of this embodiment, detailed description is omitted, but the light source data is acquired from image data obtained by imaging a specular sphere or image data obtained by imaging the surroundings of a scene using a fisheye lens. The method is known.

ステップＳ２０２で、フラッシュ要否判定部１１２が、ステップＳ２０１で取得した光源情報に基づいて、撮像時にフラッシュを焚く必要があるかどうかを判定する。フラッシュ要否判定部１１２の動作の詳細は後述する。 In step S202, the flash necessity determination unit 112 determines whether or not it is necessary to throw a flash during imaging based on the light source information acquired in step S201. Details of the operation of the flash necessity determination unit 112 will be described later.

ステップＳ２０３で、撮像部１０１、１０２がフラッシュ要否判定部１１２の判定結果に応じて、フラッシュ有り撮像、もしくはフラッシュ無し撮像を行う。ここで、推定する質感データの精度を高めるために撮像部１０１、１０２は多視点から被写体を撮像することが望ましい。このような撮像部として、図４（ａ）のような多眼カメラや、図４（ｂ）のような複数台カメラ、あるいは図４（ｃ）のような撮像位置を変えながら撮像ができるビデオカメラなどがある。図４（ｃ）の場合は、ビデオカメラで撮像した動画の複数フレームを抽出した画像が多視点画像データとなる。なお、１つの視点からの撮像画像でも、大まかな質感は取得できるため、本実施例は多視点撮像のできる撮像部に限定されない。 In step S203, the imaging units 101 and 102 perform imaging with flash or imaging without flash according to the determination result of the flash necessity determination unit 112. Here, in order to improve the accuracy of the estimated texture data, it is desirable that the imaging units 101 and 102 capture the subject from multiple viewpoints. As such an image pickup unit, a multi-lens camera as shown in FIG. 4A, a plurality of cameras as shown in FIG. 4B, or a video that can be picked up while changing the image pickup position as shown in FIG. 4C. There are cameras. In the case of FIG. 4C, an image obtained by extracting a plurality of frames of a moving image captured by a video camera is multi-viewpoint image data. In addition, since a rough texture can be acquired even in a captured image from one viewpoint, the present embodiment is not limited to an imaging unit that can perform multi-viewpoint imaging.

ステップＳ２０４で、シーン要素データ取得部１０６が被写体の形状データを取得する。形状データは、外部データの入力を用いても良いし、例えば例えば多視点から撮像した画像と整合性がとれるように被写体表面にパッチを生成していくことで、形状データを算出して取得してもよい。 In step S204, the scene element data acquisition unit 106 acquires subject shape data. For shape data, external data input may be used.For example, shape data is calculated and acquired by generating patches on the subject surface so as to be consistent with images captured from multiple viewpoints, for example. May be.

ステップＳ２０５は質感データ推定処理であり、シーン要素データ取得部１０６が被写体の質感データを推定する。ここで、質感データとは、被写体の反射率分布を指す。反射率分布を表すモデルとして例えば、以下の式で表されるＢＲＤＦ(Bidirectional Reflectance Distribution Function)がある。 Step S205 is a texture data estimation process, in which the scene element data acquisition unit 106 estimates the texture data of the subject. Here, the texture data refers to the reflectance distribution of the subject. As a model representing the reflectance distribution, for example, there is a BRDF (Bidirectional Reflectance Distribution Function) represented by the following equation.

ここで、θ_iは入射極角、φ_iは入射方位角、θ_rは反射極角、φ_rは反射方位角、Lは入射輝度、Rは反射輝度であり、それぞれの角度は形状データに含まれる「法線方向」とのなす角度である。なお、以下では、ＢＲＤＦモデルにより質感データが表されるものとして説明を行うが、本実施例はこれに限定されない。ＢＳＳＲＤＦ(Bidirectional Scattering Surface Reflectance Distribution Function)など、任意の反射モデルに適用することが可能である。 Here, θ _i is the incident polar angle, φ _i is the incident azimuth angle, θ _r is the reflected polar angle, φ _r is the reflected azimuth angle, L is the incident luminance, and R is the reflected luminance. It is an angle formed by the “normal direction” included. In the following description, it is assumed that the texture data is represented by the BRDF model, but the present embodiment is not limited to this. It can be applied to any reflection model such as BSSRDF (Bidirectional Scattering Surface Reflectance Distribution Function).

＜シーン要素データ取得部１０６の詳細＞
図５は、シーン要素データ取得部１０６の各処理部を示すブロック図である。図４のステップＳ２０５の質感データの推定処理について図５を参照して説明する。実施例１では、撮像画像データ取得部５０１で取得した撮像画像データ、形状データ取得部５０２で取得した形状データ、光源データ取得部５０３を通じて取得した光源データを用いて、質感データ取得部５０４が被写体の質感データを推定する。以下では、質感データの推定処理の詳細について説明する。 <Details of Scene Element Data Acquisition Unit 106>
FIG. 5 is a block diagram illustrating each processing unit of the scene element data acquisition unit 106. The texture data estimation process in step S205 of FIG. 4 will be described with reference to FIG. In the first embodiment, the texture data acquisition unit 504 uses the captured image data acquired by the captured image data acquisition unit 501, the shape data acquired by the shape data acquisition unit 502, and the light source data acquired through the light source data acquisition unit 503. Estimate the texture data. Details of the texture data estimation process will be described below.

まず、形状、質感、光源と撮像画像の画素値の関係について説明する。反射角（θ_r、φ_r）の反射光を受光する画素の画素値Iは、以下の式から算出される。 First, the relationship between the shape, texture, light source, and pixel value of the captured image will be described. The pixel value I of the pixel that receives the reflected light with the reflection angle (θ _r , φ _r ) is calculated from the following equation.

ここで、αは輝度から画素値への変換係数である。この変換係数αは測定などによって取得可能である。具体的には、輝度計で計測済みの光源を撮像し、その撮像画像の画素値と輝度計により測定した輝度の関係からαを算出すればよい。 Here, α is a conversion coefficient from luminance to pixel value. This conversion coefficient α can be obtained by measurement or the like. Specifically, a light source that has been measured with a luminance meter is imaged, and α may be calculated from the relationship between the pixel value of the captured image and the luminance measured with the luminance meter.

次に、質感推定の方法について説明する。本実施例では、光源データはメディアインターフェース１０８を通じて光源データ取得部５０３が取得しているので、式2、3の中で、入射輝度L（光源データ）は、全ての入射極角θ_iと入射方位角φ_iについて既知である。また、画素値Iは撮像部１０１、１０２の画素数の分だけ既知である。これら入射輝度Lと画素値Iとの整合性がとれるようなＢＲＤＦを最適化計算により算出するというのが質感データの基本的な推定方法である。しかし、入射極角θ_i、入射方位角φ_i、反射極角θ_r、反射方位角φ_rの全組み合わせでＢＲＤＦを算出することは現実的に困難である。そこで、非特許文献1記載の技術のように、ＢＲＤＦがパラメトリックモデルで表現できると仮定し、そのモデルのパラメータを求めるという方法で質感を推定するのが一般的である。なお、本実施例は質感推定の方法には依存しないため、上記以外の任意の質感推定方法を用いてよい。また、最適化計算の初期値は任意であるが、実際の質感データに近い初期値を設定するのが望ましい。 Next, a texture estimation method will be described. In this embodiment, since the light source data is acquired by the light source data acquisition unit 503 through the media interface 108, the incident luminance L (light source data) in the expressions 2 and 3 is the incident polar angle θ _i and the incident angle. The azimuth angle φ _i is known. Further, the pixel value I is known by the number of pixels of the imaging units 101 and 102. The basic method for estimating the texture data is to calculate BRDF that optimizes the matching between the incident luminance L and the pixel value I by optimization calculation. However, it is practically difficult to calculate BRDF with all combinations of incident polar angle θ _i , incident azimuth angle φ _i , reflected polar angle θ _r , and reflected azimuth angle φ _r . Therefore, as in the technique described in Non-Patent Document 1, it is generally assumed that BRDF can be expressed by a parametric model, and the texture is estimated by a method of obtaining parameters of the model. Since this embodiment does not depend on the texture estimation method, any texture estimation method other than the above may be used. The initial value of the optimization calculation is arbitrary, but it is desirable to set an initial value close to actual texture data.

＜光源周波数＞
本実施例の主眼は、光源が高周波成分を十分に含んでいない場合に、「質感推定が困難であり、フラッシュを焚く必要がある」と判定する点である。以下では、光源の周波数について説明する。 <Light source frequency>
The main point of this embodiment is that when the light source does not contain sufficient high-frequency components, it is determined that “texture estimation is difficult and it is necessary to apply a flash”. Hereinafter, the frequency of the light source will be described.

図６は、太陽光（図６（ａ））とくもり空（図６（ｂ））について、入射極角θ_iと入射輝度Lとの関係を示す図である（説明の簡略化のため、入射方位角φ_iは固定としている）。また、図７は、太陽光（図７（ａ））とくもり空（図７（ｂ））について、周波数次数fと入射輝度周波数係数L'との関係を示す図である。ここで、周波数次数fは周波数の高低を表し、入射輝度周波数係数L'は入射輝度Lを周波数変換した結果を表す。入射輝度Lを周波数変換する方法は任意だが、例えば、非特許文献２記載の球面調和展開、およびウェーブレット展開を用いればよい。以下では、これら図６と図７とを参照して、光源の周波数について説明する。 FIG. 6 is a diagram showing the relationship between the incident polar angle θ _i and the incident luminance L for sunlight (FIG. 6A) and cloudy sky (FIG. 6B) (for simplification of explanation). The incident azimuth angle φ _i is fixed). FIG. 7 is a diagram showing the relationship between the frequency order f and the incident luminance frequency coefficient L ′ for sunlight (FIG. 7A) and cloudy sky (FIG. 7B). Here, the frequency order f represents the frequency level, and the incident luminance frequency coefficient L ′ represents the result of frequency conversion of the incident luminance L. The method of frequency-converting the incident luminance L is arbitrary, but for example, spherical harmonic expansion and wavelet expansion described in Non-Patent Document 2 may be used. Hereinafter, the frequency of the light source will be described with reference to FIGS. 6 and 7.

図６（ａ）を見ると、局所的に入射輝度Lが高くなっている。これは、太陽のある方向だけ極端に入射輝度Lが高くなっていることを表している。このように局所的に入射輝度Lが高くなる方向がある場合、光源は高周波成分を多く含む。逆に、図６（ｂ）はくもり空であり、局所的に入射輝度Lが高くなる方向がない。このような場合、光源は高周波成分を十分に含まない。この差が図７に現れており、図７（ａ）の方が図７（ｂ）よりも高周波成分を多く含んでいることが分かる。 As shown in FIG. 6A, the incident luminance L is locally increased. This represents that the incident luminance L is extremely high only in a certain direction of the sun. In this way, when there is a direction in which the incident luminance L increases locally, the light source includes a large amount of high frequency components. Conversely, FIG. 6B is cloudy and there is no direction in which the incident luminance L increases locally. In such a case, the light source does not contain sufficient high frequency components. This difference appears in FIG. 7, and it can be seen that FIG. 7A contains more high-frequency components than FIG. 7B.

＜光源周波数と質感推定成否、およびフラッシュ要否の関係＞
以下、光源周波数と質感推定の成否、およびフラッシュ要否の関係について説明する。まず、光源が高周波成分を十分に含んでいないと質感推定が困難である理由を、金属のような高光沢な質感とプラスチックのような低光沢な質感を例にして説明する。高光沢被写体と低光沢被写体とを撮像した画像を比較すると、光源の写り込み方が大きく異なる。具体的には、高光沢被写体を撮像した画像では光源がくっきりと写り込み、低光沢被写体を撮像した画像では光源がぼんやりと写り込む。この光源の写り込み方の違いから被写体が高光沢か低光沢かを判断することができる。しかし、光源が高周波成分を十分に含んでいない場合、つまり光源自体がぼんやりしている場合は、高光沢被写体と低光沢被写体のどちらを撮像しても写り込みがぼんやりとしてしまう。このように撮像画像の光源の写り込み方に差が現れなくなるため、質感データの推定が困難となる。以上が、光源が高周波成分を十分に含まない場合に質感データの推定が困難となる理由である。 <Relationship between light source frequency, texture estimation success, and flash necessity>
Hereinafter, the relationship between the light source frequency, the success or failure of the texture estimation, and the necessity of the flash will be described. First, the reason why it is difficult to estimate the texture if the light source does not contain sufficient high-frequency components will be described using a high gloss texture such as metal and a low gloss texture such as plastic as examples. Comparing images obtained by imaging a high gloss subject and a low gloss subject, the way in which the light source is reflected is greatly different. Specifically, a light source is clearly reflected in an image obtained by imaging a high-gloss subject, and a light source is blurred in an image obtained by imaging a low-gloss subject. Whether the subject is high gloss or low gloss can be determined from the difference in how the light source is reflected. However, if the light source does not contain sufficient high-frequency components, that is, if the light source itself is blurred, the image is blurred regardless of whether a high-gloss subject or a low-gloss subject is imaged. As described above, since there is no difference in how the light source of the captured image is reflected, it is difficult to estimate the texture data. The above is the reason why it is difficult to estimate the texture data when the light source does not contain sufficient high frequency components.

以下では、式２を周波数変換することで、上記の理由を理論的に説明する。式２を周波数変換すると、以下の式に変形することができる（式変形の詳細は非特許文献2に記載）。 In the following, the above reason will be theoretically explained by converting the frequency of Equation (2). When Formula 2 is frequency-converted, it can be transformed into the following formula (details of formula transformation are described in Non-Patent Document 2).

ここで、B'はBRDFを周波数変換した結果（以後、BRDF周波数係数と呼ぶ）、βはどの周波数変換を行ったかにより異なる変換係数である。 Here, B ′ is a result of frequency conversion of BRDF (hereinafter referred to as BRDF frequency coefficient), and β is a conversion coefficient that varies depending on which frequency conversion is performed.

続けて、光源が高周波成分を十分に含んでいないと質感推定が困難となる理由を、式４を参照して理論的に説明する。光源が高周波成分を十分に含んでいない場合、入射輝度周波数係数L'は大きなfについてゼロとなる。すると、BRDF周波数係数B'が大きなfについて非ゼロの値を持っていても、画素値Iにその影響が反映されないことが式４から分かる。質感推定は画素値Iを参照して行うため、上記のようにBRDF周波数係数B'の影響が画素値Iに反映されない場合、質感推定が困難となる。以上が、光源が高周波成分を十分に含んでいないと質感推定が困難となる理由である。 Next, the reason why it is difficult to estimate the texture if the light source does not contain sufficient high-frequency components will be theoretically described with reference to Equation 4. When the light source does not contain sufficient high-frequency components, the incident luminance frequency coefficient L ′ is zero for large f. Then, it can be seen from Equation 4 that even if the BRDF frequency coefficient B ′ has a non-zero value for large f, the influence is not reflected on the pixel value I. Since the texture estimation is performed with reference to the pixel value I, it is difficult to estimate the texture when the influence of the BRDF frequency coefficient B ′ is not reflected on the pixel value I as described above. The above is the reason why it is difficult to estimate the texture if the light source does not contain sufficient high-frequency components.

なお、BRDF周波数係数B'が大きな周波数次数fで非ゼロの値を持たない場合、すなわち、被写体が低光沢である場合では、入射輝度周波数係数L'の値に関わらず、質感推定が可能である。すなわち、質感推定の目的となるBRDF周波数係数B'が大きな周波数次数fで非ゼロの値を持たない場合、入射輝度周波数係数L'が大きな周波数次数fについてゼロであっても、BRDF周波数B'の全ての情報が画素値Iに反映される。よって、質感推定が可能となる。つまり、入射輝度周波数係数L'がどのくらい大きな周波数次数fまで非ゼロの値を持っている必要があるかは、BRDF周波数係数B'に依存する。 Note that if the BRDF frequency coefficient B ′ has a large frequency order f and does not have a non-zero value, that is, if the subject has low gloss, texture estimation is possible regardless of the value of the incident luminance frequency coefficient L ′. is there. That is, if the BRDF frequency coefficient B ′ that is the object of texture estimation is a large frequency order f and does not have a non-zero value, even if the incident luminance frequency coefficient L ′ is zero for the large frequency order f, the BRDF frequency B ′ Is reflected in the pixel value I. Therefore, texture estimation is possible. In other words, how large the incident luminance frequency coefficient L ′ needs to have a non-zero value up to the frequency order f depends on the BRDF frequency coefficient B ′.

＜フラッシュ要否判定部１１２のフローチャート＞
図８はフラッシュ要否判定部１１２の各処理部を示すブロック図である。また、図９はフラッシュ要否判定部１１２の動作を示すフローチャートである。以降では、図９を参照して、本実施例の主眼であるフラッシュ要否判定部１１２の動作の流れを説明する。 <Flowchart of Flash Necessity Determination Unit 112>
FIG. 8 is a block diagram showing each processing unit of the flash necessity determination unit 112. FIG. 9 is a flowchart showing the operation of the flash necessity determination unit 112. Hereinafter, with reference to FIG. 9, the flow of operation of the flash necessity determination unit 112, which is the main point of the present embodiment, will be described.

ステップＳ９０１で、光源周波数変換部８０１が光源データを周波数変換する。周波数変換の方法は上記の通りである。 In step S901, the light source frequency conversion unit 801 performs frequency conversion on the light source data. The frequency conversion method is as described above.

ステップＳ９０２で、フラッシュ要否決定部８０２が、光源が高周波成分を十分に含むかを判定する。判定方法は後述する。もし、十分に含んでいれば、ステップＳ９０３で「フラッシュは不要」であると決定し、十分に含んでいなければ、ステップＳ９０４で「フラッシュは必要」であると決定する。 In step S902, the flash necessity determination unit 802 determines whether the light source sufficiently includes a high-frequency component. The determination method will be described later. If so, it is determined in step S903 that “flash is not necessary”, and if not, it is determined in step S904 that “flash is necessary”.

＜フラッシュ要否判定処理の詳細＞
図１０は、周波数次数fと周波数次数の閾値tとの関係を示す図である。以降では、図１０を参照して、光源が高周波成分を十分に含むかどうか判定する方法を説明する。まず、周波数次数fに固定閾値を設ける方法について説明する。図１０（ａ）と図１０（ｂ）は異なる光源を周波数変換した場合の入射輝度周波数係数L'を表している。具体的には、図１０（ａ）では太陽光のような高周波を十分に含む光源、図１０（ｂ）ではくもり空のような高周波を十分に含まない光源をそれぞれ周波数変換している。光源が高周波成分を十分に含むかどうか判定するためには、図１０（ａ）と図１０（ｂ）に示すような周波数次数fの閾値tをあらかじめ決めておき、閾値tよりも小さい周波数次数fで入射輝度周波数係数L'が小さい値になるか確認すればよい。つまり、閾値tよりも小さい周波数次数fで入射輝度周波数係数L'が小さい値になれば高周波成分を十分に含まないと判定し、そうでなければ高周波成分を十分に含むと判定すればよい。なお、入射輝度周波数係数L'がどの程度小さい値になった時に高周波成分を十分に含まないと判定するかは、あらかじめ閾値を設定しておき、入射輝度周波数係数L'がその閾値よりも小さいかどうかで判定すればよい。 <Details of flash necessity determination processing>
FIG. 10 is a diagram illustrating the relationship between the frequency order f and the frequency order threshold value t. Hereinafter, with reference to FIG. 10, a method for determining whether or not a light source sufficiently includes a high-frequency component will be described. First, a method for providing a fixed threshold value for the frequency order f will be described. FIG. 10A and FIG. 10B show the incident luminance frequency coefficient L ′ when the frequency of different light sources is converted. Specifically, in FIG. 10A, frequency conversion is performed on a light source that sufficiently includes high frequencies such as sunlight, and in FIG. 10B, a light source that does not sufficiently include high frequencies such as cloudy sky is converted. In order to determine whether or not the light source sufficiently includes a high frequency component, a threshold value t of a frequency order f as shown in FIGS. 10A and 10B is determined in advance, and a frequency order smaller than the threshold value t is determined. What is necessary is just to confirm whether the incident luminance frequency coefficient L ′ becomes a small value at f. That is, if the incident luminance frequency coefficient L ′ is a small value at a frequency order f smaller than the threshold value t, it is determined that the high frequency component is not sufficiently included, and otherwise, it is determined that the high frequency component is sufficiently included. Note that a threshold value is set in advance to determine how small the incident luminance frequency coefficient L ′ does not contain sufficient high-frequency components when the incident luminance frequency coefficient L ′ is small, and the incident luminance frequency coefficient L ′ is smaller than the threshold value. Whether or not it should be judged.

次に、周波数次数fに2つの閾値t1、t2を設けた図１０（ｃ）を参照して、周波数次数fに可変閾値を設ける方法について説明する。閾値t1、t2のうち、どちらの閾値を用いてフラッシュの要否を判定するのが適切かは、上記したようにBRDF周波数係数B'に依存する。そのため、BRDF周波数係数B'に応じて、どちらの閾値を用いるかを決定することができれば、適切な閾値でフラッシュ要否を判定できる。しかし、質感データが未知であることから、BRDF周波数係数B'も未知であり、BRDF周波数係数B'を直接閾値の決定のために用いることはできない。そこで、BRDF周波数係数B'を大まかに予測し、その予測を用いて、閾値を決定することを考える。例えば、物体認識技術を用いて、被写体の種類を認識し、その種類の被写体が持ちうる最も高周波なBRDF周波数係数B'を用いて閾値を決定すればよい。また、被写体の種類を撮像者に設定させ、その被写体の種類に適した閾値を設定してもよい。 Next, a method for providing a variable threshold value for the frequency order f will be described with reference to FIG. 10C in which two threshold values t1 and t2 are provided for the frequency order f. Which of the threshold values t1 and t2 is used to determine whether flash is necessary or not depends on the BRDF frequency coefficient B ′ as described above. Therefore, if it is possible to determine which threshold value is used according to the BRDF frequency coefficient B ′, it is possible to determine whether flash is necessary or not with an appropriate threshold value. However, since the texture data is unknown, the BRDF frequency coefficient B ′ is also unknown, and the BRDF frequency coefficient B ′ cannot be used directly for determining the threshold. Therefore, it is considered that the BRDF frequency coefficient B ′ is roughly predicted, and the threshold is determined using the prediction. For example, an object recognition technique may be used to recognize the type of subject, and the threshold value may be determined using the highest frequency BRDF frequency coefficient B ′ that can be possessed by that type of subject. Alternatively, the type of subject may be set by the photographer, and a threshold value suitable for the type of subject may be set.

以上が入射輝度周波数係数L'からフラッシュ要否を判定する方法の説明である。なお、本実施例は入射輝度周波数係数L'から直接フラッシュ要否を判定することに限定されない。以下では、入射輝度周波数係数L'以外からフラッシュ要否を判定する方法について説明する。 The above is the description of the method for determining whether or not the flash is necessary from the incident luminance frequency coefficient L ′. Note that the present embodiment is not limited to directly determining whether flash is necessary or not from the incident luminance frequency coefficient L ′. In the following, a method for determining whether or not the flash is necessary from other than the incident luminance frequency coefficient L ′ will be described.

まず、撮像モードから光源周波数を予測し、その予測結果を用いてフラッシュ要否を判定する方法について説明する。ここで、撮像モードとは、「くもりモード」、「晴れモード」など、カメラの撮像設定を自動で決めるために用いるシーン情報を指す。本実施例では、撮像モードもフラッシュ要否を判定するために用いられるものであるので、光源情報と称することができる。本方法では、図１１に示すような撮像モードと光源周波数の関係をテーブルとして保持しておき、このテーブルを参照して、光源周波数からフラッシュ要否を判定する。なお、上記テーブルは各撮像モードに適合するような撮像環境において実際に光源周波数を算出することで作成することができる。また、撮像モードとフラッシュ要否の関係を直接テーブルとして保持しておいてもよい。 First, a method of predicting the light source frequency from the imaging mode and determining the necessity of flash using the prediction result will be described. Here, the imaging mode refers to scene information used for automatically determining imaging settings of the camera, such as “cloudy mode” and “sunny mode”. In the present embodiment, the imaging mode is also used for determining whether or not the flash is necessary, and thus can be referred to as light source information. In this method, the relationship between the imaging mode and the light source frequency as shown in FIG. 11 is held as a table, and the necessity of flash is determined from the light source frequency with reference to this table. The table can be created by actually calculating the light source frequency in an imaging environment suitable for each imaging mode. Further, the relationship between the imaging mode and the necessity of flash may be held directly as a table.

次に、フラッシュ要否を判定する他の方法として、本撮像の前にプレ撮像を行い、プレ撮像画像データが高周波成分を十分に含んでいるか否かに応じてフラッシュ要否を判定する方法を説明する。プレ撮像画像データもフラッシュ要否を判定するために用いられるものであるので、光源情報と称することができる。具体的には、プレ撮像画像に対して周波数変換を施し、その周波数係数I'（以後、撮像画像周波数係数と呼ぶ）が高周波成分を十分に含んでいる場合には「フラッシュは不要」であると判定し、十分に含んでいない場合には「フラッシュは必要」であると判定する。このような判定を行う理由は、式４から画素値Iが高周波成分を含まない場合には、入射輝度周波数係数L'かBRDF周波数係数B'のどちらかが大きなfについてゼロに近い値となっていると判断できるからである。つまり、入射輝度周波数係数L'が大きなfについてゼロに近い値になっている可能性があるため、「フラッシュは必要」であると判定する。なお、プレ撮像画像のダイナミックレンジが狭い場合に、プレ撮像画像が高周波成分を十分に含んでいないと判定してもよい。 Next, as another method for determining whether or not flash is necessary, a method of performing pre-imaging before main imaging and determining whether or not flash is necessary depending on whether or not the pre-captured image data sufficiently includes a high-frequency component explain. Since the pre-captured image data is also used for determining whether or not the flash is necessary, it can be referred to as light source information. Specifically, if the pre-captured image is subjected to frequency conversion and the frequency coefficient I ′ (hereinafter referred to as “captured image frequency coefficient”) sufficiently includes a high-frequency component, “flash is unnecessary”. If it is not sufficient, it is determined that “flash is necessary”. The reason for making such a determination is that when the pixel value I does not include a high frequency component from Equation 4, either the incident luminance frequency coefficient L ′ or the BRDF frequency coefficient B ′ is a value close to zero for a large f. It is because it can be judged that it is. In other words, since there is a possibility that the incident luminance frequency coefficient L ′ is a value close to zero for a large f, it is determined that “flash is necessary”. Note that when the dynamic range of the pre-captured image is narrow, it may be determined that the pre-captured image does not sufficiently contain the high-frequency component.

なお、実施例１において撮像モードやプレ撮像画像データは、フラッシュ要否の判定に用いられる光源情報である。質感推定に用いる光源データについては、先に説明したように外部から取得した光源データを用いることができる。 In the first embodiment, the imaging mode and pre-captured image data are light source information used for determining whether or not a flash is necessary. As the light source data used for texture estimation, the light source data acquired from the outside can be used as described above.

以上説明したように、実施例１によれば、光源情報から撮像時にフラッシュが必要かどうかを判定し、必要に応じてフラッシュ撮像を行うことで、光源が高周波成分を十分に含まない場合でも質感データを得ることができる。 As described above, according to the first embodiment, whether or not a flash is necessary at the time of imaging is determined from the light source information, and flash imaging is performed as necessary. Data can be obtained.

実施例１では、外部から取得した光源データを用いて、質感データを推定する手法を説明した。実施例２では、本画像処理装置の撮像部を用いて光源データを取得し、質感データのみならず光源データをも推定する手法について説明する。 In the first embodiment, the method for estimating the texture data using the light source data acquired from the outside has been described. In the second embodiment, a method of acquiring light source data using an imaging unit of the image processing apparatus and estimating light source data as well as texture data will be described.

＜実施例２のフローチャート＞
図１２は図１に示す画像処理装置の実施例２における動作を示すフローチャートである。以降では、図１２を参照して画像処理装置の実施例２における動作を説明する。 <Flowchart of Example 2>
FIG. 12 is a flowchart showing the operation of the image processing apparatus shown in FIG. Hereinafter, the operation of the image processing apparatus according to the second embodiment will be described with reference to FIG.

ステップＳ１２０１で、フラッシュ要否判定部１１２が撮像時にフラッシュを焚く必要があるかどうかを判定する。フラッシュを焚く必要があるかどうかは、実施例１で説明したように、撮像モードから予測した光源周波数やプレ撮像画像の撮像画像周波数係数I'などの光源情報から判定すればよい。なお、実施例１で説明した光源情報の一例である、光源の位置、方向、輝度などの照明情報に関する光源データは実施例２では未知であるので、この光源データは、光源情報としてフラッシュ要否の判定には用いられない。 In step S1201, the flash necessity determination unit 112 determines whether it is necessary to throw a flash during imaging. Whether or not it is necessary to fire the flash may be determined from light source information such as the light source frequency predicted from the imaging mode and the captured image frequency coefficient I ′ of the pre-captured image, as described in the first embodiment. Note that light source data related to illumination information such as the position, direction, and luminance of the light source, which is an example of the light source information described in the first embodiment, is unknown in the second embodiment. It is not used for the determination.

ステップＳ１２０２で、撮像部１０１、１０２がフラッシュ要否判定部１１２の判定結果に応じて撮像を行う。具体的には、フラッシュ要否判定部１１２が「フラッシュは不要」と判定した場合はフラッシュ無しで撮像を行い、「フラッシュは必要」と判定した場合はフラッシュ無しとフラッシュ有りでそれぞれ撮像を行う。すなわち、フラッシュを用いて被写体を撮像した第１の画像データと、フラッシュを用いずに被写体を撮像した第２の画像データとを取得する。なお、ステップＳ１２０１でプレ撮像を行った場合は、そのプレ撮像画像をフラッシュ無し撮像画像として利用してもよい。 In step S <b> 1202, the imaging units 101 and 102 perform imaging according to the determination result of the flash necessity determination unit 112. Specifically, when the flash necessity determination unit 112 determines that “flash is not necessary”, imaging is performed without a flash, and when it is determined that “flash is necessary”, imaging is performed with no flash and with flash. That is, first image data obtained by imaging a subject using a flash and second image data obtained by imaging the subject without using a flash are acquired. When pre-imaging is performed in step S1201, the pre-captured image may be used as a non-flash captured image.

ステップＳ１２０３で、シーン要素データ取得部１０６が被写体の形状データを取得する。形状データの取得方法は実施例１と同様であるため、説明を省略する。 In step S1203, the scene element data acquisition unit 106 acquires subject shape data. Since the shape data acquisition method is the same as that of the first embodiment, the description thereof is omitted.

ステップＳ１２０４で、シーン要素データ取得部１０６が被写体の質感データと光源データとを推定する。基本的には、式２のBRDFと入射輝度Lとを実施例１と同様の最適化計算により求めればよい。しかし、一般的に、式２を満たすBRDFと入射輝度Lの組み合わせは多数存在する。そのため、最適化計算の初期値が実際の質感データや光源データから離れていると、局所解に陥り、推定に失敗してしまう。そこで、最適化計算の初期値を実際の質感データや光源データに近いものにすることが重要となる。 In step S1204, the scene element data acquisition unit 106 estimates subject texture data and light source data. Basically, the BRDF of formula 2 and the incident luminance L may be obtained by the same optimization calculation as in the first embodiment. However, in general, there are many combinations of BRDF and incident luminance L that satisfy Equation 2. Therefore, if the initial value of the optimization calculation is far from the actual texture data or light source data, it falls into a local solution and the estimation fails. Therefore, it is important that the initial value of the optimization calculation is close to actual texture data and light source data.

＜実施例２におけるシーン要素データ取得部１０６のフローチャート＞
実施例２では、実施例１のように外部から光源データを取得するのではなく、フラッシュ有り撮像画像とフラッシュ無し撮像画像の比較結果から適切な推定方法と最適化計算の初期値を求めることで、光源データと質感データを両方推定する。図１３は、実施例２のシーン要素データ取得部の各処理部の一例を示す図である。実施例１と異なり、入力データは撮像画像データと形状データであり、これらのデータに基づいて質感データと光源データとが出力される。 <Flowchart of Scene Element Data Acquisition Unit 106 in Embodiment 2>
In the second embodiment, the light source data is not acquired from the outside as in the first embodiment, but an appropriate estimation method and an initial value of the optimization calculation are obtained from the comparison result between the captured image with flash and the captured image without flash. Estimate both light source data and texture data. FIG. 13 is a diagram illustrating an example of each processing unit of the scene element data acquisition unit according to the second embodiment. Unlike the first embodiment, the input data is captured image data and shape data, and texture data and light source data are output based on these data.

図１４は実施例２における質感・光源データ取得部１３０３の動作を示すフローチャートである。以降では、図１４を参照して実施例２における質感・光源データ取得部１３０３の動作を説明する。 FIG. 14 is a flowchart illustrating the operation of the texture / light source data acquisition unit 1303 according to the second embodiment. Hereinafter, the operation of the texture / light source data acquisition unit 1303 according to the second embodiment will be described with reference to FIG.

ステップＳ１４０１で、ステップＳ１２０２において、フラッシュ有りで撮像を行ったどうか判定する。 In step S1401, it is determined in step S1202 whether imaging has been performed with a flash.

フラッシュ有りで撮像を行ったと判定した場合、ステップＳ１４０２で、フラッシュ有り撮像画像データとフラッシュ無し撮像画像データとを周波数変換する。 If it is determined that the image has been taken with the flash, in step S1402, the captured image data with the flash and the captured image data without the flash are frequency-converted.

続けて、ステップＳ１４０３で、フラッシュ有り撮像画像データとフラッシュ無し撮像画像データの周波数を比較する。具体的には、フラッシュ無し撮像画像データの周波数係数I'₁とフラッシュ有り撮像画像データの周波数係数I'₂がそれぞれゼロに近い値となる最小の周波数次数f1_minとf2_minとを比較する。なお、どの程度ゼロに近い値となった時の周波数次数を比較するかについては、あらかじめ閾値を設け、その閾値よりも周波数係数I'が小さくなった時の周波数次数を比較すればよい。 In step S1403, the frequencies of the captured image data with flash and the captured image data without flash are compared. Specifically, comparing the minimum frequency order f1_min and f2_min ₂ 'frequency coefficients I ₁ and the flash There captured image data' frequency coefficients I flash no captured image data has a value close to zero, respectively. As to how close the frequency order is when the value becomes close to zero, a threshold value is set in advance, and the frequency order when the frequency coefficient I ′ becomes smaller than the threshold value may be compared.

f1_minとf2_minの差があらかじめ設定された閾値より大きい場合は、ステップＳ１４０４で、フラッシュ有り撮像画像とフラッシュ無し撮像画像の差分画像を用いて推定を行う。差分画像を用いた推定処理の詳細については後述する。 If the difference between f1_min and f2_min is larger than a preset threshold value, estimation is performed using a difference image between the captured image with flash and the captured image without flash in step S1404. Details of the estimation process using the difference image will be described later.

f1_minとf2_minの差が閾値以下の場合、もしくは、ステップＳ１４０１においてフラッシュ有りで撮像を行っていないと判定した場合は、ステップＳ１４０５で、最適化計算の初期値を算出する。具体的には、フラッシュ無し撮像画像が高周波成分を十分に含んでいる場合には、BRDFが高周波成分を十分に含む質感データを初期値とする。逆に、フラッシュ無し撮像画像が高周波成分を十分に含んでいない場合には、BRDFが高周波成分を十分に含まない質感データを初期値とする。このような初期値とする理由は、式2から、画素値Iが高周波成分を含む場合はBRDFも高周波成分を含み、画素値Iが高周波成分を含まない場合はBRDFも高周波成分を含まないと分かるからである。なお、上記初期値として具体的にどのような質感データを設定するかは任意である。例えば、金属などの高周波なBRDFとプラスチックなどの低周波なBRDFを実際に計測しておき、撮像画像の周波数に応じて、それらBRDFを質感データの初期値として設定すればよい。このように、高周波成分を多く含む場合、高周波に対応する初期値を与えた最適化計算処理を行い、高周波成分を多く含まない場合、低周波に対応する初期値を与えた最適化計算処理を行う。 If the difference between f1_min and f2_min is equal to or smaller than the threshold value, or if it is determined in step S1401 that imaging is not performed with a flash, an initial value for optimization calculation is calculated in step S1405. Specifically, when the non-flash captured image sufficiently includes high-frequency components, BRDF uses texture data that sufficiently includes high-frequency components as an initial value. On the other hand, if the captured image without flash does not contain sufficient high frequency components, BRDF sets the texture data that does not contain sufficient high frequency components as the initial value. The reason for this initial value is that, from Equation 2, when the pixel value I includes a high frequency component, the BRDF also includes a high frequency component, and when the pixel value I does not include a high frequency component, the BRDF also does not include a high frequency component. Because I understand. Note that what kind of texture data is specifically set as the initial value is arbitrary. For example, a high-frequency BRDF such as metal and a low-frequency BRDF such as plastic may be actually measured, and these BRDFs may be set as the initial value of the texture data according to the frequency of the captured image. As described above, when many high-frequency components are included, an optimization calculation process that gives an initial value corresponding to a high frequency is performed, and when many high-frequency components are not included, an optimization calculation process that gives an initial value corresponding to a low frequency is performed. Do.

続けて、ステップＳ１４０６で、フラッシュ無し撮像画像のみを用いて質感推定および光源推定処理を行う。質感推定処理は実施例1と同様である。また、画素値、形状データ、質感データが既知となっているので、式２を満たす入射輝度Ｌを最適化計算により求めることで光源データを推定できる。 Subsequently, in step S1406, texture estimation and light source estimation processing are performed using only the non-flash captured image. The texture estimation process is the same as in the first embodiment. Further, since the pixel value, shape data, and texture data are known, the light source data can be estimated by obtaining the incident luminance L satisfying Equation 2 by optimization calculation.

なお、ここで、f1_minとf2_minの差が閾値以下の場合にフラッシュ無し撮像画像データのみを用いて質感を推定する理由を説明する。f1_minとf2_minが同じ場合、フラッシュ無し撮像画像データとフラッシュ有り撮像画像データとで、実施例1において説明した質感推定成否に違いが無いということになる。ということは、シーン中の光源はフラッシュが無くとも高周波成分を十分に含んでいたということになり、フラッシュ無し撮像画像のみから推定が可能だと分かる。以上が、f1_minとf2_minの差が小さい場合に、フラッシュ無し撮像画像のみを用いて推定を行う理由である。なお上記は、f1_minとf2_minの差が小さい場合に、フラッシュ有り撮像画像を用いて質感推定を行うことを排除する趣旨ではない。すなわち、f1_minとf2_minの差が小さい場合に、後述するように差分画像データを用いて質感推定を行っても良い。 Here, the reason why the texture is estimated using only the non-flash captured image data when the difference between f1_min and f2_min is equal to or smaller than the threshold will be described. When f1_min and f2_min are the same, there is no difference in the quality estimation success / failure described in the first embodiment between the captured image data without flash and the captured image data with flash. This means that the light source in the scene contained sufficient high-frequency components even without a flash, and it can be understood that estimation can be made only from a captured image without flash. The above is the reason why estimation is performed using only the non-flash captured image when the difference between f1_min and f2_min is small. Note that the above is not intended to exclude performing texture estimation using a captured image with flash when the difference between f1_min and f2_min is small. That is, when the difference between f1_min and f2_min is small, texture estimation may be performed using difference image data as described later.

＜差分画像を用いた質感・光源データ推定処理の詳細＞
図１５は、質感・光源データ取得部１３０３による差分画像データを用いた質感・光源データ推定処理の流れを示すフローチャートである。以降では、図１５を参照して差分画像データを用いた質感・光源データ推定処理について説明する。 <Details of texture / light source data estimation processing using difference images>
FIG. 15 is a flowchart showing the flow of texture / light source data estimation processing using difference image data by the texture / light source data acquisition unit 1303. Hereinafter, the texture / light source data estimation process using the difference image data will be described with reference to FIG.

ステップＳ１５０１で、差分画像データを生成する。差分画像データは、フラッシュ有り撮像画像データとフラッシュ無し撮像画像データの画素値の差を取ることで生成すればよい。ここで、差分画像データはフラッシュのみを光源としてシーンを撮像した画像データと同じとなるはずである。なぜなら、フラッシュ有り撮像画像データは元々シーン中にあった光源とフラッシュの両方を光源とした撮像画像データであり、フラッシュ無し撮像画像データは元々シーン中にあった光源のみを光源とした撮像画像データだからである。なお、フラッシュ有り撮像画像データとフラッシュ無し撮像画像データとの間でカメラ位置にずれがある場合は、差分画像データを生成する前に画像位置合わせ技術により、画像間の位置を合わせればよい。しかし、光源環境の異なる画像間で位置合わせを行うことは一般的に困難である。従って、三脚などに固定して撮像を行うか、フラッシュ有り撮像画像データとフラッシュ無し撮像画像データとをほぼ同時に撮像するなど、位置ずれが起こらないよう工夫することが望ましい。 In step S1501, difference image data is generated. The difference image data may be generated by taking the difference between the pixel values of the captured image data with flash and the captured image data without flash. Here, the difference image data should be the same as the image data obtained by capturing the scene using only the flash as the light source. This is because the captured image data with flash is captured image data using both the light source and flash that were originally in the scene, and the captured image data without flash is captured image data using only the light source that was originally in the scene as the light source. That's why. If there is a difference in camera position between the captured image data with flash and the captured image data without flash, the position between the images may be aligned by an image alignment technique before the difference image data is generated. However, it is generally difficult to perform alignment between images having different light source environments. Therefore, it is desirable to devise a method that does not cause misalignment, for example, by taking a fixed image on a tripod or by taking image data with flash and image data without flash almost simultaneously.

ステップＳ１５０２で、差分画像データから質感データを推定する。ここで、フラッシュの情報が既知であれば、差分画像データの画素値、形状データ、フラッシュ（光源）情報から実施例１と同様に質感データを推定することができる。 In step S1502, the texture data is estimated from the difference image data. Here, if the flash information is known, the texture data can be estimated from the pixel value of the difference image data, the shape data, and the flash (light source) information as in the first embodiment.

ステップＳ１５０３で、フラッシュ無し撮像画像データから、フラッシュ無し撮影の際の光源データを推定する。ここで、ステップＳ１５０２において質感データを推定したことから、画素値、形状データ、質感データが既知となっている。そのため、式２を満たす入射輝度Ｌを最適化計算により求めることができる。 In step S1503, light source data at the time of shooting without flash is estimated from captured image data without flash. Here, since the texture data is estimated in step S1502, the pixel value, the shape data, and the texture data are already known. Therefore, the incident luminance L satisfying Equation 2 can be obtained by optimization calculation.

上記したように、質感データを推定する場合には画素値、形状データ、光源データが既知であり、光源データを推定する場合には画素値、形状データ、質感データが既知となっていることから、最適化計算の初期値により局所解に陥る心配は比較的少ない。そのため、初期値として任意の値を用いることができるが、実際の光源データや質感データに近い初期値を設定する方がより望ましい。 As described above, the pixel value, shape data, and light source data are known when estimating the texture data, and the pixel value, shape data, and texture data are known when estimating the light source data. There is relatively little worry that the initial value of the optimization calculation falls into a local solution. Therefore, an arbitrary value can be used as the initial value, but it is more desirable to set an initial value close to actual light source data or texture data.

以上説明したように、実施例２によれば、未知の光源が高周波成分を十分に含まない場合に、フラッシュ有りとフラッシュ無しでそれぞれ撮像を行い、それらの撮像画像データから推定を行うことで、質感データと光源データとを両方推定することを可能とする。 As described above, according to the second embodiment, when an unknown light source does not sufficiently contain a high-frequency component, by performing imaging with and without flash, and estimating from the captured image data, It is possible to estimate both the texture data and the light source data.

＜その他の実施例＞
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 <Other examples>
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

Determining means for determining whether to estimate the texture data of the subject from the image data captured using the flash based on the light source information;
An image processing apparatus comprising: texture data estimation means for estimating texture data of the subject according to determination by the determination means.

The image processing apparatus according to claim 1, wherein when the light source information includes a high frequency component, the determination unit determines to estimate the texture data of the subject from image data captured using the flash.

The image processing apparatus according to claim 1, further comprising a light source information acquisition unit that acquires light source data including at least one of a position, a direction, and luminance of the light source as the light source information.

The image processing apparatus according to claim 3, wherein the light source information acquisition unit acquires the light source data obtained by imaging the periphery of the subject.

The image processing apparatus according to claim 1, further comprising a light source information acquisition unit that acquires data indicating an imaging mode when the subject is imaged as the light source information.

The image processing apparatus according to claim 1, further comprising a light source information acquisition unit that acquires pre-captured image data obtained by capturing the subject without using a flash as the light source information.

Image data acquisition means for acquiring image data obtained by imaging the subject;
Shape data acquisition means for acquiring shape data of the subject;
Light source data acquisition means for acquiring light source data including at least one of the position, direction, and luminance of the light source,
The texture data estimation means estimates the texture data using the image data acquired by the image data acquisition means, the shape data acquired by the shape data acquisition means, and the light source data acquired by the light source data acquisition means. An image processing apparatus according to claim 1, wherein

The image processing apparatus according to claim 1, further comprising a light source data estimation unit that estimates light source data of the subject according to the determination by the determination unit.

Image data acquisition means for acquiring first image data obtained by imaging the subject using a flash and second image data obtained by imaging the subject without using a flash;
The apparatus further comprises control means for causing the texture data estimation means and the light source data estimation means to perform estimation processing according to a frequency order difference between the first image data and the second image data. The image processing apparatus according to claim 8.

When the frequency order difference is greater than a threshold value, the control unit uses the difference image data, which is a difference between the first image data and the second image data, to the texture data estimation unit. The image processing apparatus according to claim 9, wherein the light source data estimation unit causes the light source data estimation unit to estimate the light source data using the estimated texture data and the second image data.

The control means causes the texture data estimation means to estimate the texture data using the second image data when the frequency order difference is equal to or less than the threshold value, and causes the light source data estimation means to perform the second image. The image processing apparatus according to claim 9, wherein the light source data is estimated using data.

The texture data estimation means performs an optimization calculation process that gives an initial value corresponding to a high frequency if it contains a lot of high frequency components, and an optimization that gives an initial value that corresponds to a low frequency if it does not contain a lot of high frequency components The image processing apparatus according to claim 11, wherein calculation processing is performed.

A determination step of determining based on the light source information whether to estimate the texture data of the subject from the image data captured using the flash;
An image processing method comprising: a texture data estimation step of estimating texture data of the subject according to the determination in the determination step.

A program for causing a computer to function as the image processing apparatus according to any one of claims 1 to 12.