JP7084616B2

JP7084616B2 - Image processing device, image processing method, and image processing program

Info

Publication number: JP7084616B2
Application number: JP2018117026A
Authority: JP
Inventors: 由博金森; 結城遠藤
Original assignee: University of Tsukuba NUC
Current assignee: University of Tsukuba NUC
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2022-06-15
Anticipated expiration: 2038-06-20
Also published as: JP2019219928A

Description

本発明の実施形態は、画像処理装置、画像処理方法、及び画像処理プログラムに関する。 Embodiments of the present invention relate to an image processing apparatus, an image processing method, and an image processing program.

再照明（ｒｅｌｉｇｈｔｉｎｇ）とは、被写体が撮影されたときとは異なる照明環境下での被写体の見た目を再現する技術である。人物画像の再照明が実現できれば、ポートレート写真の陰影の修正、背景画像をさしかえる切り貼り合成など、様々な応用が考えられる。
一方で、物理則に基づいて人物画像の再照明を行うには、その画像から反射率、形状および照明を推定する必要がある。これは、逆レンダリング（ｉｎｖｅｒｓｅｒｅｎｄｅｒｉｎｇ）と呼ばれる不良設定問題である。このような逆レンダリングは、主に人物の顔画像に対して、統計的な３Ｄ形状モデルや畳み込みニューラルネットワーク（ｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋ；ＣＮＮ）を用いて実現されてきた。そして、その際の照明計算の定式化において、周囲から届く光の分布を表現するために球面調和関数（ｓｐｈｅｒｉｃａｌｈａｒｍｏｎｉｃｓ；ＳＨ）がよく用いられてきた。 Relighting is a technique for reproducing the appearance of a subject in a lighting environment different from that when the subject was photographed. If re-illumination of a portrait image can be realized, various applications such as correction of shadows in portrait photographs and cut-and-paste composition to support background images can be considered.
On the other hand, in order to reilluminate a person image based on the physical law, it is necessary to estimate the reflectance, shape and illumination from the image. This is a well-posed problem called reverse rendering. Such reverse rendering has been realized mainly for a person's face image by using a statistical 3D shape model or a convolutional neural network (CNN). Then, in the formulation of the lighting calculation at that time, the spherical harmonics (SH) have often been used to express the distribution of the light arriving from the surroundings.

S.Sengupta, A.Kanazawa, C.D.Castillo, and D.W.Jacobs, “SfSNet: Learning Shape, Reflectance and Illuminance of Faces ‘in the wild’” , in Conference on Computer Vision and Pattern Recognition 2018.S.Sengupta, A.Kanazawa, C.D.Castillo, and D.W.Jacobs, “SfSNet: Learning Shape, Reflectance and Illuminance of Faces‘ in the wild ’”, in Conference on Computer Vision and Pattern Recognition 2018.

ＳＨに基づく再照明では、光の遮蔽を無視すれば、照明情報と物体表面の法線ベクトルとから、解析的な式で照明計算を行うことができる。しかし光の遮蔽を無視すると、例えば人物の脇や股など、本来なら遮蔽のために光のあまり当たらない部分が不自然に明るくなってしまう。
光の遮蔽を近似的に扱う方法として、幾何的な遮蔽率をスカラー値として算出し陰影に乗算するアンビエントオクルージョンという技術もあるが、これは光源に依らず凹んだ部分を常に暗くしているため、特に再照明の際に光源を動かした場合に違和感が生じる。 In the reillumination based on SH, if the light shielding is ignored, the illumination calculation can be performed by an analytical formula from the illumination information and the normal vector of the object surface. However, if light shielding is ignored, parts that are not exposed to much light due to shielding, such as the sides and crotch of a person, will become unnaturally bright.
As a method of treating light shielding approximately, there is also a technique called ambient occlusion that calculates the geometrical shielding rate as a scalar value and multiplies it by the shadow, but this is because the recessed part is always darkened regardless of the light source. In particular, when the light source is moved during relighting, a feeling of strangeness occurs.

よりよい方法としては、リアルタイムレンダリング向けに開発された前計算放射輝度伝達（ｐｒｅｃｏｍｐｕｔｅｄｒａｄｉａｎｃｅｔｒａｎｓｆｅｒ；ＰＲＴ）で行われたように、光の遮蔽を含めてＳＨの照明計算を定式化することが知られている。
しかし、ＰＲＴの枠組みを逆レンダリングに組み込もうとすると、光の遮蔽を計算するためには、推定しようとしている形状が必要になる上に、物体表面上の各点から多数のレイを飛ばして可視判定を行う必要があり、計算コストが大きい。
本発明は、前述した問題を解決すべくなされたもので、入力画像に対応する出力画像を生成する場合に、写実性を向上できる画像処理装置、画像処理方法、及びプログラムを提供することを目的とする。 A better method is known to formulate SH illumination calculations, including light shielding, as was done with precomputed radiation transfer (PRT) developed for real-time rendering. ing.
However, if we try to incorporate the PRT framework into reverse rendering, we need the shape we are trying to estimate in order to calculate the light occlusion, and we also shoot a large number of rays from each point on the surface of the object. It is necessary to make a visibility judgment, and the calculation cost is high.
The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to provide an image processing device, an image processing method, and a program capable of improving the photorealism when generating an output image corresponding to an input image. And.

本発明の一実施形態は、入力画像を取得する取得部と、光源からの照明状態を示す情報である光源情報と、前記入力画像に含まれる被写体の反射率として前記入力画像に基づいて推定された反射率情報と、前記入力画像に含まれる前記被写体からの光の伝達状態として前記入力画像と前記光源からの光が前記被写体に届くか否かを示す可視情報とに基づいて推定された光伝達情報とに基づいて、前記入力画像に対応する出力画像を生成する生成
部と、前記生成部が生成した出力画像を出力する出力部とを備え、前記光源と前記光伝達情報とは、基底関数の係数として表現され、前記係数は機械学習モデルによって推定されたものである、画像処理装置である。
本発明の一実施形態は、前述の画像処理装置において、前記光源情報とは、前記入力画像に基づいて推定されたものである。
本発明の一実施形態は、前述の画像処理装置において、前記出力画像とは、前記光源情報が示す前記照明状態に基づいて、前記入力画像に含まれる前記被写体を再照明した再照明画像である。
本発明の一実施形態は、前述の画像処理装置において、前記反射率情報と前記光伝達情報とは、機械学習モデルによって、前記入力画像に基づいて推定されたものである。
本発明の一実施形態は、前述の画像処理装置において、前記機械学習モデルとは、前記機械学習モデルによって、前記入力画像に基づいて推定される光源情報と、反射率情報と、光伝達情報とのいずれか又はいずれか同士の組み合わせから得られる画像と、所定の正解画像とを比較することによって学習されたものである。
本発明の一実施形態は、前述の画像処理装置において、前記可視情報は、物体表面の複数の点の各々について、光の方向毎に、前記光が複数の前記点の各々に届くか否かに基づいて作成される。 One embodiment of the present invention is estimated based on the input image as the acquisition unit for acquiring the input image, the light source information which is the information indicating the lighting state from the light source, and the reflectance of the subject included in the input image. Light estimated based on the reflectance information and visible information indicating whether or not the light from the input image and the light source reaches the subject as the transmission state of the light contained in the input image from the subject. A generation unit that generates an output image corresponding to the input image based on the transmission information and an output unit that outputs the output image generated by the generation unit are provided , and the light source and the light transmission information are based on each other. It is an image processing device expressed as a coefficient of a function, which is estimated by a machine learning model .
In one embodiment of the present invention, in the image processing apparatus described above, the light source information is estimated based on the input image.
In one embodiment of the present invention, in the image processing apparatus described above, the output image is a reilluminated image in which the subject included in the input image is reilluminated based on the illumination state indicated by the light source information. ..
In one embodiment of the present invention, in the above-mentioned image processing apparatus, the reflectance information and the light transmission information are estimated based on the input image by a machine learning model.
In one embodiment of the present invention, in the image processing apparatus described above, the machine learning model includes light source information estimated based on the input image by the machine learning model, reflectance information, and light transmission information. It is learned by comparing an image obtained from any one of the above or a combination of any of the above with a predetermined correct answer image.
In one embodiment of the present invention, in the image processing apparatus described above, whether or not the visible information reaches each of the plurality of points on the surface of the object for each direction of light. Created based on.

本発明の一実施形態は、入力画像を取得するステップと、光源からの照明状態を示す情報である光源情報と、前記入力画像に含まれる被写体の反射率として前記入力画像に基づいて推定された反射率情報と、前記入力画像に含まれる前記被写体からの光の伝達状態として前記入力画像と前記光源からの光が前記被写体に届くか否かを示す可視情報とに基づいて推定された光伝達情報とに基づいて、前記入力画像に対応する出力画像を生成するステップと、前記生成するステップで生成された出力画像を出力するステップとを有し、前記光源と前記光伝達情報とは、基底関数の係数として表現され、前記係数は機械学習モデルによって推定されたものである、コンピュータが実行する画像処理方法である。 One embodiment of the present invention is estimated based on the input image as a step of acquiring an input image, light source information which is information indicating an illumination state from the light source, and a reflectance of a subject included in the input image. Light transmission estimated based on the reflectance information and visible information indicating whether or not the light from the input image and the light source reaches the subject as the transmission state of the light contained in the input image from the subject. Based on the information, it has a step of generating an output image corresponding to the input image and a step of outputting the output image generated in the step of generating , and the light source and the light transmission information are based on each other. Expressed as a coefficient of a function, the coefficient is an image processing method performed by a computer, estimated by a machine learning model .

本発明の一実施形態は、コンピュータに、入力画像を取得するステップと、光源からの照明状態を示す情報である光源情報と、前記入力画像に含まれる被写体の反射率として前記入力画像に基づいて推定された反射率情報と、前記入力画像に含まれる前記被写体からの光の伝達状態として前記入力画像と前記光源からの光が前記被写体に届くか否かを示す可視情報とに基づいて推定された光伝達情報とに基づいて、前記入力画像に対応する出力画像を生成するステップと、前記生成するステップで生成された出力画像を出力するステップとを実行させ、前記光源と前記光伝達情報とは、基底関数の係数として表現され、前記係数は機械学習モデルによって推定されたものである、画像処理プログラムである。 One embodiment of the present invention is based on a step of acquiring an input image to a computer, light source information which is information indicating an illumination state from the light source, and a reflectance of a subject included in the input image based on the input image. It is estimated based on the estimated reflectance information and visible information indicating whether or not the light from the input image and the light source reaches the subject as the transmission state of the light contained in the input image from the subject. Based on the light transmission information, the step of generating the output image corresponding to the input image and the step of outputting the output image generated in the generation step are executed, and the light source and the light transmission information are combined with each other. Is an image processing program expressed as a coefficient of a base function, the coefficient being estimated by a machine learning model .

本発明の実施形態によれば、入力画像に対応する出力画像を生成する場合に、写実性を向上できる画像処理装置、画像処理方法、及び画像処理プログラムを提供できる。 According to an embodiment of the present invention, it is possible to provide an image processing device, an image processing method, and an image processing program capable of improving the photorealism when generating an output image corresponding to an input image.

実施形態の画像処理システムの一例を示す図である。It is a figure which shows an example of the image processing system of an embodiment. 実施形態の画像処理装置の一例を示すブロック図である。It is a block diagram which shows an example of the image processing apparatus of embodiment. 可視関数を導出する処理の一例を示す図である。It is a figure which shows an example of the process which derives a visible function. 実施形態の画像処理装置で使用される機械学習モデルの一例を示す図である。It is a figure which shows an example of the machine learning model used in the image processing apparatus of embodiment. 実施形態の画像処理装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the image processing apparatus of an embodiment. 実施形態の画像処理装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the image processing apparatus of an embodiment. ＣＧ人物データセットの一例を示す図である。It is a figure which shows an example of a CG person data set. 実施形態の画像処理装置による推定結果と従来手法との比較例を示す図である。It is a figure which shows the comparative example of the estimation result by the image processing apparatus of embodiment, and the conventional method. 実施形態の画像処理装置による推定結果の一例を示す図である。It is a figure which shows an example of the estimation result by the image processing apparatus of embodiment. 実施形態の画像処理装置による推定結果の一例を示す図である。It is a figure which shows an example of the estimation result by the image processing apparatus of embodiment. 実施形態の画像処理装置による推定結果の一例を示す図である。It is a figure which shows an example of the estimation result by the image processing apparatus of embodiment.

次に、本実施形態の画像処理装置、画像処理方法、及び画像処理プログラムを、図面を参照しつつ説明する。以下で説明する実施形態は一例に過ぎず、本発明が適用される実施形態は、以下の実施形態に限られない。
なお、実施形態を説明するための全図において、同一の機能を有するものは同一符号を用い、繰り返しの説明は省略する。
また、本願でいう「ＸＸに基づく」とは、「少なくともＸＸに基づく」ことを意味し、ＸＸに加えて別の要素に基づく場合も含む。また、「ＸＸに基づく」とは、ＸＸを直接に用いる場合に限定されず、ＸＸに対して演算や加工が行われたものに基づく場合も含む。「ＸＸ」は、任意の要素（例えば、任意の情報）である。 Next, the image processing apparatus, the image processing method, and the image processing program of the present embodiment will be described with reference to the drawings. The embodiments described below are merely examples, and the embodiments to which the present invention is applied are not limited to the following embodiments.
In all the drawings for explaining the embodiment, the same reference numerals are used for those having the same function, and the repeated description will be omitted.
Further, "based on XX" as used in the present application means "based on at least XX", and includes a case where it is based on another element in addition to XX. Further, "based on XX" is not limited to the case where XX is directly used, but also includes the case where it is based on a case where calculation or processing is performed on XX. "XX" is an arbitrary element (for example, arbitrary information).

（実施形態）
（画像処理システム）
以下、本発明の実施形態の画像処理システムを、図面を参照して説明する。
図１は、実施形態の画像処理システムの一例を示す図である。本実施形態の画像処理システムでは、端末装置は、サーバー装置へアクセスし、マスク画像付きの人物の全身画像（以下「教師画像」という）を、サーバー装置へ送信する。画像処理装置は、端末装置が送信するマスク画像付きの教師画像を、サーバー装置から取得し、取得した教師画像に基づいて、教師あり学習を行う。
また、端末装置は、ショッピングサイトなどのウェブサイトへアクセスし、画像（再照明を行う被写体が撮像されたマスク付きの全身画像）（以下「入力画像」という）を、サーバー装置へ送信する。画像処理装置１００は、端末装置が送信する入力画像を、サーバー装置から取得する。画像処理装置１００は、取得した入力画像から特徴量を抽出し、抽出した特徴量と、教師あり学習の学習結果とに基づいて、入力画像に対応する出力画像を生成する。
実施形態の画像処理システム１は、端末装置１０－１～端末装置１０－ｎ（ｎは、ｎ＞０の整数）と、サーバー装置７０と、画像処理装置１００とを含む。これらの装置は、ネットワーク５０を介して互いに接続される。ネットワーク５０は、例えば、無線基地局、Ｗｉ－Ｆｉアクセスポイント、通信回線、プロバイダ、インターネットなどを含む。なお、図１に示す各装置の全ての組み合わせが相互に通信可能である必要はなく、ネットワークＮＷは、一部にローカルなネットワークを含んでもよい。
以下、端末装置１０－１～端末装置１０－ｎのうちの任意の端末装置を、端末装置１０と記載する。 (Embodiment)
(Image processing system)
Hereinafter, the image processing system according to the embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing an example of an image processing system of an embodiment. In the image processing system of the present embodiment, the terminal device accesses the server device and transmits a full-body image of a person with a mask image (hereinafter referred to as "teacher image") to the server device. The image processing device acquires a teacher image with a mask image transmitted by the terminal device from the server device, and performs supervised learning based on the acquired teacher image.
Further, the terminal device accesses a website such as a shopping site, and transmits an image (a full-body image with a mask in which the subject to be relighted is captured) (hereinafter referred to as "input image") to the server device. The image processing device 100 acquires an input image transmitted by the terminal device from the server device. The image processing device 100 extracts a feature amount from the acquired input image, and generates an output image corresponding to the input image based on the extracted feature amount and the learning result of supervised learning.
The image processing system 1 of the embodiment includes a terminal device 10-1 to a terminal device 10-n (n is an integer of n> 0), a server device 70, and an image processing device 100. These devices are connected to each other via the network 50. The network 50 includes, for example, a radio base station, a Wi-Fi access point, a communication line, a provider, the Internet, and the like. It should be noted that not all combinations of the devices shown in FIG. 1 need not be able to communicate with each other, and the network NW may include a local network in part.
Hereinafter, any terminal device among the terminal devices 10-1 to 10-n will be referred to as a terminal device 10.

端末装置１０は、ユーザによって使用される装置である。端末装置１０は、例えば、スマートフォンなどの携帯電話、タブレット端末、パーソナルコンピュータなどのコンピュータ装置である。例えば、端末装置１０は、ショッピングサイトなどのウェブサイト、メールサービス、ＳＮＳサービス、情報提供サービスなどにおいてユーザＩＤを登録し、登録したユーザＩＤに関連付けて画像を送信するのに利用される。
サーバー装置７０は、各種サービスを提供する。例えば、サーバー装置７０は、端末装置１０において起動されるウェブブラウザを介して、各種サービスを提供するためのウェブサイトを提供するウェブサーバー装置であってよい。また、サーバー装置７０は、所定のアプリケーションプログラムが起動（実行）された端末装置１０と通信を行うことで、各種情報の受け渡しを行うアプリケーションサーバー装置であってもよい。所定のアプリケーションプログラムが起動された端末装置１０には、サーバー装置７０との通信により、各種サービスを提供可能な画面が表示される。以下、説明を簡略化するために、サーバー装置７０がウェブサーバー装置である場合について説明を続ける。
例えば、サーバー装置７０は、サービスの提供前にユーザＩＤの認証を行い、ユーザの確認を行う。サーバー装置７０は、認証の結果、既にユーザＩＤが登録されたユーザであれば各種サービスを提供し、ユーザＩＤが登録されていないユーザであれば、ユーザＩＤが未登録であることを通知したり、ユーザＩＤの登録を促したりする。ユーザＩＤが未登録であることを受けて、ユーザが新規にユーザＩＤの登録した場合、サーバー装置７０は、新たに登録されたユーザＩＤを発行する。これによって、ユーザは新規にユーザＩＤを取得することができる。ユーザは、端末装置１０を操作することによって、登録したユーザＩＤに関連付けて、被写体を含む画像を、サーバー装置７０へ送信する。 The terminal device 10 is a device used by the user. The terminal device 10 is, for example, a computer device such as a mobile phone such as a smartphone, a tablet terminal, or a personal computer. For example, the terminal device 10 is used to register a user ID on a website such as a shopping site, a mail service, an SNS service, an information providing service, or the like, and transmit an image in association with the registered user ID.
The server device 70 provides various services. For example, the server device 70 may be a web server device that provides a website for providing various services via a web browser activated by the terminal device 10. Further, the server device 70 may be an application server device that exchanges various information by communicating with the terminal device 10 in which a predetermined application program is started (executed). The terminal device 10 in which the predetermined application program is activated displays a screen capable of providing various services by communicating with the server device 70. Hereinafter, for the sake of brevity, the case where the server device 70 is a web server device will be continued.
For example, the server device 70 authenticates the user ID and confirms the user before providing the service. As a result of authentication, the server device 70 provides various services if the user has already registered the user ID, and if the user has not registered the user ID, notifies that the user ID has not been registered. , Prompt the registration of user ID. When the user newly registers the user ID in response to the fact that the user ID has not been registered, the server device 70 issues the newly registered user ID. As a result, the user can newly acquire the user ID. By operating the terminal device 10, the user transmits an image including a subject to the server device 70 in association with the registered user ID.

画像処理装置１００は、教師あり学習を行う。例えば、画像処理装置１００は、教師画像として、マスク画像付きの人物の全身画像を取得する。例えば、画像処理装置１００は、端末装置１０が送信した教師画像を、サーバー装置７０から取得する。画像処理装置１００は、取得した教師画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを（自動で）推定する。ここで、光伝達ベクトルには、光の遮蔽情報が含まれる。以下、光伝達ベクトルに、光の遮蔽情報が含まれる場合について説明を続ける。
画像処理装置１００は、画素ごとに推定した反射率から反射率マップΛを作成し、画素ごとに推定した光伝達ベクトルから光伝達マップΨを作成する。画像処理装置１００は、作成した反射率マップΛと、作成した光伝達マップΨと、推定した光源情報Πとを特徴量とし、その特徴量が教師画像に対応することを学習する。 The image processing device 100 performs supervised learning. For example, the image processing device 100 acquires a full-body image of a person with a mask image as a teacher image. For example, the image processing device 100 acquires the teacher image transmitted by the terminal device 10 from the server device 70. The image processing apparatus 100 (automatically) estimates the reflectance, the light transfer vector, and the light source information for each pixel from the acquired teacher image. Here, the light transfer vector includes light shielding information. Hereinafter, the case where the light transmission vector includes the light shielding information will be described.
The image processing device 100 creates a reflectance map Λ from the reflectance estimated for each pixel, and creates a light transfer map Ψ from the light transfer vector estimated for each pixel. The image processing apparatus 100 uses the created reflectance map Λ, the created light transmission map Ψ, and the estimated light source information Π as feature quantities, and learns that the feature quantities correspond to the teacher image.

画像処理装置１００は、サーバー装置７０と通信を行って、サーバー装置７０が提供するサービスを利用するユーザのユーザＩＤを取得し、このユーザＩＤに関連付けて送信された入力画像（再照明を行う被写体が撮像されたマスク付きの全身画像）を取得する。画像処理装置１００は、取得した入力画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを（自動で）推定する。
画像処理装置１００は、画素ごとに推定した反射率から反射率マップΛを作成し、画素ごとに推定した光伝達ベクトルから光伝達マップΨを作成する。画像処理装置１００は、作成した反射率マップΛと、作成した光伝達マップΨと、推定した光源情報Πとを特徴量とし、その特徴量に基づいて、入力画像に対応する出力画像を生成する。画像処理装置１００は、生成した出力画像を出力する。 The image processing device 100 communicates with the server device 70 to acquire a user ID of a user who uses the service provided by the server device 70, and an input image (subject to be relighted) transmitted in association with the user ID. Acquires a full-body image with a mask captured by. The image processing apparatus 100 (automatically) estimates the reflectance, the light transfer vector, and the light source information for each pixel from the acquired input image.
The image processing device 100 creates a reflectance map Λ from the reflectance estimated for each pixel, and creates a light transfer map Ψ from the light transfer vector estimated for each pixel. The image processing device 100 uses the created reflectance map Λ, the created light transmission map Ψ, and the estimated light source information Π as feature quantities, and generates an output image corresponding to the input image based on the feature quantities. .. The image processing device 100 outputs the generated output image.

（画像処理装置１００）
図２は、実施形態の画像処理装置の一例を示すブロック図である。
画像処理装置１００は、通信部１０５と、記憶部１１０と、操作部１２０と、情報処理部１３０と、表示部１４０と、前記各構成要素を図２に示されているように電気的に接続するためのアドレスバスやデータバスなどのバスライン１５０とを備える。
通信部１０５は、通信モジュールによって実現される。通信部１０５は、ネットワーク５０を介して、サーバー装置７０などの外部の通信装置と通信する。具体的には、通信部１０５は、サーバー装置７０が送信した教師画像を受信し、受信した教師画像を、情報処理部１３０へ出力する。ここで、教師画像の一例は、市販の３Ｄスキャンされた人物のメッシュモデルをレンダリングして作成された画像である。市販の３Ｄ人物モデルの多くは、テクスチャとして光沢成分を含まず拡散反射成分しか持たない。このため、本実施形態では、反射率として、拡散反射成分を対象とした場合について説明する。市販されている３Ｄ人物モデルの数は限られており、実施形態で用いるモデルの数は訓練データとテストデータとを合わせても数百体しかない。衣服のバリエーションが無数にあることを考えると訓練データが足りないようにも思えるが、実際には、本実施形態で用意したデータを用いて学習することで、服の皺、脇や股などの光の遮蔽が生じやすい部位についても、光の遮蔽を考慮した陰影計算ができる。
また、通信部１０５は、サーバー装置７０が送信した入力画像を受信し、受信した入力画像を、情報処理部１３０へ出力する。また、通信部１０５は、情報処理部１３０が出力した出力画像を取得し、取得した出力画像を、サーバー装置７０へ送信する。 (Image processing device 100)
FIG. 2 is a block diagram showing an example of the image processing apparatus of the embodiment.
The image processing device 100 electrically connects the communication unit 105, the storage unit 110, the operation unit 120, the information processing unit 130, the display unit 140, and each of the above components as shown in FIG. It is equipped with a bus line 150 such as an address bus and a data bus for information processing.
The communication unit 105 is realized by a communication module. The communication unit 105 communicates with an external communication device such as the server device 70 via the network 50. Specifically, the communication unit 105 receives the teacher image transmitted by the server device 70, and outputs the received teacher image to the information processing unit 130. Here, an example of the teacher image is an image created by rendering a mesh model of a commercially available 3D-scanned person. Most of the commercially available 3D human models do not contain a gloss component as a texture and have only a diffuse reflection component. Therefore, in the present embodiment, a case where a diffuse reflection component is targeted as the reflectance will be described. The number of 3D human models on the market is limited, and the number of models used in the embodiments is only a few hundred even if the training data and the test data are combined. Considering that there are innumerable variations of clothes, it seems that training data is insufficient, but in reality, by learning using the data prepared in this embodiment, wrinkles, sides, crotch, etc. of clothes can be used. Shadow calculation can be performed in consideration of light shielding even for areas where light shielding is likely to occur.
Further, the communication unit 105 receives the input image transmitted by the server device 70, and outputs the received input image to the information processing unit 130. Further, the communication unit 105 acquires the output image output by the information processing unit 130, and transmits the acquired output image to the server device 70.

記憶部１１０は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置などにより実現される。記憶部１１０には、情報処理部１３０により実行されるプログラム１１１と、アプリ１１２と、反射率マップΛ１１３と、光源情報Π１１４と、光伝達マップΨ１１５とが記憶される。 The storage unit 110 is realized by, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), a flash memory, or a hybrid storage device in which a plurality of these are combined. The storage unit 110 stores the program 111 executed by the information processing unit 130, the application 112, the reflectance map Λ113, the light source information Π114, and the light transmission map Ψ115.

アプリ１１２は、画像処理装置１００に、サーバー装置７０が送信した教師画像を取得させる。アプリ１１２は、画像処理装置１００に、取得させた教師画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定させる。アプリ１１２は、画像処理装置１００に、画素ごとに推定させた反射率から反射率マップΛを作成させ、画素ごとに推定させた光伝達ベクトルから光伝達マップΨを作成させる。アプリ１１２は、画像処理装置１００に、作成させた反射率マップΛと、作成させた光伝達マップΨと、推定させた光源情報Πとを特徴量とさせ、その特徴量が教師画像に対応することを学習させる。 The application 112 causes the image processing device 100 to acquire the teacher image transmitted by the server device 70. The application 112 causes the image processing device 100 to estimate the reflectance, the light transfer vector, and the light source information for each pixel from the acquired teacher image. The application 112 causes the image processing device 100 to create a reflectance map Λ from the reflectance estimated for each pixel, and to create a light transmission map Ψ from the light transfer vector estimated for each pixel. The application 112 causes the image processing device 100 to use the created reflectance map Λ, the created light transmission map Ψ, and the estimated light source information Π as feature quantities, and the feature quantities correspond to the teacher image. Let them learn that.

アプリ１１２は、画像処理装置１００に、サーバー装置７０が送信した入力画像を取得させる。アプリ１１２は、画像処理装置１００に、取得させた入力画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定させる。アプリ１１２は、画像処理装置１００に、画素ごとに推定させた反射率から反射率マップΛを作成させ、画素ごとに推定させた光伝達ベクトルから光伝達マップΨを作成させる。アプリ１１２は、画像処理装置１００に、画素ごとに作成させた反射率マップΛと、画素ごとに作成させた光伝達マップΨと、推定させた光源情報Πとを特徴量とさせ、その特徴量に基づいて、入力画像に対応する出力画像を生成させる。
反射率マップΛ１１３は、教師画像の反射率を、画素ごとに表したマップである。
光源情報Π１１４は、教師画像の照明状態を示す情報である。
光伝達マップΨ１１５は、物体の幾何的な情報が記録された光伝達ベクトルを、画素ごとに表したマップである。光伝達マップΨ１１５と、光伝達ベクトルとは、光源の情報とは分離されており、光源に依存しない。物体の幾何的な情報には、光が遮蔽されるか否かを示す情報である光の遮蔽情報が含まれる。光伝達マップΨ１１５と、光伝達ベクトルとが、光源の情報とは分離されていることによって、照明計算を正しく行うことができる。 The application 112 causes the image processing device 100 to acquire the input image transmitted by the server device 70. The application 112 causes the image processing device 100 to estimate the reflectance, the light transfer vector, and the light source information for each pixel from the acquired input image. The application 112 causes the image processing device 100 to create a reflectance map Λ from the reflectance estimated for each pixel, and to create a light transmission map Ψ from the light transfer vector estimated for each pixel. The application 112 makes the image processing device 100 feature the reflectance map Λ created for each pixel, the light transmission map Ψ created for each pixel, and the estimated light source information Π, and the feature quantity thereof. Is to generate an output image corresponding to the input image.
The reflectance map Λ113 is a map showing the reflectance of the teacher image for each pixel.
The light source information Π114 is information indicating the illumination state of the teacher image.
The light transfer map Ψ115 is a map representing a light transfer vector in which geometric information of an object is recorded for each pixel. The light transfer map Ψ115 and the light transfer vector are separated from the information of the light source and do not depend on the light source. The geometric information of the object includes the light shielding information which is the information indicating whether or not the light is shielded. Since the light transfer map Ψ115 and the light transfer vector are separated from the information of the light source, the illumination calculation can be performed correctly.

操作部１２０は、例えば、タッチパネルなどによって構成され、表示部１４０に表示される画面に対するタッチ操作を検出し、タッチ操作の検出結果を、情報処理部１３０へ出力する。 The operation unit 120 is configured by, for example, a touch panel or the like, detects a touch operation on the screen displayed on the display unit 140, and outputs the detection result of the touch operation to the information processing unit 130.

情報処理部１３０の全部または一部は、例えば、ＣＰＵ（Central Processing Unit）などのプロセッサが記憶部１１０に格納されたプログラム１１１を実行することにより実現される機能部（以下、ソフトウェア機能部と称する）である。なお、情報処理部１３０の全部または一部は、ＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、またはＦＰＧＡ（Field-Programmable Gate Array）などのハードウェアにより実現されてもよく、ソフトウェア機能部とハードウェアとの組み合わせによって実現されてもよい。
情報処理部１３０は、例えば、取得部１３１と、分析部１３２と、機械学習部１３３と、生成部１３４と、出力部１３５とを備える。 All or part of the information processing unit 130 is, for example, a functional unit (hereinafter referred to as a software functional unit) realized by executing a program 111 stored in the storage unit 110 by a processor such as a CPU (Central Processing Unit). ). All or part of the information processing unit 130 may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array), and is a software function. It may be realized by a combination of a unit and hardware.
The information processing unit 130 includes, for example, an acquisition unit 131, an analysis unit 132, a machine learning unit 133, a generation unit 134, and an output unit 135.

取得部１３１は、通信部１０５が出力した教師画像を取得し、取得した教師画像を、分析部１３２へ出力する。また、取得部１３１は、通信部１０５が出力した入力画像を取得し、取得した入力画像を、分析部１３２へ出力する。
分析部１３２は、取得部１３１が出力した教師画像を取得し、取得した教師画像を分析することによって、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定する。分析部１３２は、画素ごとに推定した反射率から反射率マップΛを作成し、画素ごとに推定した光伝達ベクトルから光伝達マップΨを作成する。分析部１３２は、作成した反射率マップΛを反射率マップΛ１１３に記憶させ、作成した光伝達マップΨを光伝達マップΨ１１５に記憶させ、推定した光源情報Πを光源情報Π１１４に記憶させる。分析部１３２は、反射率マップΛと光伝達マップΨと光源情報Πとを、機械学習部１３３へ出力する。
ここで、分析部１３２が、光伝達マップΨを作成する処理について説明する。分析部１３２は、球面調和関数（ＳＨ）に基づいて、光の遮蔽を考慮して、照明計算を行う。
まず、光の遮蔽を考慮しないで、照明計算を行う場合について説明する。
（１）光の遮蔽を考慮しない場合
光の遮蔽や相互反射を考慮しない場合、物体表面上の点ｐにおいて、単位法線ベクトルをｎとすると、式(１)に示されるように、放射照度Ｅ（ｎ）は法線ｎが定義する半球Ω（ｎ）のあらゆる方向ω_ｉから差し込む放射輝度Ｌ（ω_ｉ）の積分によって計算される。 The acquisition unit 131 acquires the teacher image output by the communication unit 105, and outputs the acquired teacher image to the analysis unit 132. Further, the acquisition unit 131 acquires the input image output by the communication unit 105, and outputs the acquired input image to the analysis unit 132.
The analysis unit 132 acquires the teacher image output by the acquisition unit 131 and analyzes the acquired teacher image to estimate the reflectance, the light transfer vector, and the light source information for each pixel. The analysis unit 132 creates a reflectance map Λ from the reflectance estimated for each pixel, and creates a light transfer map Ψ from the light transfer vector estimated for each pixel. The analysis unit 132 stores the created reflectance map Λ in the reflectance map Λ113, stores the created light transmission map Ψ in the light transmission map Ψ115, and stores the estimated light source information Π in the light source information Π114. The analysis unit 132 outputs the reflectance map Λ, the light transmission map Ψ, and the light source information Π to the machine learning unit 133.
Here, the process of creating the optical transfer map Ψ by the analysis unit 132 will be described. The analysis unit 132 performs the illumination calculation based on the spherical harmonics (SH) in consideration of the light shielding.
First, a case where the lighting calculation is performed without considering the light shielding will be described.
(1) When light shielding is not considered When light shielding and mutual reflection are not considered, irradiance as shown in equation (1), where n is the unit normal vector at the point p on the object surface. E (n) is calculated by integrating the irradiance L (ω _i ) inserted from all directions ω _i of the hemisphere Ω (n) defined by the normal n.

なお、位置ｐに関する依存性は単純化のために省略した。入射する放射輝度の分布Ｌ（ω_ｉ）にコサイン減衰項ｍａｘ（ｎ，ω_ｉ，０）をかけたものを球面調和関数に射影する。方向ベクトルωと仰角θと方位角φとを用いてω＝（θ，φ）とパラメータ表示すると、式（２）、式（３）が得られる。 The dependency on position p is omitted for the sake of simplicity. The distribution L (ω _i ) of the incident radiance multiplied by the cosine attenuation term max (n, ω _i , 0) is projected onto the spherical harmonics. When ω = (θ, φ) is displayed as a parameter using the direction vector ω, the elevation angle θ, and the azimuth angle φ, equations (2) and (3) are obtained.

式（２）において、Ｙ_ｌ，ｍは球面調和関数である（ただし、ｌ≧０，－ｌ≦ｍ≦ｌ，かつｍ≦２）。式（２）と式（３）とにおいて、Ｌ_ｌ，ｍとＡ_ｌはそれぞれ、照明とコサイン減衰項の係数である。式（１）の積分を書き換えると、式（４）となる。 In equation (2), Y _{l and m} are spherical harmonics (where l ≧ 0, −l ≦ m ≦ l, and m ≦ 2). In equations (2) and (3), L _l _{, m} and All are coefficients of the illumination and cosine attenuation terms, respectively. When the integral of the equation (1) is rewritten, the equation (4) is obtained.

ここで、＾Ａ_ｌは、以下のように表される。 Here, ^ _All is expressed as follows.

Ｙ_ｌ，ｍは、法線ベクトルｎ＝（ｘ，ｙ，ｚ）^Ｔの各成分ｘ，ｙ，ｚの多項式によって表現できる。係数列｛Ｌ_ｌ，ｍ｝をベクトルＬで、基底関数列｛＾Ａ_ｌＹ_ｌｍ｝をベクトル＾Ｙで表現すると、放射照度Ｅはベクトルの内積で計算できる。 Y _{l, m} can be expressed by a polynomial of each component x, y, z of the normal vector n = (x, y, z) ^T. If the coefficient sequence {L _{l, m} } is represented by the vector L and the basis function sequence {^ A _l Y _lm } is represented by the vector ^ Y, the irradiance E can be calculated by the inner product of the vectors.

Ｅ＝＾Ｙ^ＴＬ（５） E = ^ ^YTL (5)

次に、光の遮蔽を考慮して照明計算を行う場合について説明する。
（２）光の遮蔽を考慮する場合
（１）光の遮蔽を考慮しない場合で述べた通り、本来は光が遮られて暗くなるべきところが不自然に明るくなる。式（１）で光の遮蔽を考慮するために、本実施形態では、可視関数Ｖ（ω_ｉ）を導入する。 Next, a case where the lighting calculation is performed in consideration of the light shielding will be described.
(2) When considering light shielding (1) As described in the case where light shielding is not considered, the place where light is originally blocked and should be dark becomes unnaturally bright. In this embodiment, the visible function V (ω _i ) is introduced in order to consider the light shielding in the equation (1).

式（６）において、Ｖ（ω_ｉ）は、入射方向ω_ｉの光が遮られれば０であり、遮られなければ１である。
図３は、可視関数Ｖ（ωｉ）を導出する処理の一例を示す図である。図３に示されるように、十分に離れたところ（無限遠）から、物体に光を放射することを考える。この場合、十分に離れたところ（無限遠）から届く光は平行光とみなすことができる。周囲から届く光（環境光源）は無数の平行光からなる。本実施形態では、平行光の入射方向ωｉについて、物体表面のすべての点ｐ（光伝達マップのすべての画素）における、方向ωｉに関する可視関数Ｖ（ωｉ，ｐ）を並列で求める。つまり、光源方向を一つ選ぶたびに、物体表面のすべての点で、その方向に関する可視関数を一度にすべて求める。これを、光源方向を変えながら繰り返す。光源方向は、乱数に基づいて、ランダムに選択されてもよい。図３に示される例では、光源方向ＬＤ－１が選択され、物体表面の複数の点ｐ－１、ｐ－２、ｐ－３、ｐ－４、ｐ－５の各々について、光源方向ＬＤ－１からの光が、複数の点ｐ－１、ｐ－２、ｐ－３、ｐ－４、ｐ－５の各々に届くか否かを示す可視関数が求められる。
グラフィクス用ハードウェア（ＧＰＵ）を用いる従来の方法では、物体表面の各点で、可視関数を計算するために物体形状を描画する必要があった。この方法では物体表面の点の数が膨大になると、物体形状の描画回数が増え、計算に時間がかかる。具体的には、仮に１０２４×１０２４画素（のマスク内の）各画素で計算すると、マスクが画像の半分を覆っているとしても５２４２８８画素あり、その画素数の分だけ物体形状を描画する必要がある。また、物体表面上の各点でランダムな方向に向かってレイトレーシングして可視判定を行う場合には、効率が悪かった。
一方、本実施形態では、可視関数を評価するために物体形状を描画するのは光源方向一つ毎でよい。実際に考慮する光源方向の数を６４２方向と仮定した場合、上記と比べて格段に少なく、計算時間を短縮できる。図２に戻り説明を続ける。
ＰＲＴの枠組みでは、この可視関数Ｖ（ωｉ）とコサイン減衰項とともに前計算し、ＳＨの基底関数に投影して、係数ベクトルの内積による高速な照明条件を実現する。ベクトルＴを、可視関数とコサイン減衰項を基底関数に投影した際の係数からなるベクトルとすると、放射照度Ｅは次のように計算できる。 In the formula (6), V (ω _i ) is 0 if the light in the incident direction ω _i is blocked, and 1 if the light in the incident direction ω i is not blocked.
FIG. 3 is a diagram showing an example of a process for deriving the visible function V (ωi). As shown in FIG. 3, consider radiating light to an object from a sufficiently distant place (infinity). In this case, light arriving from a sufficiently distant place (infinity) can be regarded as parallel light. The light that arrives from the surroundings (environmental light source) consists of innumerable parallel lights. In the present embodiment, for the incident direction ωi of parallel light, the visible function V (ωi, p) with respect to the direction ωi at all points p (all pixels of the optical transmission map) on the object surface is obtained in parallel. That is, each time one light source direction is selected, all visible functions for that direction are obtained at once at all points on the surface of the object. This is repeated while changing the direction of the light source. The light source direction may be randomly selected based on a random number. In the example shown in FIG. 3, the light source direction LD-1 is selected, and the light source direction LD- is selected for each of the plurality of points p-1, p-2, p-3, p-4, and p-5 on the surface of the object. A visible function indicating whether or not the light from 1 reaches each of the plurality of points p-1, p-2, p-3, p-4, and p-5 is required.
In the conventional method using graphics hardware (GPU), it is necessary to draw an object shape in order to calculate a visible function at each point on the object surface. In this method, when the number of points on the surface of the object becomes enormous, the number of times the object shape is drawn increases, and it takes time to calculate. Specifically, if you calculate with each pixel (in the mask) of 1024 x 1024 pixels, even if the mask covers half of the image, there are 524288 pixels, and it is necessary to draw the object shape by the number of pixels. be. In addition, when ray tracing is performed at each point on the surface of the object in a random direction to determine visibility, the efficiency is poor.
On the other hand, in the present embodiment, the object shape may be drawn for each light source direction in order to evaluate the visible function. Assuming that the number of light source directions actually considered is 642, the calculation time can be shortened because the number is much smaller than the above. Returning to FIG. 2, the explanation will be continued.
In the framework of PRT, the visible function V (ωi) and the cosine attenuation term are pre-calculated and projected onto the basis function of SH to realize high-speed lighting conditions by the inner product of the coefficient vectors. If the vector T is a vector consisting of the coefficients when the visible function and the cosine attenuation term are projected onto the basis function, the irradiance E can be calculated as follows.

Ｅ＝Ｔ^ＴＬ（７） E = ^TTL (7)

ここで、光伝達ベクトルはベクトルＴであり、光伝達ベクトルを各画素に持つ画像を光伝達マップΨと呼ぶ。
また、分析部１３２は、取得部１３１が出力した入力画像を取得し、取得した入力画像を分析することによって、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定する。分析部１３２は、推定した反射率から反射率マップΛを作成し、推定した光伝達ベクトルから光伝達マップΨを作成する。分析部１３２は、反射率マップΛと光伝達マップΨと光源情報Πとを、生成部１３４へ出力する。 Here, the light transfer vector is a vector T, and an image having a light transfer vector in each pixel is called a light transfer map Ψ.
Further, the analysis unit 132 acquires the input image output by the acquisition unit 131 and analyzes the acquired input image to estimate the reflectance, the light transfer vector, and the light source information for each pixel. The analysis unit 132 creates a reflectance map Λ from the estimated reflectance, and creates a light transfer map Ψ from the estimated light transfer vector. The analysis unit 132 outputs the reflectance map Λ, the light transmission map Ψ, and the light source information Π to the generation unit 134.

機械学習部１３３は、分析部１３２が出力した反射率マップΛと光伝達マップΨと光源情報Πとを取得し、取得した反射率マップΛと光伝達マップΨと光源情報Πとを特徴量とし、その特徴量が教師画像であることを学習する。本実施形態では、機械学習モデルの一例として、畳み込みニューラルネットワーク(Convolutional neural network: CNN)を使用した場合について説明を続ける。 The machine learning unit 133 acquires the reflectance map Λ, the light transmission map Ψ, and the light source information Π output by the analysis unit 132, and uses the acquired reflectance map Λ, the light transmission map Ψ, and the light source information Π as feature quantities. , Learn that the feature quantity is a teacher image. In this embodiment, the case where a convolutional neural network (CNN) is used as an example of a machine learning model will be described.

ここで、機械学習部１３３がＣＮＮによる機械学習を行う際に使用する損失関数について説明する。
人物画像データセットＤ_Ｈは、各人物モデルに対し、画像中で人物が写っている部分を表す二値マスク、反射率マップΛ、光伝達マップΨからなる。二値マスクと、反射率マップΛと、光伝達マップΨとは、以下のように表される。 Here, the loss function used by the machine learning unit 133 when performing machine learning by CNN will be described.
The person image data set _DH is composed of a binary mask representing a part of the image in which a person is shown, a reflectance map Λ, and a light transmission map Ψ for each person model. The binary mask, the reflectance map Λ, and the light transfer map Ψ are expressed as follows.

一方、光源データセットＤ_Ｌは、光源ごとに照明情報を持ち、これはＲＧＢのチャンネルごとのＳＨの９係数からなる。照明情報は、以下のように表される。 On the other hand, the light source data set _DL has lighting information for each light source, which consists of 9 coefficients of SH for each RGB channel. Lighting information is expressed as follows.

なお、二値マスクは正解データおよびＣＮＮの出力に乗算してから損失関数を計算するが（例えばＭ^３ _ｊ*Λ_ｊやＭ^９ _ｊ*Ψ_ｊで、*は要素ごとの乗算を表す）、以下の説明では単純化のために二値マスクの乗算は省略する。
本実施形態で使用するＣＮＮは、反射率マップΛと、光伝達マップΨと、光源情報Πとを推定する。ＣＮＮへの入力は、マスク乗算済の人物全身画像Ｉ_ｊ，ｋ＝Λ_ｊ*（Ψ_ｊΠ_ｋ）である。本実施形態では、機械学習部１３３は、損失関数として、反射率マップΛと、法線マップと、光源情報Πとのそれぞれについて正解とＬ２損失、そしてそれらの積である人物全身画像の推定値の正解とＬ２損失という４種類の損失関数について、Ｌ１損失を導出し、さらに以下の損失関数を含め、合計で１５種類の損失関数を計算する。 The binary mask multiplies the correct data and the output of CNN before calculating the loss function (for example, M ³ _j * Λ _j or M ⁹ _j * Ψ _j , where * represents the multiplication for each element). In the following explanation, the multiplication of the binary mask is omitted for the sake of simplicity.
The CNN used in this embodiment estimates the reflectance map Λ, the light transmission map Ψ, and the light source information Π. The input to the CNN is a mask-multiplied human whole-body image I _{j, k} = Λ _j * (Ψ _j Π _k ). In the present embodiment, the machine learning unit 133 has, as a loss function, the correct answer and the L2 loss for each of the reflectance map Λ, the normal map, and the light source information Π, and the estimated value of the whole body image of the person which is the product thereof. For the four types of loss functions, the correct answer and the L2 loss, the L1 loss is derived, and a total of 15 types of loss functions are calculated, including the following loss functions.

・反射率マップΛおよび光伝達マップΨに関するＬ１ｔｏｔａｌｖａｒｉａｔｉｏｎ（ＴＶ）損失（２通り）
・陰影マップ（つまりΨ_ｊΠ_ｋ）に関して、光伝達マップΨと光源情報Πを、一方を推定値とし他方を正解とした場合および両方を推定値とした場合のＬ１損失（３通り）
・反射率マップΛ、光伝達マップΨおよび光源情報Πの３つの積（つまりΛ_ｊ*（Ψ_ｊΠ_ｋ））に対し、そのうちの１つまたは２つを推定値にした場合のＬ１損失（６通り）
なお、それぞれの損失関数の重みは全て１とした。
このように、推定値だけでなく正解も含めた積に対してＬ１損失を計算することは、推定値に対する重み付けを行っていることに相当する。損失関数の種類を増やすことによって、より鮮明な推定結果を得ることができる。 L1 total variation (TV) loss for reflectance map Λ and optical transfer map Ψ (2 ways)
-For the shadow map (that is, Ψ _j Π _k ), the L1 loss (3 ways) when the light transmission map Ψ and the light source information Π are used as the estimated value for one and the correct answer for the other, and when both are estimated values.
-L1 loss (L1 loss) when one or two of the three products (that is, Λ _j * (Ψ _j Π _k )) of the reflectance map Λ, the light transmission map Ψ, and the light source information Π are used as estimated values. 6 ways)
The weight of each loss function was set to 1.
In this way, calculating the L1 loss for the product including not only the estimated value but also the correct answer corresponds to weighting the estimated value. By increasing the types of loss functions, clearer estimation results can be obtained.

本実施形態で使用するＣＮＮは、推定した反射率(アルベド)マップΛと、光伝達マップΨと、光源Πとの３つを掛け合わせなくても、掛け合わせる途中のデータについても、正解と一致しているかどうかを測ることができる。例えば、ＣＮＮでは、光伝達マップΨと光源Πとを掛け合わせることで、服や肌の色や模様のついていない、純粋に形状だけに依存した陰影マップを取得する。
この取得した陰影マップを、正解の光伝達マップΨと正解の光源Πとで作ることによって、正解の陰影マップが得られる。これと推定したデータとを比較するときに、光伝達マップΨと光源Πとのうち、どちらか一方が正解で他方が推定したもの、あるいは両方とも推定したデータに対して、正解との誤差を測ることによって、３通りの損失関数が得られる。このようにして組み合わせを考えると、陰影マップについて３通り、入力画像の復元に関して６通り(３つとも推定した場合を除く)の組み合わせが得られる。
この他に、光伝達マップΨと、反射率マップΛとについてのＬ１ＴＶｌｏｓｓで、計２つ、反射率マップΛと、光伝達マップΨと、光源および入力画像をすべて推定データで復元する場合を考えると計４つ、合計で１５種類の損失関数を導出する。 The CNN used in this embodiment is the same as the correct answer even for the data in the process of being multiplied without multiplying the estimated reflectance (albedo) map Λ, the optical transmission map Ψ, and the light source Π. You can measure whether you are doing it. For example, in CNN, by multiplying the light transmission map Ψ and the light source Π, a shadow map that is purely dependent on the shape and has no clothes or skin color or pattern is obtained.
By creating this acquired shadow map with the correct light transmission map Ψ and the correct light source Π, the correct shadow map can be obtained. When comparing this with the estimated data, the error between the correct answer and the estimated data of either one of the optical transmission map Ψ and the light source Π is correct and the other is estimated, or both are estimated. By measuring, three kinds of loss functions can be obtained. Considering the combinations in this way, three combinations can be obtained for the shadow map and six combinations (except when all three are estimated) for the restoration of the input image.
In addition to this, in the case of restoring a total of two, the reflectance map Λ, the light transfer map Ψ, the light source, and the input image with estimated data in the L1 TV loss for the light transmission map Ψ and the reflectance map Λ. Considering the above, a total of four loss functions, a total of 15 types, are derived.

生成部１３４は、分析部１３２が出力した反射率マップΛと光伝達マップΨと光源情報Πとを取得し、取得した反射率マップΛと光伝達マップΨと光源情報Πとを特徴量とし、その特徴量に基づいて、出力画像を生成する。生成部１３４は、生成した出力画像を、出力部１３５へ出力する。
出力部１３５は、生成部１３４が出力した出力画像を取得し、取得した出力画像を、通信部１０５へ出力する。 The generation unit 134 acquires the reflectance map Λ, the light transmission map Ψ, and the light source information Π output by the analysis unit 132, and features the acquired reflectance map Λ, the light transmission map Ψ, and the light source information Π. An output image is generated based on the feature amount. The generation unit 134 outputs the generated output image to the output unit 135.
The output unit 135 acquires the output image output by the generation unit 134, and outputs the acquired output image to the communication unit 105.

次に、本実施形態の画像処理装置１００で用いる機械学習モデルについて説明する。
図４は、本実施形態の画像処理装置で使用する機械学習モデルの一例を示す図である。
本実施形態の画像処理装置１００で使用する機械学習モデル２００の一例は、前述したように、ＣＮＮである。
機械学習モデル２００は、乗算部２０１と、エンコーダ２０２と、反射率マップデコーダー２０３と、光伝達マップデコーダー２０６と、畳み込み部（Ｃｏｎｖ．）２１０とによって表される。反射率マップデコーダー２０３は、ＲｅｓＮｅｔブロック２０４と、逆畳み込み部（Ｄｅｃｏｎｖ．）２０５とを含む。光伝達マップデコーダー２０６は、ＲｅｓＮｅｔブロック２０７と、逆畳み込み部（Ｄｅｃｏｎｖ．）２０８とを含む。
なお、畳み込み部２１０と、ＲｅｓＮｅｔブロック２０４と、逆畳み込み部２０５と、ＲｅｓＮｅｔブロック２０７と、逆畳み込み部２０８との各々の直後には、データが入出力される最初と最後の層を除きバッチノーマライゼーション（ｂａｔｃｈｎｏｒｍａｌｉｚａｔｉｏｎ）およびＲｅＬＵを適用する。また、逆畳み込み部２０５と、逆畳み込み部２０８との各々の最初の逆畳み込み層各３層の直後には確率０．５のドロップアウトを適用する。 Next, the machine learning model used in the image processing apparatus 100 of the present embodiment will be described.
FIG. 4 is a diagram showing an example of a machine learning model used in the image processing apparatus of the present embodiment.
As described above, an example of the machine learning model 200 used in the image processing apparatus 100 of the present embodiment is a CNN.
The machine learning model 200 is represented by a multiplication unit 201, an encoder 202, a reflectance map decoder 203, an optical transfer map decoder 206, and a convolution unit (Conv.) 210. The reflectance map decoder 203 includes a ResNet block 204 and a deconv. 205. The optical transfer map decoder 206 includes a ResNet block 207 and a deconv. 208.
Immediately after each of the convolution unit 210, the ResNet block 204, the deconvolution unit 205, the ResNet block 207, and the deconvolution unit 208, batch normalization is performed except for the first and last layers to which data is input and output. (Batch normalization) and ReLU are applied. Further, a dropout with a probability of 0.5 is applied immediately after each of the first three deconvolution layers of the deconvolution unit 205 and the deconvolution unit 208.

入力画像２１１として、人物全身画像Ｉ２１２と、二値マスクＭ２１３とが乗算部２０１へ出力される。乗算部２０１は、人物全身画像Ｉ２１２と二値マスクＭ２１３とを乗算し、乗算した結果を、エンコーダ２０２へ出力する。エンコーダ２０２と逆畳み込み部２０５との間と、エンコーダ２０２と逆畳み込み部２０８との間とは、スキップコネクション（ｓｋｉｐ－ｃｏｎｎｅｃｔｉｏｎ）によって連結されている。
エンコーダ２０２の一例は、畳み込み層７層で構成される。エンコーダ２０２は、乗算部２０１が出力した人物全身画像Ｉ２１２と二値マスクＭ２１３とを乗算した結果に対して、フィルタを適用する。エンコーダ２０２は、フィルタを適用した結果を、反射率マップデコーダー２０３と、光伝達マップデコーダー２０６と、連結部２０９とへ出力する。 As the input image 211, the person whole body image I212 and the binary mask M213 are output to the multiplication unit 201. The multiplication unit 201 multiplies the person whole body image I212 and the binary mask M213, and outputs the result of the multiplication to the encoder 202. The encoder 202 and the deconvolution unit 205 and the encoder 202 and the deconvolution unit 208 are connected by a skip connection (skip-connection).
An example of the encoder 202 is composed of seven convolutional layers. The encoder 202 applies a filter to the result of multiplying the person whole body image I212 output by the multiplication unit 201 and the binary mask M213. The encoder 202 outputs the result of applying the filter to the reflectance map decoder 203, the optical transfer map decoder 206, and the connecting unit 209.

反射率マップデコーダー２０３では、エンコーダ２０２が出力したフィルタを適用した結果は、ＲｅｓＮｅｔブロック２０４へ出力される。ＲｅｓＮｅｔブロック２０４は、２つのＲｅｓＮｅｔブロックを含み、２つのＲｅｓＮｅｔブロックの各々は畳み込み計算を行う。ＲｅｓＮｅｔブロック２０４は、畳み込み計算を行った結果を、連結部２０９と、逆畳み込み部２０５とへ出力する。
逆畳み込み部２０５は、ＲｅｓＮｅｔブロック２０４が出力した畳み込み計算を行った結果を取得し、取得した畳み込み計算を行った結果に対して７層の逆畳み込みを行うことによって、反射率マップ^～Λ２２１を導出する。このように構成することによって、反射率マップデコーダー２０３は、反射率マップΛを推定できる。 In the reflectance map decoder 203, the result of applying the filter output by the encoder 202 is output to the ResNet block 204. The ResNet block 204 includes two ResNet blocks, and each of the two ResNet blocks performs a convolution calculation. The ResNet block 204 outputs the result of the convolution calculation to the connecting unit 209 and the deconvolution unit 205.
The deconvolution unit 205 acquires the result of the convolution calculation output by the ResNet block 204, and derives the reflectance map ^to Λ221 by performing deconvolution of 7 layers with respect to the acquired convolution calculation result. do. With this configuration, the reflectance map decoder 203 can estimate the reflectance map Λ.

光伝達マップデコーダー２０６では、エンコーダ２０２が出力したフィルタを適用した結果は、ＲｅｓＮｅｔブロック２０７へ出力される。ＲｅｓＮｅｔブロック２０７は、２つのＲｅｓＮｅｔブロックを含み、２つのＲｅｓＮｅｔブロックの各々は畳み込み計算を行う。ＲｅｓＮｅｔブロック２０７は、畳み込み計算を行った結果を、連結部２０９と、逆畳み込み部２０８とへ出力する。
逆畳み込み部２０８は、ＲｅｓＮｅｔブロック２０７が出力した畳み込み計算を行った結果を取得し、取得した畳み込み計算を行った結果に対して７層の逆畳み込みを行うことによって、光伝達マップ^～Ψ２２３を導出する。このように構成することによって、光伝達マップデコーダー２０６は、光伝達マップΨを推定できる。 In the optical transfer map decoder 206, the result of applying the filter output by the encoder 202 is output to the ResNet block 207. The ResNet block 207 includes two ResNet blocks, and each of the two ResNet blocks performs a convolution calculation. The ResNet block 207 outputs the result of the convolution calculation to the connecting unit 209 and the deconvolution unit 208.
The deconvolution unit 208 acquires the result of the convolution calculation output by the ResNet block 207, and derives the optical transmission map ^to Ψ223 by performing deconvolution of 7 layers with respect to the acquired convolution calculation result. do. With this configuration, the light transfer map decoder 206 can estimate the light transfer map Ψ.

連結部２０９は、エンコーダ２０２が出力したフィルタを適用した結果と、ＲｅｓＮｅｔブロック２０４が出力した畳み込み計算を行った結果と、ＲｅｓＮｅｔブロック２０７が出力した畳み込み計算を行った結果とを取得し、取得したフィルタを適用した結果と、畳み込み計算を行った結果と、畳み込み計算を行った結果とを連結する。連結部２０９は、フィルタを適用した結果と、畳み込み計算を行った結果と、畳み込み計算を行った結果とを連結した結果を、畳み込み部２１０へ出力する。
畳み込み部２１０は、連結部２０９が出力した連結した結果を、畳み込む。具体的には、畳み込み部２１０は、空間解像度が１×１、チャンネル数が２７になるまで畳み込むことによって、光源情報^～Π２２２を導出する。 The connecting unit 209 acquired and acquired the result of applying the filter output by the encoder 202, the result of performing the convolution calculation output by the ResNet block 204, and the result of performing the convolution calculation output by the ResNet block 207. The result of applying the filter, the result of the convolution calculation, and the result of the convolution calculation are concatenated. The connecting unit 209 outputs the result of applying the filter, the result of performing the convolution calculation, and the result of performing the convolution calculation to the convolution unit 210.
The convolution unit 210 convolves the connection result output by the connection unit 209. Specifically, the convolution unit 210 derives the light source information ^to Π222 by convolving until the spatial resolution is 1 × 1 and the number of channels is 27.

逆畳み込み部２０５は、導出した反射率マップ^～Λ２２１を、乗算部２２４へ出力する。畳み込み部２１０は、導出した光源情報^～Π２２２を、乗算部２２４へ出力する。逆畳み込み部２０８は、導出した光伝達マップ^～Ψ２２３を、乗算部２２４へ出力する。乗算部２２４は、逆畳み込み部２０５が出力した反射率マップ^～Λと、畳み込み部２１０が出力した光源情報^～Π２２２と、逆畳み込み部２０８が出力した光伝達マップ^～Ψ２２３とを取得し、取得した反射率マップ^～Λ２２１と、光源情報^～Π２２２と、光伝達マップ^～Ψ２２３とを乗算することによって、人物全身画像^～Ｉを導出する。このように構成することによって、画像処理装置１００は、入力画像２１１を再現した画像を導出できる。 The deconvolution unit 205 outputs the derived reflectance map ^to Λ221 to the multiplication unit 224. The convolution unit 210 outputs the derived light source information ^to Π222 to the multiplication unit 224. The deconvolution unit 208 outputs the derived optical transmission map ^to Ψ223 to the multiplication unit 224. The multiplication unit 224 acquires and acquires the reflectance map ^- Λ output by the deconvolution unit 205, the light source information ^- Π222 output by the convolution unit 210, and the optical transmission map ^- Ψ223 output by the deconvolution unit 208. By multiplying the reflectance map ^- Λ221, the light source information ^- Π222, and the light transmission map ^- Ψ223, a whole-body image of a person ^- I is derived. With this configuration, the image processing apparatus 100 can derive an image that reproduces the input image 211.

（画像処理装置１００の動作）
本実施形態の画像処理装置１００の動作について、教師あり学習を行う場合と、教師あり学習の結果に基づいて、入力画像に対応する出力画像を生成する場合とに分けて説明する。
図５は、本実施形態の画像処理装置の動作の一例を示すフローチャートである。図５に示される例では、教師あり学習を行う処理について示される。
（ステップＳ１）
画像処理装置１００の通信部１０５は、サーバー装置７０が送信した教師画像を受信し、受信した教師画像を、情報処理部１３０へ出力する。情報処理部１３０の取得部１３１は、通信部１０５が出力した教師画像を取得し、取得した教師画像を、分析部１３２へ出力する。
（ステップＳ２）
分析部１３２は、取得部１３１が出力した教師画像を取得し、取得した教師画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定する。
（ステップＳ３）
分析部１３２は、画素ごとに推定した反射率から反射率マップΛを作成し、画素ごとに推定した光伝達ベクトルから光伝達マップΨを作成する。分析部１３２は、作成した反射率マップΛと、光伝達マップΨと、推定した光源情報Πとを、機械学習部１３３へ出力する。
（ステップＳ４）
機械学習部１３３は、分析部１３２が出力した反射率マップΛと、光伝達マップΨと、光源情報Πとを取得し、取得した反射率マップΛと、光伝達マップΨと、光源情報Πとを特徴量とし、その特徴量が教師画像に対応することを学習する。 (Operation of image processing device 100)
The operation of the image processing device 100 of the present embodiment will be described separately for the case of performing supervised learning and the case of generating an output image corresponding to the input image based on the result of supervised learning.
FIG. 5 is a flowchart showing an example of the operation of the image processing apparatus of the present embodiment. In the example shown in FIG. 5, a process of performing supervised learning is shown.
(Step S1)
The communication unit 105 of the image processing device 100 receives the teacher image transmitted by the server device 70, and outputs the received teacher image to the information processing unit 130. The acquisition unit 131 of the information processing unit 130 acquires the teacher image output by the communication unit 105, and outputs the acquired teacher image to the analysis unit 132.
(Step S2)
The analysis unit 132 acquires the teacher image output by the acquisition unit 131, and estimates the reflectance, the light transfer vector, and the light source information for each pixel from the acquired teacher image.
(Step S3)
The analysis unit 132 creates a reflectance map Λ from the reflectance estimated for each pixel, and creates a light transfer map Ψ from the light transfer vector estimated for each pixel. The analysis unit 132 outputs the created reflectance map Λ, the light transmission map Ψ, and the estimated light source information Π to the machine learning unit 133.
(Step S4)
The machine learning unit 133 acquires the reflectance map Λ output by the analysis unit 132, the light transmission map Ψ, and the light source information Π, and the acquired reflectance map Λ, the light transmission map Ψ, and the light source information Π. Is set as a feature quantity, and it is learned that the feature quantity corresponds to a teacher image.

図６は、本実施形態の画像処理装置の動作の一例を示すフローチャートである。図６に示される例では、教師あり学習の結果に基づいて、入力画像に対応する出力画像を生成する処理について示される。
（ステップＳ１１）
画像処理装置１００の通信部１０５は、サーバー装置７０が送信した入力画像を受信し、受信した入力画像を、情報処理部１３０へ出力する。情報処理部１３０の取得部１３１は、通信部１０５が出力した入力画像を取得し、取得した入力画像を、分析部１３２へ出力する。
（ステップＳ１２）
分析部１３２は、取得部１３１が出力した入力画像を取得し、取得した入力画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定する。
（ステップＳ１３）
分析部１３２は、画素ごとに推定した反射率から反射率マップ^～Λを作成し、画素ごとに推定した光伝達ベクトルから光伝達マップ^～Ψを作成する。分析部１３２は、作成した反射率マップ^～Λと、光伝達マップ^～Ψと、推定した光源情報^～Πとを、生成部１３４へ出力する。
（ステップＳ１４）
生成部１３４は、分析部１３２が出力した反射率マップ^～Λと、光伝達マップ^～Ψと、光源情報^～Πとを取得し、取得した反射率マップ^～Λと、光伝達マップ^～Ψと、光源情報^～Πとに基づいて、入力画像に対応する出力画像を生成する。 FIG. 6 is a flowchart showing an example of the operation of the image processing apparatus of the present embodiment. In the example shown in FIG. 6, a process of generating an output image corresponding to an input image based on the result of supervised learning is shown.
(Step S11)
The communication unit 105 of the image processing device 100 receives the input image transmitted by the server device 70, and outputs the received input image to the information processing unit 130. The acquisition unit 131 of the information processing unit 130 acquires the input image output by the communication unit 105, and outputs the acquired input image to the analysis unit 132.
(Step S12)
The analysis unit 132 acquires the input image output by the acquisition unit 131, and estimates the reflectance, the light transfer vector, and the light source information for each pixel from the acquired input image.
(Step S13)
The analysis unit 132 creates a reflectance map ^- Λ from the reflectance estimated for each pixel, and creates a light transfer map ^- Ψ from the light transfer vector estimated for each pixel. The analysis unit 132 outputs the created reflectance map ^- Λ, the light transmission map ^- Ψ, and the estimated light source information ^- Π to the generation unit 134.
(Step S14)
The generation unit 134 acquires the reflectance map ^- Λ, the light transmission map ^- Ψ, and the light source information ^- Π output by the analysis unit 132, and the acquired reflectance map ^- Λ, the light transmission map ^- Ψ, and the like. Generates an output image corresponding to the input image based on the light source information ^~ Π.

（本実施形態の手法と従来の手法との比較）
光伝達ベクトルに、光の遮蔽情報を含めた場合（以下「本手法」という）と、含めない場合（以下「従来手法」という）とについて、比較を行った。
最初に、訓練データ（教師画像）およびテストデータ（入力画像）として用いる、３Ｄ人物モデルを用いたＣＧ人物画像データセットと環境光源データセットとを作成した。
ＣＧ人物画像データセットは、ＧＰＵベースレンダラによって生成した。ＣＧ人物画像データベースは、各３Ｄ人物モデルについて、二値マスク、反射率マップ、法線マップ、光伝達マップからなる。本手法では法線マップは必要ないが、法線マップは、従来手法での訓練データとテストデータとに使用する。
図７は、ＣＧ人物データセットの一例を示す図である。図７の（１）は反射率マップの一例を示し、図７の（２）は二値マスクの一例を示し、図７の（３）は法線マップの一例を示し、図７の（４）は光伝達マップの一例を示す。
データの取得先の一例は、ＢＵＦＦデータセットおよび商用ウェブサイトである。ここでは、３４５体分を用意し、そのうち２７６体分を訓練データ、６９体分をテストデータとした。なお、入手した３Ｄ人物モデルの多くは光沢反射成分のテクスチャが提供されていないため、ここでは、拡散反射を扱った場合について説明を続ける。ただし、３Ｄスキャン時に細かい皺などによる陰影が完全には除去されていないため、正解反射率マップにも遮蔽による陰影が含まれている。 (Comparison between the method of this embodiment and the conventional method)
A comparison was made between the case where the light transmission vector included the light shielding information (hereinafter referred to as "this method") and the case where it was not included (hereinafter referred to as "conventional method").
First, a CG person image data set and an environmental light source data set using a 3D person model used as training data (teacher image) and test data (input image) were created.
The CG portrait data set was generated by the GPU-based renderer. The CG portrait image database consists of a binary mask, a reflectance map, a normal map, and a light transmission map for each 3D portrait model. This method does not require a normal map, but the normal map is used for training data and test data in the conventional method.
FIG. 7 is a diagram showing an example of a CG person data set. FIG. 7 (1) shows an example of a reflectance map, FIG. 7 (2) shows an example of a binary mask, FIG. 7 (3) shows an example of a normal map, and FIG. 7 (4) shows. ) Shows an example of an optical transmission map.
Examples of data sources are BUFF datasets and commercial websites. Here, 345 bodies were prepared, of which 276 were used as training data and 69 were used as test data. Since most of the obtained 3D human models do not provide the texture of the glossy reflection component, the case where diffuse reflection is dealt with will be described here. However, since shadows due to fine wrinkles and the like are not completely removed during 3D scanning, shadows due to shielding are also included in the correct reflectance map.

本実施形態では、訓練データのばらつきを抑えて、ＣＮＮの推定精度を高めるため、３Ｄ人物モデルは、立った姿勢とし、座った姿勢のものは除外した。そして、描画位置は画像の中央とし、上下に画像の縦幅５％分の余白を空けて描画した。なお、推定を行う際も、入力画像の二値マスクを利用して、上下に画像の縦幅５％分の余白が空くように整形してからＣＮＮに入力する。
光源データセットは、ＬａｖａｌＩｎｄｏｏｒＨＤＲデータセットから取得したパノラマＨＤＲ形式の環境マップを、ＳＨの係数に変換することで作成した。環境マップによっては暗すぎたり明るすぎたりするものがあるため、明るさの調整を行った。まず、視点座標系の法線ベクトル（０，０，１）^Ｔから解析的に光伝達ベクトルを計算し、それと各環境マップのＳＨの係数を乗算して、参照用の輝度値を計算する。その参照用の輝度値が閾値０．２よりも暗すぎる環境マップは除外した上で、［０．７，０．９］の範囲に収まるように環境マップの輝度値をスケーリングした。さらに、バリエーションを増やすため、環境マップを、鉛直方向を軸として１０度ずつ３５回回転させた。その上で、冗長度を減らすため、ｋ－ｍｅａｎｓクラスタリングによって数を減らし、不自然な光源を取り除いた。最終的に、合計５０個の光源を用意し、そのうちランダムに選んだ４０個を訓練データ、１０個をテストデータとした。 In the present embodiment, in order to suppress the variation in the training data and improve the estimation accuracy of the CNN, the 3D person model is in a standing posture, and the sitting posture is excluded. Then, the drawing position was set to the center of the image, and the image was drawn with a margin of 5% in the vertical width of the image at the top and bottom. Also, when performing estimation, the binary mask of the input image is used to shape the image so that there is a margin of 5% in the vertical width of the image at the top and bottom, and then input to the CNN.
The light source dataset was created by converting a panoramic HDR format environment map obtained from the Laval Indoor HDR dataset into SH coefficients. Some environment maps are too dark or too bright, so I adjusted the brightness. First, the optical transfer vector is analytically calculated from the normal vector (0, 0, 1) ^T of the viewpoint coordinate system, and this is multiplied by the SH coefficient of each environment map to calculate the luminance value for reference. After excluding the environment map whose reference brightness value is too dark than the threshold value 0.2, the brightness value of the environment map was scaled so as to fall within the range of [0.7, 0.9]. Furthermore, in order to increase the variation, the environment map was rotated 35 times by 10 degrees about the vertical direction. Then, in order to reduce the redundancy, the number was reduced by k-means clustering to remove the unnatural light source. Finally, a total of 50 light sources were prepared, of which 40 were randomly selected as training data and 10 were used as test data.

本手法を、Ｐｙｔｈｏｎおよびｃｈａｉｎｅｒを用いて実装し、教師あり学習を行い、教師あり学習の結果に基づいて、推論することによって、入力画像に対応する出力画像を生成した。最適化には、Ａｄａｍを使用し、学習率は０．０００２に固定し、バッチサイズは１とした。訓練に要した時間は、１つのＧＰＵを使用して、１０２４×１０２４画素のデータを入力した場合、１エポック当たり約３時間であった。本手法として、２０エポックまで学習したものを使用した。推論に要した時間は、１０２４×１０２４画素の入力１つ当たり０．４３秒程度である。 This method was implemented using Python and chainer, supervised learning was performed, and an output image corresponding to the input image was generated by inferring based on the result of supervised learning. For optimization, Adam was used, the learning rate was fixed at 0.0002, and the batch size was 1. The time required for training was about 3 hours per epoch when data of 1024 × 1024 pixels was input using one GPU. As this method, the one learned up to 20 epochs was used. The time required for inference is about 0.43 seconds per input of 1024 × 1024 pixels.

光伝達ベクトルに、光の遮蔽情報が含まれることによる効果を明らかにするために、本手法と、光伝達マップの代わりに、法線マップを推定した従来手法との比較を行った。
従来手法の機械学習モデルは、本手法の機械学習モデルとは異なるが、本手法の機械学習モデルの方が、入力画像の解像度、層数、中間層のチャンネル数のいずれについても従来手法の機械学習モデルよりも規模が大きく、推定の精度が劣らないと想定される。
また、２つ目の比較対象として、本手法で最適化する損失関数を１５種類から４種類に減少させたものも用意した。定量的比較として、全テストデータに対する各手法の推定結果の、マスク内の平均二乗誤差平方根（ＲｏｏｔＭｅａｎＳｑｕａｒｅＥｒｒｏｒ：ＲＭＳＥ）を、図８に示す。法線マップについてのＲＭＳＥは、従来手法の結果で計算している。 In order to clarify the effect of including the light shielding information in the light transfer vector, we compared this method with the conventional method that estimated the normal map instead of the light transfer map.
The machine learning model of the conventional method is different from the machine learning model of this method, but the machine learning model of this method is the machine of the conventional method in terms of the resolution of the input image, the number of layers, and the number of channels in the intermediate layer. It is assumed that the scale is larger than the learning model and the estimation accuracy is not inferior.
In addition, as a second comparison target, we also prepared a loss function optimized by this method reduced from 15 types to 4 types. As a quantitative comparison, FIG. 8 shows the root-mean square error square root (RMSE) of the estimation results of each method for all test data. The RMSE for the normal map is calculated based on the result of the conventional method.

光源情報を除くと、共通する要素ではいずれも従来手法の結果よりも、本手法（損失関数４種類）がよく、さらに本手法（損失関数１５種類）が最もよい。定性的比較として、テストデータに対する各手法での推定結果を図９に示す。
図９は、実施形態の画像処理装置による推定結果の一例を示す図である。図９に示される例では、ＣＧテストデータを用いた場合の、各手法において、正解と推定結果との比較が示される。
図９の（１）は入力画像の一例であり、図９の（２）は正解反射率マップの一例であり、図９の（３）は従来手法による反射率マップの一例であり、図９の（４）は本手法（損失関数４種類）による反射率マップの一例であり、図９の（５）は本手法（損失関数１５種類）による反射率マップの一例である。図９の（６）は光源であり、上から順に正解、従来手法による光源、本手法（損失関数４種類）による光源、本手法（損失関数１５種類）による光源である。図９の（７）は正解法線マップの一例であり、図９の（８）は従来手法による法線マップの一例であり、図９の（９）は正解陰影マップの一例であり、図９の（１０）は従来手法による陰影マップの一例であり、図９の（１１）は本手法（損失関数４種類）による陰影マップの一例であり、図９の（１２）は本手法（損失関数１５種類）による陰影マップの一例である。
図９によれば、定量的比較と符号するように、反射率マップと陰影マップとにおいて、従来手法、本手法（損失関数４種類）、本手法（損失関数１５種類）の順でよくなっていることが分かる。従来手法の陰影マップでは、首や脇などの部分の遮蔽を法線マップで再現できない分、反射率マップの該当箇所が暗くなってしまっている。本手法（損失関数４種類）と、本手法（損失関数１５種類）とを比較すると、本手法（損失関数１５種類）の方が、陰影マップがより先鋭になっていることが分かる。 Except for the light source information, this method (4 types of loss functions) is better than the result of the conventional method in all common elements, and this method (15 types of loss functions) is the best. As a qualitative comparison, FIG. 9 shows the estimation results of each method for the test data.
FIG. 9 is a diagram showing an example of an estimation result by the image processing apparatus of the embodiment. In the example shown in FIG. 9, a comparison between the correct answer and the estimation result is shown in each method when the CG test data is used.
(1) of FIG. 9 is an example of an input image, (2) of FIG. 9 is an example of a correct reflectance map, and (3) of FIG. 9 is an example of a reflectance map by a conventional method. (4) is an example of a reflectance map by this method (4 types of loss functions), and (5) of FIG. 9 is an example of a reflectance map by this method (15 types of loss functions). FIG. 9 (6) is a light source, which is a correct answer in order from the top, a light source by the conventional method, a light source by the present method (4 types of loss functions), and a light source by the present method (15 types of loss functions). FIG. 9 (7) is an example of a correct answer normal map, FIG. 9 (8) is an example of a normal map by a conventional method, and FIG. 9 (9) is an example of a correct shadow map. 9 (10) is an example of a shadow map by the conventional method, FIG. 9 (11) is an example of a shadow map by this method (4 types of loss functions), and FIG. 9 (12) is an example of this method (loss). This is an example of a shadow map using 15 types of functions).
According to FIG. 9, in the reflectance map and the shading map, the conventional method, the present method (4 types of loss functions), and the present method (15 types of loss functions) are improved in this order so as to be coded as a quantitative comparison. You can see that there is. In the shadow map of the conventional method, the corresponding part of the reflectance map is darkened because the occlusion of the neck and armpits cannot be reproduced by the normal map. Comparing this method (4 types of loss functions) with this method (15 types of loss functions), it can be seen that the shadow map is sharper in this method (15 types of loss functions).

また、同様に、実写の人物画像を入力として、前述した３つの手法を用いて推定した結果を、図１０に示す。
図１０は、実施形態の画像処理装置による推定結果の一例を示す図である。図１０に示される例では、実写画像を用いた場合の、各手法において、正解と推定結果との比較が示される。図１０の（１）は入力画像であり、図１０の（２）は従来手法による反射率マップの一例であり、図１０の（３）は本手法（損失関数４種類）による反射率マップの一例であり、図１０の（４）は本手法（損失関数１５種類）による反射率マップの一例である。図１０の（５）は光源であり、上から順に、従来手法による光源、本手法（損失関数４種類）による光源、本手法（損失関数１５種類）による光源である。図１０の（６）は従来手法による陰影マップの一例であり、図１０の（７）は本手法（損失関数４種類）による陰影マップの一例であり、図１０の（８）は本手法（損失関数１５種類）による陰影マップの一例である。
図１０の結果については、テストデータを用いた図９の結果と同様に、従来手法で推定した陰影マップでは平坦なレリーフに凹凸がついているように見える一方、本手法（損失関数４種類）と、本手法（損失関数１５種類）とでは、遮蔽による陰影が観察できる。
また、２つの人物画像についてそれぞれ同時推定を行い、光源情報を入れ替えて再照明することで、それぞれ互いの照明環境下での見た目を再現した。
図１１は、実施形態の画像処理装置による推定結果の一例を示す図である。図１０に示される例では、推定した光源情報を互いに入れ替えて再照明を行った結果を示す。図１０によれば、推定した反射率マップ、光伝達マップ、光源情報の精度が従来よりも向上できるため、全て正解データを用いて計算した結果と見比べてもあまり違いが見られない。 Similarly, FIG. 10 shows the results of estimation using the above-mentioned three methods with a live-action person image as an input.
FIG. 10 is a diagram showing an example of an estimation result by the image processing apparatus of the embodiment. In the example shown in FIG. 10, a comparison between the correct answer and the estimation result is shown in each method when a live-action image is used. FIG. 10 (1) is an input image, FIG. 10 (2) is an example of a reflectance map by the conventional method, and FIG. 10 (3) is a reflectance map by this method (4 types of loss functions). As an example, (4) in FIG. 10 is an example of a reflectance map by this method (15 types of loss functions). FIG. 10 (5) is a light source, which is a light source according to the conventional method, a light source according to the present method (4 types of loss function), and a light source according to the present method (15 types of loss function) in order from the top. FIG. 10 (6) is an example of a shadow map by the conventional method, FIG. 10 (7) is an example of a shadow map by this method (4 types of loss functions), and FIG. 10 (8) is an example of this method (8). This is an example of a shadow map based on the loss function (15 types).
As for the result of FIG. 10, similar to the result of FIG. 9 using the test data, the shading map estimated by the conventional method seems to have unevenness on the flat relief, while this method (4 types of loss functions) is used. With this method (15 types of loss functions), shadows due to shielding can be observed.
In addition, simultaneous estimation was performed for each of the two person images, and the light source information was exchanged and re-illuminated to reproduce the appearance under each other's lighting environment.
FIG. 11 is a diagram showing an example of an estimation result by the image processing apparatus of the embodiment. The example shown in FIG. 10 shows the result of reilluminating by exchanging the estimated light source information with each other. According to FIG. 10, since the accuracy of the estimated reflectance map, the light transmission map, and the light source information can be improved as compared with the conventional case, there is not much difference even when compared with the results calculated using the correct answer data.

前述した実施形態では、端末装置１０からサーバー装置７０と経由して、画像処理装置１００へ、画像が送信される場合について説明したが、この限りでない。例えば、画像が、画像処理装置１００へ直接入力されてもよい。
前述した実施形態において、端末装置１０と画像処理装置１００とが同一の装置であってもよいし、サーバー装置７０と画像処理装置１００とが同一の装置であってもよい。
前述した実施形態では、端末装置１０が送信した入力画像（再照明を行う被写体が撮影されたマスク付きの全身画像）を、画像処理装置１００が取得する場合について説明したが、この限りでない。例えば、端末装置１０は、再照明を行う被写体が撮影された全身画像を送信し、画像処理装置１００は、端末装置１０が送信した再照明を行う被写体が撮影された全身画像を取得し、取得した全身画像に基づいて、マスクを推定してもよい。
前述した実施形態では、反射率として、拡散反射成分を対象とした場合について説明したが、この限りでない。例えば、拡散反射成分だけでなく光沢を含む鏡面反射成分を対象とした場合についても適用できる。この場合、鏡面反射に関する反射率マップについて教師付き学習が行われ、教師付き学習の学習結果に基づいて、入力画像の鏡面反射成分が推定される。
前述した実施形態では、機械学習モデルの一例として、畳み込みニューラルネットワークを使用した場合について説明したが、この例に限られない。例えば、再起型ニューラルネットワークなどの一般的な機械学習モデルを使用してもよい。
前述した実施形態では、光源および光伝達ベクトルの表現方法として、ＳＨの２次の９係数を使用した場合について説明したが、この例に限られない。例えば、ＳＨの１次以下、３次以上でもよく、係数のすべてを使用してもよく、係数の一部のみを使用してもよい。 In the above-described embodiment, the case where the image is transmitted from the terminal device 10 to the image processing device 100 via the server device 70 has been described, but the present invention is not limited to this. For example, the image may be directly input to the image processing device 100.
In the above-described embodiment, the terminal device 10 and the image processing device 100 may be the same device, or the server device 70 and the image processing device 100 may be the same device.
In the above-described embodiment, the case where the image processing device 100 acquires the input image (a full-body image with a mask in which the subject to be reilluminated is captured) transmitted by the terminal device 10 has been described, but the present invention is not limited to this. For example, the terminal device 10 transmits a full-body image of the subject to be relit, and the image processing device 100 acquires and acquires a full-body image of the subject to be relighted transmitted by the terminal device 10. The mask may be estimated based on the whole body image.
In the above-described embodiment, the case where the diffuse reflection component is targeted as the reflectance has been described, but the present invention is not limited to this. For example, it can be applied not only to a diffuse reflection component but also to a specular reflection component including gloss. In this case, supervised learning is performed on the reflectance map for specular reflection, and the specular reflection component of the input image is estimated based on the learning result of supervised learning.
In the above-described embodiment, the case where a convolutional neural network is used as an example of the machine learning model has been described, but the present invention is not limited to this example. For example, a general machine learning model such as a recurrence type neural network may be used.
In the above-described embodiment, the case where the quadratic 9 coefficient of SH is used as the expression method of the light source and the light transfer vector has been described, but the present invention is not limited to this example. For example, the first order or less and the third order or more of SH may be used, all of the coefficients may be used, or only a part of the coefficients may be used.

前述した実施形態では、光源および光伝達ベクトルの表現方法として、ＳＨについて説明したが、この限りでない。例えば、Ｈａａｒウェーブレット、球面ガウス関数などで、光源および光伝達ベクトルを表現してもよい。
前述した実施形態では、可視関数Ｖ（ωｉ）が、物体表面の複数の点ｐ－１、ｐ－２、ｐ－３、ｐ－４、ｐ－５の各々について、光源方向ＬＤ－１からの光が、複数の点ｐ－１、ｐ－２、ｐ－３、ｐ－４、ｐ－５の各々に届くか否かに基づいて導出される場合について、説明したがこの例に限られない。例えば、可視関数Ｖ（ωｉ）が、物体表面の点（画素）毎且つ光源ＬＳが出射する光の方向毎に、光が届くか否かを判定することによって導出されてもよい。
前述した実施形態では、光源情報^～Πが入力画像に含まれる被写体に基づいて、推定される場合について説明したが、この例に限られない。例えば、光源情報^～Πが、背景の情報(建物の陰影など)に基づいて、推定されてもよい。つまり、光源情報^～Πが、背景を含む入力画像に基づいて、推定されてもよい。
前述した実施形態では、機械学習モデルとは、機械学習モデルによって、入力画像に基づいて推定される光源情報Πと、反射率マップΛと、光伝達マップΨとのいずれか又はいずれか同士の組み合わせから得られる画像と、入力画像とを比較することによって学習されたものである場合について説明したが、この例に限られない。例えば、機械学習モデルによって、入力画像に基づいて推定される光源情報Πと、反射率マップΛと、光伝達マップΨとのいずれか又はいずれか同士の組み合わせから得られる画像と、入力画像以外の所定の正解画像とを比較することによって学習されたものであってもよい。具体的には、光源と光伝達マップをかけ合わせると（反射率マップに含まれるような色・模様を含まない)陰影マップが得られ、これをＣＧで生成し、機械学習モデルの出力（同じように推定された光源と光伝達マップを掛け合わせたもの)とを比較してもよい。
前述した実施形態では、画像処理装置１００が、取得した教師画像から、画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定する場合について説明したが、この例に限られない。例えば、画像処理装置１００が、取得した教師画像から、複数の画素ごとに反射率と、光伝達ベクトルと、光源情報とを推定するようにしてもよい。
前述した実施形態では、画像処理装置１００が、取得した入力画像から光源情報を推定する場合について説明したが、この例に限られない。例えば、画像処理装置１００は、外部から光源情報を取得し、取得した光源情報に基づいて、出力画像を生成してもよい。このように構成することによって、入力画像に含まれる被写体を再照明した再照明画像を生成できる。 In the above-described embodiment, SH has been described as a method of expressing the light source and the light transfer vector, but the present invention is not limited to this. For example, a Haar wavelet, a spherical Gaussian function, or the like may be used to represent a light source and a light transfer vector.
In the above-described embodiment, the visible function V (ωi) is from the light source direction LD-1 for each of the plurality of points p-1, p-2, p-3, p-4, and p-5 on the surface of the object. The case where the light is derived based on whether or not the light reaches each of the plurality of points p-1, p-2, p-3, p-4, and p-5 has been described, but the present invention is not limited to this example. .. For example, the visible function V (ωi) may be derived by determining whether or not the light reaches each point (pixel) on the surface of the object and each direction of the light emitted by the light source LS.
In the above-described embodiment, the case where the light source information ^to Π is estimated based on the subject included in the input image has been described, but the present invention is not limited to this example. For example, the light source information ^to Π may be estimated based on the background information (such as the shadow of the building). That is, the light source information ^to Π may be estimated based on the input image including the background.
In the above-described embodiment, the machine learning model is a combination of the light source information Π estimated based on the input image by the machine learning model, the reflectance map Λ, and the light transfer map Ψ. Although the case where the image obtained from the above is learned by comparing the input image with the input image is described, the present invention is not limited to this example. For example, an image obtained from a combination of light source information Π estimated based on an input image by a machine learning model, a reflectance map Λ, and an optical transmission map Ψ, and an image other than the input image. It may be learned by comparing with a predetermined correct image. Specifically, by multiplying the light source and the light transmission map, a shadow map (not including the colors and patterns included in the reflectance map) is obtained, which is generated by CG, and the output of the machine learning model (same). A light source estimated as described above multiplied by a light transfer map) may be compared.
In the above-described embodiment, the case where the image processing apparatus 100 estimates the reflectance, the light transfer vector, and the light source information for each pixel from the acquired teacher image has been described, but the present invention is not limited to this example. For example, the image processing device 100 may estimate the reflectance, the light transfer vector, and the light source information for each of a plurality of pixels from the acquired teacher image.
In the above-described embodiment, the case where the image processing apparatus 100 estimates the light source information from the acquired input image has been described, but the present invention is not limited to this example. For example, the image processing device 100 may acquire light source information from the outside and generate an output image based on the acquired light source information. With this configuration, it is possible to generate a reilluminated image in which the subject included in the input image is reilluminated.

前述した実施形態では、１５種類の損失関数について説明したが、この例に限られない。例えば、前述した１５種類の損失関数に加え、又は前述した１５種類の損失関数の代わりに以下の損失関数を使用してもよい。具体的には、ＴＶｌｏｓｓの代わりに、光伝達マップと、反射率マップとについて、推定したものを´をつけて表記した場合に、ｘ方向の勾配演算子、ｙ方向の勾配演算子を、∇ｘ、∇ｙで表すと、式（Ａ）と式（Ｂ）とのように、勾配に関して正解と一致するように、損失関数を使用してもよい。 In the above-described embodiment, 15 types of loss functions have been described, but the present invention is not limited to this example. For example, in addition to the above-mentioned 15 types of loss functions, or instead of the above-mentioned 15 types of loss functions, the following loss functions may be used. Specifically, instead of TV loss, when the estimated light transmission map and reflectance map are expressed with', the gradient operator in the x direction and the gradient operator in the y direction are used. Expressed by ∇x and ∇y, a loss function may be used so as to match the correct answer with respect to the gradient, as in equations (A) and (B).

｜∇ｘΛ－∇ｘΛ´｜＋｜∇ｙΛ－∇ｙΛ´｜（Ａ）｜ ∇xΛ－∇xΛ´ ｜＋｜ ∇yΛ－∇yΛ´ ｜ (A)

｜∇ｘψ－∇ｘψ´｜＋｜∇ｙψ－∇ｙψ´｜（Ｂ）｜ ∇xψ－∇xψ´ ｜＋｜ ∇yψ－∇yψ´ ｜ (B)

また、∇ｘや∇ｙを計算する前に、３ｘ３などの平滑化フィルタを掛けた上で、∇ｘや∇ｙを計算してもよい。具体的には、平滑化フィルタをＨとした場合に、式（Ｃ）と式（Ｄ）とを導出する。 Further, before calculating ∇x or ∇y, ∇x or ∇y may be calculated after applying a smoothing filter such as 3x3. Specifically, when the smoothing filter is H, the equation (C) and the equation (D) are derived.

｜∇ｘＨΛ－∇ｘＨΛ´｜＋｜∇ｙＨΛ－∇ｙＨΛ´｜（Ｃ）｜ ∇xHΛ－∇xHΛ´ ｜＋｜ ∇yHΛ－∇yHΛ´ ｜ (C)

｜∇ｘＨψ－∇ｘＨψ´｜＋｜∇ｙＨψ－∇ｙＨψ´｜（Ｄ）｜ ∇xHψ－∇xHψ´ ｜＋｜ ∇yHψ－∇yHψ´ ｜ (D)

この場合、実際の計算の順序としては、∇ｘＨなどを掛け合わせたもの、つまり平滑化カーネルのｘ，ｙ方向の微分を計算してから畳み込むことになる。
本実施形態の画像処理装置１００によれば、画像処理装置１００は、単視点で撮影された人物の全身画像の再照明という新しい問題に対し、反射率と光源情報とに加え、再照明可能な形での光の遮蔽情報を推定できる。具体的には、光の遮蔽情報を、球面調和関数（ＳＨ）の係数として表現し、球面調和関数の係数として表現された光の遮蔽情報を推定する。このように構成することによって、光の遮蔽を考慮しない場合と比較して、写実的な再照明を実現できる。
また、画素単位でカラーチャンネルごとに９次元ベクトルの内積計算を行うのみなので高速である。実写の人物画像を入力した場合も含め、尤もらしい再照明を行うことができる。 In this case, the actual order of calculation is the product of ∇xH and the like, that is, the derivative of the smoothing kernel in the x and y directions is calculated and then convolved.
According to the image processing apparatus 100 of the present embodiment, the image processing apparatus 100 can reilluminate the whole body image of a person taken from a single viewpoint in addition to the reflectance and the light source information in order to solve the new problem of reilluminating the whole body image. It is possible to estimate the light shielding information in the form. Specifically, the light shielding information is expressed as a coefficient of the spherical harmonic function (SH), and the light shielding information expressed as the coefficient of the spherical harmonic function is estimated. With such a configuration, it is possible to realize realistic reillumination as compared with the case where the light shielding is not taken into consideration.
In addition, it is fast because it only calculates the internal product of the 9-dimensional vector for each color channel on a pixel-by-pixel basis. It is possible to perform plausible re-illumination even when a live-action portrait image is input.

＜構成例＞
一構成例として、入力画像を取得する取得部と、光源からの照明状態を示す情報である光源情報（実施形態では、光源情報Π）と、入力画像に含まれる被写体の反射率として入力画像に基づいて推定された反射率情報（実施形態では、反射率マップΛ）と、入力画像に含まれる被写体からの光の伝達状態として入力画像と光源からの光が被写体に届くか否かを示す可視情報（実施形態では、可視情報Ｖ（ωｉ））とに基づいて推定された光伝達情報（実施形態では、光伝達マップΨ）とに基づいて、入力画像に対応する出力画像を生成する生成部と、生成部が生成した出力画像を出力する出力部とを備える、画像処理装置。
一構成例として、光源情報とは、入力画像に基づいて推定されたものである。
一構成例として、出力画像とは、光源情報が示す照明状態に基づいて、入力画像に含まれる被写体を再照明した再照明画像である。
一構成例として、反射率情報と光伝達情報とは、機械学習モデル（実施形態では、ＣＮＮ）によって、入力画像に基づいて推定されたものである。
一構成例として、機械学習モデルとは、機械学習モデルによって、入力画像に基づいて推定される光源情報と、反射率情報と、光伝達情報とのいずれか又はいずれか同士の組み合わせから得られる画像と、所定の正解画像とを比較することによって学習されたものである。
一構成例として、可視情報は、物体表面の複数の点の各々について、光の方向毎に、光が複数の前記点の各々に届くか否かに基づいて作成される。 <Configuration example>
As a configuration example, an acquisition unit for acquiring an input image, light source information (in the embodiment, light source information Π) which is information indicating an illumination state from a light source, and a reflectance of a subject included in the input image are used in the input image. Reflectivity information estimated based on (in the embodiment, reflectance map Λ) and visibility indicating whether or not the light from the input image and the light source reaches the subject as the transmission state of the light from the subject included in the input image. A generator that generates an output image corresponding to an input image based on light transmission information (light transmission map Ψ in the embodiment) estimated based on information (visible information V (ωi) in the embodiment). An image processing device including an output unit that outputs an output image generated by the generation unit.
As an example of the configuration, the light source information is estimated based on the input image.
As a configuration example, the output image is a reilluminated image in which the subject included in the input image is reilluminated based on the illumination state indicated by the light source information.
As a configuration example, the reflectance information and the light transmission information are estimated based on the input image by a machine learning model (CNN in the embodiment).
As a configuration example, a machine learning model is an image obtained from a combination of light source information estimated based on an input image, reflectance information, and light transmission information by a machine learning model. It was learned by comparing with a predetermined correct image.
As an example of configuration, visible information is created for each of a plurality of points on the surface of an object based on whether or not the light reaches each of the plurality of points in each direction of light.

以上、本発明の実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更、組合わせを行うことができる。これら実施形態及びその変形例は、発明の範囲や要旨に含まれると同時に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。
なお、前述の画像処理装置１００は内部にコンピュータを有している。そして、前述した各装置の各処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリなどをいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしてもよい。
また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。
さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Although the embodiments of the present invention have been described above, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other embodiments, and various omissions, replacements, changes, and combinations can be made without departing from the gist of the invention. These embodiments and variations thereof are included in the scope and gist of the invention, and at the same time, are included in the scope of the invention described in the claims and the equivalent scope thereof.
The image processing device 100 described above has a computer inside. The process of each process of each device described above is stored in a computer-readable recording medium in the form of a program, and the process is performed by the computer reading and executing this program. Here, the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Further, the computer program may be distributed to the computer via a communication line, and the computer receiving the distribution may execute the program.
Further, the above program may be for realizing a part of the above-mentioned functions.
Further, it may be a so-called difference file (difference program) that can realize the above-mentioned function in combination with a program already recorded in the computer system.

１…画像処理システム、１０、１０－１、１０－２、・・・、１０－ｎ…端末装置、５０…ネットワーク、７０…サーバー装置、１００…画像処理装置、１０５…通信部、１１０…記憶部、１１１…プログラム、１１２…アプリ、１１３…反射率マップΛ、１１４…光源情報Π、１１５…光伝達マップΨ、１２０…操作部、１３０…情報処理部、１３１…取得部、１３２…分析部、１３３…機械学習部、１３４…生成部、１３５…出力部、１４０…表示部、２０２…エンコーダ、２０３・・・反射率マップデコーダー、２０４…ＲｅｓＮｅｔブロック、２０５…逆畳み込み部、２０６…光伝達マップデコーダー、２０７…ＲｅｓＮｅｔブロック、２０８…逆畳み込み部、２０９…連結部、２１０…畳み込み部、２１１…入力画像、２１２…人物全身画像Ｉ、２１３…二値マスクＭ、２１４…正解データ、２１５…反射率マップΛ、２１６…光源情報Π、２１７…光伝達マップΨ、２２０…推定値、２２１…反射率マップΛ、２２２…光源情報Π、２２３…光伝達マップΨ、２２４…乗算部、２２５…人物全身画像Ｉ 1 ... Image processing system, 10, 10-1, 10-2, ... 10-n ... Terminal device, 50 ... Network, 70 ... Server device, 100 ... Image processing device, 105 ... Communication unit, 110 ... Storage Department, 111 ... Program, 112 ... App, 113 ... Reflection map Λ, 114 ... Light source information Π, 115 ... Light transmission map Ψ, 120 ... Operation unit, 130 ... Information processing department, 131 ... Acquisition unit, 132 ... Analysis department , 133 ... Machine learning unit, 134 ... Generation unit, 135 ... Output unit, 140 ... Display unit, 202 ... Encoder, 203 ... Reflection map decoder, 204 ... ResNet block, 205 ... Reverse convolution unit, 206 ... Optical transmission Map decoder, 207 ... ResNet block, 208 ... Reverse folding part, 209 ... Connecting part, 210 ... Folding part, 211 ... Input image, 212 ... Person whole body image I, 213 ... Binary mask M, 214 ... Correct data, 215 ... Reflection map Λ, 216 ... Light transmission map Π, 217 ... Light transmission map Ψ, 220 ... Estimated value, 221 ... Reflection map Λ, 222 ... Light source information Π, 223 ... Light transmission map Ψ, 224 ... Multiplying part, 225 ... Full-body image of a person I

Claims

The acquisition unit that acquires the input image and
Light source information that indicates the lighting state from the light source, reflectance information estimated based on the input image as the reflectance of the subject included in the input image, and light from the subject included in the input image. An output image corresponding to the input image is generated based on the light transmission information estimated based on the input image and the visible information indicating whether or not the light from the light source reaches the subject as the transmission state of. And the generation part to do
It is provided with an output unit that outputs the output image generated by the generation unit.
The light source and the light transmission information are expressed as coefficients of a basis function.
The coefficient is an image processing device estimated by a machine learning model .

The image processing apparatus according to claim 1, wherein the light source information is estimated based on the input image.

The image processing apparatus according to claim 1, wherein the output image is a reilluminated image in which the subject included in the input image is reilluminated based on the illumination state indicated by the light source information.

The image processing apparatus according to any one of claims 1 to 3, wherein the reflectance information and the light transmission information are estimated based on the input image by a machine learning model.

The machine learning model is an image obtained from a combination of light source information estimated based on the input image, reflectance information, and light transmission information by the machine learning model. The image processing apparatus according to claim 4, which is learned by comparing with a predetermined correct image.

The visible information is created from claim 1 to claim 5 based on whether or not the light reaches each of the plurality of points in each of the plurality of points on the surface of the object for each direction of light. The image processing apparatus according to any one of the following items.

Steps to get the input image and
Light source information that indicates the lighting state from the light source, reflectance information estimated based on the input image as the reflectance of the subject included in the input image, and light from the subject included in the input image. An output image corresponding to the input image is generated based on the light transmission information estimated based on the input image and the visible information indicating whether or not the light from the light source reaches the subject as the transmission state of. Steps to do and
It has a step of outputting the output image generated in the step of generating, and has a step of outputting the output image.
The light source and the light transmission information are expressed as coefficients of a basis function.
The coefficient is an image processing method performed by a computer, which is estimated by a machine learning model .

On the computer
Steps to get the input image and
Light source information that indicates the lighting state from the light source, reflectance information estimated based on the input image as the reflectance of the subject included in the input image, and light from the subject included in the input image. An output image corresponding to the input image is generated based on the light transmission information estimated based on the input image and the visible information indicating whether or not the light from the light source reaches the subject as the transmission state of. Steps to do and
To execute the step of outputting the output image generated in the above-mentioned generation step ,
The light source and the light transmission information are expressed as coefficients of a basis function.
An image processing program in which the coefficients are estimated by a machine learning model .