JP7504629B2

JP7504629B2 - IMAGE PROCESSING METHOD, IMAGE PROCESSING APPARATUS, IMAGE PROCESSING PROGRAM, AND STORAGE MEDIUM

Info

Publication number: JP7504629B2
Application number: JP2020040027A
Authority: JP
Inventors: 崇鬼木
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2024-06-24
Anticipated expiration: 2040-03-09
Also published as: JP2021140663A

Description

本発明は、ディープラーニングを用いた画像処理方法に関する。 The present invention relates to an image processing method using deep learning.

特許文献１には、ＲＡＷ画像を入力とする多層のニューラルネットワークを学習する際に、ガンマ補正の影響を考慮することで、高解像度化や高コントラスト化（鮮鋭化）に伴うアンダーシュートやリンギングを抑制する手法が開示されている。非特許文献１には、様々な回帰問題に対して汎用的に適用可能なネットワーク構成が開示されている。また非特許文献１には、ネットワークを用いて、入力画像のアップサンプリング、ＪＰＥＧデブロッキング（圧縮ノイズの除去）、デノイジング、ノンブラインドなデブラー、または、インペインティングを実行することが開示されている。 Patent Document 1 discloses a method for suppressing undershoot and ringing that accompanies high resolution and high contrast (sharpening) by taking into account the effects of gamma correction when training a multi-layer neural network that uses raw images as input. Non-Patent Document 1 discloses a network configuration that can be generally applied to various regression problems. Non-Patent Document 1 also discloses using a network to perform upsampling of input images, JPEG deblocking (removal of compression noise), denoising, non-blind deblurring, or inpainting.

特開２０１９－１２１２５２号公報JP 2019-121252 A

Ｘ．Ｍａｏ，Ｃ．Ｓｈｅｎ，Ｙ．Ｙａｎｇ， “ＩｍａｇｅＲｅｓｔｏｒａｔｉｏｎＵｓｉｎｇＣｏｎｖｏｌｕｔｉｏｎａｌＡｕｔｏ－ｅｎｃｏｄｅｒｓｗｉｔｈＳｙｍｍｅｔｒｉｃＳｋｉｐＣｏｎｎｅｃｔｉｏｎｓ”，ｈｔｔｐｓ：／／ａｒｘｉｖ．ｏｒｇ／ａｂｓ／１６０６．０８９２１．X. Mao, C. Shen, Y. Yang, "Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections", https://arxiv.org/. org/abs/1606.08921.

しかしながら、非特許文献１に開示された方法では、入力画像がＲＡＷ画像である際に適切な推定を行うことができない。また、特許文献１では、ガンマ補正の影響を考慮した誤差を用いて学習することにより、現像された画像の輝度の大きさに対して、推定精度が影響されにくいニューラルネットワークを実現している。しかしながら、実際にユーザが現像された出力画像を鑑賞する際には、ガンマ補正だけでなくホワイトバランス処理が行われているため、ホワイトバランス処理を考慮せずにＲＡＷ画像を学習すると、現像時とは色のバランスが大きく異なる場合がある。例えば、光源が波長依存性のない白色光で無彩色の被写体を撮影したとしても撮像素子の感度特性により、取得されるＲＡＷ画像の輝度値が色ごとにばらつきが生じる。このようなＲＡＷ画像を用いて色ごとの輝度の調整をせずに学習を行った場合、推定精度も色ごとにばらつきが生じる可能性がある。 However, the method disclosed in Non-Patent Document 1 cannot perform appropriate estimation when the input image is a RAW image. In addition, Patent Document 1 realizes a neural network whose estimation accuracy is not easily affected by the magnitude of the luminance of the developed image by learning using an error that takes into account the effect of gamma correction. However, when a user actually views a developed output image, not only gamma correction but also white balance processing is performed, so if a RAW image is learned without considering white balance processing, the color balance may be significantly different from when it was developed. For example, even if an achromatic subject is photographed with white light that has no wavelength dependency as a light source, the luminance value of the acquired RAW image varies for each color due to the sensitivity characteristics of the image sensor. If learning is performed using such a RAW image without adjusting the luminance for each color, the estimation accuracy may also vary for each color.

そこで本発明は、色ごとの推定精度のばらつきを低減したニューラルネットワークを取得可能な画像処理方法などを提供することを目的とする。 The present invention therefore aims to provide an image processing method capable of obtaining a neural network that reduces the variation in estimation accuracy for each color.

本発明の一側面としての画像処理方法は、訓練画像、正解画像、およびホワイトバランスに関する情報を取得する取得工程と、前記訓練画像の複数の色成分に対応する各色成分のみで構成される色成分画像を生成する生成工程と、前記色成分画像をニューラルネットワークに入力することで前記複数の色成分に対応する各色成分のみで構成される出力画像を生成し、前記出力画像と前記正解画像との各色成分の第１誤差を取得し、該第１誤差と前記ホワイトバランスに関する情報とに基づいて第２誤差を取得し、該第２誤差に基づいて前記ニューラルネットワークのパラメータを更新する更新工程とを有する。 An image processing method as one aspect of the present invention includes an acquisition step of acquiring a training image, a reference image, and information related to white balance ; a generation step of generating a color component image consisting only of color components corresponding to a plurality of color components of the training image; and an update step of inputting the color component image to a neural network to generate an output image consisting only of color components corresponding to the plurality of color components, acquiring a first error for each color component between the output image and the reference image, acquiring a second error based on the first error and information related to white balance, and updating parameters of the neural network based on the second error .

本発明の他の目的及び特徴は、以下の実施例において説明される。 Other objects and features of the present invention are described in the following examples.

本発明によれば、色ごとの推定精度のばらつきを低減したニューラルネットワークを取得可能な画像処理方法などを提供することができる。 The present invention provides an image processing method that can obtain a neural network that reduces the variation in estimation accuracy for each color.

実施例１における畳み込みニューラルネットワークを示す図である。FIG. 1 is a diagram illustrating a convolutional neural network in a first embodiment. 各実施例におけるホワイトバランスに関する説明図である。FIG. 4 is an explanatory diagram regarding white balance in each embodiment. 実施例１における画像処理システムのブロック図である。1 is a block diagram of an image processing system according to a first embodiment. 実施例１における画像処理システムの外観図である。1 is an external view of an image processing system according to a first embodiment. 実施例１における学習工程のフローチャートである。4 is a flowchart of a learning process in the first embodiment. 各実施例における画像の色成分に関する説明図である。FIG. 4 is an explanatory diagram regarding color components of an image in each embodiment. 各実施例におけるガンマ補正に関する説明図である。FIG. 4 is an explanatory diagram regarding gamma correction in each embodiment. 各実施例における推定工程のフローチャートである。1 is a flowchart of an estimation process in each embodiment. 実施例２における画像処理システムのブロック図である。FIG. 11 is a block diagram of an image processing system according to a second embodiment. 実施例２における画像処理システムの外観図である。FIG. 11 is an external view of an image processing system according to a second embodiment. 実施例２における畳み込みニューラルネットワークを示す図である。FIG. 13 is a diagram showing a convolutional neural network in Example 2. 実施例２における学習工程のフローチャートである。13 is a flowchart of a learning process in the second embodiment.

以下、本発明の実施例について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照符号を付し、重複する説明は省略する。 The following describes in detail an embodiment of the present invention with reference to the drawings. In each drawing, the same components are given the same reference symbols, and duplicate descriptions are omitted.

まず、以下に各実施例において使用される用語を定義する。各実施例は、ディープラーニングによって回帰問題を解き、入力画像から様々な出力画像を推定する方法に関する。ディープラーニングとは、多層のニューラルネットワークを用いた機械学習である。大量の訓練画像とそれに対応する正解画像（得たい出力）のペアから、ネットワークパラメータ（ウエイトとバイアス）を学習することで、未知の入力画像に対しても高精度な推定が可能となる。 First, the terms used in each embodiment are defined below. Each embodiment relates to a method for solving a regression problem using deep learning and estimating various output images from an input image. Deep learning is machine learning using a multi-layered neural network. By learning network parameters (weights and biases) from a large number of pairs of training images and their corresponding ground-truth images (desired output), highly accurate estimation is possible even for unknown input images.

多層のニューラルネットワークを用いた画像処理には、ネットワークパラメータ（ウエイトとバイアス）を更新するための処理工程と、更新されたパラメータを用いて未知の入力に対して推定を行う処理工程の二つが存在する。以下、前者を学習工程と呼び、後者を推定工程と呼ぶ。 Image processing using a multi-layer neural network involves two processes: a process for updating the network parameters (weights and biases), and a process for making inferences about unknown inputs using the updated parameters. Below, the former is called the learning process, and the latter is called the estimation process.

次に、学習工程と推定工程における画像の名称を定める。ネットワークへ入力する画像を入力画像とし、特に学習工程の際に用いる、正解画像が既知の入力画像を訓練画像と呼称する。ネットワークから出力された画像を出力画像とし、特に推定工程の際の出力画像を推定画像と呼称する。ネットワークの入力画像と、正解画像はＲＡＷ画像である。ここでＲＡＷ画像とは、撮像素子から出力された未現像の画像データであり、各画素の光量と信号値とが略線型の関係にある。ユーザが画像を鑑賞する前にＲＡＷ画像は現像されるが、その際にガンマ補正が実行される。ガンマ補正は、例えば入力の信号値を冪乗する処理であり、その冪指数として１／２．２などが用いられる。各実施例において、正解画像または訓練画像を生成する際の元となる無劣化相当の画像を原画像と呼称する。 Next, the names of the images in the learning process and the estimation process are determined. The image input to the network is the input image, and the input image used in the learning process, in which the correct answer image is known, is called the training image. The image output from the network is the output image, and the output image in the estimation process is called the estimated image. The input image to the network and the correct answer image are RAW images. Here, the RAW image is undeveloped image data output from the imaging element, and the light amount and signal value of each pixel have a roughly linear relationship. The RAW image is developed before the user views the image, and gamma correction is performed at that time. Gamma correction is, for example, a process of raising the input signal value to a power, and 1/2.2 is used as the exponent. In each embodiment, the image equivalent to losslessness that is the basis for generating the correct answer image or training image is called the original image.

また出力画像も、推定によって正解画像に準ずる画像として生成されるため、ＲＡＷ画像の性質を有する。推定工程には、様々な処理が含まれる。例えば、アップサンプリング、デノイジング、圧縮ノイズの除去、デブラー（ぼけ補正）、インペインティング、デモザイキング、ディヘイズ（Ｄｅｈａｚｅ）、高階調化、リライティング（照明環境の変更）がある。 The output image also has the properties of a raw image, since it is generated as an image similar to the ground truth image through estimation. The estimation process includes a variety of processes, such as upsampling, denoising, removal of compression noise, deblurring (blur correction), inpainting, demosaicing, dehaze, high gradation, and relighting (changing the lighting environment).

実施例の具体的な説明へ入る前に、本発明の要旨を述べる。本発明は、ＲＡＷ画像を入力とする多層のニューラルネットワークの学習工程において、ホワイトバランスの影響を考慮する。一般に、デジタルカメラ等の撮像素子を用いた撮像装置においては、撮像によって得られた画像の色調を調整するホワイトバランス制御機能を備えている。ホワイトバランス処理は、被写体における無彩色部分が出力画像において無彩色になるように、撮像素子が出力するＲＧＢ成分に対して色成分別にゲイン処理を行い、輝度レベルを合わせる処理である。ホワイトバランス処理を行わない場合、撮像素子の色特性によって被写体の色味が正しく再現されず、実際の被写体とは異なる色の画像が生成される。 Before proceeding to a detailed description of the embodiments, the gist of the present invention will be described. The present invention considers the influence of white balance in the learning process of a multi-layered neural network that inputs RAW images. In general, imaging devices using an image sensor, such as digital cameras, are equipped with a white balance control function that adjusts the color tone of an image obtained by imaging. White balance processing is a process that performs gain processing for each color component on the RGB components output by the image sensor and adjusts the brightness level so that achromatic parts of the subject become achromatic in the output image. If white balance processing is not performed, the color of the subject will not be reproduced correctly due to the color characteristics of the image sensor, and an image with a color different from the actual subject will be generated.

図２はホワイトバランスに関する説明図であり、図２（Ａ）は光源の分光分布特性、図２（Ｂ）はカラーフィルタおよびＩＲカットフィルタを有する撮像素子の分光感度特性をそれぞれ示す。図２（Ａ）において、横軸は波長、縦軸は光強度をそれぞれ示し、図２（Ａ）中の実線は白色ＬＥＤ、破線は白熱電球の分光分布をそれぞれ示す。図２（Ｂ）において、横軸は波長、縦軸は分光感度をそれぞれ示し、図２（Ｂ）中の実線はＢ成分、破線はＧ成分、一点鎖線はＲ成分の分光感度をそれぞれ示す。 Figure 2 is an explanatory diagram regarding white balance, where Figure 2(A) shows the spectral distribution characteristics of a light source, and Figure 2(B) shows the spectral sensitivity characteristics of an image sensor having a color filter and an IR cut filter. In Figure 2(A), the horizontal axis shows wavelength, and the vertical axis shows light intensity, with the solid line in Figure 2(A) showing the spectral distribution of a white LED and the dashed line showing the spectral distribution of an incandescent light bulb. In Figure 2(B), the horizontal axis shows wavelength, and the vertical axis shows spectral sensitivity, with the solid line in Figure 2(B) showing the spectral sensitivity of the B component, the dashed line showing the G component, and the dashed line showing the R component.

撮像素子から出力される信号には、図２（Ａ）、（Ｂ）に示されるような光源や撮像素子の特性が反映されており、実際にはこれらの特性以外にも光学系の透過率等の影響も含まれている。例えば、図２（Ａ）の実線および破線は波長依存性のある光源となっているが、仮に波長依存性が無く光強度が完全にフラットな環境下で無彩色の被写体を撮影したとしても、図２（Ｂ）のように撮像素子の分光感度特性の影響を受ける。この場合、被写体が無彩色であるため、本来であれば撮影画像の輝度値はＲＧＢ成分が一致するべきであるが、図２（Ｂ）に示される分光特性の影響で、Ｇ成分に対してＲ成分、Ｂ成分が低くなる。このため、出力される画像は緑色に色付いた画像となる。 The signal output from the image sensor reflects the characteristics of the light source and image sensor as shown in Figures 2(A) and (B), and in fact also includes the effects of the transmittance of the optical system in addition to these characteristics. For example, the solid and dashed lines in Figure 2(A) represent a wavelength-dependent light source, but even if an achromatic subject is photographed in an environment with no wavelength dependency and completely flat light intensity, it will be affected by the spectral sensitivity characteristics of the image sensor as shown in Figure 2(B). In this case, since the subject is achromatic, the luminance values of the captured image should ideally be the same for the RGB components, but due to the influence of the spectral characteristics shown in Figure 2(B), the R and B components are lower than the G component. As a result, the output image will be colored green.

このようなＲＡＷ画像を集めて学習を行った場合、緑色の被写体ばかり学習することになるため、出力されるネットワークパラメータはより緑色の被写体に対しては推定精度が高く、逆に赤色や青色の被写体に対しては推定精度が低くなる。また、実際には光源は波長依存性があり、図２（Ａ）に示されるように撮影時の光源の種類によって更に色味が変化し、推定制度にも影響することになる。本発明は、このような色ごとの推定精度のばらつきを低減することを目的としており、その実現方法について、以下の各実施例にて詳述する。 When such raw images are collected and training is performed, only green subjects are trained, and the output network parameters have high estimation accuracy for greener subjects, but low estimation accuracy for red and blue subjects. In reality, light sources are wavelength-dependent, and as shown in Figure 2(A), the color tone changes depending on the type of light source used at the time of shooting, which also affects the estimation accuracy. The present invention aims to reduce this variation in estimation accuracy for each color, and methods for achieving this are described in detail in the following embodiments.

まず、本発明の実施例１における画像処理システムに関して説明する。本実施例では、多層のニューラルネットワークにぼけ補正を学習、実行させる。ただし本実施例は、ぼけ補正に限定されるものではなく、その他の画像処理にも適用可能である。 First, an image processing system according to a first embodiment of the present invention will be described. In this embodiment, a multi-layer neural network learns and executes blur correction. However, this embodiment is not limited to blur correction and can also be applied to other types of image processing.

図３は、本実施例における画像処理システム１００のブロック図である。図４は、画像処理システム１００の外観図である。画像処理システム１００は、学習装置（画像処理装置）１０１、撮像装置１０２、画像推定装置（画像処理装置）１０３、表示装置１０４、記録媒体１０５、出力装置１０６、およびネットワーク１０７を有する。 Figure 3 is a block diagram of the image processing system 100 in this embodiment. Figure 4 is an external view of the image processing system 100. The image processing system 100 has a learning device (image processing device) 101, an imaging device 102, an image estimation device (image processing device) 103, a display device 104, a recording medium 105, an output device 106, and a network 107.

学習装置１０１は、学習工程を実行する画像処理装置であり、記憶部１０１ａ、取得部１０１ｂ、算出部１０１ｃ、更新部１０１ｄ、および生成部１０１ｅを有する。取得部１０１ｂは、訓練画像と正解画像、およびホワイトバランスに関する情報を取得する。生成部１０１ｅは、訓練画像を多層のニューラルネットワークへ入力して出力画像を生成する。更新部１０１ｄは、算出部１０１ｃにより算出された出力画像と正解画像との差（誤差）に基づいて、ニューラルネットワークのネットワークパラメータを更新する。なお、学習工程に関する詳細は、フローチャートを用いて後述する。学習されたネットワークパラメータは、記憶部１０１ａに記憶される。 The learning device 101 is an image processing device that executes a learning process, and includes a memory unit 101a, an acquisition unit 101b, a calculation unit 101c, an update unit 101d, and a generation unit 101e. The acquisition unit 101b acquires training images, correct images, and information related to white balance. The generation unit 101e inputs the training images into a multi-layer neural network to generate an output image. The update unit 101d updates the network parameters of the neural network based on the difference (error) between the output image and the correct image calculated by the calculation unit 101c. Details of the learning process will be described later using a flowchart. The learned network parameters are stored in the memory unit 101a.

撮像装置１０２は、光学系１０２ａおよび撮像素子１０２ｂを有する。光学系１０２ａは、被写体空間から撮像装置１０２へ入射した光を集光する。撮像素子１０２ｂは、光学系１０２ａを介して形成された光学像（被写体像）を受光して（光電変換して）撮像画像を取得する。撮像素子１０２ｂは、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）センサや、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ－ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサなどである。撮像装置１０２によって取得される撮像画像は、光学系１０２ａの収差や回折によるぼけと、撮像素子１０２ｂによるノイズを含む。 The imaging device 102 has an optical system 102a and an imaging element 102b. The optical system 102a collects light incident on the imaging device 102 from the subject space. The imaging element 102b receives (photoelectrically converts) the optical image (subject image) formed via the optical system 102a to obtain an image. The imaging element 102b is, for example, a CCD (Charge Coupled Device) sensor or a CMOS (Complementary Metal-Oxide Semiconductor) sensor. The image obtained by the imaging device 102 contains blur due to aberration and diffraction of the optical system 102a and noise due to the imaging element 102b.

画像推定装置１０３は、推定工程を実行する装置であり、記憶部１０３ａ、取得部１０３ｂ、および推定部１０３ｃを有する。画像推定装置１０３は、取得した撮像画像に対してぼけ補正を行って推定画像を生成する。ぼけ補正には、多層のニューラルネットワークを使用し、ネットワークパラメータの情報は記憶部１０３ａから読み出される。ネットワークパラメータは学習装置１０１で学習されたものであり、画像推定装置１０３は、事前にネットワーク１０７を介して記憶部１０１ａからネットワークパラメータを読み出し、記憶部１０３ａに保存している。保存されるネットワークパラメータはその数値そのものでもよいし、符号化された形式でもよい。ネットワークパラメータの学習、およびネットワークパラメータを用いたぼけ補正処理に関する詳細は、後述する。 The image estimation device 103 is a device that executes the estimation process, and has a memory unit 103a, an acquisition unit 103b, and an estimation unit 103c. The image estimation device 103 performs blur correction on the acquired captured image to generate an estimated image. A multi-layer neural network is used for blur correction, and information on the network parameters is read from the memory unit 103a. The network parameters are learned by the learning device 101, and the image estimation device 103 reads the network parameters from the memory unit 101a via the network 107 in advance and stores them in the memory unit 103a. The stored network parameters may be their numerical values or may be in an encoded format. Details regarding the learning of the network parameters and the blur correction process using the network parameters will be described later.

出力画像は、表示装置１０４、記録媒体１０５、および出力装置１０６の少なくとも１つに出力される。表示装置１０４は、例えば液晶ディスプレイやプロジェクタなどである。ユーザは、表示装置１０４を介して、処理途中の画像を確認しながら編集作業などを行うことができる。記録媒体１０５は、例えば半導体メモリ、ハードディスク、ネットワーク上のサーバー等である。出力装置１０６は、プリンタなどである。画像推定装置１０３は、必要に応じて現像処理やその他の画像処理を行う機能を有する。 The output image is output to at least one of the display device 104, the recording medium 105, and the output device 106. The display device 104 is, for example, a liquid crystal display or a projector. The user can perform editing work and the like while checking the image being processed via the display device 104. The recording medium 105 is, for example, a semiconductor memory, a hard disk, a server on a network, etc. The output device 106 is, for example, a printer. The image estimation device 103 has a function of performing development processing and other image processing as necessary.

次に、図５を参照して、本実施例における学習装置１０１により実行されるネットワークパラメータの学習方法（学習済みモデルの製造方法）に関して説明する。図５は、ネットワークパラメータの学習に関するフローチャートである。図５の各ステップは、主に、学習装置１０１の取得部１０１ｂ、算出部１０１ｃ、更新部１０１ｄ、および生成部１０１ｅにより実行される。 Next, a method for learning network parameters (a method for manufacturing a trained model) executed by the learning device 101 in this embodiment will be described with reference to FIG. 5. FIG. 5 is a flowchart related to learning network parameters. Each step in FIG. 5 is executed mainly by the acquisition unit 101b, the calculation unit 101c, the update unit 101d, and the generation unit 101e of the learning device 101.

まず、図５のステップＳ１０１において、取得部１０１ｂは、正解パッチ（正解画像）と訓練パッチ（訓練画像）を取得する。正解パッチは相対的にぼけが少ない画像であり、訓練パッチは相対的にぼけが多い画像である。なお、パッチとは既定の画素数（例えば、６４×６４画素など）を有する画像を指す。また、正解パッチと訓練パッチの画素数は、必ずしも一致する必要はない。本実施例では、多層のニューラルネットワークのネットワークパラメータの学習に、ミニバッチ学習を使用する。このためステップＳ１０１では、複数組の正解パッチと訓練パッチを取得する。ただし本実施例は、これに限定されるものではなく、オンライン学習またはバッチ学習を用いてもよい。 First, in step S101 of FIG. 5, the acquisition unit 101b acquires a correct answer patch (correct answer image) and a training patch (training image). The correct answer patch is an image with relatively little blur, and the training patch is an image with relatively more blur. Note that a patch refers to an image having a predetermined number of pixels (e.g., 64×64 pixels). The number of pixels of the correct answer patch and the training patch do not necessarily need to be the same. In this embodiment, mini-batch learning is used to learn the network parameters of the multi-layered neural network. For this reason, in step S101, multiple pairs of correct answer patches and training patches are acquired. However, this embodiment is not limited to this, and online learning or batch learning may be used.

本実施例は、以下の方法により正解パッチと訓練パッチを取得するが、これに限定されるものではない。本実施例は、記憶部１０１ａに記憶されている複数の原画像を被写体として、撮像シミュレーションを行うことにより、収差や回折が実質的にない高解像撮像画像と収差や回折のある低解像撮像画像を複数生成する。そして、複数の高解像撮像画像と低解像撮像画像各々から同一位置の部分領域を抽出することで、複数の正解パッチと訓練パッチを取得する。本実施例において、原画像は未現像のＲＡＷ画像であり、正解パッチと訓練パッチも同様にＲＡＷ画像であるが、これに限定されるものではなく、現像後の画像でもよい。また、部分領域の位置とは、部分領域の中心を指す。複数の原画像は、様々な被写体、すなわち、様々な強さと方向のエッジや、テクスチャ、グラデーション、平坦部などを有する画像である。原画像は、実写画像でもよいし、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）により生成した画像でもよい。 In this embodiment, the correct patch and the training patch are obtained by the following method, but the present invention is not limited thereto. In this embodiment, a plurality of original images stored in the storage unit 101a are used as subjects, and imaging simulation is performed to generate a plurality of high-resolution images substantially free of aberration and diffraction and a plurality of low-resolution images with aberration and diffraction. Then, a plurality of correct patches and training patches are obtained by extracting partial areas at the same position from each of the plurality of high-resolution images and the plurality of low-resolution images. In this embodiment, the original image is an undeveloped RAW image, and the correct patch and the training patch are also RAW images, but the present invention is not limited thereto and may be images after development. In addition, the position of the partial area refers to the center of the partial area. The plurality of original images are images having various subjects, that is, edges of various strengths and directions, textures, gradations, flat parts, etc. The original images may be real images or images generated by CG (Computer Graphics).

好ましくは、原画像は、撮像素子１０２ｂの輝度飽和値よりも高い輝度値を有する。これは、実際の被写体においても、特定の露出条件で撮像装置１０２により撮影を行った際、輝度飽和値に収まらない被写体が存在するためである。高解像撮像画像は、原画像を縮小し、撮像素子１０２ｂの輝度飽和値でクリッピング処理することによって生成される。特に、原画像として実写画像を使用する場合、既に収差や回折によってぼけが発生しているため、縮小することでぼけの影響を小さくし、高解像（高品位）な画像にすることができる。なお、原画像に高周波成分が充分に含まれている場合、縮小は行わなくてもよい。低解像撮像画像は、高解像撮像画像と同様に縮小し、光学系１０２ａの収差や回折によるぼけの付与を行った後、輝度飽和値によってクリッピング処理することで生成する。光学系１０２ａは、複数のレンズステート（ズーム、絞り、合焦距離の状態）と像高、アジムスによって異なる収差や回折を有する。このため、原画像ごとに異なるレンズステートや像高、アジムスの収差や回折によるぼけを付与することで、複数の低解像撮像画像を生成する。 Preferably, the original image has a luminance value higher than the luminance saturation value of the image sensor 102b. This is because, even in actual subjects, there are subjects that do not fall within the luminance saturation value when photographed by the image sensor 102 under specific exposure conditions. The high-resolution image is generated by reducing the original image and clipping it with the luminance saturation value of the image sensor 102b. In particular, when a real image is used as the original image, blurring has already occurred due to aberration and diffraction, so by reducing the image, the effect of blurring can be reduced and a high-resolution (high-quality) image can be obtained. Note that if the original image contains sufficient high-frequency components, reduction is not necessary. The low-resolution image is generated by reducing the image in the same way as the high-resolution image, adding blurring due to the aberration and diffraction of the optical system 102a, and then clipping it with the luminance saturation value. The optical system 102a has different aberrations and diffractions depending on a plurality of lens states (zoom, aperture, and focal distance states), image height, and azimuth. For this reason, multiple low-resolution captured images are generated by adding different lens states, image heights, azimuth aberrations, and diffraction blurs to each original image.

なお、縮小とぼけの付与は順序を逆にしてもよい。ぼけの付与を先に行う場合、縮小を考慮して、ぼけのサンプリングレートを細かくする必要がある。ＰＳＦ（点像強度分布）ならば空間のサンプリング点を細かくし、ＯＴＦ（光学伝達関数）ならば最大周波数を大きくすればよい。また必要に応じて、付与するぼけには、撮像装置１０２に含まれる光学ローパスフィルタなどの成分を加えてもよい。なお、低解像撮像画像の生成で付与するぼけには、歪曲収差を含めない。歪曲収差が大きいと、被写体の位置が変化し、正解パッチと訓練パッチで被写体が異なる可能性があるためである。このため、本実施例で学習するニューラルネットワークは歪曲収差を補正しない。歪曲収差はバイリニア補間やバイキュービック補間などを用いて、ぼけ補正後、個別に補正する。 The order of shrinking and blurring may be reversed. If blurring is performed first, the sampling rate of the blur must be finer, taking shrinking into consideration. In the case of a PSF (point spread function), the spatial sampling points should be finer, and in the case of an OTF (optical transfer function), the maximum frequency should be increased. If necessary, a component such as an optical low-pass filter included in the imaging device 102 may be added to the blur to be added. Note that distortion is not included in the blur to be added when generating a low-resolution captured image. This is because if distortion is large, the position of the subject changes, and the subject may differ between the correct patch and the training patch. For this reason, the neural network learned in this embodiment does not correct distortion. Distortion is corrected individually after blur correction using bilinear interpolation, bicubic interpolation, etc.

次に、生成された高解像撮像画像から、規定の画素サイズの部分領域を抽出し、正解パッチとする。低解像撮像画像から、抽出位置と同じ位置から部分領域を抽出し、訓練パッチとする。本実施例では、ミニバッチ学習を使用するため、生成された複数の高解像撮像画像と低解像撮像画像から、複数の正解パッチと訓練パッチを取得する。なお、原画像はノイズ成分を有していてもよい。この場合、原画像に含まれるノイズを含めて被写体であるみなして正解パッチと訓練パッチが生成されると考えることができるため、原画像のノイズは特に問題にならない。 Next, a partial area of a specified pixel size is extracted from the generated high-resolution captured image and used as a correct patch. A partial area is extracted from the low-resolution captured image at the same position as the extraction position and used as a training patch. In this embodiment, mini-batch learning is used, so multiple correct patches and training patches are obtained from the multiple high-resolution captured images and low-resolution captured images generated. Note that the original image may contain noise components. In this case, the correct patch and training patch can be generated by considering the noise contained in the original image as the subject, so noise in the original image does not pose a particular problem.

なお、収差・回折によるぼけ補正以外の処理に関しても、同様にシミュレーションで訓練画像と正解画像のペアを用意することで、学習工程を実行することができる。デノイジングに関しては、低ノイズの正解画像に想定されるノイズを付与することで、訓練画像を生成することができる。アップサンプリングに関しては、正解画像をダウンサンプリングすることで訓練画像を用意することができる。圧縮ノイズの除去に関しては、無圧縮または圧縮率の小さい正解画像を圧縮することで、訓練画像を生成することができる。収差・回折以外（デフォーカスぼけなど）のデブラーに関しては、ぼけの少ない正解画像に想定されるぼけを畳み込むことで、訓練画像を生成することができる。デフォーカスぼけの場合、距離に依存するため、複数の訓練画像と正解画像に異なる距離のデフォーカスぼけを畳み込むようにする。インペインティングに関しては、欠損のない正解画像に欠損を与えることで、訓練画像を生成することができる。デモザイキングに関しては、三板式の撮像素子などで撮像された正解画像をＢａｙｅｒ配列などで再サンプリングすることで、訓練画像を生成することができる。ディヘイズに関しては、霧や靄のない正解画像に対して散乱光を付与することで、訓練画像を生成することができる。霧や靄は、濃度や距離によって散乱光の強さが変化するため、異なる濃度や距離の散乱光に対して複数の訓練画像を生成する。高階調化に関しては、高階調な正解画像を低階調化することで訓練画像を得ることができる。リライティングに関しては、正解画像の被写体における法線、形状、反射率の分布が既知であれば、シミュレーションで異なる光源環境の訓練画像を生成することができる。ただし、この場合、計測の負荷が大きいため、実際に異なる照明環境で被写体を撮影して正解画像と訓練画像のペアを生成してもよい。 For processes other than blur correction due to aberration and diffraction, a pair of training images and correct images can be prepared in a similar simulation to execute the learning process. For denoising, training images can be generated by adding expected noise to a low-noise correct image. For upsampling, training images can be prepared by downsampling the correct image. For removing compression noise, training images can be generated by compressing a correct image that is uncompressed or has a low compression rate. For deblurring other than aberration and diffraction (such as defocus blur), training images can be generated by convolving an expected blur into a correct image with little blur. In the case of defocus blur, since it depends on the distance, defocus blur of different distances is convolved into multiple training images and the correct image. For inpainting, training images can be generated by adding defects to a correct image without defects. For demosaicing, training images can be generated by resampling a correct image captured by a three-chip image sensor or the like in a Bayer array or the like. Regarding dehaze, training images can be generated by adding scattered light to a ground truth image without fog or mist. Since the intensity of scattered light in fog or mist varies depending on the density and distance, multiple training images are generated for scattered light of different densities and distances. Regarding high gradation, training images can be obtained by reducing the gradation of a high-gradation ground truth image. Regarding relighting, if the normal, shape, and reflectance distribution of the subject in the ground truth image are known, training images in different light source environments can be generated by simulation. However, since this involves a large measurement load, pairs of ground truth and training images can also be generated by actually photographing the subject in different lighting environments.

続いてステップＳ１０２において、取得部１０１ｂは、学習工程で用いられるホワイトバランスに関する情報（学習条件情報、ホワイトバランス係数）を取得する。本実施例において、学習条件情報とは、例えば、「撮影時のホワイトバランスの設定」、「オートホワイトバランス設定」などの設定に関する情報、あるいは光源の色温度情報である。通常、デジタルカメラにはオートホワイトバランスと称し、自動的に光源の種類を判別して補正する機能が搭載されている。しかしながら、被写体に白色が含まれない場合には光源の判別が容易にできなくなる。このためデジタルカメラには、使用者が光源の種類をメニューから選択するプリセットホワイトバランス機能や、光源の色温度等を直接指定できるマニュアルホワイトバランス機能が搭載されることが一般的である。 Next, in step S102, the acquisition unit 101b acquires information (learning condition information, white balance coefficient) related to the white balance used in the learning process. In this embodiment, the learning condition information is, for example, information related to settings such as "white balance setting at the time of shooting" and "auto white balance setting", or color temperature information of the light source. Typically, digital cameras are equipped with a function called auto white balance, which automatically identifies and corrects the type of light source. However, if the subject does not contain white, it is not easy to identify the light source. For this reason, digital cameras are generally equipped with a preset white balance function that allows the user to select the type of light source from a menu, and a manual white balance function that allows the user to directly specify the color temperature of the light source, etc.

プリセットホワイトバランス機能では、白熱電球、晴天、曇天、蛍光灯などのそれぞれの撮影条件に適したホワイトバランス係数（色別のゲイン値）が用意されている。これらのホワイトバランス係数は色温度と対応しており、例えば白熱電球であれば３０００Ｋ、曇天であれば６０００Ｋとなる。色温度が３０００Ｋであれば、被写体が本来よりも赤みがかった撮影環境を想定するため、ホワイトバランス係数はＲ成分よりもＢ成分の方が大きくなる。逆に色温度が６０００Ｋであれば、被写体が本来よりも青みがかった撮影環境を想定するため、ホワイトバランス係数はＢ成分よりもＲ成分の方が大きくなる。つまり、学習時に色温度が３０００Ｋのホワイトバランス係数で学習すれば、本来よりもＢ成分の係数値が大きくなるため、推定精度もＲ成分よりＢ成分の方が高くなる。逆に、学習時に色温度が６０００Ｋのホワイトバランス係数で学習すれば、本来よりもＲ成分の係数値が大きくなるため、推定精度もＢ成分よりＲ成分の方が高くなる。 The preset white balance function provides white balance coefficients (gain values for each color) suitable for each shooting condition, such as incandescent light bulbs, sunny days, cloudy days, and fluorescent lights. These white balance coefficients correspond to color temperatures, for example, 3000K for incandescent light bulbs and 6000K for cloudy days. If the color temperature is 3000K, the shooting environment is assumed to be redder than the subject's actual color, so the white balance coefficient is larger for the B component than the R component. Conversely, if the color temperature is 6000K, the shooting environment is assumed to be bluish than the subject's actual color, so the white balance coefficient is larger for the R component than the B component. In other words, if learning is performed using a white balance coefficient with a color temperature of 3000K, the coefficient value of the B component will be larger than the actual color, and the estimation accuracy will be higher for the B component than the R component. Conversely, if learning is performed using a white balance coefficient with a color temperature of 6000K, the coefficient value of the R component will be larger than the actual color, and the estimation accuracy will be higher for the R component than the B component.

このように、学習条件情報を色温度として選択できるようにし、それぞれの色温度に対応したネットワークパラメータを用いることで、ユーザがＲＧＢのどの色の推定精度を優先するかを選択することができる。本実施例では、学習条件情報を「撮影時のホワイトバランスの設定」とし、ステップＳ１０２では正解パッチまたは訓練パッチの元となるＲＡＷ画像の撮影時に設定されたホワイトバランス係数を取得する。ホワイトバランス係数はＲＡＷ画像のヘッダー情報から取得してもよいし、撮像装置１０２から取得してもよい。なお、以降の説明において、ヘッダー情報と記載されている場合には画像の付加情報を表しており、フッター情報であってもよい。また本実施例では、撮影時に設定されたホワイトバランス係数を取得するが、学習条件情報を「オートホワイトバランス設定」として、撮像装置が自動判定して算出したホワイトバランス係数を用いてもよい。 In this way, by allowing the learning condition information to be selected as a color temperature and using network parameters corresponding to each color temperature, the user can select which RGB color estimation accuracy is to be prioritized. In this embodiment, the learning condition information is set to "white balance setting at the time of shooting", and in step S102, the white balance coefficient set at the time of shooting the RAW image that is the source of the correct patch or training patch is obtained. The white balance coefficient may be obtained from the header information of the RAW image, or from the imaging device 102. In the following description, when "header information" is mentioned, it represents additional information of the image, and may be footer information. In this embodiment, the white balance coefficient set at the time of shooting is obtained, but the learning condition information may be set to "auto white balance setting" and the white balance coefficient automatically determined and calculated by the imaging device may be used.

続いてステップＳ１０３において、生成部１０１ｅは、ステップＳ１０１にて取得された複数の訓練画像のうち少なくとも一の訓練画像を選択し、選択された訓練画像をネットワークへ入力して出力画像を生成する。複数の訓練画像の全てを選択する（訓練画像の全てをネットワークへ入力し、それら全ての出力を用いてネットワークパラメータを更新する）場合をバッチ学習と呼ぶ。この方法は、訓練画像の数が増えるにつれて、演算負荷が膨大になる。一枚の訓練画像のみを選択する場合（ネットワークパラメータの更新に一枚の訓練画像のみを用いて、更新ごとに異なる訓練画像を用いる）場合をオンライン学習と呼ぶ。この手法は、訓練画像の総数が増えても演算量が増大しないが、一枚の訓練画像に存在するノイズの影響を受けやすい。このため、複数の訓練画像から少数（ミニバッチ）を選択し、それらを用いてネットワークパラメータの更新を行なうミニバッチ法を用いることが好ましい。次の更新では、異なる少数の訓練画像を選択して用いる。この処理を繰り返すことにより、バッチ学習とオンライン学習の弱点を小さくすることができる。 Next, in step S103, the generation unit 101e selects at least one training image from the multiple training images acquired in step S101, inputs the selected training image to the network, and generates an output image. The case where all of the multiple training images are selected (all of the training images are input to the network, and the network parameters are updated using all of the outputs) is called batch learning. This method imposes a huge computational load as the number of training images increases. The case where only one training image is selected (only one training image is used to update the network parameters, and a different training image is used for each update) is called online learning. This method does not increase the amount of computation even if the total number of training images increases, but is susceptible to the influence of noise present in one training image. For this reason, it is preferable to use a mini-batch method in which a small number (mini-batch) of multiple training images is selected and used to update the network parameters. In the next update, a small number of different training images are selected and used. By repeating this process, the weaknesses of batch learning and online learning can be reduced.

ここで、図１を参照して、多層のニューラルネットワークで行われる処理に関して説明する。図１は、畳み込みニューラルネットワーク（ＣＮＮ）を示す図である。ただし本実施例は、これに限定されるものではなく、例えばＣＮＮに残差ネットワークを採用することができ、または、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）などを用いてもよい。なお図１では、簡単のため、入力する訓練画像２０１を一枚だけ描画しているが、実際には選択された複数の訓練画像それぞれに対して、出力画像が生成される。訓練画像２０１は、ＲＡＷ画像を色成分ごとに三次元方向に配列した画像である。 Now, referring to FIG. 1, the processing performed in a multi-layered neural network will be described. FIG. 1 is a diagram showing a convolutional neural network (CNN). However, this embodiment is not limited to this, and for example, a residual network can be adopted for the CNN, or a GAN (generative adversarial network) or the like may be used. Note that in FIG. 1, for simplicity, only one training image 201 is drawn to be input, but in reality, an output image is generated for each of the multiple selected training images. The training image 201 is an image in which RAW images are arranged in a three-dimensional direction for each color component.

図６は、画像の色成分に関する説明図である。本実施例において、訓練画像は、図６（Ａ）に示されるようなＢａｙｅｒ配列の画像（ＲＡＷ画像）である。ここでＲＧＢは、それぞれ赤、緑、青を表す。図６（Ａ）のＢａｙｅｒ配列から、各色の成分だけを配列し直した構成が図６（Ｂ）である。Ｇは、Ｇ１とＧ２の２種類があるため、それぞれを抽出して配列する。図６（Ｂ）の四枚の画像を三次元方向に配列した４チャンネルの画像が、図１における訓練画像２０１である。この作業は必ずしも必要ではないが、収差・回折は波長によって変化するため、同一のぼけを持つ色成分を配列させた方が補正しやすい。また、ＲＧＢが同一次元内に配列されていると、局所的に異なる明るさを有する画素が混合されるため、推定精度が低下しやすい。このため、訓練画像を色成分ごとに分離することが好ましい。なお、ここではＢａｙｅｒ配列の場合を示しているが、その他の配列（ハニカム構造など）に関しても同様である。図１では描画を簡略化するため、訓練画像２０１を４×４の４チャンネル画像としているが、縦横の画像サイズはこれに限定されるものではない。 Figure 6 is an explanatory diagram of the color components of an image. In this embodiment, the training image is a Bayer array image (RAW image) as shown in Figure 6 (A). Here, RGB represents red, green, and blue, respectively. Figure 6 (B) shows a configuration in which only the components of each color are rearranged from the Bayer array of Figure 6 (A). Since there are two types of G, G1 and G2, each is extracted and arranged. The four-channel image in which the four images of Figure 6 (B) are arranged in a three-dimensional direction is the training image 201 in Figure 1. This work is not necessarily necessary, but since aberration and diffraction change depending on the wavelength, it is easier to correct by arranging color components with the same blur. In addition, if RGB are arranged in the same dimension, pixels with locally different brightness are mixed, so the estimation accuracy is likely to decrease. For this reason, it is preferable to separate the training image for each color component. Note that although the case of a Bayer array is shown here, the same applies to other arrays (such as a honeycomb structure). In FIG. 1, to simplify the drawing, the training image 201 is a 4x4 four-channel image, but the horizontal and vertical image sizes are not limited to this.

本実施例において、訓練画像および正解画像はそれぞれ、周期的に配列された複数の色成分を有し、訓練画像または正解画像の各色成分のみで構成される色成分画像を生成する生成工程を設けてもよい。ここで、色成分画像を生成する工程は、訓練画像に対してニューラルネットワークへの入力前に実行され、正解画像に対して誤差の算出前に実行される。 In this embodiment, the training images and the correct answer image each have a plurality of color components arranged periodically, and a generation step may be provided for generating color component images consisting of only each color component of the training images or the correct answer image. Here, the step of generating color component images is performed on the training images before inputting them into the neural network, and on the correct answer image before calculating the error.

ＣＮＮは複数の層構造になっており、各層で線型変換と非線型変換が実行される。線型変換は、入力された画像（または特徴マップ）とフィルタの畳み込み、およびバイアス（図１中のｂｉａｓ）との和で表現される。各層におけるネットワークパラメータ（フィルタのウエイトとバイアス）を学習工程によって更新する。非線形変換は、活性化関数（ＡｃｔｉｖａｔｉｏｎＦｕｎｃｔｉｏｎ）と呼ばれる非線型関数による変換である（図１中のＡＦ）。活性化関数の例としては、シグモイド関数やハイパボリックタンジェント関数などがあり、本実施例では以下の式（１）で表されるＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）が用いられる。 CNN has a multi-layer structure, and linear and nonlinear transformations are performed in each layer. Linear transformations are expressed as the sum of the input image (or feature map), the convolution of the filter, and the bias (bias in Figure 1). The network parameters (filter weights and biases) in each layer are updated through a learning process. Nonlinear transformations are transformations using a nonlinear function called an activation function (AF in Figure 1). Examples of activation functions include the sigmoid function and the hyperbolic tangent function, and in this embodiment, ReLU (Rectified Linear Unit) expressed by the following formula (1) is used.

式（１）において、ｍａｘは、引数のうち最大値を出力するＭＡＸ関数を表す。 In formula (1), max represents the MAX function that outputs the maximum value among the arguments.

入力層に入力された訓練画像２０１は、第１畳み込み層で複数のフィルタ２０２それぞれとのコンボリューションと、バイアスとの和を取られる。フィルタ２０２それぞれのチャンネル数は、訓練画像２０１と一致し、訓練画像２０１のチャンネル数が２以上の場合、３次元フィルタとなる（三次元目がチャンネル数を表す）。なお、フィルタの縦横の大きさは任意である。コンボリューションと和の結果は、活性化関数によって非線形変換が施され、第１特徴マップ２０３が第１中間層に出力される。ここで、第１特徴マップ２０３のチャンネル数（三次元方向の配列数）は、フィルタ２０２の数と同じである。次に、第２畳み込み層へ第１特徴マップ２０３が入力され、前述と同様に複数のフィルタ２０４のそれぞれとのコンボリューションと、バイアスとの和が取られる。その結果を非線形変換し、以下同様に畳み込み層の数だけ繰り返す。一般に、畳み込み層が３層以上あるＣＮＮが、ディープラーニングに該当する。最後の畳み込み層から出力された結果が、ＣＮＮの出力画像２１１である。なお、最後の畳み込み層では、活性化関数による非線形変換を実行しなくてもよい。 The training image 201 input to the input layer is convolved with each of the multiple filters 202 in the first convolution layer, and summed with the bias. The number of channels of each filter 202 matches the training image 201, and if the number of channels of the training image 201 is two or more, it becomes a three-dimensional filter (the third dimension represents the number of channels). The length and width of the filter are arbitrary. The result of the convolution and sum is nonlinearly transformed by an activation function, and the first feature map 203 is output to the first hidden layer. Here, the number of channels (the number of arrays in the three-dimensional direction) of the first feature map 203 is the same as the number of filters 202. Next, the first feature map 203 is input to the second convolution layer, and convolution with each of the multiple filters 204 and summed with the bias are taken as described above. The result is nonlinearly transformed, and the same process is repeated for the number of convolution layers. In general, a CNN with three or more convolution layers corresponds to deep learning. The result output from the final convolutional layer is the CNN output image 211. Note that in the final convolutional layer, it is not necessary to perform nonlinear transformation using an activation function.

続いてステップＳ１０４において、生成部１０１ｅはステップＳ１０２で取得したホワイトバランスに関する情報（ホワイトバランス係数）を用いて、出力画像２１１および正解画像２２１を補正する。ここで、Ｒ、Ｇ、Ｂのホワイトバランス係数をそれぞれＷ_ｒ、Ｗ_ｇ、Ｗ_ｂ、調整前の画像をそれぞれＩ_ｒ０、Ｉ_ｇ０、Ｉ_ｂ０、調整後の画像をそれぞれＩ_ｒ、Ｉ_ｇ、Ｉ_ｂとする。このとき、ホワイトバランス係数による調整後の画像Ｉ_ｒ、Ｉ_ｇ、Ｉ_ｂはそれぞれ、式（２）～（４）のように表される。 Next, in step S104, the generation unit 101e corrects the output image 211 and the correct image 221 using the information on white balance (white balance coefficients) acquired in step S102. Here, the white balance coefficients of R, G, and B are _Wr , _Wg , and _Wb , respectively, the images before adjustment are _Ir0 , _Ig0 , and _Ib0 , respectively, and the images after adjustment are _Ir , _Ig , and _Ib , respectively. At this time, the images _Ir , _Ig , and _Ib after adjustment using the white balance coefficients are expressed by equations (2) to (4), respectively.

なお、式（２）～（４）のようにホワイトバランス係数を直接色ごとに掛けるのではなく、規格化してから係数を掛けてもよい。その場合、例えばＧの係数で規格化するのであれば、Ｒ、Ｂの係数に対して、Ｇの係数で割り、それぞれＲとＢの画像に対して掛ければよい。また、図６のようにＧがＧ１およびＧ２の二つに分かれる場合には、それぞれのホワイトバランス係数を掛けてもよいし、Ｇ１とＧ２の平均値を計算し、平均のホワイトバランス係数をＧの画像に掛けてもよい。なお、ＲＡＷ画像に含まれるオプティカルブラックは色成分に依存しないため、オプティカルブラックを考慮する場合は、式（２）～（４）の計算を実施する前にオプティカルブラックの値を各画像から差し引き、計算後に加算すればよい。 Note that instead of directly multiplying each color by the white balance coefficient as in equations (2) to (4), it is also possible to standardize and then multiply the coefficients. In that case, for example, if standardizing by the G coefficient, the R and B coefficients can be divided by the G coefficient and multiplied by the R and B images respectively. Also, if G is divided into two, G1 and G2, as in Figure 6, it is possible to multiply each by its own white balance coefficient, or to calculate the average value of G1 and G2 and multiply the G image by the average white balance coefficient. Note that optical black contained in a RAW image does not depend on the color components, so when optical black is taken into consideration, the optical black value can be subtracted from each image before carrying out the calculations of equations (2) to (4) and then added after the calculations.

続いて、ホワイトバランス係数を適用した出力画像２１１および正解画像２２１に対して必要に応じてクリッピング処理を行う。本実施例において、クリッピング処理とは、指定した上限値以上の輝度値を上限値に置き換える処理である。ホワイトバランス係数による調整を行う前の出力画像２１１または正解画像２２１において、輝度飽和値（画素が取り得る上限値）に達した画素がある場合、クリッピング処理を実施する。例えば全色輝度飽和となっている場合、調整前においてＲＧＢで同じ輝度値であった画素は、ホワイトバランス係数をかけることにより、調整した分だけ逆に色付くことになる。よって、この対策として、ホワイトバランス係数による調整後の出力画像２１１と正解画像２２１に対して輝度飽和値でクリッピング処理を行う。なお、このクリッピング処理は輝度飽和部以外の画素について影響がないため、輝度飽和の有無に関わらず全画素実施してもよいし、輝度飽和の有無によって処理を分岐してもよい。また、画素ごとの輝度飽和のばらつきを考慮し、輝度飽和値よりも少し低めの値を設定してクリッピング処理を行ってもよい。また、この処理は輝度飽和に達していない場合は不要な処理となるため、必ずしも実施する必要はない。 Next, clipping is performed as necessary on the output image 211 and correct image 221 to which the white balance coefficient has been applied. In this embodiment, clipping is a process of replacing a luminance value equal to or greater than a specified upper limit value with the upper limit value. If there is a pixel that has reached a luminance saturation value (the upper limit value that a pixel can take) in the output image 211 or correct image 221 before adjustment using the white balance coefficient, clipping is performed. For example, when all colors are saturated with luminance, pixels that had the same luminance value in RGB before adjustment will be colored in the opposite direction by the amount of adjustment when the white balance coefficient is applied. Therefore, as a countermeasure, clipping is performed at the luminance saturation value on the output image 211 and correct image 221 after adjustment using the white balance coefficient. Note that this clipping does not affect pixels other than the luminance saturation part, so it may be performed on all pixels regardless of the presence or absence of luminance saturation, or the process may be branched depending on the presence or absence of luminance saturation. In addition, taking into account the variation in luminance saturation for each pixel, clipping may be performed by setting a value slightly lower than the luminance saturation value. Also, this process is unnecessary if brightness saturation has not been reached, so it does not necessarily have to be performed.

続いてステップＳ１０５において、算出部１０１ｃは、ホワイトバランス係数による調整後の出力画像２１１と正解画像２２１との差（誤差）を算出する。このとき本実施例では、出力画像２１１および正解画像２２１に対してガンマ補正を実行してから誤差を算出する。ガンマ補正は、例えば入力の輝度値を冪乗する処理であり、その冪指数として１／２．２などが用いられる。正解画像２２１は訓練画像２０１と同様に、色成分ごとに配列してチャンネル方向にスタックされている。本実施例において、算出部１０１ｃは、以下の式（５）を用いて誤差Ｌを算出する。 Next, in step S105, the calculation unit 101c calculates the difference (error) between the output image 211 after adjustment using the white balance coefficient and the correct image 221. In this embodiment, gamma correction is performed on the output image 211 and the correct image 221, and then the error is calculated. Gamma correction is, for example, a process of raising an input luminance value to a power, and 1/2.2 or the like is used as the exponent. The correct image 221 is arranged for each color component and stacked in the channel direction, similar to the training image 201. In this embodiment, the calculation unit 101c calculates the error L using the following formula (5).

式（５）において、ｔは正解画像２２１の輝度値、ｙは出力画像２１１の輝度値、ｊは画素の番号、Ｎは総画素数、ｇはガンマ補正を示す。式（５）ではユークリッドノルムを用いているが、正解画像と出力画像の差異を表す値であれば、他の指標を用いてもよい。なお本実施例では、出力画像２１１および正解画像２２１に対してガンマ補正を実行してから誤差を算出しているが、この処理は必須ではなく、ガンマ補正を行わずに誤差を算出してもよい。 In formula (5), t is the luminance value of the correct image 221, y is the luminance value of the output image 211, j is the pixel number, N is the total number of pixels, and g is gamma correction. Formula (5) uses the Euclidean norm, but other indices may be used as long as they are values that represent the difference between the correct image and the output image. Note that in this embodiment, gamma correction is performed on the output image 211 and the correct image 221 before the error is calculated, but this process is not essential, and the error may be calculated without performing gamma correction.

続いてステップＳ１０６において、更新部１０１ｄは、ステップＳ１０５にて算出された誤差からネットワークパラメータの更新量を算出し、ネットワークパラメータを更新する。ここでは、誤差逆伝搬法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）が用いられる。誤差逆伝搬法では、誤差の微分に基づいて更新量を算出する。ただし、本実施例はこれに限定されるものではない。 Next, in step S106, the update unit 101d calculates the update amount of the network parameters from the error calculated in step S105, and updates the network parameters. Here, the backpropagation method is used. In the backpropagation method, the update amount is calculated based on the derivative of the error. However, this embodiment is not limited to this.

続いてステップＳ１０７において、更新部１０１ｄは、所定の終了条件を満たすか否か、すなわち、ネットワークパラメータの最適化が終了したか否かを判定する。ここで所定の終了条件とは、例えば、学習工程が既定の時間に達した場合、パラメータの更新回数が既定の回数に達した場合、パラメータ更新には用いない訓練画像と正解画像を用意しておき、その出力画像と正解画像の誤差が所定の値以下になった場合などである。または、ユーザが最適化終了を指示してもよい。所定の終了条件を満たさない場合、ステップＳ１０３に戻り、更新部１０１ｄは新たなミニバッチを取得してネットワークパラメータを更新する。一方、所定の終了条件を満たす場合、ステップＳ１０８へ進む。 Next, in step S107, the update unit 101d determines whether a predetermined termination condition is met, i.e., whether optimization of the network parameters is completed. Here, the predetermined termination condition is, for example, when the learning process reaches a predetermined time, when the number of parameter updates reaches a predetermined number, when training images and correct images that are not used for parameter update are prepared and the error between the output image and the correct image becomes equal to or less than a predetermined value, etc. Alternatively, the user may instruct the end of optimization. If the predetermined termination condition is not met, the process returns to step S103, and the update unit 101d obtains a new mini-batch and updates the network parameters. On the other hand, if the predetermined termination condition is met, the process proceeds to step S108.

ステップＳ１０８において、更新部１０１ｄは、更新したネットワークパラメータを記憶部１０１ａに出力して記憶させる。本実施例では、異なる学習条件情報（ホワイトバランスに関する情報）ごとにネットワークパラメータを学習するため、ネットワークパラメータとそれに対応する学習条件情報とを合わせて記憶部１０１ａに記憶する。以上の学習工程により、色ごとの推定精度のばらつきを低減した多層のニューラルネットワークを得ることができる。 In step S108, the update unit 101d outputs the updated network parameters to the storage unit 101a for storage. In this embodiment, in order to learn the network parameters for each different learning condition information (information related to white balance), the network parameters and the corresponding learning condition information are stored together in the storage unit 101a. Through the above learning process, it is possible to obtain a multi-layered neural network that reduces the variation in estimation accuracy for each color.

また本実施例では、図１に示されるように出力画像２１１および正解画像２２１に対してホワイトバランス係数で調整を行い、調整後の各画像に対してガンマ補正を実施するが、この順番でなくてもよい。例えば、ガンマ補正後にホワイトバランス係数による調整を行うこともできる。この場合、ガンマ補正による非線形変換後の出力画像２１１および正解画像２２１に対して、ホワイトバランス係数を用いた調整処理を実行する。なお、ガンマ補正とは、図７に示されるように、補正前後における輝度値の関係を示すカーブ（ガンマカーブ）が傾き１の直線（図７中の一点鎖線）以上の位置に存在する処理である。図７は、ガンマ補正に関する説明図である。図７において、横軸はガンマ補正前の輝度値、縦軸はガンマ補正後の輝度値をそれぞれ示す。 In this embodiment, as shown in FIG. 1, the output image 211 and the correct image 221 are adjusted using a white balance coefficient, and gamma correction is then performed on each adjusted image, but this order is not essential. For example, adjustment using a white balance coefficient can be performed after gamma correction. In this case, adjustment processing using a white balance coefficient is performed on the output image 211 and the correct image 221 after nonlinear conversion using gamma correction. Note that gamma correction is processing in which a curve (gamma curve) showing the relationship between luminance values before and after correction is at a position equal to or greater than a straight line with a slope of 1 (the dashed line in FIG. 7), as shown in FIG. 7. FIG. 7 is an explanatory diagram of gamma correction. In FIG. 7, the horizontal axis indicates the luminance value before gamma correction, and the vertical axis indicates the luminance value after gamma correction.

先にガンマ補正を行い、その後にホワイトバランス係数による調整を行う場合、後にガンマ補正を実行する場合とは異なるネットワークパラメータを生成することができる。また、ガンマ補正後に実行する場合、式（２）～（４）のように画像に対して調整を行ってもよいし、出力画像２１１と正解画像２２１の誤差に対して調整を行ってもよい。誤差Ｌに対する調整を行う場合、Ｒ、Ｇ、Ｂのホワイトバランス係数をＷ_ｒ、Ｗ_ｇ、Ｗ_ｂ、正解画像２２１をｔ_ｒ、ｔ_ｇ、ｔ_ｂ、出力画像２１１をｙ_ｒ、ｙ_ｇ、ｙ_ｂとするとき、誤差Ｌは以下の式（６）のように表される。 When gamma correction is performed first and then adjustment using the white balance coefficient is performed, a network parameter different from that generated when gamma correction is performed later can be generated. When performing after gamma correction, adjustment may be performed on the image as shown in formulas (2) to (4), or adjustment may be performed on the error between the output image 211 and the correct image 221. When adjusting for the error L, the error L is expressed by the following formula (6), where the white balance coefficients of _R , G, and _B are Wr, _Wg , and Wb, the correct image 221 is _tr , _tg , and _tb , and the output image 211 is _yr , _yg , and _yb .

このように、色ごとに誤差を計算し、計算した誤差に対して、ホワイトバランス係数による調整を行い合算することもできる。このような処理でも、色ごとの推定精度のばらつき低減した学習を行うことができる。また、図６のようにＧがＧ１およびＧ２の二つに分かれる場合には、それぞれのホワイトバランス係数を掛けてもよいし、Ｇ１とＧ２との平均値を計算し、平均のホワイトバランス係数をＧの画像に掛けてもよい。 In this way, the error can be calculated for each color, and the calculated errors can be adjusted using the white balance coefficient and then summed up. This type of processing can also perform learning that reduces the variation in estimation accuracy for each color. Also, when G is divided into two, G1 and G2, as in Figure 6, the white balance coefficients of each can be multiplied, or the average value of G1 and G2 can be calculated and the image of G can be multiplied by the average white balance coefficient.

次に、図８を参照して、画像推定装置１０３で実行される推定工程に関して説明する。図８は、推定工程のフローチャートである。 Next, the estimation process executed by the image estimation device 103 will be described with reference to FIG. 8. FIG. 8 is a flowchart of the estimation process.

まず、ステップＳ２０１において、取得部１０３ｂは、撮像装置１０２または記録媒体１０５から、撮像画像を取得する。撮像画像は、未現像のＲＡＷ画像である。ＲＡＷ画像の輝度値が符号化されている場合、推定部１０３ｃは復号処理を実行する。また取得部１０３ｂは、撮像装置１０２または記録媒体１０５から、学習条件情報を取得する。なお、ステップＳ２０１における学習条件情報は、学習時のネットワークパラメータの選択に利用するパラメータであるため、撮影画像がオートホワイトバランス設定であっても、必ずしも「オートホワイトバランス設定」である必要はない。また、学習条件情報はユーザに自由に選択できるようにしてもよいし、撮像装置１０２が撮影シーンに応じて学習条件情報を自動で決めてもよい。 First, in step S201, the acquisition unit 103b acquires a captured image from the imaging device 102 or the recording medium 105. The captured image is an undeveloped RAW image. If the luminance value of the RAW image is encoded, the estimation unit 103c executes a decoding process. The acquisition unit 103b also acquires learning condition information from the imaging device 102 or the recording medium 105. Note that the learning condition information in step S201 is a parameter used to select network parameters during learning, and therefore does not necessarily have to be "auto white balance setting" even if the captured image is set to auto white balance. In addition, the learning condition information may be freely selected by the user, or the imaging device 102 may automatically determine the learning condition information according to the shooting scene.

続いてステップＳ２０２において、推定部１０３ｃは、ステップＳ２０１にて取得した学習条件情報に対応するネットワークパラメータを取得する。ネットワークパラメータは、学習装置１０１の記憶部１０１ａから読み出される。または、画像推定装置１０３の記憶部１０３ａに複数のネットワークパラメータを保存しておき、記憶部１０３ａから読み出してもよい。取得するネットワークパラメータは、ステップＳ３０１にて取得した学習条件情報と学習工程で用いられた学習条件情報とが互いに一致するもの、または、最も近いものである。 Next, in step S202, the estimation unit 103c acquires network parameters corresponding to the learning condition information acquired in step S201. The network parameters are read from the memory unit 101a of the learning device 101. Alternatively, multiple network parameters may be stored in the memory unit 103a of the image estimation device 103 and read from the memory unit 103a. The acquired network parameters are those in which the learning condition information acquired in step S301 matches or is closest to the learning condition information used in the learning process.

続いてステップＳ２０３において、推定部１０３ｃは、撮像画像からＣＮＮへ入力する入力画像を取得する。入力画像は、訓練画像と同様に、色成分ごとに配列して三次元方向にスタックされる。なお、推定工程の入力画像のサイズは、学習工程における訓練画像のサイズと、必ずしも一致する必要はない。 Next, in step S203, the estimation unit 103c acquires an input image to be input to the CNN from the captured image. The input image is arranged by color component and stacked in three dimensions, similar to the training image. Note that the size of the input image in the estimation process does not necessarily have to match the size of the training image in the learning process.

続いてステップＳ２０４において、推定部１０３ｃは、入力画像とネットワークパラメータに基づいて、推定画像を生成する。推定画像の生成には、学習工程と同様に、図１に示されるＣＮＮが用いられる。ただし、図１中の出力画像２１１が推定画像となり、それ以降の正解画像との誤差算出等の処理は行わない。 Next, in step S204, the estimation unit 103c generates an estimated image based on the input image and the network parameters. To generate the estimated image, the CNN shown in FIG. 1 is used, as in the learning process. However, the output image 211 in FIG. 1 becomes the estimated image, and no further processing such as error calculation with the correct image is performed.

続いてステップＳ２０５において、推定部１０３ｃは、撮像画像の所定の領域に対して推定が完了したか否かを判定する。推定が完了していない場合、ステップＳ２０３へ戻り、推定部１０３ｃは、撮像画像の所定の領域から新たな入力画像を取得する。推定に用いられるＣＮＮにおいて、出力画像のサイズが入力画像よりも小さくなる場合、所定の領域からオーバーラップして入力画像を取得する必要がある。所定の領域は、撮像画像の全体または一部である。撮像画像はＲＡＷ画像であるため、受光して得られた画像の他に、ヘッダー情報（画像の画素数や撮影時刻などの情報）や撮像素子のオプティカルブラックの情報が含まれていることがある。ヘッダー情報やオプティカルブラックは、収差・回折のぼけと無関係であるため、所定の領域からそれらを除いてもよい。 Next, in step S205, the estimation unit 103c determines whether estimation has been completed for a predetermined region of the captured image. If estimation has not been completed, the process returns to step S203, where the estimation unit 103c acquires a new input image from the predetermined region of the captured image. In the CNN used for estimation, if the size of the output image is smaller than the input image, it is necessary to acquire the input image by overlapping from the predetermined region. The predetermined region is the entire captured image or a part of it. Since the captured image is a RAW image, in addition to the image obtained by receiving light, header information (information such as the number of pixels of the image and the shooting time) and optical black information of the image sensor may be included. Since the header information and optical black are unrelated to aberration and diffraction blur, they may be excluded from the predetermined region.

続いてステップＳ２０６において、推定部１０３ｃは、生成された複数の推定画像を合成して、収差・回折によるぼけが補正された撮像画像を出力する。必要に応じて、推定部１０３ｃは、ヘッダー情報やオプティカルブラックの情報を含めて出力する。 Next, in step S206, the estimation unit 103c combines the generated estimated images to output a captured image in which blurring due to aberration and diffraction has been corrected. If necessary, the estimation unit 103c outputs the captured image including header information and optical black information.

以上の推定処理により、色ごとの推定精度のばらつきが少ないネットワークパラメータ用いて推定を行うことができる。これにより、収差・回折によるぼけの補正効果も色によって推定精度がばらつくことなく、より高精度な補正を実現することができる。また、推定工程後、ユーザが任意で露出補正などの編集を行い、現像処理により最終的な現像画像を得る。本実施例では、学習条件情報によってネットワークパラメータを切り替えて補正を実施する方法について述べたが、複数のネットワークパラメータを取得して、入力画像をそれぞれのネットワークに入力することで複数の出力画像を生成してもよい。こうすることで、学習条件情報が異なる出力画像を複数生成することができるため、例えばそれらを補間することによって、中間の学習条件情報の出力画像を生成することができる。例えば、学習条件情報が色温度Ｋ３０００と色温度Ｋ６０００であったとき、それぞれに対応したネットワークパラメータを用いて推定画像を生成し、これらを補間することで色温度Ｋ５０００相当の推定画像を出力することもできる。また、逆に学習条件情報は１つだけでもよく、特定のネットワークパラメータのみ撮像装置１０２または記録媒体１０５に保持しておいてもよい。 The above estimation process allows estimation to be performed using network parameters with little variation in estimation accuracy for each color. As a result, the correction effect of blur due to aberration and diffraction does not vary in estimation accuracy depending on the color, and more accurate correction can be achieved. In addition, after the estimation process, the user can arbitrarily perform editing such as exposure correction, and the final developed image is obtained by development processing. In this embodiment, a method of performing correction by switching network parameters according to learning condition information has been described, but multiple network parameters may be obtained and multiple output images may be generated by inputting an input image to each network. In this way, multiple output images with different learning condition information can be generated, and for example, an output image of intermediate learning condition information can be generated by interpolating them. For example, when the learning condition information is a color temperature K3000 and a color temperature K6000, estimated images can be generated using network parameters corresponding to each, and an estimated image equivalent to a color temperature K5000 can be output by interpolating these. Conversely, only one learning condition information may be used, and only specific network parameters may be stored in the imaging device 102 or the recording medium 105.

なお本実施例では、収差・回折によるぼけの補正について述べたが、アップサンプリングやデノイジング等の他の手法であっても、それらに対応した訓練画像と正解画像を用いてホワイトバランス係数による調整を行うことで同様の効果を得ることができる。 In this embodiment, we have described the correction of blur caused by aberration and diffraction, but the same effect can be obtained with other methods such as upsampling and denoising by adjusting the white balance coefficient using the corresponding training images and ground truth images.

次に、本発明の実施例２における画像処理システムに関して説明する。
図９は、本実施例における画像処理システム３００のブロック図である。図１０は、画像処理システム３００の外観図である。画像処理システム３００は、ネットワーク３０３を介して接続された学習装置３０１および撮像装置３０２を含む。 Next, an image processing system according to a second embodiment of the present invention will be described.
Fig. 9 is a block diagram of an image processing system 300 in this embodiment. Fig. 10 is an external view of the image processing system 300. The image processing system 300 includes a learning device 301 and an imaging device 302 connected via a network 303.

学習装置３０１は、記憶部３１１、取得部３１２、算出部３１３、更新部３１４、および生成部３１５を有し、ニューラルネットワークで収差・回折によるぼけを補正するためのネットワークパラメータを学習する。 The learning device 301 has a memory unit 311, an acquisition unit 312, a calculation unit 313, an update unit 314, and a generation unit 315, and learns network parameters for correcting blurring due to aberration and diffraction using a neural network.

撮像装置３０２は、被写体空間を撮像して撮像画像を取得し、読み出したネットワークパラメータを用いて撮像画像中の収差・回折によるぼけを補正する。撮像装置３０２は、光学系３２１および撮像素子３２２を有する。画像推定部３２３は、取得部３２３ａおよび推定部３２３ｂを有し、記憶部３２４に保存されたネットワークパラメータを用いて、撮像画像の補正を実行する。ネットワークパラメータは、学習装置３０１で事前に学習され、記憶部３１１に保存されている。撮像装置３０２は、記憶部３１１からネットワーク３０３を介してネットワークパラメータを読み出し、記憶部３２４に保存する。収差・回折によるぼけを補正した撮像画像（出力画像）は、記録媒体３２５に保存される。ユーザから出力画像の表示に関する指示が出された場合、保存された出力画像が読み出され、表示部３２６に表示される。なお、記録媒体３２５に既に保存された撮像画像を読み出し、画像推定部３２３でぼけ補正を行ってもよい。以上の一連の制御は、システムコントローラ３２７によって行われる。 The imaging device 302 captures an image of a subject space, and corrects blurring due to aberration and diffraction in the captured image using the read network parameters. The imaging device 302 has an optical system 321 and an imaging element 322. The image estimation unit 323 has an acquisition unit 323a and an estimation unit 323b, and performs correction of the captured image using the network parameters stored in the storage unit 324. The network parameters are learned in advance by the learning device 301 and stored in the storage unit 311. The imaging device 302 reads the network parameters from the storage unit 311 via the network 303 and stores them in the storage unit 324. The captured image (output image) in which blurring due to aberration and diffraction has been corrected is stored in the recording medium 325. When an instruction regarding display of the output image is issued by the user, the stored output image is read and displayed on the display unit 326. Note that the captured image already stored in the recording medium 325 may be read, and blurring correction may be performed by the image estimation unit 323. The above series of controls are performed by the system controller 327.

次に、図１１を参照して、本実施例における多層のニューラルネットワークで行われる処理に関して説明する。図１１は、本実施例における畳み込みニューラルネットワークを示す図である。図１１は、訓練画像４０１に対するホワイトバランス係数による調整方法の点で、実施例１における図１とは異なる。本実施例では、図１１に示されるように、訓練画像４０１に対して、まずホワイトバランス係数による調整処理を行い、その後にニューラルネットワークに入力される。そして、出力画像４１１に対してガンマ補正が実行される。正解画像４２１に関しては実施例１と同様であり、正解画像４２１にホワイトバランス係数による調整処理を実行し、その後にガンマ補正を実行する。そして、ガンマ補正後の出力画像４１１と正解画像４２１との差（誤差）を算出する。なお、ホワイトバランス後のクリッピング処理は必要に応じて実行する。また、フィルタ４０２、第１特徴マップ４０３、およびフィルタ４０４は、図１のフィルタ２０２、第１特徴マップ２０３、およびフィルタ２０４とそれぞれ同様であるため、それらの説明は省略する。 Next, referring to FIG. 11, the processing performed by the multi-layered neural network in this embodiment will be described. FIG. 11 is a diagram showing a convolutional neural network in this embodiment. FIG. 11 differs from FIG. 1 in the embodiment 1 in the adjustment method using the white balance coefficient for the training image 401. In this embodiment, as shown in FIG. 11, the training image 401 is first adjusted using the white balance coefficient, and then input to the neural network. Then, gamma correction is performed on the output image 411. The correct image 421 is the same as in the embodiment 1, and the correct image 421 is adjusted using the white balance coefficient, and then gamma correction is performed. Then, the difference (error) between the output image 411 after gamma correction and the correct image 421 is calculated. Note that clipping processing after white balance is performed as necessary. Also, the filter 402, the first feature map 403, and the filter 404 are the same as the filter 202, the first feature map 203, and the filter 204 in FIG. 1, respectively, and therefore their description will be omitted.

本実施例において、学習工程は学習装置３０１において実行され、推定工程は画像推定部３２３によって実行される。なお、本実施例における推定工程は、実施例１の図８に示されるフローチャートと同様の処理であるため、その説明は省略する。 In this embodiment, the learning process is executed by the learning device 301, and the estimation process is executed by the image estimation unit 323. Note that the estimation process in this embodiment is the same process as that shown in the flowchart in FIG. 8 of the first embodiment, and therefore will not be described here.

次に、図１２を参照して、本実施例における学習装置３０１により実行されるネットワークパラメータの学習方法（学習工程、学習済みモデルの製造方法）に関して説明する。図１２は、ネットワークパラメータの学習（学習工程）に関するフローチャートである。図１２の各ステップは、主に、学習装置３０１の取得部３１２、算出部３１３、更新部３１４、および生成部３１５により実行される。なお、図１２のステップＳ３０１、Ｓ３０２は、図５のステップＳ１０１、Ｓ１０２とそれぞれ同様の処理であるため、それらの説明は省略する。 Next, with reference to FIG. 12, a method for learning network parameters (learning process, method for manufacturing a trained model) executed by the learning device 301 in this embodiment will be described. FIG. 12 is a flowchart related to learning network parameters (learning process). Each step in FIG. 12 is mainly executed by the acquisition unit 312, calculation unit 313, update unit 314, and generation unit 315 of the learning device 301. Note that steps S301 and S302 in FIG. 12 are similar processes to steps S101 and S102 in FIG. 5, respectively, and therefore descriptions thereof will be omitted.

続いてステップＳ３０３において、生成部３１５は、ステップＳ３０２にて取得したホワイトバランス係数を用いて、訓練画像４０１および正解画像４２１を補正する。本実施例におけるホワイトバランス係数による調整は、実施例１と同様に、式（２）～（４）を用いて行われる。続いて、生成部３１５は、ホワイトバランス係数を適用した訓練画像４０１および正解画像４２１に対して、必要に応じてクリッピング処理を行う。クリッピング処理では、ホワイトバランス係数による調整を行う前の訓練画像４０１または正解画像４２１において、輝度飽和値に達した画素がある場合、輝度飽和値に置き換える。輝度飽和値は、色ごとに異なっていてもよいし、同じ値としてもよい。 Next, in step S303, the generation unit 315 corrects the training image 401 and the correct answer image 421 using the white balance coefficient acquired in step S302. The adjustment using the white balance coefficient in this embodiment is performed using equations (2) to (4) as in the first embodiment. Next, the generation unit 315 performs clipping processing as necessary on the training image 401 and the correct answer image 421 to which the white balance coefficient has been applied. In the clipping processing, if there is a pixel that has reached the brightness saturation value in the training image 401 or the correct answer image 421 before the adjustment using the white balance coefficient is performed, it is replaced with the brightness saturation value. The brightness saturation value may be different for each color, or may be the same value.

続いてステップＳ３０４において、生成部３１５は、ステップＳ３０３にて調整した訓練画像４０１をニューラルネットワークへ入力して出力画像４１１を生成する。本実施例では、実施例１と同様にミニバッチ学習で実行するが、バッチ学習やオンライン学習で実行してもよい。また、本実施例の活性化関数としてはＲｅＬＵを用いるが、シグモイド関数やハイパボリックタンジェント関数を用いてもよい。 Next, in step S304, the generation unit 315 inputs the training image 401 adjusted in step S303 to the neural network to generate an output image 411. In this embodiment, mini-batch learning is performed as in the first embodiment, but batch learning or online learning may also be performed. In addition, ReLU is used as the activation function in this embodiment, but a sigmoid function or a hyperbolic tangent function may also be used.

ステップＳ３０５以降の工程については、実施例１のステップＳ１０５以降と同様であるため、それらの説明は省略する。以上が本実施例にて実行される学習工程となる。このように、ニューラルネットワークに入力する前の訓練画像４０１に対して、ホワイトバランスによる調整を実施してもよく、実施例１と同様に、色ごとの推定精度のばらつきを低減することが可能なネットワークパラメータを生成することができる。 The steps from step S305 onwards are the same as those from step S105 onwards in the first embodiment, and so a description thereof will be omitted. The above is the learning process executed in this embodiment. In this way, a white balance adjustment may be performed on the training image 401 before it is input to the neural network, and network parameters that can reduce the variation in estimation accuracy for each color can be generated, as in the first embodiment.

なお本実施例では、訓練画像４０１に対して式（２）～（４）を用いてホワイトバランス係数を用いた調整処理を実行したが、別の方法を利用してもよい。例えば、式（２）～（４）を利用する代わりに、訓練画像４０１と一緒にホワイトバランス係数の情報をニューラルネットワークに入力してもよい。この場合、ニューラルネットワークに入力できるようにホワイトバランス係数をマップ化する。例えば、訓練画像４０１がＲＧ１Ｇ２Ｂの４チャンネルである場合、ホワイトバランス係数のマップ（ＷＢマップ）もＲＧ１Ｇ２Ｂの４チャンネルとし、それぞれの１チャンネルあたりの要素数（画素数）は訓練画像４０１と等しくする。そして、訓練画像４０１とＷＢマップをチャンネル方向に規定の順序で連結する。このように連結した訓練画像４０１とＷＢマップを入力データとしてニューラルネットワークに入力し、出力画像４１１を得る。この場合、ホワイトバランス係数はニューラルネットワークへの入力データとしているため、出力画像４１１および正解画像４２１に対してはホワイトバランス係数を用いた調整処理は不要となる。そして、出力画像４１１および正解画像４２１に対してガンマ補正を行い、ガンマ補正後の出力画像４１１および正解画像４２１を用いて誤差を算出する。このとき、低輝度部に対して高輝度部の推定精度を優先する場合、ガンマ補正を実施しなくてもよい。 In this embodiment, the adjustment process using the white balance coefficients is performed on the training image 401 using equations (2) to (4), but other methods may be used. For example, instead of using equations (2) to (4), information on the white balance coefficients may be input to the neural network together with the training image 401. In this case, the white balance coefficients are mapped so that they can be input to the neural network. For example, if the training image 401 has four channels of RG1G2B, the map of the white balance coefficients (WB map) is also four channels of RG1G2B, and the number of elements (number of pixels) per channel is set to be equal to that of the training image 401. Then, the training image 401 and the WB map are linked in a specified order in the channel direction. The linked training image 401 and the WB map are input to the neural network as input data to obtain the output image 411. In this case, since the white balance coefficients are input to the neural network, adjustment processes using the white balance coefficients are not required for the output image 411 and the correct answer image 421. Then, gamma correction is performed on the output image 411 and the correct image 421, and an error is calculated using the gamma-corrected output image 411 and the correct image 421. At this time, if the estimation accuracy of high luminance areas is prioritized over low luminance areas, gamma correction does not need to be performed.

このように、ホワイトバランス係数の利用方法としては、訓練画像４０１と一緒にニューラルネットワークに入力するという方法もあり、こうした方法でも色ごとの推定精度のばらつきを抑制したネットワークパラメータを生成することができる。なお、このネットワークパラメータを用いて推定処理を行う場合、同様に撮影画像とホワイトバランス係数をチャンネル方向に連結して、ニューラルネットワークに入力することで色ごとに推定精度のばらつきが少ない推定画像を生成することができる。また、訓練画像４０１とＷＢマップを連結して入力する場合について説明したが、ニューラルネットワークへの入力方法はこれに限らない。訓練画像４０１またはＷＢマップの一方のみをニューラルネットワークの第１層に入力し、第１層または何層かを経た後の出力である特徴マップと、第１層に入力しなかったもう一方をチャンネル方向に連結して、ニューラルネットワークの後続の層へ入力してもよい。また、ニューラルネットワークの入力部分を分岐させ、訓練画像４０１とＷＢマップをそれぞれ異なる層で特徴マップに変換し、それらの特徴マップを連結して後続の層へ入力してもよい。このような方法でも、同様に色ごとの推定精度のばらつきを抑制したネットワークパラメータを生成することができる。 In this way, the white balance coefficient can be used by inputting it into the neural network together with the training image 401, and this method can also generate network parameters that suppress the variation in estimation accuracy for each color. When performing estimation processing using this network parameter, the captured image and the white balance coefficient can be similarly linked in the channel direction and input to the neural network to generate an estimated image with less variation in estimation accuracy for each color. In addition, although the case where the training image 401 and the WB map are linked and input has been described, the input method to the neural network is not limited to this. Only one of the training image 401 or the WB map may be input to the first layer of the neural network, and the feature map that is the output after passing through the first layer or several layers and the other one that was not input to the first layer may be linked in the channel direction and input to the subsequent layer of the neural network. Also, the input part of the neural network may be branched, and the training image 401 and the WB map may be converted into feature maps in different layers, and these feature maps may be linked and input to the subsequent layer. This method can also generate network parameters that suppress variation in estimation accuracy for each color.

以上のように、各実施形態の画像処理方法は、ホワイトバランスに関する情報を取得する取得工程と、ニューラルネットワークのパラメータを更新する更新工程とを有する。更新工程において、ホワイトバランスに関する情報に基づいて、学習結果が調整されるように訓練画像をニューラルネットワークに入力して出力画像を生成し、出力画像と正解画像との差に基づきニューラルネットワークのパラメータを更新する。 As described above, the image processing method of each embodiment includes an acquisition step of acquiring information related to white balance, and an update step of updating the parameters of the neural network. In the update step, a training image is input to the neural network to generate an output image so that the learning result is adjusted based on the information related to white balance, and the parameters of the neural network are updated based on the difference between the output image and the correct image.

好ましくは、更新工程において、正解画像および出力画像に対してホワイトバランスに関する情報を用いてホワイトバランス調整を行い、ホワイトバランス調整後の正解画像および出力画像を用いて差を算出する。より好ましくは、更新工程において、ホワイトバランス調整後の正解画像および出力画像に対してガンマ補正を行う。 Preferably, in the update process, white balance adjustment is performed on the master image and the output image using information related to white balance, and the difference is calculated using the master image and the output image after the white balance adjustment. More preferably, in the update process, gamma correction is performed on the master image and the output image after the white balance adjustment.

好ましくは、更新工程において、訓練画像または出力画像と正解画像との色成分ごとの差を算出し、色成分ごとの差に対してホワイトバランスに関する情報に基づく重み付けを行って加算することにより出力画像と正解画像との差を算出する。 Preferably, in the update process, the difference between the training image or output image and the correct image for each color component is calculated, and the difference between the output image and the correct image is calculated by weighting the difference for each color component based on information related to white balance and adding them up.

好ましくは、更新工程において、訓練画像および正解画像に対してホワイトバランスに関する情報を用いてホワイトバランス調整を行い、ホワイトバランス調整後の訓練画像および正解画像を用いて差を算出する。より好ましくは、更新工程において、ホワイトバランス調整後の訓練画像をニューラルネットワークへ入力して出力画像を生成し、ホワイトバランス調整後の正解画像および出力画像に対してガンマ補正を行う。 Preferably, in the update process, white balance adjustment is performed on the training image and the correct answer image using information related to white balance, and the difference is calculated using the training image and the correct answer image after the white balance adjustment. More preferably, in the update process, the training image after the white balance adjustment is input to a neural network to generate an output image, and gamma correction is performed on the correct answer image and the output image after the white balance adjustment.

（その他の実施例）
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Other Examples
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

各実施例によれば、色ごとの推定精度のばらつきを低減したニューラルネットワークを取得可能な画像処理方法、画像処理装置、画像処理プログラム、記憶媒体、および学習済みモデルの製造方法を提供することができる。 According to each embodiment, it is possible to provide an image processing method, an image processing device, an image processing program, a storage medium, and a method for manufacturing a trained model that can obtain a neural network that reduces the variation in estimation accuracy for each color.

以上、本発明の好ましい実施例について説明したが、本発明はこれらの実施例に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 The above describes preferred embodiments of the present invention, but the present invention is not limited to these embodiments, and various modifications and variations are possible within the scope of the gist of the invention.

１０１：学習装置（画像処理装置）
１０１ｂ：取得部
１０１ｄ：更新部 101: Learning device (image processing device)
101b: Acquisition unit 101d: Update unit

Claims

An acquisition step of acquiring a training image, a ground truth image, and information regarding white balance;
A generating step of generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
an update step of inputting the color component image into a neural network to generate an output image composed only of color components corresponding to the plurality of color components, obtaining a first error for each color component between the output image and the correct image, obtaining a second error based on the first error and information related to the white balance, and updating parameters of the neural network based on the second error .

An acquisition step of acquiring a training image, a ground truth image, and information regarding white balance;
A generating step of generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
an update step of inputting the color component image into a neural network to generate a first output image composed only of each color component corresponding to the plurality of color components, generating a second output image for each color component based on the first output image and information related to the white balance, and updating parameters of the neural network based on an error between the second output image and the correct image.

The image processing method according to claim 1, characterized in that the training images and the ground truth images each have a plurality of color components arranged periodically.

3. The image processing method according to claim 2, wherein in the updating step , gamma correction is performed on the correct image and the second output image.

5. The image processing method according to claim 2, wherein in the updating step , a clipping process is performed within a predetermined range on the luminance values of at least one of the original image and the second output image.

6. The image processing method according to claim 1, wherein the information regarding the white balance is acquired from additional information of the training image or the correct image.

7. The image processing method according to claim 1, wherein the information regarding white balance is a white balance coefficient.

8. An image processing method comprising the step of generating an estimated image by inputting an input image to a neural network having the parameters updated using the image processing method according to claim 1 .

9. The image processing method according to claim 8, wherein the estimated image is an image generated by subjecting the input image to at least one of upsampling, denoising, removal of compression noise, deblurring, inpainting, demosaicing, dehazing, high gradation, and relighting.

an acquisition unit that acquires training images, a correct answer image, and information regarding white balance;
a generating unit for generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
an update unit that generates an output image composed only of color components corresponding to the plurality of color components by inputting the color component image into a neural network, obtains a first error for each color component between the output image and the correct image, obtains a second error based on the first error and information related to white balance, and updates parameters of the neural network based on the second error .

an acquisition unit that acquires training images, a correct answer image, and information regarding white balance;
a generating unit for generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
an update unit that generates a first output image composed only of each color component corresponding to the plurality of color components by inputting the color component image into a neural network, generates a second output image of each color component based on the first output image and information related to the white balance, and updates parameters of the neural network based on an error between the second output image and the correct image.

10. An image processing program for causing a computer to execute the image processing method according to claim 1.

A storage medium storing the image processing program according to claim 12 .

An acquisition step of acquiring a training image, a ground truth image, and information regarding white balance;
A generating step of generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
a first error for each color component between the output image and the correct image, a second error based on the first error and information related to the white balance, and an updating step for updating parameters of the neural network based on the second error .

An acquisition step of acquiring a training image, a ground truth image, and information regarding white balance;
A generating step of generating a color component image composed only of color components corresponding to the plurality of color components of the training image;
a first output image composed only of each color component corresponding to the plurality of color components by inputting the color component image into a neural network, a second output image for each color component based on the first output image and information relating to the white balance, and an update step of updating parameters of the neural network based on an error between the second output image and the correct image.