JP2018195069A

JP2018195069A - Image processing apparatus and image processing method

Info

Publication number: JP2018195069A
Application number: JP2017098231A
Authority: JP
Inventors: 良範木村; Yoshinori Kimura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-05-17
Filing date: 2017-05-17
Publication date: 2018-12-06
Anticipated expiration: 2037-05-17
Also published as: US20180336662A1; JP6957197B2

Abstract

To set network parameters of a convolutional neural network capable of restoring high frequency components of a high-resolution image with high accuracy.SOLUTION: An image processing apparatus 103 calculates an error between an estimated image obtained by giving an input image to a convolutional neural network and a training image corresponding to the input image, weights a frequency component of the error, calculates a gradient from the weighted error, and sets network parameters of the convolutional neural network using the gradient.SELECTED DRAWING: Figure 2

Description

本発明は、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）を用いた超解像手法であるＳＲＣＮＮにおいて、高精度に高周波成分を回復する画像処理技術に関する。 The present invention relates to an image processing technique for recovering high-frequency components with high accuracy in SRCNN, which is a super-resolution technique using a convolutional neural network (CNN).

ＳＲＣＮＮは、非特許文献１にて開示されているように、ＣＮＮを用いて低解像度画像から高解像度画像を生成する超解像（super-resolution：ＳＲ）手法である。ＣＮＮは、入力画像に対してフィルタの畳み込み（convolution）を行った後に非線形処理を繰り返すことで、目的とする出力画像を生成する画像処理手法である。 SRCNN is a super-resolution (SR) technique for generating a high-resolution image from a low-resolution image using CNN, as disclosed in Non-Patent Document 1. CNN is an image processing method for generating a target output image by repeating nonlinear processing after performing convolution of a filter on an input image.

フィルタは後述する訓練画像から学習によって生成され、一般に複数存在する。また、入力画像に対してフィルタの畳み込みを行った後に非線形処理を行うことで得られる複数の画像は特徴マップ（feature map）と呼ばれる。さらに、入力画像に対してフィルタの畳み込みを行った後に非線形処理を行う一連の処理は、層（layer）という単位で表現される。例えば、１層目のフィルタや２層目の特徴マップ等と呼ばれる。例えばフィルタの畳み込みと非線形処理を３回繰り返すＣＮＮは、３層のネットワーク構造と呼ばれる。 A filter is generated by learning from a training image to be described later, and there are generally a plurality of filters. A plurality of images obtained by performing non-linear processing after performing filter convolution on an input image is called a feature map. Further, a series of processes for performing non-linear processing after performing filter convolution on the input image is expressed in units of layers. For example, it is called a first layer filter or a second layer feature map. For example, a CNN that repeats filter convolution and nonlinear processing three times is called a three-layer network structure.

ＣＮＮは、以下のように定式化できる。 CNN can be formulated as follows.

式（１）において、Ｗ_ｎはｎ層目のフィルタであり、ｂ_ｎはｎ層目のバイアスである。ｆは非線形処理演算子であり、Ｘ_ｎはｎ層目の特徴マップであり、＊は畳み込み演算子である。右辺の（ｌ）はｌ番目のフィルタまたは特徴マップを表している。また、非線形処理としては、従来のシグモイド関数（sigmoid function）や収束性に優れたＲｅＬＵ（Rectified Linear Unit）が用いられる。ＲｅＬＵは、 In Expression (1), W _n is an n-th layer filter, and b _n is an n-th layer bias. f is a nonlinear processing operator, _Xn is a feature map of the nth layer, and * is a convolution operator. (L) on the right side represents the l-th filter or feature map. As the nonlinear processing, a conventional sigmoid function or a ReLU (Rectified Linear Unit) excellent in convergence is used. ReLU is

により与えられる。すなわち、入力ベクトルＺの要素のうち負のものに対しては０を出力し、正のものに対してはそのままＺを出力する非線形な処理である。 Given by. That is, this is a non-linear process that outputs 0 for negative elements of the input vector Z and outputs Z as it is for positive elements.

超解像とは、画素が粗い（画素サイズが大きい）イメージセンサで取得した低解像度画像から元の高解像度画像を生成（推定）する画像処理である。超解像には、光学像を形成する光学系と該光学像を光電変換するイメージセンサの画素開口で失われる高解像度画像の高周波成分を高精度に回復すること（ぼけを除去して鮮鋭化すること）が求められる。 Super-resolution is image processing for generating (estimating) an original high-resolution image from a low-resolution image acquired by an image sensor with coarse pixels (large pixel size). For super-resolution, the high-frequency components of the high-resolution image lost at the pixel aperture of the optical system that forms the optical image and the image sensor that photoelectrically converts the optical image are recovered with high accuracy (removing blur and sharpening). To do).

ＳＲＣＮＮでは、まず低解像度な訓練画像とこれに対応する高解像度な訓練画像（正解画像）からなる訓練画像の組を用意する。次に、低解像度な入力画像を高解像度な変換画像に高精度に変換することができるように、訓練画像の組を用いてＣＮＮのネットワークパラメータ（前述したフィルタおよびバイアス）を学習により設定する。ＣＮＮのネットワークパラメータの学習は、以下のように定式化できる。 In SRCNN, a set of training images including a low-resolution training image and a corresponding high-resolution training image (correct image) is first prepared. Next, CNN network parameters (the above-described filter and bias) are set by learning using a set of training images so that a low-resolution input image can be converted into a high-resolution converted image with high accuracy. Learning CNN network parameters can be formulated as follows.

式（３）において、Ｗはフィルタであり、Ｌは損失関数（loss function）、ηは学習率である。損失関数は、低解像度訓練画像をＣＮＮに入力した際に、得られる高解像度の推定画像定と正解画像との誤差を評価する関数である。また、学習率ηは、勾配法（gradient descent method）におけるステップ幅と同じ役割を有する。また、各フィルタに関する損失関数の勾配は、微分の連鎖律により求められる。また、式（３）は、フィルタに対する学習を示したが、バイアスに対しても同様である。 In Equation (3), W is a filter, L is a loss function, and η is a learning rate. The loss function is a function that evaluates an error between a high-resolution estimated image constant and a correct image obtained when a low-resolution training image is input to the CNN. The learning rate η has the same role as the step width in the gradient descent method. In addition, the slope of the loss function for each filter is obtained by the differential chain law. Equation (3) shows learning for the filter, but the same applies to the bias.

式（３）は、推定画像と正解画像との誤差が小さくなるようにネットワークパラメータを更新する学習法を表す。この学習法は、誤差逆伝搬法（back propagation method）と呼ばれる。また、損失関数については、後に本発明の実施例において詳細に説明する。 Expression (3) represents a learning method for updating the network parameter so that the error between the estimated image and the correct image becomes small. This learning method is called a back propagation method. The loss function will be described in detail later in the embodiments of the present invention.

次にＳＲＣＮＮでは、学習により生成したＣＮＮのネットワークパラメータを用いて、式（１）に従って任意の低解像度画像から高解像度画像を生成する超解像処理を行う。 Next, SRCNN performs super-resolution processing for generating a high-resolution image from an arbitrary low-resolution image according to Expression (1) using the network parameters of CNN generated by learning.

ＳＲＣＮＮの学習には反復計算を要し、一般には時間がかかる。しかし、一度ネットワークパラメータを学習すれば、それを用いて高速に超解像処理を行える。また、ＳＲＣＮＮは高い汎化性能、すなわち学習に用いていない画像でも良好に超解像できる性質を有する。これにより、ＳＲＣＮＮは他の技術と比べて高速かつ高精度な超解像処理を可能とする。 SRCNN learning requires iterative calculation and generally takes time. However, once network parameters are learned, they can be used to perform super-resolution processing at high speed. In addition, SRCNN has a high generalization performance, that is, a property that can be super-resolved well even with an image that is not used for learning. Thereby, SRCNN enables high-resolution and high-precision super-resolution processing as compared with other technologies.

Chao Dong、Chen Change Loy、Kaiming He、Xiaoou Tang、「Image super-resolution using deep convolutional networks」、IEEE Transactions on Pattern Analysis and Machine Intelligence、アメリカ合衆国、2015、pp.295-307Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, `` Image super-resolution using deep convolutional networks '', IEEE Transactions on Pattern Analysis and Machine Intelligence, USA, 2015, pp.295-307

特開２０１４−１９５３３３号公報JP 2014-195333 A

ＳＲＣＮＮでは、高解像度画像の高周波成分を高精度に回復することができない。これは、ＳＲＣＮＮが用いている損失関数を見ると明らかである。ＳＲＣＮＮが用いている損失関数は、以下で与えられる。 SRCNN cannot recover high-frequency components of high-resolution images with high accuracy. This is apparent from the loss function used by SRCNN. The loss function used by SRCNN is given below.

式（４）式において、Ｘは低解像度訓練画像をＣＮＮに入力した際に、得られる高解像度な推定画像であり、Ｙは入力した低解像度訓練画像に対応する高解像度な訓練画像（正解画像）である。
In Equation (4), X is a high-resolution estimated image obtained when a low-resolution training image is input to the CNN, and Y is a high-resolution training image (correct image) corresponding to the input low-resolution training image. ).

はＬ２ノルムであり、簡単にはベクトルＺの要素の２乗和の平方根である。式（４）は、ともに高解像度な推定画像と正解画像の誤差として、両画像の差分の２乗和を用いる。 Is the L2 norm, which is simply the square root of the sum of squares of the elements of the vector Z. Equation (4) uses the sum of squares of the difference between both images as the error between the high-resolution estimated image and the correct image.

式（４）は、周波数的には、低周波成分から高周波成分までに等しい重みを付けて、高解像度な推定画像と正解画像との差分をとっていることと等価である。しかし、一般に自然画像は低周波成分を主として含み、高周波成分はそれに比べて少ないため、このように誤差を評価すると、高解像度な推定画像における高周波成分の回復を評価することができない。つまり、損失関数は、高解像度画像の推定において低周波成分さえ回復できれば誤差が小さくなるために、高周波成分を回復しない関数であると言える。 Equation (4) is equivalent to taking the difference between the high-resolution estimated image and the correct image by assigning equal weight from the low-frequency component to the high-frequency component in terms of frequency. However, in general, a natural image mainly includes low-frequency components, and there are few high-frequency components. Therefore, when errors are evaluated in this way, recovery of high-frequency components in a high-resolution estimated image cannot be evaluated. That is, it can be said that the loss function is a function that does not recover the high frequency component because the error becomes small if only the low frequency component can be recovered in the estimation of the high resolution image.

以上の理由から、ＳＲＣＮＮが用いている損失関数により学習したＣＮＮのネットワークパラメータでは、高解像度画像の高周波成分を高精度に回復することができない。 For the above reasons, the CNN network parameters learned by the loss function used by SRCNN cannot recover the high-frequency component of the high-resolution image with high accuracy.

本発明は、高解像度画像の高周波成分を高精度に回復することが可能なＣＮＮのネットワークパラメータを設定できるようにした画像処理装置および画像処理方法等を提供する。 The present invention provides an image processing apparatus, an image processing method, and the like that can set a network parameter of a CNN that can recover a high-frequency component of a high-resolution image with high accuracy.

本発明の一側面としての画像処理装置は、畳み込みニューラルネットワークに入力画像を与えることで得られる推定画像と前記入力画像に対応する訓練画像との誤差を算出し、該誤差の周波数成分に対して重み付けを行う誤差重み付け手段と、重み付けが行われた誤差から勾配を算出し、該勾配を用いて畳み込みニューラルネットワークのネットワークパラメータを設定するパラメータ設定手段とを有することを特徴とする。なお、上記画像処理装置を備えた撮像装置も、本発明の他の一側面を構成する。 An image processing apparatus according to an aspect of the present invention calculates an error between an estimated image obtained by providing an input image to a convolutional neural network and a training image corresponding to the input image, and calculates the frequency component of the error. An error weighting unit that performs weighting and a parameter setting unit that calculates a gradient from the weighted error and sets a network parameter of the convolutional neural network using the gradient are characterized. Note that an imaging apparatus provided with the image processing apparatus also constitutes another aspect of the present invention.

また、本発明の他の一側面としての画像処理方法は、畳み込みニューラルネットワークに入力画像を与えることで得られる推定画像と入力画像に対応する訓練画像との誤差を算出し、該誤差の周波数成分に対して重み付けを行うステップと、重み付けが行われた誤差から勾配を算出し、該勾配を用いて畳み込みニューラルネットワークのネットワークパラメータを設定するステップとを有することを特徴とする。なお、上記画像処理方法をコンピュータに実行させる画像処理プログラムも、本発明の他の一側面を構成する。 An image processing method according to another aspect of the present invention calculates an error between an estimated image obtained by giving an input image to a convolutional neural network and a training image corresponding to the input image, and a frequency component of the error And a step of calculating a gradient from the weighted error and setting a network parameter of the convolutional neural network using the gradient. An image processing program for causing a computer to execute the image processing method also constitutes another aspect of the present invention.

本発明によれば、ＣＮＮを用いた超解像手法であるＳＲＣＮＮにおいて、高精度に高周波成分を回復することができる。 According to the present invention, high frequency components can be recovered with high accuracy in SRCNN, which is a super-resolution technique using CNN.

本発明の実施例である画像処理装置を備えた撮像装置の構成を示すブロック図。1 is a block diagram illustrating a configuration of an imaging apparatus including an image processing apparatus that is an embodiment of the present invention. 上記画像処理装置が行う画像処理方法を示すフローチャート。6 is a flowchart illustrating an image processing method performed by the image processing apparatus. 本発明の実施例１に用いるステップ関数状の重み付け係数を説明する図。The figure explaining the weighting coefficient of the step function form used for Example 1 of this invention. 実施例１の効果を説明する数値計算結果を示す図。The figure which shows the numerical calculation result explaining the effect of Example 1. FIG. 従来技術の効果を説明する数値計算結果を示す図。The figure which shows the numerical calculation result explaining the effect of a prior art. 実施例１と従来技術の周波数領域の比較図。FIG. 3 is a comparison diagram of the frequency domain of the first embodiment and the prior art. 本発明の実施例２に用いる線形関数状の重み付け係数を説明する図。The figure explaining the weighting coefficient of a linear function form used for Example 2 of this invention. 実施例２の効果を説明する数値計算結果を示す図。The figure which shows the numerical calculation result explaining the effect of Example 2. FIG.

以下、本発明の実施例について図面を参照しながら説明する。 Embodiments of the present invention will be described below with reference to the drawings.

まず本発明の具体的な実施例（数値例）を説明する前に、本発明の代表的な実施例について説明する。図１には、本発明の実施例である画像処理装置１０３を備えた撮像装置１００の構成を示している。 Before describing specific examples (numerical examples) of the present invention, typical examples of the present invention will be described. FIG. 1 shows a configuration of an imaging apparatus 100 including an image processing apparatus 103 that is an embodiment of the present invention.

撮像装置１００は、結像光学系１０１と、撮像素子１０２と、画像処理装置１０３とを有する。結像光学系１０１は、撮像素子１０２の撮像面上に光学像（被写体像）を形成する。結像光学系１０１は、１又は複数のレンズにより構成されてもよいし、反射鏡、屈折率分布素子またはＤＭＤ（Digital Mirror Device）等の光学素子により構成されてもよい。結像光学系１０１の結像特性は、未知でもよいが、既知であることが望ましい。結像特性は、画角、被写体距離、波長および輝度等の条件に対する光学像のぼけを点像強度分布（Point Spread Function：ＰＳＦ）で表したものである。また、結像光学系１０１の働きは、画像処理的にはＰＳＦの畳み込み積分で与えられる。 The imaging apparatus 100 includes an imaging optical system 101, an imaging element 102, and an image processing apparatus 103. The imaging optical system 101 forms an optical image (subject image) on the imaging surface of the image sensor 102. The imaging optical system 101 may be configured by one or a plurality of lenses, or may be configured by an optical element such as a reflecting mirror, a refractive index distribution element, or a DMD (Digital Mirror Device). The imaging characteristics of the imaging optical system 101 may be unknown, but are desirably known. The imaging characteristics represent a blur of an optical image with respect to conditions such as an angle of view, a subject distance, a wavelength, and luminance as a point spread function (PSF). The function of the imaging optical system 101 is given by the PSF convolution integral in image processing.

撮像素子１０２は、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサにより構成され、その撮像面上に形成された被写体像を光電変化し、該被写体像の光強度に応じた電気信号を出力する。撮像素子１０２は、ＣＭＯＳイメージセンサに限らず、光強度に応じた電気信号が出力できれば、他のものであってもよい。例えば、ＣＣＤ（Charge Coupled Device）イメージセンサを用いてもよい。また、撮像素子１０２の働きは、画像処理的には、高解像度な光学像の光電変換により得られる複数画素が１画素の広がり（開口）により平均化されて低解像度画像における１画素となるダウンサンプリングで与えられる。 The imaging element 102 is configured by a complementary metal oxide semiconductor (CMOS) image sensor, photoelectrically changes a subject image formed on the imaging surface, and outputs an electrical signal corresponding to the light intensity of the subject image. The image sensor 102 is not limited to a CMOS image sensor, and may be any other device as long as an electrical signal corresponding to the light intensity can be output. For example, a CCD (Charge Coupled Device) image sensor may be used. Also, the function of the image sensor 102 is that, in terms of image processing, a plurality of pixels obtained by photoelectric conversion of a high-resolution optical image are averaged by the spread (opening) of one pixel to become one pixel in a low-resolution image. Given by sampling.

画像処理装置１０３は、パーソナルコンピュータやワークステーション等の演算装置により構成され、撮像素子１０２が出力する電気信号を用いて生成される撮像画像を入力画像として後述する画像処理を行う。画像処理装置１０３は、不図示の内部記憶部に格納されたコンピュータプログラムである画像処理プログラム（アプリケーション）を実行してもよいし、該プログラムを回路として実装したボード等を有していてもよい。また、外部記憶媒体（半導体メモリや光ディスク等）に記憶された画像処理プログラムを読み込んで画像処理を実行してもよい。 The image processing apparatus 103 is configured by a computing device such as a personal computer or a workstation, and performs image processing, which will be described later, using a captured image generated using an electrical signal output from the imaging element 102 as an input image. The image processing apparatus 103 may execute an image processing program (application) that is a computer program stored in an internal storage unit (not shown), or may include a board on which the program is mounted as a circuit. . In addition, image processing may be executed by reading an image processing program stored in an external storage medium (semiconductor memory, optical disk, or the like).

撮像装置１００は、結像光学系１０１が撮像素子１０２と一体に設けられた光学系一体型であってもよいし、結像光学系１０１の着脱交換が可能な光学系交換型であってもよい。光学系交換型である場合は、後述する画像処理で用いるパラメータ（ＣＮＮのネットワークパラメータ）として、使用する結像光学系１０１に適したものを用いる必要がある。これは、該パラメータは、結像光学系１０１の結像特性に応じて設定されるべきだからである。 The imaging apparatus 100 may be an optical system integrated type in which the imaging optical system 101 is provided integrally with the imaging element 102, or an optical system exchange type in which the imaging optical system 101 can be attached and detached. Good. In the case of the optical system exchange type, it is necessary to use a parameter suitable for the imaging optical system 101 to be used as a parameter (CNN network parameter) used in image processing to be described later. This is because the parameter should be set according to the imaging characteristics of the imaging optical system 101.

次に、画像処理装置１０３が実行する画像処理（方法）について、図２に示したフローチャートを用いて説明する。なお、Ｓはステップまたは処理を示す。画像処理装置１０３は、誤差重み付け手段およびパラメータ設定手段として機能する。 Next, image processing (method) executed by the image processing apparatus 103 will be described with reference to the flowchart shown in FIG. S indicates a step or process. The image processing apparatus 103 functions as an error weighting unit and a parameter setting unit.

ステップＳ２０１では、画像処理装置１０３は、入力画像としての低解像度訓練画像とこれに対応する高解像度訓練画像（正解画像）からなる訓練画像の組を準備する。なお、結像光学系１０１の結像特性が既知である場合は、計算機を用いたシミュレーションにより、高解像度訓練画像から低解像度訓練画像を生成してもよい。すなわち、高解像度訓練画像に対して結像光学系１０１の結像特性であるＰＳＦを畳み込み、得られる光学像に撮像素子１０２の影響を付加（ダウンサンプリング）することで、低解像度訓練画像を生成してもよい。 In step S201, the image processing apparatus 103 prepares a set of training images including a low-resolution training image as an input image and a corresponding high-resolution training image (correct image). When the imaging characteristics of the imaging optical system 101 are known, a low resolution training image may be generated from a high resolution training image by simulation using a computer. That is, the PSF that is the imaging characteristic of the imaging optical system 101 is convolved with the high-resolution training image, and the effect of the image sensor 102 is added to the resulting optical image (downsampling), thereby generating a low-resolution training image. May be.

また、結像光学系１０１の結像特性が未知である場合は、撮像装置１００を用いて既知の高解像度パターン（バーチャート等）を撮像することで、低解像度訓練画像を生成してもよい。 When the imaging characteristics of the imaging optical system 101 are unknown, a low-resolution training image may be generated by imaging a known high-resolution pattern (such as a bar chart) using the imaging device 100. .

また、各訓練画像はカラー画像でもモノクロ画像でもよいが、以下の説明では、各訓練画像をモノクロ画像として説明する。訓練画像がカラー画像である場合は、カラーチャンネルごとに又はカラー画像の輝度成分にのみ、以下で説明する画像処理を適用すればよい。 Each training image may be a color image or a monochrome image. In the following description, each training image is described as a monochrome image. When the training image is a color image, the image processing described below may be applied for each color channel or only for the luminance component of the color image.

さらに、本実施例では、非特許文献１に倣って、低解像度訓練画像をバイキュービック（Bicubic）補間して、高解像度訓練画像と同じサイズにしている。例えば、２倍超解像を行う場合、低解像度画像は高解像度画像と比べてそのサイズが半分になるが、補間処理により低解像度画像のサイズを２倍にして両訓練画像のサイズを揃える。 Further, in this embodiment, following Non-Patent Document 1, the low-resolution training image is bicubic-interpolated to have the same size as the high-resolution training image. For example, when super-resolution is performed twice, the size of the low-resolution image is half that of the high-resolution image, but the size of both training images is made uniform by doubling the size of the low-resolution image by interpolation processing.

ステップＳ２０２では、画像処理装置１０３は、訓練画像から畳み込みニューラルネットワーク（ＣＮＮ）のネットワークパラメータを学習する。この際、損失関数として以下の式（５）で与えられる関数を用いる。 In step S202, the image processing apparatus 103 learns network parameters of a convolutional neural network (CNN) from the training image. At this time, a function given by the following equation (5) is used as the loss function.

式（５）において、Ｘは低解像度訓練画像をＣＮＮに入力して得られる高解像度推定画像であり、Ｙは入力した低解像度訓練画像に対応する高解像度訓練画像（正解画像）である。Ψは高周波成分に重み付けする行列（高周波重み付け行列）であり、以下の式（６）で与えられる。 In Expression (5), X is a high-resolution estimated image obtained by inputting a low-resolution training image to the CNN, and Y is a high-resolution training image (correct image) corresponding to the input low-resolution training image. Ψ is a matrix for weighting high frequency components (high frequency weighting matrix) and is given by the following equation (6).

式（６）において、Φは周波数分解を行う離散コサイン変換（Discrete Cosine Transform:ＤＣＴ）に用いられるＤＣＴ行列であり、Γは重み付け係数行列である。重み付け係数行列Γは、ＤＣＴ行列により得られるＤＣＴ係数（離散コサイン変換係数）に重み付けする係数（重み付け係数）を対角成分に有する対角行列である。この重み付け係数の決定方法については後の実施例で詳細に説明する。 In Expression (6), Φ is a DCT matrix used for Discrete Cosine Transform (DCT) for performing frequency decomposition, and Γ is a weighting coefficient matrix. The weighting coefficient matrix Γ is a diagonal matrix having coefficients (weighting coefficients) for weighting DCT coefficients (discrete cosine transform coefficients) obtained from the DCT matrix as diagonal components. A method for determining the weighting coefficient will be described in detail in a later embodiment.

式（６）では、まず高解像度推定画像と正解画像との差分（誤差）を示す差分画像をＤＣＴ変換して得られる周波数成分ごとのＤＣＴ係数（周波数係数）のうち、所定の高周波成分に対応する高周波係数（高周波ＤＣＴ係数）に対して重み付け係数行列を適用する。これにより、高周波ＤＣＴ係数に対する重み付けを行う。さらに、式（６）では、その重み付けした高周波ＤＣＴ係数（加重高周波係数）をＤＣＴ逆変換することを意味する。言い換えると、式（６）は、自然画像にはあまり含まれていない高周波成分に重み付けすることで、高解像度推定画像の高周波成分が良好に回復しなければ、大きなペナルティが加わるようにすることを意味する。上記損失関数を用いて学習したＣＮＮのネットワークパラメータを用いることで、高精度に高周波成分を回復することが可能となる。また、学習には、式（３）で説明した誤差逆伝搬法を用いる。誤差逆伝搬法において用いる損失関数の勾配は、以下の式（７）で与えられる。 In Expression (6), first, a DCT coefficient (frequency coefficient) for each frequency component obtained by DCT conversion of a difference image indicating a difference (error) between the high-resolution estimated image and the correct image corresponds to a predetermined high-frequency component. A weighting coefficient matrix is applied to the high frequency coefficient to be performed (high frequency DCT coefficient). This weights the high frequency DCT coefficients. Further, the expression (6) means that the weighted high-frequency DCT coefficient (weighted high-frequency coefficient) is subjected to DCT inverse transform. In other words, Equation (6) weights the high-frequency components that are not included in the natural image so that a high penalty is added if the high-frequency components of the high-resolution estimated image do not recover well. means. By using the CNN network parameters learned using the loss function, it is possible to recover the high-frequency component with high accuracy. Further, the error back-propagation method described in Expression (3) is used for learning. The slope of the loss function used in the error back propagation method is given by the following equation (7).

式（７）において、Ｙ’は高周波重み付け行列Ψで重み付けした高解像度正解画像Ｙである。 In Expression (7), Y ′ is the high-resolution correct image Y weighted by the high frequency weighting matrix Ψ.

このように、本実施例では、推定誤差の高周波成分に重み付けをして、ネットワークを学習する。 Thus, in this embodiment, the network is learned by weighting the high frequency component of the estimation error.

従来、画像の高周波成分に重み付けをする超解像は行われてきたが、重み付けをした後に学習することは行われておらず、また式（５）と式（６）に示した損失関数とそれを用いた学習法（式（７））も従来にはなかったものである。 Conventionally, super-resolution for weighting high-frequency components of an image has been performed, but learning is not performed after weighting, and the loss function shown in equations (5) and (6) A learning method using the method (formula (7)) is also not available conventionally.

また、特開２０１４−１９５３３３号公報では、周波数領域または実空間において重み付けした測度を用いて、ビデオ信号の予測誤差信号の量子化誤差を評価し、周波数空間と実空間のどちらで量子化するかを選択する方法が開示されている。予測誤差信号とは、前フレームとの差分を予測した信号である。しかし、上記公報で開示されている重み付けは、エッジでは誤差を許容し、フラットであれば誤差を許容しないという本実施例とは逆の目的でなされている。しかも、周波数領域において重み付けした測度を用いてネットワークを学習することは開示されていない。 In Japanese Patent Laid-Open No. 2014-195333, the quantization error of the prediction error signal of the video signal is evaluated using a weighted measure in the frequency domain or real space, and the quantization is performed in the frequency space or real space. A method of selecting is disclosed. The prediction error signal is a signal obtained by predicting a difference from the previous frame. However, the weighting disclosed in the above publication is performed for the purpose opposite to the present embodiment in which an error is allowed at an edge and an error is not allowed if the edge is flat. In addition, learning a network using a weighted measure in the frequency domain is not disclosed.

以上の処理を予め行って学習したＣＮＮのネットワークパラメータを不図示の記憶部に記憶しておいてもよい。また、半導体メモリや光ディスク等の記憶媒体にネットワークパラメータを記憶しておき、その記憶媒体から以下の処理を行う前に記憶されたネットワークパラメータ読み込んでもよい。 The network parameters of CNN learned by performing the above processing in advance may be stored in a storage unit (not shown). Alternatively, the network parameters may be stored in a storage medium such as a semiconductor memory or an optical disk, and the stored network parameters may be read from the storage medium before performing the following processing.

ステップＳ２０３では、画像処理装置１０３は、撮像装置１００（撮像素子１０２）により取得した任意の低解像度画像（入力画像）に対して、学習したＣＮＮのネットワークパラメータを用いて高解像度画像を生成（推定）する。ここでは、式（１）で示した超解像方法を用いる。 In step S203, the image processing apparatus 103 generates (estimates) a high-resolution image using the learned CNN network parameters for an arbitrary low-resolution image (input image) acquired by the imaging apparatus 100 (imaging element 102). ) Here, the super-resolution method shown by Formula (1) is used.

また、取得した低解像度画像がカラー画像である場合は、カラーチャンネルごとに学習したＣＮＮのネットワークパラメータを用いて、カラーチャンネルごとに低解像度画像から高解像度画像を生成し、それぞれのカラーチャンネルの高解像度画像を合成すればよい。または、カラー画像の輝度成分から学習したＣＮＮのネットワークパラメータを用いて低解像度輝度画像から高解像度輝度画像を生成し、その高解像度輝度画像を、補間した色差画像と融合すればよい。 If the acquired low-resolution image is a color image, a high-resolution image is generated from the low-resolution image for each color channel using the CNN network parameters learned for each color channel. What is necessary is just to synthesize a resolution image. Alternatively, a high-resolution luminance image may be generated from the low-resolution luminance image using the CNN network parameters learned from the luminance component of the color image, and the high-resolution luminance image may be fused with the interpolated color difference image.

さらに、画像処理結果を不図示の記憶部に記憶してもよいし、不図示の表示部に表示してもよい。 Furthermore, the image processing result may be stored in a storage unit (not shown) or displayed on a display unit (not shown).

以上の処理により、撮像装置１００で取得した任意の低解像度画像から高解像度画像を生成することができる。 Through the above processing, a high resolution image can be generated from an arbitrary low resolution image acquired by the imaging apparatus 100.

次に、具体的な実施例について説明する。 Next, specific examples will be described.

実施例１では、先に説明した画像処理によって生成した超解像画像（高解像度画像）の数値計算結果を示す。 In the first embodiment, the numerical calculation result of the super-resolution image (high resolution image) generated by the image processing described above is shown.

ＣＮＮは、非特許文献１にて開示された３層のネットワーク構造を有している。１層目のフィルタサイズは９×９を６４個であり、２層目のフィルタサイズは６４×１×１×３２であり、３層目のフィルタサイズは５×５を３２個である。２層目のフィルタは、入力画像サイズをＮｙ×Ｎｘとして、１層目が出力するＮｘ×Ｎｙ×６４次元の行列をＮｘ×Ｎｙ×３２次元の行列に変換するものである。 The CNN has a three-layer network structure disclosed in Non-Patent Document 1. The filter size of the first layer is 64 of 9 × 9, the filter size of the second layer is 64 × 1 × 1 × 32, and the filter size of the third layer is 32 of 5 × 5. The second-layer filter converts an Nx × Ny × 64-dimensional matrix output from the first layer into an Nx × Ny × 32-dimensional matrix with an input image size of Ny × Nx.

１層目から３層目のフィルタの学習率はそれぞれ、１０^−４、１０^−６および１０^−８である。また、１層目から３層目のバイアスの学習率は１０^−５、１０^−７および１０^−９である。各層のフィルタの初期値は正規分布乱数で与え、各層のバイアスの初期値はゼロとした。１層目と２層目の活性化関数として、前述したＲｅＬＵを用いた。また、誤差逆伝搬の回数は３×１０^５回とした。 The learning rates of the first to third layer filters are 10 ⁻⁴ , 10 ⁻⁶ and 10 ⁻⁸ , respectively. Further, the learning rates of biases in the first to third layers are 10 ⁻⁵ , 10 ^−7, and 10 ⁻⁹ . The initial value of the filter of each layer was given by a normally distributed random number, and the initial value of the bias of each layer was zero. The ReLU described above was used as the activation function for the first and second layers. In addition, the number of back propagation errors was 3 × 10 ⁵ times.

光学系は、Ｆ値が２．８、波長が０．５５μｍかつ等倍である収差を考慮しない理想レンズとした。光学系は結像特性が既知であればどのような構成のものでもよい。本実施例では、簡単のために収差を考慮しなかった。イメージセンサは、１画素のサイズが１．５μｍであり、開口率が１００％のものを用いた。簡単のため、イメージセンサノイズは考慮しなかった。 The optical system is an ideal lens that does not consider aberrations having an F value of 2.8, a wavelength of 0.55 μm, and equal magnification. The optical system may have any configuration as long as the imaging characteristics are known. In this embodiment, for the sake of simplicity, no aberration was considered. An image sensor having a pixel size of 1.5 μm and an aperture ratio of 100% was used. For simplicity, image sensor noise was not considered.

また、超解像倍率を２倍とした。光学系が等倍で、イメージセンサの１画素サイズが１．５μｍであるので、高解像度画像の１画素サイズは０．７５μｍとなる。 Further, the super-resolution magnification was set to 2 times. Since the optical system has the same magnification and the pixel size of the image sensor is 1.5 μm, the pixel size of the high resolution image is 0.75 μm.

訓練画像は、３２×３２画素のモノクロの高解像度訓練画像と低解像度訓練画像との組が計１５０００組集まったものである。低解像度訓練画像は、前述したＦ値２．８、波長０．５５μｍ、等倍という光学条件および１画素サイズが１．５μｍで開口率が１００％というイメージセンサを仮定して、複数の高解像度訓練画像から数値計算により生成した。すなわち、１画素サイズが０．７５μｍの高解像度訓練画像を、上記光学条件でぼけさせた後、上記イメージセンサで取得して１画素サイズが１．５μｍの低解像度訓練画像を生成する。また、前述したように、高解像度訓練画像と低解像度訓練画像は互いに同じサイズになるようにバイキュービック補間処理を行った。撮像装置１００で取得される低解像度画像に対してもバイキュービック補間した後、その補間画像に対して超解像処理を行った。高解像度訓練画像を、画素値の最大値が１になるように規格化した。 The training image is a set of 15000 sets of monochrome high-resolution training images and low-resolution training images of 32 × 32 pixels. The low-resolution training image is based on the above-mentioned optical value of 2.8, wavelength 0.55 μm, equal magnification, and an image sensor with a pixel size of 1.5 μm and an aperture ratio of 100%. It was generated by numerical calculation from training images. That is, a high-resolution training image having a pixel size of 0.75 μm is blurred under the optical conditions, and then acquired by the image sensor to generate a low-resolution training image having a pixel size of 1.5 μm. Further, as described above, the bicubic interpolation processing is performed so that the high resolution training image and the low resolution training image have the same size. After bicubic interpolation was performed on the low-resolution image acquired by the imaging apparatus 100, super-resolution processing was performed on the interpolated image. The high-resolution training image was normalized so that the maximum pixel value was 1.

損失関数の重み付け係数は、図３に示すステップ関数状のものを用いた。具体的には、高解像度推定画像と正解画像との差分画像から算出されるＤＣＴ係数のうち高周波側１／２の高周波成分である高周波ＤＣＴ係数を２．５倍するものとした。 As the weighting coefficient of the loss function, the step function type shown in FIG. 3 was used. Specifically, the high-frequency DCT coefficient, which is a high-frequency component on the high-frequency side, out of the DCT coefficients calculated from the difference image between the high-resolution estimated image and the correct image is multiplied by 2.5.

なお、重み付け係数は、高周波ＤＣＴ係数に一様な重みを加えられればどのようなものでもよい。例えば、本実施例のようにステップ関数状のものでもよいし、ステップ関数が鈍ったシグモイド関数状のものを用いてもよい。また、一様に重み付けをする高周波ＤＣＴ係数は、厳密に高周波側１／２の高周波成分に対応するものである必要はなく、高周波側１／２以上、２／３以下の高周波成分に対応するものであればよい。さらに、高周波ＤＣＴ係数に一様に加える重みは、厳密に２．５倍である必要はなく、１．５倍以上、２．５倍以下の範囲であればよい。すなわち、重み付け係数の大きさは、１．５以上、２．５以下であればよい。 The weighting coefficient may be any as long as a uniform weight is added to the high-frequency DCT coefficient. For example, a step function-like one as in this embodiment may be used, or a sigmoid function-like one having a dull step function may be used. Further, the high-frequency DCT coefficient that is uniformly weighted does not have to correspond strictly to the high-frequency component on the high-frequency side 1/2, and corresponds to the high-frequency component on the high-frequency side 1/2 or more and 2/3 or less. Anything is acceptable. Furthermore, the weight added uniformly to the high-frequency DCT coefficient need not be strictly 2.5 times, and may be in the range of 1.5 times or more and 2.5 times or less. That is, the magnitude of the weighting coefficient may be 1.5 or more and 2.5 or less.

図４（ａ）〜（ｃ）は、本実施例での画像処理の結果を示す。図４（ａ）は低解像度画像のバイキュービック補間画像を示し、図４（ｂ）は本実施例における高解像度推定画像、図４（ｃ）は正解画像を示す。各画像はモノクロ画像であり、画像サイズはＮｘ＝Ｎｙ＝２５６画素である。これらの図から、本実施例では、バイキュービック補間画像に比べて正解画像に近い鮮鋭な（ぼけによる劣化が少ない）推定画像が得られたことが分かる。 4A to 4C show the results of image processing in this embodiment. FIG. 4A shows a bicubic interpolated image of a low resolution image, FIG. 4B shows a high resolution estimated image in this embodiment, and FIG. 4C shows a correct image. Each image is a monochrome image, and the image size is Nx = Ny = 256 pixels. From these figures, it can be seen that in the present example, a sharp estimated image (less deteriorated due to blur) closer to the correct image than the bicubic interpolation image was obtained.

本実施例の効果を２乗平均平方根誤差（Root Mean Square Error：ＲＭＳＥ）で定量的に評価した。ＲＭＳＥは以下の式（８）で与えられる。 The effect of this example was quantitatively evaluated by root mean square error (RMSE). RMSE is given by the following equation (8).

式（８）において、Ｐ，Ｑは任意のＭ×１次元ベクトルであり、ｐ_ｉ，ｑ_ｉはそれぞれＰ，Ｑのｉ番目の要素である。ＲＭＳＥがゼロに近いほど、Ｐ，Ｑがより類似していることを示す。すなわち、高解像度推定画像と正解画像のＲＭＳＥがゼロに近いほど、推定画像が高精度に超解像できていると言える。 In Equation (8), P and Q are arbitrary M × 1D vectors, and p _i and q _i are i-th elements of P and Q, respectively. The closer RMSE is to zero, the more similar P and Q are. That is, it can be said that the estimated image can be super-resolved with higher accuracy as the RMSE of the high-resolution estimated image and the correct image is closer to zero.

表１は、低解像度画像のバイキュービック補間画像と正解画像のＲＭＳＥおよび本実施例により得られた高解像度推定画像と正解画像のＲＭＳＥを示す。前者のＲＭＳＥより後者のＲＭＳＥ方がゼロに近いので、本実施例によれば、より高精度に超解像できることが分かる。 Table 1 shows the RMSE of the bicubic interpolated image and the correct image of the low-resolution image and the high-resolution estimated image and the correct image obtained by this example. Since the latter RMSE is closer to zero than the former RMSE, it can be seen that according to this embodiment, super-resolution can be performed with higher accuracy.

次に、本実施例と従来技術とを比較する。従来技術としては、非特許文献１で開示されているＳＲＣＮＮを用いる。また、従来技術は損失関数の重み付けを除いて本実施例と同様なので、説明は省略する。 Next, the present embodiment is compared with the prior art. As conventional technology, SRCNN disclosed in Non-Patent Document 1 is used. Further, since the prior art is the same as the present embodiment except for the weighting of the loss function, the description is omitted.

図５は、従来技術により得られた高解像度推定画像を示す。表２に、従来技術により得られた高解像度推定画像と正解画像のＲＭＳＥを示す。このＲＭＳＥよりも本実施例により得られた高解像度推定画像と正解画像のＲＭＳＥの方がゼロに近いので、本実施例によれば、より高精度に超解像できることが分かる。 FIG. 5 shows a high resolution estimated image obtained by the prior art. Table 2 shows the RMSE of the high resolution estimated image and the correct image obtained by the conventional technique. Since the RMSE of the high-resolution estimated image and the correct image obtained by the present embodiment is closer to zero than the RMSE, it can be seen that according to the present embodiment, the super-resolution can be performed with higher accuracy.

図６は、本実施例と従来技術との１次元スペクトルによる比較結果を示す。１次元スペクトルとは、画像を２次元フーリエ変換することで得られる２次元スペクトルの絶対値をとり、動径方向に積算して１次元のベクトルとして表現したスペクトルである。図６において、横軸は規格化された空間周波数を示し、図の右側ほど高周波数であることを示している。縦軸は１次元スペクトルの対数値を示している。実線は正解画像の１次元スペクトルを示し、点線は従来技術により得られた高解像度推定画像の１次元スペクトルを示している。一点鎖線は本実施例により得られた高解像度推定画像の１次元スペクトルを示している。 FIG. 6 shows a comparison result of the present example and the prior art based on a one-dimensional spectrum. The one-dimensional spectrum is a spectrum obtained by taking an absolute value of a two-dimensional spectrum obtained by performing a two-dimensional Fourier transform on an image and integrating it in the radial direction and expressing it as a one-dimensional vector. In FIG. 6, the horizontal axis indicates the normalized spatial frequency, and the higher the frequency is on the right side of the figure. The vertical axis represents the logarithmic value of the one-dimensional spectrum. The solid line indicates the one-dimensional spectrum of the correct image, and the dotted line indicates the one-dimensional spectrum of the high-resolution estimated image obtained by the conventional technique. An alternate long and short dash line indicates a one-dimensional spectrum of the high-resolution estimated image obtained by this embodiment.

この図において、高周波領域において一点鎖線が点線より実線に近いことから、従来技術より本実施例の方が多くの高周波成分を回復できていることが分かる。なお、画像にノイズ様の高周波成分を加えることでも、高周波成分を増やすことはできる。しかし、その場合は高周波成分を増やした画像の画質が低下し、該画像と正解画像のＲＭＳＥがゼロから離れる。一方、本実施例により得られた高解像度推定画像と正解画像のＲＭＳＥは、従来技術よりゼロに近くなっていることから、高周波成分を高精度に回復できていることが分かる。 In this figure, since the alternate long and short dash line is closer to the solid line than the dotted line in the high frequency region, it can be seen that this embodiment can recover more high frequency components than the prior art. Note that the high frequency component can also be increased by adding a noise-like high frequency component to the image. However, in that case, the image quality of the image with the increased high-frequency component deteriorates, and the RMSE of the image and the correct image deviates from zero. On the other hand, since the RMSE of the high-resolution estimated image and the correct image obtained by this example is closer to zero than in the prior art, it can be seen that the high-frequency component can be recovered with high accuracy.

以上のことから、本実施例によれば、従来技術に比べて、高周波成分をより高精度に回復することができると言える。 From the above, according to the present embodiment, it can be said that the high-frequency component can be recovered with higher accuracy than in the prior art.

実施例２では、損失関数の重み付け係数として、線形関数状のもの（正確には区分線形関数）を用いた数値計算の結果を示す。実施例１とは損失関数の重み付け係数が異なるだけであるので、それ以外についての説明は省略する。 In Example 2, the result of numerical calculation using a linear function-like one (more precisely, a piecewise linear function) is shown as the weighting coefficient of the loss function. Since only the weighting coefficient of the loss function is different from that of the first embodiment, the description of the rest is omitted.

図７は、本実施例における線形関数状の重み付け係数を示す。この重み付け係数は、高解像度推定画像と正解画像との差分画像から算出されるＤＣＴ係数のうち高周波側２／３の高周波成分である高周波ＤＣＴ係数に、最大値が３倍となるように線形に重み付けするものである。 FIG. 7 shows linear function-like weighting coefficients in the present embodiment. This weighting coefficient is linearly set so that the maximum value is three times the high-frequency DCT coefficient that is a high-frequency component on the high-frequency side 2/3 among the DCT coefficients calculated from the difference image between the high-resolution estimated image and the correct image. It is to be weighted.

なお、重み付け係数は、高周波ＤＣＴ係数に対して単調増加する重みを加えられればどのようなものでもよい。例えば、本実施例のように線形関数状のものでもよいし、累乗関数や指数関数等の曲線状のものでもよい。また、単調増加する重みを加える高周波ＤＣＴ係数は、厳密に高周波側２／３の高周波成分に対応するものである必要はなく、高周波側２／３以上、４／５以下の高周波成分に対応するものであればよい。また、高周波ＤＣＴ係数に対して単調増加するように加える重みは、最大値が厳密に３倍である必要はなく、最大値が３倍以上、６倍以下の範囲であればよい。すなわち、重み付け係数の最大値が３以上、６以下であればよい。 The weighting coefficient may be any as long as a monotonically increasing weight is added to the high-frequency DCT coefficient. For example, it may be a linear function as in this embodiment, or may be a curve such as a power function or an exponential function. Further, the high frequency DCT coefficient to which the monotonically increasing weight need not correspond strictly to the high frequency component on the high frequency side 2/3, and corresponds to the high frequency component on the high frequency side 2/3 or more and 4/5 or less. Anything is acceptable. Further, the weight added so as to monotonously increase with respect to the high-frequency DCT coefficient does not need to be strictly three times the maximum value, and may be in the range of the maximum value of 3 times or more and 6 times or less. That is, the maximum value of the weighting coefficient may be 3 or more and 6 or less.

図８は、本実施例により得られた高解像度推定画像を示す。低解像度画像（のバイキュービック補間画像）と正解画像は実施例１と同じである。表３に、本実施例により得られた高解像度推定画像と正解画像のＲＭＳＥを示す。このＲＭＳＥは、従来技術により得られた高解像度推定画像と正解画像のＲＭＳＥよりゼロに近い。また、１次元スペクトルによる周波数空間での評価も、具体的には示さないが、実施例１と同様である。このため、本実施例によれば、従来技術に比べて、より正解画像に近い鮮鋭な（ぼけによる劣化が少ない）高解像度推定画像を得ることができる。 FIG. 8 shows a high-resolution estimated image obtained by this example. The low-resolution image (its bicubic interpolation image) and the correct image are the same as those in the first embodiment. Table 3 shows the RMSE of the high resolution estimated image and the correct image obtained by this example. This RMSE is closer to zero than the RMSE of the high resolution estimated image and the correct image obtained by the prior art. Moreover, although evaluation in the frequency space by a one-dimensional spectrum is not specifically shown, it is the same as that of Example 1. For this reason, according to the present embodiment, it is possible to obtain a sharp high-resolution estimated image that is closer to the correct image (less deteriorated due to blur) than the conventional technique.

実施例３では、超解像ではなく、ノイズ除去を行う場合について説明する。ノイズ除去においても、高周波成分の高精度な回復は重要となっている。その理由は、ノイズにより劣化したノイズ劣化画像のうち本来の画像の高周波成分と高周波ノイズとを見分けることが難しく、ノイズ劣化画像から高周波ノイズを良好に除去することが難しいためである。 In the third embodiment, a case of performing noise removal instead of super-resolution will be described. Also in noise removal, high-accuracy recovery of high-frequency components is important. The reason is that it is difficult to distinguish the high-frequency component of the original image from the noise-degraded image degraded by noise and the high-frequency noise, and it is difficult to satisfactorily remove the high-frequency noise from the noise-degraded image.

例えば、画像処理分野においては、メジアンフィルタ（median filter）を用いてノイズ劣化画像からスパイクノイズが除去される。メジアンフィルタは、ノイズ劣化画像中の注目画素の画素値を、該注目画素の隣接範囲内の画素の中間値（メジアン）と置き換えるフィルタである。このメジアンフィルタを用いることにより、周囲の画素に比べて顕著に大きい又は小さい画素値はノイズと判断されて除去される。しかし、同時にエッジ等の画像の高周波成分も平均化されて鈍る。このため、画像の高周波成分を高精度に回復することが必要である。 For example, in the image processing field, spike noise is removed from a noise-degraded image using a median filter. The median filter is a filter that replaces the pixel value of the target pixel in the noise-degraded image with the intermediate value (median) of the pixels in the adjacent range of the target pixel. By using this median filter, pixel values that are significantly larger or smaller than surrounding pixels are determined to be noise and removed. However, at the same time, the high frequency components of the image such as edges are averaged and dull. For this reason, it is necessary to recover the high-frequency component of the image with high accuracy.

実施例１，２で説明した画像処理をノイズ除去に適用するには、学習で用いる訓練画像を変えるだけでよい。具体的には、低解像度訓練画像（入力画像）と高解像度訓練画像の代わりに、訓練ノイズ劣化画像とこれよりもノイズによる劣化が少ない訓練鮮鋭画像とを用いて、ＣＮＮのネットワークパラメータを学習すればよい。その他は、実施例１，２と同様であるので、説明は省略する。 In order to apply the image processing described in the first and second embodiments to noise removal, it is only necessary to change the training image used in learning. Specifically, instead of using a low-resolution training image (input image) and a high-resolution training image, a training noise degradation image and a training sharp image that is less degraded by noise are used to learn CNN network parameters. That's fine. Others are the same as those in the first and second embodiments, and thus the description thereof is omitted.

実施例４では、超解像ではなく、ぼけ除去を行う場合について説明する。ぼけ除去においても、高周波成分の高精度な回復は重要である。その理由は、ぼけ除去の目的が光学系やイメージセンサの開口により失われた画像の高周波成分を回復することだからである。 In the fourth embodiment, a case of performing blur removal instead of super-resolution will be described. In removing blur, it is important to recover high-frequency components with high accuracy. This is because the purpose of blur removal is to recover the high frequency components of the image lost due to the aperture of the optical system or image sensor.

実施例１，２で説明した画像処理をぼけ除去に適用するには、学習で用いる訓練画像を変えるだけでよい。具体的には、低解像度訓練画像（入力画像）と高解像度訓練画像の代わりに、ぼけを含む訓練ぼけ画像とこれよりもぼけによる劣化が少ない訓練鮮鋭画像とを用いて、ＣＮＮのネットワークパラメータを学習すればよい。その他は、実施例１，２と同様であるので、説明は省略する。
（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 In order to apply the image processing described in the first and second embodiments to blur removal, it is only necessary to change the training image used in learning. Specifically, instead of the low-resolution training image (input image) and the high-resolution training image, the network parameter of the CNN is set using a training blur image including blur and a training sharp image with less deterioration due to blur. Just learn. Others are the same as those in the first and second embodiments, and thus the description is omitted.
(Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

以上説明した各実施例は代表的な例にすぎず、本発明の実施に際しては、各実施例に対して種々の変形や変更が可能である。 Each embodiment described above is only a representative example, and various modifications and changes can be made to each embodiment in carrying out the present invention.

１０３画像処理装置 103 Image processing apparatus

Claims

An error weighting means for calculating an error between an estimated image obtained by giving an input image to a convolutional neural network and a training image corresponding to the input image, and weighting a frequency component of the error;
An image processing apparatus comprising: parameter setting means for calculating a gradient from the weighted error and setting a network parameter of the convolutional neural network using the gradient.

The image processing apparatus according to claim 1, wherein the error is an image indicating a difference between the input image and the training image.

The error weighting means includes
Perform frequency decomposition of the error to calculate a frequency coefficient for each frequency component,
Of the frequency coefficients, a weighting high frequency coefficient is calculated by applying a weighting coefficient to a high frequency coefficient corresponding to a predetermined high frequency component,
The image processing apparatus according to claim 1, wherein an inverse transform of the frequency decomposition is performed on the weighted high frequency coefficient.

The image processing apparatus according to claim 3, wherein the frequency decomposition is a discrete cosine transform, and the frequency coefficient is a discrete cosine transform coefficient.

The image processing apparatus according to claim 3, wherein the weighting coefficient is set so as to uniformly apply weight to the high-frequency coefficient.

The image processing apparatus according to claim 5, wherein the weighting coefficient has a size of 1.5 or more and 2.5 or less.

The image processing apparatus according to claim 5, wherein the predetermined high-frequency component is a high-frequency component of ½ or more and ／ or less of a high-frequency side.

The image processing apparatus according to claim 3, wherein the weighting coefficient is set so as to add a monotonically increasing weight to the high-frequency coefficient.

The image processing apparatus according to claim 8, wherein the maximum value of the weighting coefficient is 3 or more and 6 or less.

The image processing apparatus according to claim 8, wherein the predetermined high-frequency component is a high-frequency component of 2/3 or more and 4/5 or less on the high-frequency side.

The input image is a deteriorated image deteriorated with respect to the correct image,
The image processing apparatus according to claim 1, wherein the training image is the correct image.

The input image is a low resolution image;
The estimated image is an estimated image having a higher resolution than the low-resolution image;
The image processing apparatus according to claim 1, wherein the training image is a training image having a higher resolution than the low-resolution image.

The input image is a noise deteriorated image deteriorated by noise,
The estimated image is an estimated image that is less degraded by noise than the noise degraded image,
The image processing apparatus according to claim 1, wherein the training image is a training image that is less degraded by noise than the noise degraded image.

The input image is a blurred image including blur;
The estimated image is an estimated image with less deterioration due to blur than the blurred image;
The image processing apparatus according to claim 1, wherein the training image is a training image that is less deteriorated due to blur than the blur image.

An image sensor;
An imaging apparatus comprising: the image processing apparatus according to claim 1, wherein an image obtained through the imaging element is an input image.

Calculating an error between an estimated image obtained by giving an input image to a convolutional neural network and a training image corresponding to the input image, and weighting the frequency component of the error;
And a step of calculating a gradient from the weighted error and setting a network parameter of the convolutional neural network using the gradient.

A computer program for causing a computer to execute image processing,
The image processing is
A process of calculating an error between an estimated image obtained by giving an input image to a convolutional neural network and a training image corresponding to the input image, and weighting the frequency component of the error;
An image processing program comprising: calculating a gradient from the weighted error, and setting a network parameter of the convolutional neural network using the gradient.