JP2021039559A

JP2021039559A - Image processing device

Info

Publication number: JP2021039559A
Application number: JP2019160691A
Authority: JP
Inventors: 瑛一佐々木; Eiichi Sasaki; 中條　健; Takeshi Nakajo; 健中條; 知宏猪飼; Tomohiro Igai
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2021-03-11

Abstract

To provide an image processing device capable of achieving more suitable image quality.SOLUTION: An image processing device (1) includes a neural network (21) for receiving first input image data and outputting output image data, and a learning part (22) for learning the neural network (21). The learning part (22) uses first loss and second loss to learn the neural network (21).SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、画像処理装置に関する。 Embodiments of the present invention relate to an image processing apparatus.

ニューラルネットワークを用いて、超解像処理、フィルタリング等の画像処理を行う技術が知られている。これらのうち教師付学習では、入力画像と教師画像を入力として、その２つの画像間の損失を最小化するように、ネットワークのパラメータ(NNパラメータ)を学習することが行われる。 Techniques for performing image processing such as super-resolution processing and filtering using a neural network are known. Of these, in supervised learning, an input image and a teacher image are used as inputs, and network parameters (NN parameters) are learned so as to minimize the loss between the two images.

例えば、特許文献１には、ニューラルネットワークを用いた超解像方法が開示されている。特許文献１では、ジェネレータとディスクリミネータの２つのニューラルネットワークから構成される条件付き敵対生成ネットワーク(GAN Network)を用いる。 For example, Patent Document 1 discloses a super-resolution method using a neural network. Patent Document 1 uses a conditional hostile generation network (GAN Network) composed of two neural networks, a generator and a discriminator.

特開２０１９−７９４３６号公報JP-A-2019-79436

しかしながら、特許文献１のような従来技術では、所望の画質を実現するという点で改善の余地があった。 However, in the prior art as in Patent Document 1, there is room for improvement in achieving a desired image quality.

本発明の一態様は、より好適な画質を実現可能な画像処理装置を実現することを目的とする。 One aspect of the present invention is to realize an image processing apparatus capable of achieving more suitable image quality.

本発明の一態様に係る画像処理装置は、第１の入力画像データが入力され、出力画像データを出力するニューラルネットワークと、上記ニューラルネットワークを学習させる学習部と、を備え、上記学習部は、上記出力画像データと、上記出力画像データに対応する第１の教師画像データとを用いた第１の損失、及び上記出力画像データと、上記第１の教師画像データに対応する第２の教師画像データであって、上記第１の教師画像データに対して画像処理を施して得られる第２の教師画像データとを用いた第２の損失を用いて、上記ニューラルネットワークを学習させることを特徴とする。 The image processing apparatus according to one aspect of the present invention includes a neural network in which the first input image data is input and outputs the output image data, and a learning unit for learning the neural network. The first loss using the output image data and the first teacher image data corresponding to the output image data, and the second teacher image corresponding to the output image data and the first teacher image data. The feature is that the neural network is trained by using the second loss using the second teacher image data obtained by performing image processing on the first teacher image data. To do.

本発明の一態様に係る画像処理装置は、第１の入力画像データと、画像処理の度合いを示す処理強度パラメータを画素値とする第３の入力画像データとが入力され、第１の出力画像データを出力するニューラルネットワークを備えることを特徴とする。 In the image processing apparatus according to one aspect of the present invention, a first input image data and a third input image data having a processing intensity parameter indicating the degree of image processing as a pixel value are input, and a first output image is obtained. It is characterized by including a neural network that outputs data.

本発明の一態様によれば、より好適な画質を実現可能な画像処理装置を提供することができる。 According to one aspect of the present invention, it is possible to provide an image processing apparatus capable of achieving more suitable image quality.

本発明の実施形態1に係る画像処理装置の構成を示す概略図である。It is the schematic which shows the structure of the image processing apparatus which concerns on Embodiment 1 of this invention. 本発明の実施形態1に係るCNNフィルタの入出力の一例を示す概念図である。It is a conceptual diagram which shows an example of the input / output of the CNN filter which concerns on Embodiment 1 of this invention. 本発明の実施形態1の変形例に係るCNNフィルタの入出力の一例を示す概念図である。It is a conceptual diagram which shows an example of the input / output of the CNN filter which concerns on the modification of Embodiment 1 of this invention. 本発明の実施形態2に係るCNNフィルタの入出力の一例を示す概念図である。It is a conceptual diagram which shows an example of the input / output of the CNN filter which concerns on Embodiment 2 of this invention. 本発明の実施形態2の変形例に係るCNNフィルタの入出力の一例を示す概念図である。It is a conceptual diagram which shows an example of the input / output of the CNN filter which concerns on the modification of Embodiment 2 of this invention. 本発明の実施形態3に係るCNNフィルタの入出力の一例を示す概念図である。It is a conceptual diagram which shows an example of the input / output of the CNN filter which concerns on Embodiment 3 of this invention.

〔実施形態1〕
以下、図面を参照しながら本発明の実施形態について説明する。 [Embodiment 1]
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図1は、本実施形態に係る画像処理装置1の構成を示す概略図である。 FIG. 1 is a schematic view showing the configuration of the image processing device 1 according to the present embodiment.

画像処理装置1は、例えば、画像のフィルタリングを行う画像フィルタ装置として機能する。画像処理装置1の適用先は本実施形態を限定するものではないが、一例として、画像フィルタ装置としては、超解像処理、解像度変換処理、精細度エンハンス処理、ノイズ低減処理、ループフィルタ及びポストフィルタが挙げられる。 The image processing device 1 functions as, for example, an image filter device that filters images. The application destination of the image processing device 1 is not limited to this embodiment, but as an example, as an image filter device, super-resolution processing, resolution conversion processing, definition enhancement processing, noise reduction processing, loop filter and post. Examples include filters.

ここで、ループフィルタとは、画像復号処理又は画像符号化処理における復号ループにおいて復号画像に作用するフィルタであり、フィルタ処理後の画像は参照ピクチャとして参照され、予測画像の生成に利用される。ポストフィルタとは、画像復号処理における復号ループ処理後の出力画像に作用するフィルタであり、出力画像の高画質化に利用される。 Here, the loop filter is a filter that acts on the decoded image in the decoding loop in the image decoding process or the image coding process, and the filtered image is referred to as a reference picture and used to generate a predicted image. The post filter is a filter that acts on the output image after the decoding loop process in the image decoding process, and is used to improve the image quality of the output image.

画像処理装置1は、一例として、入出力インターフェース10、制御部20、及びメモリ30を備えている。入出力インターフェース10は、外部の入力装置から第１の入力画像データA1を取得し、外部の出力装置へ出力画像データBを出力する。制御部20としては、CPU（Central Processing Unit）又はGPU（Graphics Processing Unit）が用いられる。メモリ30は、各種データを格納する。なお、画像処理装置1が、画像復号装置又は画像符号化装置内に備えられる場合、入出力インターフェース10及びメモリ30は、画像復号装置又は画像符号化装置におけるインターフェースやピクチャバッファに置き換えることができる。 The image processing device 1 includes an input / output interface 10, a control unit 20, and a memory 30 as an example. The input / output interface 10 acquires the first input image data A1 from the external input device and outputs the output image data B to the external output device. As the control unit 20, a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) is used. The memory 30 stores various data. When the image processing device 1 is provided in the image decoding device or the image coding device, the input / output interface 10 and the memory 30 can be replaced with an interface or a picture buffer in the image decoding device or the image coding device.

制御部20は、重みとバイアスを含むNNパラメータにより計算されるネットワークからなるCNN（Convolutional Neural Network、畳み込みニューラルネットワーク）フィルタ21（ニューラルネットワーク）を備える。また、制御部20は、さらにフィルタ21を学習させる学習部22を備えてもよい。 The control unit 20 includes a CNN (Convolutional Neural Network) filter 21 (neural network) composed of a network calculated by NN parameters including weights and biases. Further, the control unit 20 may further include a learning unit 22 for learning the filter 21.

ここで、CNNとは、コンボリューション層（積和演算における重み係数及びバイアス/オフセットがピクチャ内の位置に依存しない層）を少なくとも有するニューラルネットワークの総称である。重み係数をカーネルとも呼ぶ。CNNフィルタ21は、コンボリューション層の他、フルコネクション層（FCN）と呼ばれる、重み計算がピクチャ内の位置に依存する層を含むことができる。また、CNNフィルタ21は、層に属するニューロンが、その層の一部の入力とのみ接続する構成（換言すると、ニューロンは空間的位置を有し、空間的な位置に近い入力のみと接続する構成）であるLCN(Locally Connected Networks)層も含むことができる。CNNフィルタ21において、コンボリューション層への入力サイズと、出力サイズとは異なってもよい。すなわち、CNNフィルタ21は、コンボリューションフィルタを適用する位置を移動させる場合の移動量（ステップサイズ）を１より大きくすることで、出力サイズが入力サイズよりも小さくなる層を含むことができる。また、出力サイズが入力サイズよりも大きくなるデコンボリューション層(Deconvolution)も含むことができる。デコンボリューション層は、トランスポーズコンボリューション(Transposed Convolution)と呼ばれる場合もある。また、DepthToSpace(Pixel shuffler)と呼ばれる複数のチャネルを展開して解像度を変換しても良い。たとえば４チャネルを各値を左上、右上、左下、右下に展開することで縦２倍、横２倍の解像度に変換しても良い。また、CNNフィルタ21は、プーリング層(Pooling)、ドロップアウト(DropOut)層等を含むことができる。プーリング層は、大きな画像を小さなウィンドウに区切り、区切ったそれぞれのウィンドウに応じて最大値や平均値等の代表値を得る層であり、ドロップアウト層は、確率に応じて出力を固定値（例えば０）にすることでランダム性を追加する層である。なお、CNNフィルタ21に含まれる層の数は本実施形態を限定するものではない。 Here, CNN is a general term for neural networks having at least a convolutional layer (a layer in which the weighting coefficient and bias / offset in the product-sum operation do not depend on the position in the picture). The weighting factor is also called the kernel. In addition to the convolutional layer, the CNN filter 21 can include a layer called a full connection layer (FCN) whose weight calculation depends on the position in the picture. Further, the CNN filter 21 has a configuration in which neurons belonging to a layer are connected only to some inputs of the layer (in other words, neurons have a spatial position and are connected only to inputs close to the spatial position. ) LCN (Locally Connected Networks) layer can also be included. In the CNN filter 21, the input size to the convolutional layer and the output size may be different. That is, the CNN filter 21 can include a layer in which the output size is smaller than the input size by making the movement amount (step size) when moving the position to which the convolution filter is applied larger than 1. It can also include a deconvolution layer in which the output size is larger than the input size. The deconvolution layer is sometimes called a transposed convolution. In addition, a plurality of channels called DepthToSpace (Pixel shuffler) may be expanded to convert the resolution. For example, 4 channels may be converted into double vertical and double horizontal resolution by expanding each value to the upper left, upper right, lower left, and lower right. Further, the CNN filter 21 can include a pooling layer (Pooling), a dropout (DropOut) layer and the like. The pooling layer is a layer that divides a large image into small windows and obtains representative values such as the maximum value and the average value according to each divided window, and the dropout layer has a fixed output value (for example, depending on the probability). It is a layer that adds randomness by setting it to 0). The number of layers included in the CNN filter 21 does not limit the present embodiment.

なおCNNフィルタ21の入力層である画像、中間層の特徴マップ、出力層の画像は、幅W×高さH×チャネル数Cのテンソルであってもよい。さらに３次元テンソルの代わりに４次元テンソルを用いてもよい。入力層のチャネルは、R, G, BやY, U, Vの3チャネルでもよいし、また、符号化データ（例えばNFチャネル）や、処理強度パラメータ（例えば1チャネル）を加えた、3 + NF + 1チャネルでもよい。また、出力層のチャネルも、R, G, BやY, U, Vの3チャネルでもよい。また、R, G, BもしくはY, U, Vの色コンポーネントを1つづつ処理する場合には、入力層は1 + NF + 1チャネル、出力層は、1チャネルでもよい。また、出力層は処理画像以外に、セグメンテーション結果の画像などを含んでいても良い。なお、CNNフィルタ21の第１の入力画像データA1と出力画像データBの解像度（幅and/or高さ）は異なっていてもよい。つまり、解像度変換を行ってもよい。解像度変換のうち、解像度を上げる方向の変換を一般に超解像とよぶ。 The image of the input layer of the CNN filter 21, the feature map of the intermediate layer, and the image of the output layer may be a tensor of width W × height H × number of channels C. Further, a four-dimensional tensor may be used instead of the three-dimensional tensor. The channels of the input layer may be 3 channels of R, G, B and Y, U, V, and 3 + with coded data (for example, NF channel) and processing intensity parameter (for example, 1 channel) added. It may be NF + 1 channel. Further, the channels of the output layer may be three channels of R, G, B and Y, U, V. When processing R, G, B or Y, U, V color components one by one, the input layer may be 1 + NF + 1 channel and the output layer may be 1 channel. Further, the output layer may include an image of the segmentation result and the like in addition to the processed image. The resolutions (width and / or height) of the first input image data A1 and the output image data B of the CNN filter 21 may be different. That is, resolution conversion may be performed. Of the resolution conversions, the conversion in the direction of increasing the resolution is generally called super-resolution.

続いて、図2を参照して本実施形態に係るCNNフィルタ21の具体的処理について説明する。図2は、本実施形態に係るCNNフィルタ21の入出力の一例を示す概念図である。 Subsequently, the specific processing of the CNN filter 21 according to the present embodiment will be described with reference to FIG. FIG. 2 is a conceptual diagram showing an example of input / output of the CNN filter 21 according to the present embodiment.

図2に示すように、CNNフィルタ21には、入出力インターフェース10を介して、第１の入力画像データA1が入力される。CNNフィルタ21は、第１の入力画像データA1にフィルタ処理を施し、出力画像データBを出力する。 As shown in FIG. 2, the first input image data A1 is input to the CNN filter 21 via the input / output interface 10. The CNN filter 21 filters the first input image data A1 and outputs the output image data B.

（学習フェーズ）
学習部22は、CNNフィルタ21を学習させる。学習部22は、損失導出部と、NNパラメータ更新部を備える。損失導出部は、教師画像と出力画像データBとから結合Lossを計算する。NNパラメータ更新部は、Lossを減少させる方向のNNパラメータのgradientを導出し、学習率とgradientの積をNNパラメータに加算することによりNNパラメータを更新する。また、損失を導出するネットワーク（ディスクリミネータ）の損失GANLossをさらに結合Lossの一部として用いる場合には、ディスクリミネータのNNパラメータのgradientを導出し、学習率とgradientの積をNNパラメータに加算することによりディスクリミネータのNNパラメータを更新する。GANを用いる場合には、CNNフィルタ21であるジェネレータの学習とディスクリミネータの学習を同時（交互にNNパラメータを更新）するとよい。 (Learning phase)
The learning unit 22 trains the CNN filter 21. The learning unit 22 includes a loss derivation unit and an NN parameter update unit. The loss derivation unit calculates the combined Loss from the teacher image and the output image data B. The NN parameter update unit derives the gradient of the NN parameter in the direction of decreasing Loss, and updates the NN parameter by adding the product of the learning rate and the gradient to the NN parameter. When the loss GAN Loss of the network (discriminator) from which the loss is derived is further used as a part of the coupling loss, the gradient of the discriminator's NN parameter is derived, and the product of the learning rate and the gradient is used as the NN parameter. Update the NN parameter of the discriminator by adding. When using GAN, it is advisable to simultaneously learn the generator, which is the CNN filter 21, and the discriminator (update the NN parameters alternately).

学習部22は、第１の損失Loss1及び第２の損失Loss2を用いて、上記損失導出部により結合Lossを計算し、NNパラメータ更新部でNNパラメータを更新してCNNフィルタ21を学習させる。第１の損失は、出力画像データBと、出力画像データBに対応する第１の教師画像データC1とを用いた損失である。第２の損失は、出力画像データBと、第１の教師画像データC1に対応する第２の教師画像データC2とを用いた損失である。 The learning unit 22 calculates the combined Loss by the loss deriving unit using the first loss Loss1 and the second loss Loss2, and updates the NN parameter in the NN parameter updating unit to learn the CNN filter 21. The first loss is a loss using the output image data B and the first teacher image data C1 corresponding to the output image data B. The second loss is a loss using the output image data B and the second teacher image data C2 corresponding to the first teacher image data C1.

ここで、ある画像データと他の画像データとを用いた損失（誤差、コスト）とは、当該ある画像と他の画像との相違を定量的に示す値である。２つの画像Ａ，画像Ｂから損失を計算するための関数を損失関数と呼ぶ。例えば、第１の損失及び第２の損失として、それぞれ、L2ノルムであるMSE（Mean Squared Error）、L1ノルムであるMAE(Mean Absolute Error)がある。また、Perceptual LossやContent Lossと呼ばれる、画像Ａのあるニューラルネットワークにより特徴量PA（特徴マップ）と、画像Ｂの同じニューラルネットワークによる特徴量PBの差（例えばMSEやMAE）がある。Perceptual LossではVGGなどのネットワークを利用しても良い。MSEの場合の式は以下のとおり、
PerceptualLoss = Σ|PA - PB|^2
また、GANロス（敵対生成ネットワークロス）と呼ばれる、２つの画像からディスクリミネータと呼ばれるニューラルネットワークを用いて計算される損失、グラムロスやスタイルロスと呼ばれる損失等がある。グラムロスでは２つの特徴マップ間で内積であるグラム行列Gを導出し、
G_ij = ΣPik*Pjk
上記Σはkに関する総和
画像Ａのグラム行列GAと画像Ｂのグラム行列GBの差からロスを損失してもよい。 Here, the loss (error, cost) using a certain image data and another image data is a value that quantitatively indicates the difference between the certain image and the other image. The function for calculating the loss from the two images A and B is called a loss function. For example, as the first loss and the second loss, there are MSE (Mean Squared Error) which is the L2 norm and MAE (Mean Absolute Error) which is the L1 norm, respectively. In addition, there is a difference between the feature PA (feature map) due to the neural network with the image A and the feature PB due to the same neural network of the image B (for example, MSE or MAE), which are called Perceptual Loss or Content Loss. Perceptual Loss may use a network such as VGG. The formula for MSE is as follows:
PerceptualLoss = Σ | PA --PB | ^ 2
In addition, there is a loss called GAN loss (hostile generation network loss), which is calculated from two images using a neural network called a discriminator, and a loss called gram loss or style loss. In Gram Loss, the Gram matrix G, which is the inner product between the two feature maps, is derived.
G_ij = ΣPik * Pjk
The above Σ may lose a loss from the difference between the Gram matrix GA of the total image A and the Gram matrix GB of the image B with respect to k.

GramLoss = Σ|GA - GB|^2
これらの損失関数を用いて損失を導出することができる。例えば、教師画像データCと出力画像データBの損失の例を以下に示す。 GramLoss = Σ | GA --GB | ^ 2
Losses can be derived using these loss functions. For example, an example of loss of teacher image data C and output image data B is shown below.

学習部22は、様々な入力画像データに対し、第１の損失及び第２の損失が小さくなるようにCNNフィルタ21における各種のNNパラメータ（重み、バイアス）を順次更新することにより、CNNフィルタ21を学習させる。一例として、学習部22は、第１の損失と第２の損失との和の結合Lossが小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。また、第１の損失及び第２の損失以外の損失を含んでいてもよい。例えば、第１の損失及び第２の損失、第３の損失Loss3、・・・、第Nの損失LossNの和の結合Lossが小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させてもよい。 The learning unit 22 sequentially updates various NN parameters (weights, biases) in the CNN filter 21 so that the first loss and the second loss become smaller for various input image data, so that the CNN filter 21 To learn. As an example, the learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the combined Loss of the sum of the first loss and the second loss becomes smaller. Further, a loss other than the first loss and the second loss may be included. For example, the various NN parameters in the CNN filter 21 are sequentially updated so that the combined Loss of the sum of the first loss and the second loss, the third loss Loss3, ..., The Nth loss LossN becomes smaller. May train the CNN filter 21.

これにより、CNNフィルタ21は、第１の教師画像データC1にも第２の教師画像データC2にも類似した出力画像データBを出力することができるようになる。 As a result, the CNN filter 21 can output output image data B similar to both the first teacher image data C1 and the second teacher image data C2.

（画像データの説明）
学習フェーズにおいて用いられる第１の入力画像データA1は、例えば、第１の教師画像データC1に対して、所定の処理を施した画像データである。例えば、第１の入力画像データA1は、第１の教師画像データC1を符号化した画像データ、第１の教師画像データC1を低解像度にする処理を施した画像データ等である。 (Explanation of image data)
The first input image data A1 used in the learning phase is, for example, image data obtained by subjecting the first teacher image data C1 to a predetermined process. For example, the first input image data A1 is image data in which the first teacher image data C1 is encoded, image data obtained by processing the first teacher image data C1 to have a low resolution, and the like.

第２の教師画像データC2は、第１の教師画像データC1に対して画像処理を施して得られる教師画像データである。当該画像処理は、シャープネス、精細化、トーン、ボケ、ノイズ除去量、及びタッチの少なくとも１つの処理である。 The second teacher image data C2 is teacher image data obtained by performing image processing on the first teacher image data C1. The image processing is at least one processing of sharpness, fineness, tone, blurring, noise removal amount, and touch.

（損失の組み合わせの他の例）
上記の説明では、一例として、学習部22は、第１の損失と第２の損失との和である結合Lossが小さくなるようにCNNフィルタ21を学習させる場合を説明したが、これは本実施形態を限定するものではない。 (Other examples of loss combinations)
In the above description, as an example, the learning unit 22 has described the case where the CNN filter 21 is trained so that the coupling loss, which is the sum of the first loss and the second loss, becomes small. The form is not limited.

以下では、学習部22が第１の損失及び第２の損失を用いてCNNフィルタ21を学習させる他の例を示す。 In the following, another example is shown in which the learning unit 22 trains the CNN filter 21 using the first loss and the second loss.

学習部22は、第１の損失及び第２の損失の少なくとも一方に対して重み付けを行った結合Lossを用いて、CNNフィルタ21を学習させてもよい。つまり、第１の損失のみに重み付けを行っても、第２の損失のみに重み付けを行っても、第１の損失及び第２の損失の両方に重み付けを行ってもよい。 The learning unit 22 may train the CNN filter 21 by using the coupling Loss in which at least one of the first loss and the second loss is weighted. That is, only the first loss may be weighted, only the second loss may be weighted, or both the first loss and the second loss may be weighted.

具体的には、学習部22は、第１の損失及び第２の損失の少なくとも一方に対して重み付けを行った重み付け線形和を用いる。そして、第１の損失及び第２の損失が小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。これにより、CNNフィルタ21は、重みに応じて、適切な出力画像データBを出力することができる。 Specifically, the learning unit 22 uses a weighted linear sum in which at least one of the first loss and the second loss is weighted. Then, the CNN filter 21 is trained by sequentially updating various NN parameters in the CNN filter 21 so that the first loss and the second loss become smaller. As a result, the CNN filter 21 can output appropriate output image data B according to the weight.

例えば、第１の損失の影響を第２の損失の影響よりも大きくするように重み付けを行った場合、CNNフィルタ21は、第２の教師画像データC2よりも第１の教師画像データC1により類似した出力画像データBを出力することができる。一方、第２の損失の影響を第１の損失の影響よりも大きくするように重み付けを行った場合、CNNフィルタ21は、第１の教師画像データC1よりも第２の教師画像データC2により類似した出力画像データBを出力することができる。 For example, when weighted so that the effect of the first loss is greater than the effect of the second loss, the CNN filter 21 is more similar to the first teacher image data C1 than to the second teacher image data C2. Output image data B can be output. On the other hand, when weighting is performed so that the effect of the second loss is larger than the effect of the first loss, the CNN filter 21 is more similar to the second teacher image data C2 than the first teacher image data C1. Output image data B can be output.

具体的な重み付けの一例として、学習部22が第２の損失に対して重み付けを行った重み付け線形和を用いてCNNフィルタ21を学習させる場合、重み付け線形和として以下を用いればよい。 As a specific example of weighting, when the learning unit 22 trains the CNN filter 21 by using the weighted linear sum obtained by weighting the second loss, the following may be used as the weighted linear sum.

Loss=Loss1+w*Loss2
ここで、wは第２の損失に対する重み係数である。
勿論、以下のように複数の重みを用いて結合Lossを計算してもよい。 Loss = Loss1 + w * Loss2
Here, w is a weighting coefficient for the second loss.
Of course, the combined Loss may be calculated using a plurality of weights as follows.

Loss=w1*Loss1+w2*Loss2
また、Lossは、GANLossやPerceptualLossなどを含んでいても良い。また、Loss1 / Loss2は複数の損失（Loss1L1, Loss1L2 / Loss1L1, Loss1L2）を用いても良い。例えば
Loss=w1*Loss1+w2*Loss2 + w3*GANLoss + w4*PerceptualLoss
であってもよい。一般には結合Lossは、以下の式のように添え字iで識別される複数のLossの重み付き線形結合からなる。 Loss = w1 * Loss1 + w2 * Loss2
In addition, Loss may include GAN Loss, Perceptual Loss, and the like. Further, Loss1 / Loss2 may use a plurality of losses (Loss1L1, Loss1L2 / Loss1L1, Loss1L2). For example
Loss = w1 * Loss1 + w2 * Loss2 + w3 * GANLoss + w4 * Perceptual Loss
It may be. In general, a combination Loss consists of a weighted linear combination of a plurality of Loss identified by the subscript i as shown in the following equation.

Loss = Σwi * Loss_i
ここで、Loss_iのうちに、第１の損失と、第２の損失を含む。例えばLoss_0 = 第１の損失、Loss_1 = 第２の損失であってもよい。 Loss = Σwi * Loss_i
Here, Loss_i includes a first loss and a second loss. For example, Loss_0 = first loss, Loss_1 = second loss.

第１の損失の影響を第２の損失の影響よりも大きくする場合、wの値を１未満(w1>w2)とすればよい。また、第２の損失の影響を第１の損失の影響よりも大きくする場合、wの値を１よりも大きく(w1<w2)すればよい。 If the effect of the first loss is greater than the effect of the second loss, the value of w may be less than 1 (w1> w2). Further, when the influence of the second loss is made larger than the influence of the first loss, the value of w may be made larger than 1 (w1 <w2).

（運用フェーズ）
運用フェーズにおいては、実際に画像処理を施したい所望の画像をCNNフィルタ21に入力させる。そして画像処理装置1は、以上のように学習したCNNフィルタを用いて、フィルタ処理を行う。これにより、画像処理装置1は、好適に画像処理を施した出力画像データを出力することができる。 (Operation phase)
In the operation phase, the CNN filter 21 is made to input a desired image to be actually subjected to image processing. Then, the image processing device 1 performs filter processing using the CNN filter learned as described above. As a result, the image processing device 1 can output output image data that has been appropriately subjected to image processing.

なお、上記では画像処理装置1の動作として学習フェーズと運用フェーズを説明したが、これらの装置は独立であってもよい。すなわち、画像処理装置1は学習部22（損失導出部とNNパラメータ更新部）を備え、学習フェーズだけを行う装置であってもよいし、学習部22を備えずCNNフィルタ21のみを備え運用フェーズだけを行う装置であってもよい。 Although the learning phase and the operation phase have been described above as the operations of the image processing device 1, these devices may be independent. That is, the image processing device 1 may be a device that includes a learning unit 22 (loss derivation unit and NN parameter update unit) and performs only the learning phase, or may include only the CNN filter 21 without the learning unit 22 and is in the operation phase. It may be a device that performs only.

（実施形態1の変形例）
実施形態1の変形例を以下に説明する。画像処理装置1の構成については実施形態1と同様である。学習フェーズ及び運用フェーズについては、実施形態1と異なる点について以下に説明するが、それ以外は実施形態1と同様である。 (Modified Example of Embodiment 1)
A modified example of the first embodiment will be described below. The configuration of the image processing device 1 is the same as that of the first embodiment. The learning phase and the operation phase are the same as those in the first embodiment except for the differences from the first embodiment.

図3は、本変形例に係るCNNフィルタ21の入出力の一例を示す概念図である。 FIG. 3 is a conceptual diagram showing an example of input / output of the CNN filter 21 according to this modification.

図3に示すように、CNNフィルタ21には、第１の入力画像データA1に加えて、第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2がさらに入力されてもよい。符号化パラメータを画素値とする第２の入力画像データA2とは、画像サイズの各位置(x, y)の符号化パラメータの値（例えば量子化パラメータqP(x y)）を画素とする画像であってもよい。例えば、第１の入力画像データA1として、qP=22で量子化した符号化画像を用いる場合には、A2は、22を画素値とした画像であってもよい。
セットである。つまり、A1とA2のセットは、qP=27の量子化パラメータで符号化された符号化画像と、qP(x,y)=27からなる画像のセットであってもよい。これらのセットを大量の画像に対して準備し、入力する。また、一例として、CNNフィルタ21には、第１の入力画像データA1と、第２の入力画像データA2とを結合（concatenate）して入力してもよい。例えば、第１の入力画像データA1と第２の入力画像データA2の解像度が異なる場合、第２の入力画像データA2の解像度を、第１の入力画像データA1にあわせて拡大した後で結合（concatenate）して入力してもよい。 As shown in FIG. 3, in addition to the first input image data A1, a second input image data A2 having a coding parameter related to the first input image data A1 as a pixel value is further input to the CNN filter 21. May be done. The second input image data A2 having the coding parameter as the pixel value is an image in which the value of the coding parameter (for example, the quantization parameter qP (xy)) at each position (x, y) of the image size is used as a pixel. There may be. For example, when a coded image quantized with qP = 22 is used as the first input image data A1, A2 may be an image having 22 as a pixel value.
It is a set. That is, the set of A1 and A2 may be a set of a coded image encoded by the quantization parameter of qP = 27 and an image consisting of qP (x, y) = 27. Prepare and input these sets for a large number of images. Further, as an example, the first input image data A1 and the second input image data A2 may be concatenate and input to the CNN filter 21. For example, when the resolutions of the first input image data A1 and the second input image data A2 are different, the resolution of the second input image data A2 is expanded according to the first input image data A1 and then combined ( You may enter by concatenate).

また、遅延入力と呼ばれるように、CNNフィルタ21の入力層だけではなく、CNNフィルタ21の中間層の特徴量において、第２の入力画像データA2を結合（concatenate）して入力してもよい。 Further, as is called delayed input, the second input image data A2 may be concatenate and input not only in the input layer of the CNN filter 21 but also in the feature amount of the intermediate layer of the CNN filter 21.

CNNフィルタ21の入力層とCNNフィルタ21の中間層の特徴マップの両方において、第２の入力画像データA2を結合（concatenate）して入力してもよい。また、複数のCNNフィルタ21の中間層の特徴量において、第２の入力画像データA2を結合（concatenate）して入力してもよい。 The second input image data A2 may be concatenate and input in both the feature map of the input layer of the CNN filter 21 and the feature map of the intermediate layer of the CNN filter 21. Further, the second input image data A2 may be concatenate and input in the feature amount of the intermediate layer of the plurality of CNN filters 21.

CNNフィルタ21に第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2が入力されることにより、符号化パラメータに応じた出力画像データBを生成することができる。 By inputting the second input image data A2 having the coding parameter related to the first input image data A1 as the pixel value to the CNN filter 21, the output image data B corresponding to the coding parameter can be generated. ..

（符号化パラメータの説明）
符号化パラメータとは、量子化パラメータ（QP:quantization parameter）、パーティション分割パラメータ（マルチツリー分割パラメータ、Multi Type Treeパラメータ、MTTパラメータ）、予測パラメータ、符号化に関連して生成される符号化の対象となるパラメータの１つもしくはセットである。 (Explanation of coding parameters)
Coding parameters are quantization parameters (QP: quantization parameters), partitioning parameters (multi-tree partitioning parameters, Multi Type Tree parameters, MTT parameters), prediction parameters, and coding targets generated in connection with coding. It is one or a set of parameters that becomes.

量子化パラメータは、画像の圧縮率と画質とを制御するパラメータである。量子化パラメータは、値が大きいほど画質が低くなり符号量が減少する特性、および値が小さいほど画質が高くなり符号量が増加する特性を有する。量子化パラメータとして、例えば、予測残差の量子化幅を導出するパラメータを用いることができる。なお、量子化パラメータは、符号化データ上で伝送される値そのものに限定されず、符号化データで伝送された量子化パラメータから導出される量子化ステップなどを用いてもよい。 The quantization parameter is a parameter that controls the compression ratio and the image quality of the image. The quantization parameter has a characteristic that the larger the value, the lower the image quality and the decrease in the code amount, and the smaller the value, the higher the image quality and the increase in the code amount. As the quantization parameter, for example, a parameter for deriving the quantization width of the predicted residual can be used. The quantization parameter is not limited to the value itself transmitted on the coded data, and a quantization step derived from the quantization parameter transmitted on the coded data may be used.

ピクチャ単位の量子化パラメータとしては、処理対象フレームの代表的な１個の量子化パラメータを入力することができる。例えば、量子化パラメータは、対象ピクチャに適用されるパラメータセットにより指定されることができる。また、量子化パラメータは、ピクチャの構成要素に適用される量子化パラメータに基づいて算出されることができる。具体的には、スライスに適用される量子化パラメータの平均値に基づいて、量子化パラメータを算出することができる。
また、ピクチャを分割した単位の量子化パラメータとしては、所定の基準でピクチャを分割した単位毎の量子化パラメータを入力することができる。例えば、量子化パラメータを、スライス毎に適用することができる。また、量子化パラメータを、スライス内のブロックに適用することができる。また、量子化パラメータを、既存の符号化単位から独立した領域単位（例えば、ピクチャを16x9個に分割して得られる各領域）で指定することができる。この場合、量子化パラメータが、スライス数や変換ユニット数に依存するため、領域に対応する量子化パラメータの値が不定になり、CNNフィルタが構成できないため、領域内の量子化パラメータの平均値を代表値として用いる方法がある。また、領域内の一つの位置の量子化パラメータを代表値として用いる方法もある。また、領域内の複数の位置の量子化パラメータの中央値（median）や最頻値(mode)を代表値として用いる方法もある。 As the quantization parameter for each picture, one typical quantization parameter of the frame to be processed can be input. For example, the quantization parameters can be specified by the parameter set applied to the target picture. Also, the quantization parameters can be calculated based on the quantization parameters applied to the components of the picture. Specifically, the quantization parameter can be calculated based on the average value of the quantization parameter applied to the slice.
Further, as the quantization parameter of the unit in which the picture is divided, the quantization parameter for each unit in which the picture is divided can be input based on a predetermined reference. For example, the quantization parameters can be applied slice by slice. Also, the quantization parameters can be applied to the blocks in the slice. In addition, the quantization parameter can be specified in a region unit independent of the existing coding unit (for example, each region obtained by dividing the picture into 16x9 pieces). In this case, since the quantization parameter depends on the number of slices and the number of conversion units, the value of the quantization parameter corresponding to the region becomes undefined, and the CNN filter cannot be constructed. Therefore, the average value of the quantization parameters in the region is calculated. There is a method used as a representative value. There is also a method of using the quantization parameter of one position in the region as a representative value. There is also a method of using the median or mode of the quantization parameters at a plurality of positions in the region as representative values.

また、特定の個数の量子化パラメータを入力する場合、量子化パラメータの個数が一定になるように、量子化パラメータのリストを生成して、その中から所定の個数の量子化パラメータをCNNフィルタへ入力してもよい。例えば、スライス毎の量子化パラメータのリストを作成し、最大値、最小値、中央値の３個の量子化パラメータのリストを作成して入力する方法が考えられる。 When inputting a specific number of quantization parameters, a list of quantization parameters is generated so that the number of quantization parameters is constant, and a predetermined number of quantization parameters are sent to the CNN filter from the list. You may enter it. For example, a method of creating a list of quantization parameters for each slice and creating and inputting a list of three quantization parameters of maximum value, minimum value, and median value can be considered.

また、コンポーネント単位の量子化パラメータとして、処理対象のコンポーネントに適用する量子化パラメータを入力することができる。この量子化パラメータの例として、輝度量子化パラメータ（QPL）、色差量子化パラメータ（QPC）を挙げることができる。 Further, as the quantization parameter for each component, the quantization parameter applied to the component to be processed can be input. Examples of this quantization parameter include a luminance quantization parameter (QPL) and a color difference quantization parameter (QPC).

また、ブロック単位でCNNフィルタを適用する場合、対象ブロックの量子化パラメータとブロック周辺の量子化パラメータ（QPN）を入力してもよい。 When applying the CNN filter on a block-by-block basis, the quantization parameter of the target block and the quantization parameter (QPN) around the block may be input.

パーティション分割パラメータ（MTT分割パラメータ）は、スライスを構成する符号化ツリーユニット（CTU:Coding Tree Unit）から符号化ユニット（CU:Coding Unit）への分割方式として、４分木（quad tree）分割するQT分割に加えて、２分木（binary tree）分割するBT分割、あるいは、３分木（ternary tree）分割するTT分割を行うパラメータである。各位置の分割デプスを値とする画像であってもよい。パーティション分割では、選択可能なCUのサイズや形状が増加し、画像のテクスチャに合わせた適応的な分割ができる。パーティション分割により、四分木（正方形）分割のみのHEVCに比べ、均質な性質を持つ領域を１つの符号化単位（CU）として選択しやすい。予測パラメータは、参照ピクチャインデックス、動きベクトル等である。 The partitioning parameter (MTT dividing parameter) is divided into quad trees as a division method from the coding tree unit (CTU: Coding Tree Unit) constituting the slice to the coding unit (CU: Coding Unit). In addition to QT division, it is a parameter that performs BT division that divides into a binary tree or TT division that divides into a ternary tree. It may be an image whose value is the division depth of each position. Partitioning increases the size and shape of the CU that can be selected, allowing adaptive partitioning to match the texture of the image. Partitioning makes it easier to select a region with homogeneous properties as one coding unit (CU), compared to HEVC with only quadtree (square) partitioning. The prediction parameters are a reference picture index, a motion vector, and the like.

予測パラメータは、イントラ予測モード、インター予測モードを区別するPredMode、イントラ予測の予測モードや予測方向を示すIntraPredMode、動き予測の方法を示すマージフラグやアフィンフラグ、動きベクトルの精度を示す値、動きベクトルの大きさで分類した値、動きベクトル差分を大きさで分類した値、対象画素と隣接画素の動きベクトル差分であってもよい。 Prediction parameters are intra prediction mode, Pred Mode that distinguishes between inter prediction modes, Intra Pred Mode that indicates the prediction mode and prediction direction of intra prediction, merge flags and affine flags that indicate the method of motion prediction, values that indicate the accuracy of motion vectors, and motion vectors. It may be a value classified by the size of, a value classified by the size of the motion vector difference, or a motion vector difference between the target pixel and the adjacent pixel.

符号化パラメータには、例えば、ピクチャの復号に用いられる量子化幅の基準値や重み付き予測の適用を示すフラグ及びスケーリングリスト（量子化マトリックス）が含まれる。 The coding parameters include, for example, a reference value of the quantization width used for decoding the picture, a flag indicating the application of the weighted prediction, and a scaling list (quantization matrix).

（学習フェーズ）
学習部22は、第１の損失及び第２の損失が小さくなるように、大量の入力画像に対し、CNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。一例として、学習部22は、第１の損失と第２の損失との和である結合Lossが小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。 (Learning phase)
The learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 for a large number of input images so that the first loss and the second loss become small. As an example, the learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the coupling Loss, which is the sum of the first loss and the second loss, becomes small. ..

なお、本例においても、学習部22は、上述の実施形態と同様に、第１の損失及び第２の損失の少なくとも一方に対して重み付けを行った重み付け線形和を用いて、CNNフィルタ21を学習させてもよい。つまり、第１の損失のみに重み付けを行っても、第２の損失のみに重み付けを行っても、第１の損失及び第２の損失の両方に重み付けを行ってもよい。 In this example as well, the learning unit 22 applies the CNN filter 21 by using a weighted linear sum in which at least one of the first loss and the second loss is weighted, as in the above-described embodiment. You may let them learn. That is, only the first loss may be weighted, only the second loss may be weighted, or both the first loss and the second loss may be weighted.

（運用フェーズ）
運用フェーズにおいては、実際に画像処理を施したい所望の画像と、所望の画像に関する符号化パラメータを画素値とする画像とをCNNフィルタ21に入力させる。そして画像処理装置1は、以上のように学習したCNNフィルタを用いて、フィルタ処理を行う。これにより、画像処理装置1は、符号化パラメータに応じて好適に画像処理を施した出力画像データを出力することができる。 (Operation phase)
In the operation phase, the CNN filter 21 is made to input a desired image to be actually subjected to image processing and an image whose pixel value is a coding parameter related to the desired image. Then, the image processing device 1 performs filter processing using the CNN filter learned as described above. As a result, the image processing apparatus 1 can output output image data that has been appropriately subjected to image processing according to the coding parameters.

〔実施形態2〕
本発明の他の実施形態について、以下に説明する。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。 [Embodiment 2]
Other embodiments of the present invention will be described below. For convenience of explanation, the same reference numerals will be added to the members having the same functions as the members described in the above embodiment, and the description will not be repeated.

本実施形態に係る画像処理装置1の構成は、実施形態1と同様である。 The configuration of the image processing device 1 according to the present embodiment is the same as that of the first embodiment.

図4は、本実施形態に係るCNNフィルタ21の入出力の一例を示す概念図である。 FIG. 4 is a conceptual diagram showing an example of input / output of the CNN filter 21 according to the present embodiment.

図4に示すように、CNNフィルタ21には、第１の入力画像データA1に加えて、画像処理の度合い（例えば画像の精細さ）を示す処理強度パラメータを画素値とする第３の入力画像データA3がさらに入力される。一例として、CNNフィルタ21には、第１の入力画像データA1と、第３の入力画像データA3とを結合して入力してもよい。処理強度パラメータは、例えば、画像の精細さを示す画質パラメータである。 As shown in FIG. 4, in the CNN filter 21, in addition to the first input image data A1, a third input image having a processing intensity parameter indicating the degree of image processing (for example, image fineness) as a pixel value. Further data A3 is entered. As an example, the first input image data A1 and the third input image data A3 may be combined and input to the CNN filter 21. The processing intensity parameter is, for example, an image quality parameter indicating the fineness of an image.

（画像処理の説明）
上記処理強度パラメータは、シャープネス、精細化、トーン、ボケ、ノイズ除去量、及びタッチの少なくともの１つの処理で用いるパラメータである。画像の精細さを示す処理強度パラメータは、第１の入力画像データA1に対して１つのパラメータであってもよいし、画素数（解像度）と同じ数だけのパラメータであってもよい。この場合、画素単位で各パラメータの調整が可能である。 (Explanation of image processing)
The processing intensity parameter is a parameter used in at least one processing of sharpness, fineness, tone, blur, noise removal amount, and touch. The processing intensity parameter indicating the fineness of the image may be one parameter for the first input image data A1 or may be as many parameters as the number of pixels (resolution). In this case, each parameter can be adjusted on a pixel-by-pixel basis.

（学習フェーズ）
学習部22は、第１の損失が小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。一例として、学習部22は、第１の損失が小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。 (Learning phase)
The learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the first loss becomes small. As an example, the learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the first loss becomes small.

これにより、CNNフィルタ21は、第１の教師画像データC1に類似した出力画像データBを出力することができるようになる。 As a result, the CNN filter 21 can output output image data B similar to the first teacher image data C1.

（運用フェーズ）
運用フェーズにおいては、実際に画像処理を施したい所望の画像と、画像処理の度合いを示す処理強度パラメータを画素値とする画像とをCNNフィルタ21に入力させる。そして画像処理装置1は、以上のように学習したCNNフィルタを用いて、フィルタ処理を行う。これにより、画像処理装置1は、処理強度パラメータに応じて好適に画像処理を施した出力画像データを出力することができる。 (Operation phase)
In the operation phase, the CNN filter 21 is made to input a desired image to be actually subjected to image processing and an image having a processing intensity parameter indicating the degree of image processing as a pixel value. Then, the image processing device 1 performs filter processing using the CNN filter learned as described above. As a result, the image processing apparatus 1 can output output image data that has been appropriately subjected to image processing according to the processing intensity parameter.

一例として、入出力インターフェース10を介して、処理強度パラメータに関するユーザ入力を受け付け、ユーザ入力に応じた処理強度パラメータを用いた第３の入力画像をCNNフィルタ21に入力する。これにより、ユーザが所望する画像処理が施された出力画像データを出力することができる。 As an example, the user input regarding the processing intensity parameter is received via the input / output interface 10, and the third input image using the processing intensity parameter corresponding to the user input is input to the CNN filter 21. As a result, it is possible to output the output image data to which the image processing desired by the user has been performed.

（実施形態2の変形例）
実施形態2の変形例を以下に説明する。画像処理装置1の構成については実施形態2と同様である。学習フェーズ及び運用フェーズについては、実施形態2と異なる点について以下に説明するが、それ以外は実施形態2と同様である。 (Modified Example of Embodiment 2)
A modified example of the second embodiment will be described below. The configuration of the image processing device 1 is the same as that of the second embodiment. The learning phase and the operation phase are the same as those in the second embodiment except that the differences from the second embodiment will be described below.

図5は、本変形例に係るCNNフィルタ21の入出力の一例を示す概念図である。 FIG. 5 is a conceptual diagram showing an example of input / output of the CNN filter 21 according to this modification.

図5に示すように、CNNフィルタ21には、第１の入力画像データA1及び第３の入力画像データA3に加えて、第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2がさらに入力されてもよい。一例として、CNNフィルタ21には、第１の入力画像データA1と、第２の入力画像データA2と、第３の入力画像データA3とを結合して入力してもよい。また、前の実施形態で説明したように、中間層の特徴マップに第２の入力画像データA2と、第３の入力画像データA3とを結合して入力してもよい。また、入力層と中間層、もしくは、複数の中間層に結合して入力してもよい。 As shown in FIG. 5, in the CNN filter 21, in addition to the first input image data A1 and the third input image data A3, the second input image data A1 has a second coding parameter as a pixel value. Input image data A2 may be further input. As an example, the first input image data A1, the second input image data A2, and the third input image data A3 may be combined and input to the CNN filter 21. Further, as described in the previous embodiment, the second input image data A2 and the third input image data A3 may be combined and input to the feature map of the intermediate layer. Further, the input layer and the intermediate layer, or a plurality of intermediate layers may be combined for input.

CNNフィルタ21に第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2と、処理強度パラメータを画素値とする第３の入力画像データA3とが入力される。これにより、符号化パラメータと処理強度パラメータとに応じた出力画像データBを生成することができる。 The second input image data A2 having the coding parameter related to the first input image data A1 as the pixel value and the third input image data A3 having the processing intensity parameter as the pixel value are input to the CNN filter 21. As a result, the output image data B can be generated according to the coding parameter and the processing intensity parameter.

符号化パラメータとは、量子化パラメータ、パーティション分割パラメータ、予測パラメータやこの予測パラメータに関連して生成される符号化の対象となるパラメータである。符号化パラメータの詳細については、実施形態1に上述した通りである。 The coding parameter is a quantization parameter, a partitioning parameter, a prediction parameter, or a parameter to be coded generated in connection with the prediction parameter. The details of the coding parameters are as described above in the first embodiment.

これにより、CNNフィルタ21は、第１の教師画像データC1に類似した出力データBを出力することができるようになる。 As a result, the CNN filter 21 can output output data B similar to the first teacher image data C1.

（運用フェーズ）
運用フェーズにおいては、実際に画像処理を施したい所望の画像と、所望の画像に関する符号化パラメータを画素値とする画像と、処理強度パラメータを画素値とする画像とをCNNフィルタ21に入力させる。そして画像処理装置1は、以上のように学習したCNNフィルタを用いて、フィルタ処理を行う。これにより、画像処理装置1は、処理強度パラメータと、符号化パラメータとに応じて好適に画像処理を施した出力画像データを出力することができる。 (Operation phase)
In the operation phase, the CNN filter 21 is made to input a desired image to be actually subjected to image processing, an image having a coding parameter related to the desired image as a pixel value, and an image having a processing intensity parameter as a pixel value. Then, the image processing device 1 performs filter processing using the CNN filter learned as described above. As a result, the image processing apparatus 1 can output output image data that has been appropriately subjected to image processing according to the processing intensity parameter and the coding parameter.

一例として、入出力インターフェース10を介して、処理強度パラメータに関するユーザ入力を受け付け、ユーザ入力に応じた処理強度パラメータを用いた第３の入力画像をCNNフィルタ21に入力する。これにより、ユーザが所望する画像処理であって、符号化パラメータに応じた画像処理が施された出力画像データを出力することができる。 As an example, the user input regarding the processing intensity parameter is received via the input / output interface 10, and the third input image using the processing intensity parameter corresponding to the user input is input to the CNN filter 21. As a result, it is possible to output the output image data which is the image processing desired by the user and has been subjected to the image processing according to the coding parameter.

〔実施形態3〕
本発明の他の実施形態について、以下に説明する。なお、説明の便宜上、上記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を繰り返さない。 [Embodiment 3]
Other embodiments of the present invention will be described below. For convenience of explanation, the same reference numerals will be added to the members having the same functions as the members described in the above embodiment, and the description will not be repeated.

図6は、本実施形態に係るCNNフィルタ21の入出力の一例を示す概念図である。 FIG. 6 is a conceptual diagram showing an example of input / output of the CNN filter 21 according to the present embodiment.

図6に示すように、CNNフィルタ21には、第１の入力画像データA1に加えて、第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2及び処理強度パラメータを画素値とする第３の入力画像データA3がさらに入力される。一例として、CNNフィルタ21には、第１の入力画像データA1と、第２の入力画像データA2と、第３の入力画像データA3とを結合して入力してもよい。また、前の実施形態で説明したように、中間層の特徴マップに第２の入力画像データA2と、第３の入力画像データA3とを結合して入力してもよい。また、入力層と中間層、もしくは、複数の中間層に結合して入力してもよい。 As shown in FIG. 6, in the CNN filter 21, in addition to the first input image data A1, a second input image data A2 having a coding parameter related to the first input image data A1 as a pixel value and processing intensity The third input image data A3 having the parameter as the pixel value is further input. As an example, the first input image data A1, the second input image data A2, and the third input image data A3 may be combined and input to the CNN filter 21. Further, as described in the previous embodiment, the second input image data A2 and the third input image data A3 may be combined and input to the feature map of the intermediate layer. Further, the input layer and the intermediate layer, or a plurality of intermediate layers may be combined for input.

また、CNNフィルタ21には、第１の入力画像データA1及び第２の入力画像データA2に加えて、重み付け線形和に対応する重み付けを行った、処理強度パラメータを画素値とする第３の入力画像データA3がさらに入力されてもよい。 Further, in the CNN filter 21, in addition to the first input image data A1 and the second input image data A2, a third input in which the processing intensity parameter is a pixel value is weighted corresponding to the weighted linear sum. Image data A3 may be further input.

CNNフィルタ21には、第１の入力画像データA1に関する符号化パラメータを画素値とする第２の入力画像データA2と、処理強度パラメータを画素値とする第３の入力画像データA3とが入力される。これにより、符号化パラメータと処理強度パラメータとに応じた出力画像データBを生成することができる。 The CNN filter 21 is input with the second input image data A2 having the coding parameter related to the first input image data A1 as the pixel value and the third input image data A3 having the processing intensity parameter as the pixel value. To. As a result, the output image data B can be generated according to the coding parameter and the processing intensity parameter.

（学習フェーズ）
学習部22は、第１の損失及び第２の損失が小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。一例として、学習部22は、第１の損失と第２の損失との和の結合Lossが小さくなるようにCNNフィルタ21における各種のNNパラメータを順次更新することにより、CNNフィルタ21を学習させる。 (Learning phase)
The learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the first loss and the second loss become smaller. As an example, the learning unit 22 trains the CNN filter 21 by sequentially updating various NN parameters in the CNN filter 21 so that the combined Loss of the sum of the first loss and the second loss becomes smaller.

なお、本実施例においても、学習部22は実施形態1と同様に、第１の損失及び第２の損失の少なくとも一方に対して重み付けを行った重み付け線形和を用いて、CNNフィルタ21を学習させてもよい。つまり、第１の損失のみに重み付けを行っても、第２の損失のみに重み付けを行っても、第１の損失及び第２の損失の両方に重み付けを行ってもよい。 In this embodiment as well, the learning unit 22 learns the CNN filter 21 by using a weighted linear sum in which at least one of the first loss and the second loss is weighted, as in the first embodiment. You may let me. That is, only the first loss may be weighted, only the second loss may be weighted, or both the first loss and the second loss may be weighted.

Loss=w1*Loss1+w2*Loss2
第１の損失の影響を第２の損失の影響よりも大きくする場合、wの値を１未満とすればよい。また、第２の損失の影響を第１の損失の影響よりも大きくする場合、wの値を１よりも大きくすればよい。 Loss = w1 * Loss1 + w2 * Loss2
If the effect of the first loss is greater than the effect of the second loss, the value of w may be less than 1. Further, when the influence of the second loss is made larger than the influence of the first loss, the value of w may be made larger than 1.

また、学習部22は、処理強度パラメータのコストの重みとwとを連動させるようにCNNフィルタを学習させてもよい。 Further, the learning unit 22 may train the CNN filter so as to link the cost weight of the processing intensity parameter with w.

処理強度パラメータのコストの重みの値が小さい場合、wの値を小さくすればよい。また、処理強度パラメータのコストの重みの値が大きい場合、wの値を大きくすればよい。 If the cost weight value of the processing intensity parameter is small, the value of w should be small. Further, when the value of the cost weight of the processing intensity parameter is large, the value of w may be increased.

具体的な一例として、処理強度パラメータのコストの重みが0.1のとき、wの値を0.25とする。処理強度パラメータのコストの重みが0.2のとき、wの値を0.5とする。処理強度パラメータのコストの重みが0.3のとき、wの値を0.75とする。処理強度パラメータのコストの重みが0.4のとき、wの値を1とする。なお、処理強度パラメータのコストの重みの値及びwの値は本実施形態を限定するものではない。 As a specific example, when the cost weight of the processing intensity parameter is 0.1, the value of w is 0.25. When the cost weight of the processing intensity parameter is 0.2, the value of w is 0.5. When the cost weight of the processing intensity parameter is 0.3, the value of w is 0.75. When the cost weight of the processing intensity parameter is 0.4, the value of w is 1. The cost weight value and the w value of the processing strength parameter do not limit the present embodiment.

一例として、入出力インターフェース10を介して、処理強度パラメータに関するユーザ入力を受け付け、ユーザ入力に応じた処理強度パラメータを用いた第３の入力画像をCNNフィルタ21に入力する。これにより、ユーザが所望する画像処理であって、符号化パラメータに応じた、より好適な画像処理が施された出力画像データを出力することができる。 As an example, the user input regarding the processing intensity parameter is received via the input / output interface 10, and the third input image using the processing intensity parameter corresponding to the user input is input to the CNN filter 21. As a result, it is possible to output the output image data which is the image processing desired by the user and has been subjected to more suitable image processing according to the coding parameter.

（ハードウェア的実現及びソフトウェア的実現）
また、上述した画像処理装置1の各ブロックは、集積回路（ICチップ）上に形成された論理回路によってハードウェア的に実現してもよいし、CPU（Central Processing Unit）を用いてソフトウェア的に実現してもよい。 (Hardware realization and software realization)
Further, each block of the image processing device 1 described above may be realized by hardware by a logic circuit formed on an integrated circuit (IC chip), or by software by using a CPU (Central Processing Unit). It may be realized.

後者の場合、上記各装置は、各機能を実現するプログラムの命令を実行するCPU、上記プログラムを格納したROM（Read Only Memory）、上記プログラムを展開するRAM（Random Access Memory）、上記プログラム及び各種データを格納するメモリ等の記憶装置（記録媒体）等を備えている。そして、本発明の実施形態の目的は、上述した機能を実現するソフトウェアである上記各装置の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、上記各装置に供給し、そのコンピュータ（又はCPUやMPU）が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 In the latter case, each of the above devices includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Only Memory) that stores the above program, a RAM (Random Access Memory) that expands the above program, the above program, and various types. It is equipped with a storage device (recording medium) such as a memory for storing data. Then, an object of the embodiment of the present invention is a record in which the program code (execution format program, intermediate code program, source program) of the control program of each of the above devices, which is software for realizing the above-mentioned functions, is recorded readable by a computer. It can also be achieved by supplying the medium to each of the above devices and having the computer (or CPU or MPU) read and execute the program code recorded on the recording medium.

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ類、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやCD-ROM（Compact Disc Read-Only Memory）／MOディスク（Magneto-Optical disc）／MD（Mini Disc）／DVD（Digital Versatile Disc:登録商標）／CD-R（CD Recordable）／ブルーレイディスク（Blu-ray Disc：登録商標）等の光ディスクを含むディスク類、ICカード（メモリカードを含む）／光カード等のカード類、マスクROM／EPROM（Erasable Programmable Read-Only Memory）／EEPROM（Electrically Erasable and Programmable Read-Only Memory：登録商標）／フラッシュROM等の半導体メモリ類、あるいはPLD（Programmable logic device）やFPGA（Field Programmable Gate Array）等の論理回路類等を用いることができる。 Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic discs such as floppy (registered trademark) discs / hard disks, and CD-ROMs (Compact Disc Read-Only Memory) / MO discs (Magneto-Optical discs). ) / MD (Mini Disc) / DVD (Digital Versatile Disc: registered trademark) / CD-R (CD Recordable) / Blu-ray Disc (registered trademark) and other discs including optical discs, IC cards (memory cards) (Including) / Cards such as optical cards, mask ROM / EPROM (Erasable Programmable Read-Only Memory) / EEPROM (Electrically Erasable and Programmable Read-Only Memory: registered trademark) / Semiconductor memories such as flash ROM, or PLD ( Logic circuits such as Programmable logic device) and FPGA (Field Programmable Gate Array) can be used.

また、上記各装置を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークは、プログラムコードを伝送可能であればよく、特に限定されない。例えば、インターネット、イントラネット、エキストラネット、LAN（Local Area Network）、ISDN（Integrated Services Digital Network）、VAN（Value-Added Network）、CATV（Community Antenna television/Cable Television）通信網、仮想専用網（Virtual Private Network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、この通信ネットワークを構成する伝送媒体も、プログラムコードを伝送可能な媒体であればよく、特定の構成又は種類のものに限定されない。例えば、IEEE（Institute of Electrical and Electronic Engineers）1394、USB、電力線搬送、ケーブルＴＶ回線、電話線、ADSL（Asymmetric Digital Subscriber Line）回線等の有線でも、IrDA（Infrared Data Association）やリモコンのような赤外線、BlueTooth（登録商標）、IEEE802.11無線、HDR（High Data Rate）、NFC（Near Field Communication）、DLNA（Digital Living Network Alliance：登録商標）、携帯電話網、衛星回線、地上デジタル放送網等の無線でも利用可能である。なお、本発明の実施形態は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, each of the above devices may be configured to be connectable to a communication network, and the above program code may be supplied via the communication network. This communication network is not particularly limited as long as it can transmit the program code. For example, Internet, Intranet, Extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Antenna television / Cable Television) communication network, Virtual Private network (Virtual Private) Network), telephone line network, mobile communication network, satellite communication network, etc. can be used. Further, the transmission medium constituting this communication network may be any medium as long as it can transmit the program code, and is not limited to a specific configuration or type. For example, even wired such as IEEE (Institute of Electrical and Electronic Engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line) line, infrared data such as IrDA (Infrared Data Association) and remote control , BlueTooth (registered trademark), IEEE802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance: registered trademark), mobile phone network, satellite line, terrestrial digital broadcasting network, etc. It is also available wirelessly. The embodiment of the present invention can also be realized in the form of a computer data signal embedded in a carrier wave, in which the program code is embodied by electronic transmission.

本発明の実施形態は上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。すなわち、請求項に示した範囲で適宜変更した技術的手段を組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made within the scope of the claims. That is, an embodiment obtained by combining technical means appropriately modified within the scope of the claims is also included in the technical scope of the present invention.

1 画像処理装置
10 入出力インターフェース
20 制御部
21 CNNフィルタ（ニューラルネットワーク）
22 学習部
30 メモリ 1 Image processing device
10 I / O interface
20 Control unit
21 CNN filter (neural network)
22 Learning Department
30 memory

Claims

A neural network in which the first input image data is input and the output image data is output,
The learning unit that trains the above neural network and
With
The above learning department
The first loss using the output image data and the first teacher image data corresponding to the output image data, and the second teacher image corresponding to the output image data and the first teacher image data. The feature is that the neural network is trained by using the second loss using the second teacher image data obtained by performing image processing on the first teacher image data. Image processing device.

The first aspect of claim 1, wherein the learning unit trains the neural network by using a weighted linear sum in which at least one of the first loss and the second loss is weighted. Image processing device.

The image processing apparatus according to claim 1 or 2, wherein the image processing is at least one processing of sharpness, fineness, tone, blurring, noise removal amount, and touch.

The invention according to any one of claims 1 to 3, wherein the second input image data having the coding parameter related to the first input image data as a pixel value is further input to the neural network. Image processing equipment.

The invention according to any one of claims 1 to 4, wherein a third input image data having a processing intensity parameter indicating the degree of image processing as a pixel value is further input to the neural network. Image processing device.

The learning unit trains the neural network by using a weighted linear sum in which at least one of the first loss and the second loss is weighted.
A third input image data having a processing intensity parameter indicating the degree of image processing, which is weighted corresponding to the weighted linear sum, is further input to the neural network. Item 6. The image processing apparatus according to any one of Items 1 to 4.

It is characterized by including a neural network in which a first input image data and a third input image data having a processing intensity parameter indicating the degree of image processing as a pixel value are input and the first output image data is output. Image processing device.

The image processing apparatus according to claim 7, wherein a second input image data having a coding parameter related to the first input image data as a pixel value is further input to the neural network.

The image processing apparatus according to claim 7 or 8, wherein the image processing is at least one processing of sharpness, fineness, tone, blurring, noise removal amount, and touch.