JP7122155B2

JP7122155B2 - Image super-resolution device and its program, and parameter learning device and its program

Info

Publication number: JP7122155B2
Application number: JP2018097195A
Authority: JP
Inventors: 俊枝三須; 敦郎市ヶ谷
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2022-08-19
Anticipated expiration: 2038-05-21
Also published as: JP2019204167A

Description

本発明は、畳み込みニューラルネットワークにより画像を高解像度化させる画像超解像装置およびそのプログラム、ならびに、画像超解像装置に用いる畳み込みニューラルネットワークのパラメータを学習するパラメータ学習装置およびそのプログラムに関する。 The present invention relates to an image super-resolution apparatus and program for increasing the resolution of an image using a convolutional neural network, and a parameter learning apparatus and program for learning parameters of the convolutional neural network used in the image super-resolution apparatus.

従来、画像の解像度を向上させる手法として、入力画像に対して、ウェーブレット変換等の直交変換を行った後、高解像度化した画像の空間高周波スペクトルを推定し、入力画像と逆直交変換を行うことで、入力画像を高解像度化する手法が開示されている（特許文献１参照）。 Conventionally, as a method to improve the resolution of an image, after performing orthogonal transformation such as wavelet transformation on the input image, the spatial high-frequency spectrum of the high-resolution image is estimated, and inverse orthogonal transformation is performed with the input image. discloses a technique for increasing the resolution of an input image (see Patent Document 1).

この手法は、入力画像と高解像度化後の画像との間には自己相似性が存在すると仮定して、入力画像を直交変換した空間高周波スペクトルから、高解像度化後の帯域の空間高周波スペクトルを補間生成する。
そして、この手法は、入力画像が予め原画像を低解像度化した画像であって、原画像をオクターブ分解した帯域別のスペクトルパワー代表値を既知の情報として外部から入力する。あるいは、この手法は、自己相似性を前提として、入力画像をオクターブ分解した帯域別のスペクトルパワー代表値を、そのまま、水平・垂直方向に２倍した帯域のスペクトルパワー代表値とする。
そして、この手法は、入力画像のスペクトルと空間高周波スペクトルとを、外部から入力したスペクトルパワー代表値、あるいは、帯域別に入力画像から求めたスペクトルパワー代表値となるように補正する。
そして、この手法は、補正した入力画像のスペクトルと空間高周波スペクトルとに対して、逆直交変換を行うことで、高解像度化した画像を生成する。 This method assumes that there is self-similarity between the input image and the image after resolution enhancement. Generate interpolation.
In this method, the input image is an image obtained by reducing the resolution of the original image in advance, and the spectrum power representative value for each band obtained by octave-decomposing the original image is input from the outside as known information. Alternatively, in this method, on the premise of self-similarity, the spectral power representative value for each band obtained by octave-decomposing the input image is directly doubled in the horizontal and vertical directions to obtain the spectral power representative value for the band.
In this method, the spectrum of the input image and the spatial high-frequency spectrum are corrected to the spectral power representative value input from the outside or the spectral power representative value obtained from the input image for each band.
Then, in this method, an inverse orthogonal transform is performed on the corrected spectrum of the input image and the spatial high-frequency spectrum to generate a high-resolution image.

特開２０１２－５９１３８号公報JP 2012-59138 A

前記した特許文献１に記載された従来の手法は、空間高周波スペクトルを推定するために、入力画像を生成した原画像をオクターブ分解した帯域別のスペクトルパワー代表値を既知の情報として用い、空間高周波スペクトルを推定する。
しかし、このような原画像に対する情報は、必ずしも得られるわけではない。そのため、この手法では、元となる原画像がない画像からは、高解像度の画像を生成することができないという問題がある。 In the conventional method described in Patent Document 1, in order to estimate the spatial high-frequency spectrum, the spectral power representative value for each band obtained by octave decomposition of the original image that generated the input image is used as known information, and the spatial high-frequency spectrum Estimate the spectrum.
However, such information on the original image is not necessarily obtained. Therefore, this method has a problem that a high-resolution image cannot be generated from an image that does not have an original image.

また、従来の手法は、別の手法として、自己相似性を前提として、入力画像をオクターブ分解した帯域別のスペクトルパワー代表値を用いて、空間高周波スペクトルを推定する。
しかし、この場合、従来の手法は、スペクトルパワーの調整のみでしか、空間高周波スペクトルを推定することができない。このように、スペクトルパワー代表値を用いたスペクトルパワーの調整のみでは、細かい空間周波数単位でのスペクトルの調整には限界がある。そのため、従来の手法に対して、さらなる高画質化の要望があった。 As another method, the conventional method estimates a spatial high-frequency spectrum using spectral power representative values for each band obtained by octave-decomposing an input image on the premise of self-similarity.
However, in this case, the conventional method can estimate the spatial high-frequency spectrum only by adjusting the spectral power. Thus, there is a limit to adjusting the spectrum in fine spatial frequency units only by adjusting the spectral power using the spectral power representative value. Therefore, there has been a demand for higher image quality than the conventional method.

本発明は、このような問題や要望に鑑みてなされたものであり、学習済みの畳み込みニューラルネットワークを用いて、元となる原画像がなくても高画質な高解像度画像を生成することが可能な画像超解像装置およびそのプログラム、ならびに、その畳み込みニューラルネットワークのパラメータを学習するパラメータ学習装置およびそのプログラムを提供することを課題とする。 The present invention has been made in view of such problems and demands, and uses a trained convolutional neural network to generate a high-quality, high-resolution image without the original image. It is an object of the present invention to provide an image super-resolution device and its program, and a parameter learning device and its program for learning the parameters of the convolutional neural network.

前記課題を解決するため、本発明に係る画像超解像装置は、画像をウェーブレット分解した水平方向および垂直方向の両方が低域である低域成分から、水平方向および垂直方向のいずれか一方または両方が高域である高域成分を推定する予めパラメータが学習された畳み込みニューラルネットワークを用いて、入力画像の解像度を向上させる画像超解像装置であって、ブロック切り出し手段と、畳み込みニューラルネットワーク手段と、ウェーブレット再構成手段と、ブロック配置手段と、を備える構成とした。 In order to solve the above-mentioned problems, the image super-resolution device according to the present invention is a low-frequency component that is low-frequency in both the horizontal direction and the vertical direction obtained by wavelet decomposition of an image. An image super-resolution device for improving the resolution of an input image by using a convolutional neural network with pre-learned parameters for estimating high-frequency components both of which are high-frequency components, comprising block segmentation means and convolutional neural network means. , wavelet reconstruction means, and block arrangement means.

かかる構成において、画像超解像装置は、ブロック切り出し手段によって、高解像度化の対象となる入力画像から予め定めたサイズのブロックを順次切り出す。
そして、画像超解像装置は、畳み込みニューラルネットワーク手段によって、切り出したブロックを低域成分として、畳み込みニューラルネットワークを用いて当該ブロックに対応する高域成分を推定する。 In such a configuration, the image super-resolution device sequentially cuts out blocks of a predetermined size from the input image to be increased in resolution by the block cutout means.
Then, the image super-resolution apparatus uses the convolutional neural network means to estimate the high-frequency component corresponding to the block using the convolutional neural network, with the extracted block as the low-frequency component.

そして、画像超解像装置は、ウェーブレット再構成手段によって、低域成分であるブロックと畳み込みニューラルネットワーク手段で推定された高域成分とをウェーブレット再構成し、ブロックを超解像した超解像ブロックを生成する。これによって、ブロックの水平方向および垂直方向に２倍の解像度の画像（超解像ブロック）が生成されることになる。 Then, the image super-resolution device uses wavelet reconstruction means to perform wavelet reconstruction of the block that is the low-frequency component and the high-frequency component estimated by the convolutional neural network means, and super-resolves the block. to generate This will produce a double resolution image (super-resolution block) in the horizontal and vertical direction of the block.

そして、画像超解像装置は、ブロック配置手段によって、ブロックを切り出した位置に応じて超解像ブロックを再配置する。これによって、画像超解像装置は、超解像ブロックが画像全体に配置された高解像度画像（超解像画像）を生成する。
なお、画像超解像装置は、コンピュータを、前記した各手段として機能させるための画像超解像プログラムで動作させることができる。 Then, the image super-resolution device rearranges the super-resolution blocks according to the positions where the blocks are cut out by the block arrangement means. As a result, the image super-resolution device generates a high-resolution image (super-resolution image) in which super-resolution blocks are arranged over the entire image.
Note that the image super-resolution apparatus can be operated with an image super-resolution program for causing a computer to function as each means described above.

また、前記課題を解決するため、本発明に係るパラメータ学習装置は、画像超解像装置で用いる畳み込みニューラルネットワークのパラメータを学習するパラメータ学習装置であって、ブロック切り出し手段と、ウェーブレット分解手段と、学習用畳み込みニューラルネットワーク手段と、誤差演算手段と、を備える構成とした。 Further, in order to solve the above problems, a parameter learning device according to the present invention is a parameter learning device for learning parameters of a convolutional neural network used in an image super-resolution device, comprising: block extraction means; wavelet decomposition means; The configuration includes learning convolutional neural network means and error calculation means.

かかる構成において、パラメータ学習装置は、ブロック切り出し手段によって、入力画像から、畳み込みニューラルネットワークの入力となる画像の水平方向および垂直方向に２倍の解像度のブロックを順次切り出す。
そして、パラメータ学習装置は、ウェーブレット分解手段によって、ブロックをウェーブレット分解した水平方向および垂直方向の両方が低域である低域成分と、水平方向および垂直方向のいずれか一方または両方が高域である高域成分とを生成する。 In such a configuration, the parameter learning device sequentially cuts out from the input image blocks having twice the resolution in the horizontal direction and the vertical direction of the image to be input to the convolutional neural network by the block cutting means.
Then, the parameter learning device uses the wavelet decomposition means to wavelet decompose the block into a low-frequency component that is low-frequency in both the horizontal direction and the vertical direction, and a high-frequency component in either or both of the horizontal direction and the vertical direction. generates high-frequency components.

そして、パラメータ学習装置は、学習用畳み込みニューラルネットワーク手段によって、ウェーブレット分解手段で生成された低域成分を入力し、畳み込みニューラルネットワークにおいて順方向に伝播させることで高域成分を推定する。
さらに、パラメータ学習装置は、誤差演算手段によって、ウェーブレット分解手段で生成された高域成分と、学習用畳み込みニューラルネットワーク手段で推定された高域成分との誤差を演算する。 Then, the parameter learning device inputs the low-frequency component generated by the wavelet decomposition means by the learning convolutional neural network means, and propagates it forward in the convolutional neural network to estimate the high-frequency component.
Further, the parameter learning device uses the error computing means to compute the error between the high frequency component generated by the wavelet decomposition means and the high frequency component estimated by the learning convolutional neural network means.

そして、パラメータ学習装置は、学習用畳み込みニューラルネットワーク手段によって、誤差演算手段で演算された誤差を誤差逆伝播法により、畳み込みニューラルネットワークにおいて逆方向に伝播させることで、畳み込みニューラルネットワークの結合重み係数を学習する。
これによって、パラメータ学習装置は、画像超解像装置が用いる畳み込みニューラルネットワークのパラメータである結合重み係数を学習する。
なお、パラメータ学習装置は、コンピュータを、前記した各手段として機能させるためのパラメータ学習プログラムで動作させることができる。 Then, the parameter learning device uses the convolutional neural network means for learning to propagate the error calculated by the error calculation means in the reverse direction in the convolutional neural network by the error backpropagation method, thereby setting the connection weight coefficient of the convolutional neural network. learn.
Thereby, the parameter learning device learns connection weight coefficients, which are parameters of the convolutional neural network used by the image super-resolution device.
The parameter learning device can operate a computer with a parameter learning program for functioning as each of the means described above.

本発明は、以下に示す優れた効果を奏するものである。
本発明にかかる画像超解像装置によれば、畳み込みニューラルネットワークを用いて、入力画像に対して高域成分を合成することで、超解像画像を生成することができる。この高域成分は、種々の波形の高域成分を学習したものである。そのため、本発明は、従来のような高域成分のパワー調整のみではないため、高画質な超解像画像を生成することができる。
本発明にかかるパラメータ学習装置によれば、学習用の画像を用いて、画像超解像装置が用いる畳み込みニューラルネットワークのパラメータを学習することができる。そのため、本発明は、画像超解像装置が対象とする画像に応じて、学習用の画像を変えることができ、画像超解像装置が用いる畳み込みニューラルネットワークを最適化することができる。 ADVANTAGE OF THE INVENTION This invention has the outstanding effect shown below.
According to the image super-resolution apparatus of the present invention, a super-resolution image can be generated by synthesizing high-frequency components with respect to an input image using a convolutional neural network. This high-frequency component is obtained by learning high-frequency components of various waveforms. Therefore, the present invention can generate a high-quality super-resolution image because it does not only adjust the power of high-frequency components as in the conventional art.
According to the parameter learning device of the present invention, learning images can be used to learn the parameters of the convolutional neural network used by the image super-resolution device. Therefore, according to the present invention, the image for learning can be changed according to the image targeted by the image super-resolution device, and the convolutional neural network used by the image super-resolution device can be optimized.

本発明の概要を説明するための概要図であって、（ａ）は画像超解像装置の処理概要を示す図、（ｂ）はパラメータ学習装置の処理概要を示す図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic diagram for explaining the outline of the present invention, in which (a) is a diagram showing an outline of processing by an image super-resolution device, and (b) is a diagram showing an outline of processing by a parameter learning device; 本発明の実施形態に係る画像超解像装置の構成を示すブロック構成図である。1 is a block configuration diagram showing the configuration of an image super-resolution device according to an embodiment of the present invention; FIG. 本発明の実施形態に係る画像超解像装置のカラー画像を対象としたウェーブレット再構成手段の構成を示すブロック構成図である。2 is a block configuration diagram showing the configuration of a wavelet reconstruction means for color images of the image super-resolution device according to the embodiment of the present invention; FIG. 本発明の実施形態に係る画像超解像装置の動作を示すフローチャートである。4 is a flow chart showing the operation of the image super-resolution device according to the embodiment of the present invention; 本発明の実施形態に係る画像超解像装置の具体例（その１）を示すブロック構成図である。1 is a block configuration diagram showing a specific example (part 1) of an image super-resolution device according to an embodiment of the present invention; FIG. 本発明の実施形態に係る画像超解像装置の具体例（その２）を示すブロック構成図である。2 is a block configuration diagram showing a specific example (No. 2) of an image super-resolution device according to an embodiment of the present invention; FIG. 本発明の実施形態に係るパラメータ学習装置の構成を示すブロック構成図である。1 is a block configuration diagram showing the configuration of a parameter learning device according to an embodiment of the present invention; FIG. 本発明の実施形態に係るパラメータ学習装置のカラー画像を対象としたウェーブレット分解手段の構成を示すブロック構成図である。3 is a block configuration diagram showing the configuration of a wavelet decomposition means for color images of the parameter learning device according to the embodiment of the present invention; FIG. 本発明の実施形態に係るパラメータ学習装置の動作を示すフローチャートである。4 is a flow chart showing the operation of the parameter learning device according to the embodiment of the present invention;

以下、本発明の実施形態について図面を参照して説明する。
＜発明の概要＞
まず、図１を参照して、本発明の概要について説明する。図１（ａ）は、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を用いた本発明の画像超解像装置１（図２）の処理概要を示す図である。図１（ｂ）は、本発明の画像超解像装置１（図２）で用いるＣＮＮのパラメータを学習するパラメータ学習装置（図７）の処理概要を示す図である。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Overview of the invention>
First, an outline of the present invention will be described with reference to FIG. FIG. 1(a) is a diagram showing an overview of the processing of the image super-resolution device 1 (FIG. 2) of the present invention using a convolutional neural network (CNN). FIG. 1(b) is a diagram showing an overview of the processing of the parameter learning device (FIG. 7) for learning CNN parameters used in the image super-resolution device 1 (FIG. 2) of the present invention.

画像超解像装置１（図２）は、画像Ｌ（低解像画像）を、水平方向および垂直方向に２倍した画像Ｈ（超解像画像）に高解像度化するものである。
図１（ａ）に示すように、画像超解像装置１は、画像ＬのブロックＢ（例えば、８×８画素）を順次切り出し、当該ブロックＢを、２次元ウェーブレット分解における水平、垂直ともに低域成分であるＬＬ画像（ＬＬ_１）とする。また、画像超解像装置１は、ＬＬ画像（ＬＬ_１）から、予めパラメータＰａを学習したＣＮＮによって、高域３成分として、ＬＬ画像（ＬＬ_１）に対応する水平が高域成分、垂直が低域成分であるＨＬ画像（ＨＬ_１＾）と、水平が低域成分、垂直が高域成分であるＬＨ画像（ＬＨ_１＾）と、水平、垂直ともに高域成分であるＨＨ画像（ＨＨ_１＾）とを推定する。 The image super-resolution device 1 (FIG. 2) increases the resolution of an image L (low-resolution image) to an image H (super-resolution image) that is doubled in the horizontal and vertical directions.
As shown in FIG. 1A, the image super-resolution device 1 sequentially cuts out blocks B (for example, 8×8 pixels) of an image L, and converts the blocks B into two-dimensional wavelet decomposition both horizontally and vertically. Assume that the LL image (LL ₁ ) is a region component. In addition, the image super-resolution apparatus 1 uses the CNN that has learned the parameter Pa in advance from the LL image (LL ₁ ) to set the horizontal high-frequency component and the vertical component corresponding to the LL image (LL ₁ ) as three high-frequency components An HL image (HL ₁ ̂) that is a low-frequency component, an LH image (LH ₁ ̂) that has a horizontal low-frequency component and a vertical high-frequency component, and an HH image (HH ₁ ̂) that has both horizontal and vertical high-frequency components ^) is estimated.

そして、画像超解像装置１は、ＬＬ画像（ＬＬ_１）と、ＨＬ画像（ＨＬ_１＾）と、ＬＨ画像（ＬＨ_１＾）と、ＨＨ画像（ＨＨ_１＾）とをウェーブレット再構成することで、ブロックＢに対応した超解像ブロックＳ（ＬＬ_０＾）を生成する。
このように、画像超解像装置１は、ブロックＢごとにＣＮＮを用いた高解像度化を行うことで、低解像度の画像Ｌから高解像度（超解像）の画像Ｈを生成する。 Then, the image super-resolution device 1 performs wavelet reconstruction on the LL image (LL ₁ ), the HL image (HL ₁ ̂), the LH image (LH ₁ ̂), and the HH image (HH ₁ ̂). , a super-resolution block S(LL ₀ ̂) corresponding to block B is generated.
In this manner, the image super-resolution apparatus 1 generates a high-resolution (super-resolution) image H from a low-resolution image L by performing resolution enhancement using CNN for each block B. FIG.

パラメータ学習装置２（図７）は、画像超解像装置１が用いるＣＮＮのパラメータＰａを学習するものである。
図１（ｂ）に示すように、パラメータ学習装置２は、学習用画像ＤのブロックＥ（例えば、１６×１６画素）を順次切り出す。そして、パラメータ学習装置２は、２次元ウェーブレット分解により、ブロックＥ（ＬＬ_０′）を、ＬＬ画像（ＬＬ_１′）と、ＨＬ画像（ＨＬ_１′）と、ＬＨ画像（ＬＨ_１′）と、ＨＨ画像（ＨＨ_１′）とに分解する。
そして、パラメータ学習装置２は、ＬＬ画像（ＬＬ_１′）をＣＮＮに入力し、その出力であるＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）およびＨＨ画像（ＨＨ_１＾）と、正解データであるウェーブレット分解後のＨＬ画像（ＨＬ_１′）、ＬＨ画像（ＬＨ_１′）およびＨＨ画像（ＨＨ_１′）との誤差をなくすように誤差逆伝播法により、ＣＮＮのパラメータＰａを学習する。 The parameter learning device 2 ( FIG. 7 ) learns the CNN parameter Pa used by the image super-resolution device 1 .
As shown in FIG. 1B, the parameter learning device 2 sequentially cuts out blocks E (for example, 16×16 pixels) of the learning image D. As shown in FIG. Then, the parameter learning device 2 converts block E (LL ₀ ′) into LL image (LL ₁ ′), HL image (HL ₁ ′), and LH image (LH ₁ ′) by two-dimensional wavelet decomposition. HH image (HH ₁ ').
Then, the parameter learning device 2 inputs the LL image (LL ₁ ') to the CNN, and outputs the HL image (HL ₁ ̂), the LH image (LH ₁ ̂) and the HH image (HH ₁ ̂), CNN parameter Pa is learned by error backpropagation so as to eliminate the error with the HL image (HL ₁ '), LH image (LH ₁ '), and HH image (HH ₁ ') after wavelet decomposition, which is the correct data. do.

なお、パラメータ学習装置２は、学習用画像Ｄとして、画像超解像装置１が高解像度化する対象となる画像の特徴（絵柄等）を含んだ画像を用いることで、画像超解像装置１が用いるＣＮＮを最適化することができる。例えば、パラメータ学習装置２は、学習用画像Ｄとして、高解像度化の対象となる低解像の画像Ｌを用いてパラメータ学習を行ってもよい。
また、パラメータ学習装置２は、学習用画像Ｄとして、汎用的な画像を用いれば、汎用的な画像を高解像度化するための画像超解像装置１が用いるＣＮＮを学習することができる。
なお、学習用画像Ｄは、１枚である必要なく、複数枚の画像を用いてもよい。
以下、画像超解像装置１およびパラメータ学習装置２の構成および動作について詳細に説明する。 Note that the parameter learning device 2 uses, as the learning image D, an image containing features (patterns, etc.) of an image to be increased in resolution by the image super-resolution device 1. can optimize the CNN used by For example, the parameter learning device 2 may perform parameter learning using, as the learning image D, a low-resolution image L to be increased in resolution.
Also, if a general-purpose image is used as the learning image D, the parameter learning device 2 can learn the CNN used by the image super-resolution device 1 for increasing the resolution of the general-purpose image.
Note that the learning image D does not have to be one, and a plurality of images may be used.
The configurations and operations of the image super-resolution device 1 and the parameter learning device 2 will be described in detail below.

＜画像超解像装置の構成＞
図２を参照して、画像超解像装置１の構成について説明する。なお、ここでは、画像超解像装置１に入力する画像Ｌの解像度を水平Ａ_ｘ画素、垂直Ａ_ｙ画素とする。また、画像超解像装置１が出力する画像Ｈの解像度を、画像Ｌを水平方向および垂直方向にそれぞれ２倍した水平２Ａ_ｘ画素、垂直２Ａ_ｙ画素とする。
図２に示すように、画像超解像装置１は、ブロック切り出し手段１０と、ブロック走査手段１１と、畳み込みニューラルネットワーク手段１２と、ウェーブレット再構成手段１３と、ブロック配置手段１４と、を備える。 <Configuration of image super-resolution device>
The configuration of the image super-resolution device 1 will be described with reference to FIG. Here, it is assumed that the resolution of the image L input to the image super-resolution apparatus 1 is horizontal A _x pixels and vertical A _y pixels. The resolution of the image H output by the image super-resolution device 1 is assumed to be horizontal 2A _x pixels and vertical 2A _y pixels obtained by doubling the image L in the horizontal and vertical directions.
As shown in FIG. 2 , the image super-resolution device 1 includes block clipping means 10 , block scanning means 11 , convolutional neural network means 12 , wavelet reconstruction means 13 and block placement means 14 .

ブロック切り出し手段１０は、入力画像（画像Ｌ）の部分画像であるブロックを切り出すものである。以下、画像Ｌの画像座標（ｘ，ｙ）における第ｃの色成分の画素値をＬ（ｘ，ｙ，ｃ）と記す。ここで、画像Ｌをモノクロ画像とした場合、ｃ＝０、画像ＬをＣ原色のカラー画像とした場合、ｃは０以上Ｃ未満（Ｃは２以上の整数、例えば、ＲＧＢ画像の場合Ｃ＝３）である。
ブロック切り出し手段１０は、水平Ｐ画素および垂直Ｑ画素（Ｐ×Ｑ画素）の矩形領域のブロックを画像Ｌから切り出す。ここで、ＰおよびＱはともに自然数とし、かつ、Ｐ×Ｑは２以上とする。例えば、Ｐ＝８およびＱ＝８である。 The block clipping means 10 clips a block which is a partial image of an input image (image L). Hereinafter, the pixel value of the c-th color component at the image coordinates (x, y) of the image L is denoted as L(x, y, c). Here, when the image L is a monochrome image, c = 0, and when the image L is a C primary color image, c is 0 or more and less than C (C is an integer of 2 or more, for example, in the case of an RGB image, C = 3).
The block cutout means 10 cuts out from the image L a block of a rectangular area of horizontal P pixels and vertical Q pixels (P×Q pixels). Here, both P and Q are natural numbers, and P×Q is 2 or more. For example, P=8 and Q=8.

ここで、ブロック切り出し手段１０は、後記するブロック走査手段１１が指定する切り出し座標（ｐ，ｑ）を基準に切り出しを行う。例えば、ブロック走査手段１１から、切り出し座標（ｐ，ｑ）を指定された場合、ブロック切り出し手段１０は、画像座標（ｐ，ｑ）と画像座標（ｐ＋Ｐ－１，ｑ＋Ｑ－１）とを対角の２点とする矩形内（境界を含む）の画像Ｌの画素値列を部分画像（ブロック）として切り出す。
ブロック切り出し手段１０は、切り出したブロックを畳み込みニューラルネットワーク手段１２と、ウェーブレット再構成手段１３とに出力する。 Here, the block cutout means 10 cuts out based on the cutout coordinates (p, q) specified by the block scanning means 11, which will be described later. For example, when the block scanning means 11 designates the cutout coordinates (p, q), the block cutout means 10 sets the image coordinates (p, q) and the image coordinates (p+P−1, q+Q−1) to the diagonal A pixel value string of the image L within the rectangle (including the boundary) with two points is cut out as a partial image (block).
The block extraction means 10 outputs the extracted blocks to the convolutional neural network means 12 and the wavelet reconstruction means 13 .

なお、ブロック切り出し手段１０は、ブロックの切り出しとともに、色成分ｃごとの画素値の正規化（係数α_ｃ，オフセットβ_ｃ）を施しても構わない。
具体的には、ブロック切り出し手段１０は、以下の式（１）により正規化を行いブロックＢの画素値（ｘ，ｙ，ｃ）とする。 Note that the block extraction means 10 may normalize the pixel values (coefficient α _c , offset β _c ) for each color component c along with the block extraction.
Specifically, the block cut-out means 10 performs normalization using the following equation (1) to obtain the pixel values (x, y, c) of the block B.

例えば、画像Ｌが、輝度・色差表現によるカラー画像（Ｃ＝３）であって、ｃ＝０（輝度）については、画素値が１６～２３５の範囲、ｃ＝１およびｃ＝２（色差）については、画素値が１６～２４０の範囲である場合、α_０＝１／２１９、β_０＝－１６／２１９、α_１＝α_２＝１／２２４、β_１＝β_２＝－１６／２２４とする。 For example, the image L is a color image (C=3) expressed by luminance and color difference, and for c=0 (luminance), the pixel value ranges from 16 to 235, c=1 and c=2 (color difference). , when the pixel values are in the range of 16 to 240, α ₀ =1/219, β ₀ =−16/219, α ₁ =α ₂ =1/224, β ₁ =β ₂ =−16/224 and

ブロック走査手段１１は、ブロック切り出し手段１０がブロックを切り出す基準となる切り出し座標（ｐ，ｑ）を逐次生成するものである。ブロック走査手段１１は、例えば、時点ｕ（ｕは０以上の整数）において、以下の式（２）により、水平方向Ｐ画素および垂直方向Ｑ画素の間隔で、ラスタ走査の順序に座標（ｐ，ｑ）を生成する。 The block scanning means 11 sequentially generates cut-out coordinates (p, q) that serve as references for the block cut-out means 10 to cut out blocks. The block scanning means 11 scans the coordinates (p, q).

また、二項演算子％は、ａ％ｂが非負の整数ａを正の整数ｂで除したときの剰余を求めるものとして定義する。また、Ｂ_ｘは、水平方向のブロックの切り出し数である。
あるいは、ブロック走査手段１１は、例えば、時点ｕ（ｕは０以上の整数）において、以下の式（３）により、時間の前後で切り出し画像が重なり合うように、水平方向Ｐ／２画素および垂直方向Ｑ／２画素の間隔で、ラスタ走査の順序に座標（ｐ，ｑ）を生成することとしてもよい。 Also, the binary operator % is defined as a %b that obtains the remainder when a non-negative integer a is divided by a positive integer b. Also, _Bx is the number of cut-out blocks in the horizontal direction.
Alternatively, the block scanning means 11, for example, at time u (u is an integer equal to or greater than 0), according to the following equation (3), P/2 pixels in the horizontal direction and P/2 pixels in the vertical direction so that the clipped images overlap before and after the time. Coordinates (p,q) may be generated in raster scan order at intervals of Q/2 pixels.

ブロック走査手段１１は、生成した切り出し座標（ｐ，ｑ）を、ブロック切り出し手段１０と、ブロック配置手段１４とに出力する。 The block scanning means 11 outputs the generated cutout coordinates (p, q) to the block cutout means 10 and the block arrangement means 14 .

畳み込みニューラルネットワーク手段１２は、ブロック切り出し手段１０で切り出したブロックを入力して予め学習した畳み込みニューラルネットワークによる処理を実行するものである。畳み込みニューラルネットワーク手段１２は、入力したブロックと同じ標本数のブロックを３チャンネル分生成する。すなわち、畳み込みニューラルネットワーク手段１２は、ブロックの標本数の３倍の標本数のデータを出力する。
例えば、畳み込みニューラルネットワーク手段１２は、ブロック切り出し手段１０から、Ｐ×Ｑ画素の矩形のブロックが入力された場合、Ｐ×Ｑ画素の画像を３チャンネル分出力する。
畳み込みニューラルネットワーク手段１２は、例えば、１個以上の畳込手段１２０と、１個以上の活性化関数適用手段１２１とを交互に縦続接続した構成とすることができる。
図２に示すように、畳み込みニューラルネットワーク手段１２は、Ｌ個の畳込手段１２０（１２０_１，１２０_２，…，１２０_Ｌ）と、Ｌ個の活性化関数適用手段１２１（１２１_１，１２１_２，…，１２１_Ｌ）と、を備える。 The convolutional neural network means 12 inputs the blocks cut out by the block cutout means 10 and executes processing by a pre-learned convolutional neural network. The convolutional neural network means 12 generates three channels of blocks having the same number of samples as the input block. That is, the convolutional neural network means 12 outputs the data of the number of samples three times the number of samples of the block.
For example, when a rectangular block of P.times.Q pixels is input from the block extraction means 10, the convolutional neural network means 12 outputs an image of P.times.Q pixels for three channels.
The convolutional neural network means 12 can have, for example, a configuration in which one or more convolution means 120 and one or more activation function application means 121 are alternately connected in cascade.
As shown in FIG. 2, the convolutional neural network means 12 includes L convolution means 120 (120 ₁ , 120 ₂ , . . . , 120 _L ) and L activation function application means 121 (121 ₁ , 121 ₂ , . . . , 121 _L ).

畳込手段１２０は、予め定めたサイズの学習済みの結合重み係数（パラメータ）を有するカーネルを用いて畳み込み演算を行うものである。
畳込手段１２０_ｉ（ｉは１以上Ｌ以下の整数）は、カーネルサイズＭ_ｉ×Ｎ_ｉ×Ｋ_ｉ－１の３階テンソルの畳み込み演算器（カーネル：不図示）をＫ_ｉ種類（Ｔ_ｉ ^（０）（ｒ，ｓ，ｔ）～Ｔ_ｉ ^{（Ｋｉ－１）}（ｒ，ｓ，ｔ））備え、サイズＰ×Ｑ×Ｋ_ｉ－１の３階テンソルＩ_ｉ－１（ｒ，ｓ，ｔ）の入力に対して、畳み込み演算を行い、サイズＰ×Ｑ×Ｋ_ｉの３階テンソルＪ_ｉ（ｒ，ｓ，ｔ）として出力する。
具体的には、畳込手段１２０_ｉは、以下の式（４）により、Ｊ_ｉ（ｒ，ｓ，ｔ）を算出する。 The convolution means 120 performs a convolution operation using a kernel having learned connection weighting coefficients (parameters) of a predetermined size.
The convolution means 120 _i ( _i is an integer of ₁ or more and L or less _{) converts K i} _types (T _i ⁽⁰⁾ (r,s,t) to T _i ^(Ki−1) (r,s,t)) and of size P×Q×K _i−1 3rd order tensor I _i−1 (r,s,t) t) is subjected to a convolution operation and output as a third-order tensor J _i (r, s, t) of size P×Q×K _i .
Specifically, the convolution means 120 _i calculates J _i (r, s, t) by the following equation (4).

なお、テンソルＴ_ｉ ^（ｋ）（ρ，σ，τ）（ｋは０以上Ｋ_ｉ未満の整数）は、ｒ_ｉ ^（０）以上ｒ_ｉ ^（１）以下の整数ρ、ｓ_ｉ ^（０）以上ｓ_ｉ ^（１）以下の整数σ、０以上Ｋ_ｉ－１未満の整数τに対して、定義されているものとする。 Note that the tensor T _i ^(k) (ρ, σ, τ) (k is an integer greater than or equal to ⁰ and less than K _i ) is an integer ρ greater than or equal to _ri (0) and less than or equal to _ri ⁽¹ ), _si ⁽⁰⁾ or greater s _i ⁽¹ ) or less, and an integer τ of 0 or more and less than K _i−1 are defined.

また、畳込手段１２０_ｉは、式（４）において、Ｉ_ｉ－１（ｒ－ρ，ｓ－σ，τ）を参照するにあたって、ｒ－ρ＜０、ｒ－ρ≧Ｐ、ｓ－σ＜０またはｓ－σ≧Ｑの場合（テンソルの定義域外を参照した場合）には、その値として、例えば、Ｉ_ｉ－１（ｒ－ρ，ｓ－σ，τ）＝０（ゼロパディング）として定義した値を用いる。あるいは、畳込手段１２０_ｉは、定義域内の最近傍の要素の値（０次外挿値）を用いてもよい。
なお、ｒ_ｉ ^（０）、ｒ_ｉ ^（１）、ｓ_ｉ ^（０）およびｓ_ｉ ^（１）は、例えば、以下の式（５）、あるいは、式（６）により定義した値を用いる。 In addition, the convolution means 120 _i refers to I _i−1 (r−ρ, s−σ, τ) in Equation (4), r−ρ<0, r−ρ≧P, s−σ <0 or s−σ≧Q (when referring to outside the domain of the tensor), the value is, for example, I _i−1 (r−ρ, s−σ, τ)=0 (zero padding) Use the value defined as Alternatively, the convolution means 120 _i may use the value of the closest element within the domain (zero-order extrapolated value).
For r _i ⁽⁰⁾ , r _i ⁽¹⁾ , s _i ⁽⁰⁾ and s _i ⁽¹⁾ , for example, values defined by the following equations (5) or (6) are used.

例えば、Ｍ_ｉ＝５、Ｎ_ｉ＝５の場合、式（５）および式（６）のいずれによっても、ｒ_ｉ ^（０）＝－２、ｒ_ｉ ^（１）＝＋２、ｓ_ｉ ^（０）＝－２、ｓ_ｉ ^（１）＝＋２となる。
また、例えば、Ｍ_ｉ＝４、Ｎ_ｉ＝４の場合、式（５）によれば、ｒ_ｉ ^（０）＝－１、ｒ_ｉ ^（１）＝＋２、ｓ_ｉ ^（０）＝－１、ｓ_ｉ ^（１）＝＋２となり、式（６）によれば、ｒ_ｉ ^（０）＝－２、ｒ_ｉ ^（１）＝＋１、ｓ_ｉ ^（０）＝－２、ｓ_ｉ ^（１）＝＋１となる。 For example, if M _i =5, N _i =5, both equations (5) and (6) yield r _i ⁽⁰⁾ =−2, r _i ⁽¹⁾ =+2, s _i ⁽⁰⁾ =-2 and s _i ⁽¹⁾ =+2.
Also, for example, when M _i =4 and N _i =4, according to equation (5), r _i ⁽⁰⁾ =−1, r _i ⁽¹⁾ =+2, s _i ⁽⁰⁾ =−1, s _i ⁽¹⁾ =+2, and according to equation (6), r _i ⁽⁰⁾ =−2, r _i ⁽¹⁾ =+1, s _i ⁽⁰⁾ =−2, s _i ⁽¹⁾ =+1 becomes.

なお、初段の畳込手段１２０_１への入力は、サイズＰ×Ｑ×Ｋ_０の３階テンソルＩ_０（ｒ，ｓ，ｔ）であるが、Ｋ_０は入力画像Ｌがモノクロ画像の場合にはＫ_０＝１、Ｃチャンネルのカラー画像の場合にはＫ_０＝Ｃと定義する（Ｃは原色の数、例えば、ＲＧＢ画像等の典型的なカラー画像においてはＣ＝３）。
また、畳込手段１２０_１への入力であるＩ_０（ｒ，ｓ，ｔ）には、以下の式（７）に示すように、ブロック切り出し手段１０から入力されるブロックＢ（ｒ，ｓ，ｔ）を設定する。 The input to the first-stage convolution means 120 ₁ is a third-order tensor I ₀ (r, s, t) of size P×Q×K ₀ _. defines K ₀ =1, and K ₀ =C for a C-channel color image (where C is the number of primary colors, eg C=3 for a typical color image such as an RGB image).
Also, I ₀ (r, s, t), which is input to the convolution means 120 ₁ , is represented by the following equation (7), the block B (r, s, t).

一方、最終段の畳込手段１２０_Ｌにおける畳み込み演算器（不図示）の種類数Ｋ_Ｌは、入力画像Ｌがモノクロ画像の場合にはＫ_Ｌ＝３、Ｃチャンネル（Ｃは原色の数、典型的にはＣ＝３）のカラー画像の場合にはＫ_Ｌ＝３Ｃと定義する On the other hand, when the input image _L is a monochrome image, the number of types of convolution calculators (not shown) in the convolution means 120 _L at the final stage is K _L =3, C channels (C is the number of primary colors, typically For a color image with C=3), define K _L =3C

活性化関数適用手段１２１は、畳込手段１２０の出力に対して、活性化関数を用いた演算を行うものである。
活性化関数適用手段１２１_ｉ（ｉは１以上Ｌ以下の整数）は、以下の式（８）に示すように、畳込手段１２０_ｉから入力されるサイズＰ×Ｑ×Ｋ_ｉの３階テンソルＪ_ｉ（ｒ，ｓ，ｔ）の各成分に対して、活性化関数φを適用し、その適用結果を、サイズＰ×Ｑ×Ｋ_ｉの３階テンソルＩ_ｉ（ｒ，ｓ，ｔ）として出力する。 The activation function applying means 121 performs calculation using the activation function on the output of the convolution means 120 .
Activation function application means 121 _i (i is an integer of 1 or more and L or less), as shown in the following equation (8), a third-order tensor of size P×Q×K _i input from convolution means 120 _i To each component of J _i (r, s, t), apply an activation function φ and denote the result of the application as a 3rd order tensor I _i (r, s, t) of size P×Q×K _i Output.

テンソルＪの各成分に対して適用する活性化関数φ_ｉ，ｔは、ｉまたはｔのいずれか、あるいは、その両方に関してすべて同一でなくても構わないし、ｉおよびｔのすべての組み合わせについて同一であっても構わない。典型的な例としては、ｉおよびｔのすべての組み合わせについて同一とするか、同一とは限らない活性化関数をｉごとに設定する。
例えば、ｉ＝１，２，…，Ｌ－１については、ＲｅＬＵ（Rectified Linear Unit：正規化線形関数）を用い（式（９）参照）、ｉ＝Ｌについては活性化関数を用いない（式（１３）参照）等である。以下に、活性化関数適用手段１２１で適用する活性化関数の具体例である関数φを示す。
例えば、関数φは、以下の式（９）に示すＲｅＬＵを用いることができる。 The activation functions φ _i,t applied to each component of tensor J need not all be the same for either i or t, or both, or be the same for all combinations of i and t. It doesn't matter if there is. A typical example is to set the same activation function for all combinations of i and t, or to set an activation function that is not necessarily the same for each i.
For example, for i = 1, 2, . (13)) and the like. A function φ, which is a specific example of the activation function applied by the activation function applying means 121, is shown below.
For example, the function φ can use ReLU shown in the following equation (9).

また、関数φは、以下の式（１０）に示すシグモイド（sigmoid）関数を用いることができる。 A sigmoid function shown in the following equation (10) can be used as the function φ.

また、関数φは、以下の式（１１）に示す双曲線正接関数を用いることができる。 Also, the function φ can use the hyperbolic tangent function shown in the following equation (11).

また、関数φは、以下の式（１２）に示すソフトサイン（softsign）関数を用いることができる。 Also, the function φ can use the softsign function shown in the following equation (12).

また、関数φは、以下の式（１３）に示す恒等写像（活性化関数を適用しない）を用いても構わない。 Also, as the function φ, the identity map (without applying the activation function) shown in the following equation (13) may be used.

この式（１３）に示すように、テンソルＪの全成分について活性化関数を適用しない場合、活性化関数適用手段１２１_ｉそのものを構成から省略しても構わない。
なお、最終段の畳込手段１２０_Ｌ以外の畳込手段１２０に接続される活性化関数適用手段１２１には、ニューラルネットワークの滑らかな表現を学習するため、非線形な活性化関数（式（１３）以外）を用いることする。
最終段の畳込手段１２０_Ｌの後段に接続される活性化関数適用手段１２１_Ｌには、すべての出力を活性化させるため、正、負および零の値をとり得る活性化関数（例えば、式（１１）の双曲線正接関数、式（１２）のソフトサイン関数）を用いるか、活性化関数を適用しない関数（式（１３））を用いるか、あるいは、活性化関数適用手段１２１_Ｌそのものを省略するものとする。
畳み込みニューラルネットワーク手段１２は、畳み込みニューラルネットワークによる処理を実行した最終段の演算結果Ｊ_Ｌを、ウェーブレット再構成手段１３に出力する。 As shown in this equation (13), when the activation function is not applied to all the components of tensor J, the activation function applying means 121 _i itself may be omitted from the configuration.
Note that the activation function applying means 121 connected to the convolution means 120 other than the convolution means _120L at the final stage is provided with a nonlinear activation function (equation (13) ) should be used.
Activation function application means 121 _L connected to the rear stage of convolution means 120 _L in the final stage has an activation function (for example, the formula Either the hyperbolic tangent function of (11), the soft sine function of formula (12)) is used, the function that does not apply the activation function (formula (13)) is used, or the activation function applying means 121 _L itself is omitted. It shall be.
The convolutional neural network means 12 outputs to the wavelet reconstruction means 13 the operation result _JL of the final stage that has been processed by the convolutional neural network.

図３に、本発明の実施形態に係る画像超解像装置１のカラー画像を対象としたウェーブレット再構成手段１３の構成を示す。
ウェーブレット再構成手段１３は、色成分ごとにウェーブレット再構成を行う第１ウェーブレット再構成手段１３_１と、第２ウェーブレット再構成手段１３_２と、第３ウェーブレット再構成手段１３_３と、を有し、ブロック切り出し手段１０で切り出されるブロックＢと、畳み込みニューラルネットワーク手段１２で演算されたブロックＢの３倍の標本数のデータＪ_Ｌとに基づいて、ウェーブレット再構成を行い、超解像ブロックＳを生成するものである。なお、以下では、超解像ブロックＳの座標（ｘ，ｙ）における色成分ｃの画素値をＳ（ｘ，ｙ，ｃ）と表す。ただし、入力画像Ｌがモノクロ画像の場合には、色成分ｃは、ｃ＝０のみとする。この場合、ウェーブレット再構成手段１３は図２に示すように１つの構成とすればよい。 FIG. 3 shows the configuration of the wavelet reconstruction means 13 for color images of the image super-resolution device 1 according to the embodiment of the present invention.
The wavelet reconstruction means 13 has a _first wavelet reconstruction means 13-1, a second wavelet reconstruction means _13-2 , and a _third wavelet reconstruction means 13-3 that perform wavelet reconstruction for each color component, Wavelet reconstruction is performed based on the block B cut out by the block cutout means 10 and the data J _L whose number of samples is three times that of the block B calculated by the convolutional neural network means 12, and the super-resolution block S is generated. It is something to do. In addition, below, the pixel value of the color component c at the coordinates (x, y) of the super-resolution block S is expressed as S(x, y, c). However, when the input image L is a monochrome image, the color component c is only c=0. In this case, the wavelet reconstruction means 13 may have a single configuration as shown in FIG.

ウェーブレット再構成手段１３がウェーブレット再構成に用いる基底関数は任意であるが、例えば、ハール（Haar）基底を用いることができる。
例えば、入力画像Ｌがモノクロ画像で、基底関数がハール基底の場合、ウェーブレット再構成手段１３は、ブロック切り出し手段１０の出力であるブロックＢ（ｒ，ｓ，０）と、畳み込みニューラルネットワーク手段１２の出力であるＪ_Ｌ（ｒ，ｓ，０）、Ｊ_Ｌ（ｒ，ｓ，１）およびＪ_Ｌ（ｒ，ｓ，２）とに基づいて、以下の式（１４）により、超解像ブロックＳを生成する。 The wavelet reconstruction means 13 can use any basis function for wavelet reconstruction, but for example, a Haar basis can be used.
For example, when the input image L is a monochrome image and the basis function is the Haar basis, the wavelet reconstruction means 13 converts the block B (r, s, 0) which is the output of the block extraction means 10 and the convolutional neural network means 12 Based on the outputs J _L (r, s, 0), J _L (r, s, 1) and J _L (r, s, 2), the super-resolution block S to generate

また、例えば、入力画像Ｌがカラー画像で、基底関数がハール基底の場合、ウェーブレット再構成手段１３は、以下の式（１５）により、超解像ブロックＳを生成する。 Further, for example, when the input image L is a color image and the basis functions are Haar basis, the wavelet reconstruction means 13 generates the super-resolution block S by the following equation (15).

ウェーブレット再構成手段１３は、生成した超解像ブロックＳを、ブロック配置手段１４に出力する。 The wavelet reconstruction means 13 outputs the generated super-resolution block S to the block arrangement means 14 .

ブロック配置手段１４は、ブロック走査手段１１で生成されるブロックＢの切り出し座標（ｐ，ｑ）に基づいて、当該ブロックＢに対応してウェーブレット再構成手段１３で生成される超解像ブロックＳを配置して、超解像画像を生成するものである。 Based on the cut-out coordinates (p, q) of the block B generated by the block scanning means 11, the block placement means 14 selects the super-resolution block S generated by the wavelet reconstruction means 13 corresponding to the block B. arranged to generate a super-resolution image.

なお、ブロック走査手段１１が生成する座標を、ブロックが重なり合わない切り出し座標（ｐ，ｑ）とする場合、ブロック配置手段１４は、切り出し座標（ｐ，ｑ）に応じて、超解像ブロックＳを配置することで、超解像画像を生成する。
また、ブロック走査手段１１が生成する座標を、ブロックが重なる切り出し座標（ｐ，ｑ）とする場合、ブロック配置手段１４は、切り出し座標（ｐ，ｑ）に応じて、超解像ブロックＳをブレンディングにより合成することで、超解像画像を生成する
具体的には、ブロック配置手段１４は、ブロックが重なり合わない切り出し座標の場合（前記式（２）参照）、以下の式（１６）により、ブロック走査手段１１の走査に応じた座標（ｐ，ｑ）に対応して、超解像ブロックＳを超解像画像Ｈに配置する。 In addition, when the coordinates generated by the block scanning means 11 are the cutout coordinates (p, q) where the blocks do not overlap, the block arrangement means 14 arranges the super-resolution block S according to the cutout coordinates (p, q) to generate a super-resolution image.
Further, when the coordinates generated by the block scanning unit 11 are taken as the cutout coordinates (p, q) where the blocks overlap, the block arrangement unit 14 blends the super-resolution block S according to the cutout coordinates (p, q). Specifically, in the case of cut-out coordinates in which the blocks do not overlap (see the above formula (2)), the block arrangement means 14 uses the following formula (16) to generate a super-resolution image by combining A super-resolution block S is arranged in a super-resolution image H in correspondence with coordinates (p, q) according to scanning by the block scanning means 11 .

なお、入力画像Ｌがモノクロ画像の場合、Ｃ＝１とし、出力画像Ｈの第３引数のｃはｃ＝０のみとする。
また、ブロック配置手段１４は、ブロックが重なり合う切り出し座標の場合（前記式（３）参照）、以下の式（１７）により、所定の重みＷ_ｐ，ｑ（ρ，σ，ｃ）を付加して、オーバーラップ部分のブレンディングを行い、超解像画像Ｈを合成する。 Note that when the input image L is a monochrome image, C=1, and the third argument c of the output image H is only c=0.
In addition, in the case of cut coordinates where blocks overlap (see the above formula (3)), the block placement means 14 adds a predetermined weight W _p,q (ρ, σ, c) according to the following formula (17) to , and super-resolution image H is synthesized by blending the overlapping portions.

ブロック配置手段１４は、式（１７）に示すように、１時点前までに足し込まれた結果であるＨ_ｏｌｄに、現時点で得られた超解像ブロックＳに空間的な重みＷ_ｐ，ｑを付加したものを足し込む。ここで、１時点前とは、ブロック走査手段１１が前記式（３）の演算で用いる時点ｕを、ｕ－１とした時点である。なお、走査開始前の出力画像Ｈには、初期値として、すべて“０”を設定する。
重みＷ_ｐ，ｑには、以下の式（１８）、式（１９）に示すように、水平方向の因子Ｗ_ｐ，ｑ ^{（Ｈｏｒ）}と垂直方向の因子Ｗ_ｐ，ｑ ^{（Ｖｅｒ）}の積を用いることができる。 As shown in equation (17), the block placement means 14 adds spatial weights W _p _{, q} is added. Here, one time point before is a time point where the time point u used by the block scanning means 11 in the calculation of the formula (3) is set to u-1. Note that the output image H before scanning is set to all "0" as an initial value.
The weight W _p,q is the product of the horizontal factor W _p,q ^(Hor) and the vertical factor W _p,q ^(Ver) , as shown in the following equations (18) and (19). can be used.

前記式（１８）の重みを用いることで、ブロック配置手段１４は、ブロックの中心部分が最大の重み付けとなり、ブロックの重なり合う部分が水平方向および垂直方向のそれぞれについて線形に減衰する重み付けとなるように、ブロックをブレンディングする。これによって、ブロック配置手段１４は、ブロック間の境界を目立たなくすることができる。
ブロック配置手段１４は、ブロック走査手段１１が入力画像Ｌの走査を終えた時点で、入力画像Ｌの４倍（水平２倍、垂直２倍）の解像度を有する出力画像（超解像画像）Ｈを生成することができる。 By using the weights of the above equation (18), the block arranging means 14 assigns the maximum weighting to the central portion of the block and the weighting to linearly attenuate the overlapping portion of the block in each of the horizontal and vertical directions. , to blend blocks. This allows the block placement means 14 to make the boundaries between blocks inconspicuous.
When the block scanning unit 11 finishes scanning the input image L, the block arranging unit 14 outputs an output image (super-resolution image) H having four times (horizontal double, vertical double) the resolution of the input image L. can be generated.

以上説明したように画像超解像装置１を構成することで、画像超解像装置１は、予め学習したパラメータを用いた畳み込みニューラルネットワークにより、高解像度の画像（超解像画像）を生成することができる。
このとき、入力画像Ｌが原画像を縮小して生成したものであっても、画像超解像装置１は、原画像を参照することなく、入力画像Ｌに対するウェーブレット再構成可能な空間高周波スペクトルを推定し、超解像画像Ｈを生成することができる。
なお、画像超解像装置１は、コンピュータを、前記した各手段として機能させるためのプログラム（画像超解像プログラム）により動作させることができる。 By configuring the image super-resolution device 1 as described above, the image super-resolution device 1 generates a high-resolution image (super-resolution image) by a convolutional neural network using pre-learned parameters. be able to.
At this time, even if the input image L is generated by reducing the original image, the image super-resolution apparatus 1 obtains a wavelet-reconstructable spatial high-frequency spectrum for the input image L without referring to the original image. can be estimated and a super-resolution image H can be generated.
Note that the image super-resolution apparatus 1 can be operated by a program (image super-resolution program) for causing a computer to function as each means described above.

＜画像超解像装置の動作＞
図４を参照（構成については、適宜図２参照）して、画像超解像装置１の動作について説明する。なお、畳み込みニューラルネットワーク手段１２の畳込手段１２０の結合重み係数は、予めパラメータ学習装置２（図７）によって学習されたパラメータが設定されているものとする。 <Operation of image super-resolution device>
The operation of the image super-resolution apparatus 1 will be described with reference to FIG. 4 (see also FIG. 2 for the configuration). It is assumed that the connection weighting coefficients of the convolution means 120 of the convolution neural network means 12 are preset with parameters learned by the parameter learning device 2 (FIG. 7).

ステップＳ１において、ブロック走査手段１１は、入力画像Ｌにおいて、ラスタ走査の順序で、Ｐ×Ｑ画素のブロックの切り出し位置となる座標（ｐ，ｑ）を生成する。
なお、ブロックの切り出し位置は、ブロックが重ならない位置としてもよいし、ブロックが重複する位置としてもよく、予め定めたいずれか一方の切り出し位置とする。 In step S1, the block scanning means 11 generates coordinates (p, q) as a cut-out position of a block of P×Q pixels in the input image L in the order of raster scanning.
The block cutout position may be a position where the blocks do not overlap or a position where the blocks overlap, and either one of the predetermined cutout positions may be used.

ステップＳ２において、ブロック切り出し手段１０は、入力画像Ｌから、ステップＳ１で生成された座標（ｐ，ｑ）を切り出し位置として、入力画像ＬからＰ×Ｑ画素のブロックを切り出す。ブロック切り出し手段１０が切り出すブロックは、図１（ａ）に示したブロックＢのＬＬ画像（ＬＬ_１）に相当する。 In step S2, the block clipping means 10 clips a block of P×Q pixels from the input image L using the coordinates (p, q) generated in step S1 as the clipping position. The block cut out by the block cutout means 10 corresponds to the LL image (LL ₁ ) of the block B shown in FIG. 1(a).

ステップＳ３において、畳み込みニューラルネットワーク手段１２は、ステップＳ２で切り出したブロックを入力し、畳込手段１２０および活性化関数適用手段１２１で構成された畳み込みニューラルネットワーク（ＣＮＮ）による演算を実行することで、ブロックの標本数の３倍の標本数のデータを出力する。
この畳み込みニューラルネットワーク手段１２が出力するデータは、図１（ａ）に示したＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）およびＨＨ画像（ＨＨ_１＾）に相当する。 In step S3, the convolutional neural network means 12 inputs the block cut out in step S2, and performs an operation by a convolutional neural network (CNN) composed of the convolution means 120 and the activation function application means 121. Output data with the number of samples three times the number of samples in the block.
The data output from the convolutional neural network means 12 correspond to the HL image (HL ₁ ̂), LH image (LH ₁ ̂) and HH image (HH ₁ ̂) shown in FIG. 1(a).

ステップＳ４において、ウェーブレット再構成手段１３は、ステップＳ２で切り出したブロック（ＬＬ画像）と、ステップＳ３で生成したデータ（ＨＬ画像、ＬＨ画像およびＨＨ画像）とをウェーブレット再構成し、超解像ブロックを生成する。 In step S4, the wavelet reconstruction means 13 wavelet-reconstructs the block (LL image) cut out in step S2 and the data (HL image, LH image and HH image) generated in step S3, and super-resolution block to generate

ステップＳ５において、ブロック配置手段１４は、ステップＳ４で生成された超解像ブロックを、ステップＳ１で生成された座標（ｐ，ｑ）に対して、出力画像Ｈ上の座標（２ｐ，２ｑ）の位置に配置する。なお、ステップＳ１で、ブロックの切り出し位置をブロックが重ならない位置とした場合、ブロック配置手段１４は、出力画像Ｈ上の座標（２ｐ，２ｑ）の位置にそのまま超解像ブロックを配置する。一方、ステップＳ１で、ブロックの切り出し位置をブロックが重ならない位置とした場合、ブロック配置手段１４は、出力画像Ｈ上の座標（２ｐ，２ｑ）の位置において、すでに配置済みの超解像ブロックと重なる部分のブレンディングを行う。 In step S5, the block placement unit 14 places the super-resolution block generated in step S4 at coordinates (2p, 2q) on the output image H with respect to the coordinates (p, q) generated in step S1. place in position. In step S1, when the block cutout position is set to a position where the blocks do not overlap, the block placement means 14 places the super-resolution block at the position of the coordinates (2p, 2q) on the output image H as it is. On the other hand, in step S1, when the block cutout position is set to a position where the blocks do not overlap, the block arrangement means 14 places the already arranged super-resolution block at the position of the coordinates (2p, 2q) on the output image H. Blend the overlapping parts.

ステップＳ６において、ブロック走査手段１１は、入力画像Ｌのすべてのブロックを走査したか否かを判定する。
ここで、入力画像Ｌのすべてのブロックを走査していない場合（ステップＳ６でＮｏ）、画像超解像装置１は、ステップＳ１に戻って、動作を継続する。 In step S6, the block scanning means 11 determines whether or not all blocks of the input image L have been scanned.
Here, if all the blocks of the input image L have not been scanned (No in step S6), the image super-resolution apparatus 1 returns to step S1 and continues its operation.

一方、入力画像Ｌのすべてのブロックを走査した場合（ステップＳ６でＹｅｓ）、ステップＳ７において、画像超解像装置１は、超解像ブロックを配置した出力画像（超解像画像）Ｈを出力する。
以上の動作により、画像超解像装置１は、畳み込みニューラルネットワークによって、高解像度の画像（超解像画像）を生成することができる。 On the other hand, when all blocks of the input image L have been scanned (Yes in step S6), in step S7, the image super-resolution apparatus 1 outputs an output image (super-resolution image) H in which super-resolution blocks are arranged. do.
By the operation described above, the image super-resolution device 1 can generate a high-resolution image (super-resolution image) using a convolutional neural network.

（畳み込みニューラルネットワークの具体例）
ここで、画像超解像装置１が用いる畳み込みニューラルネットワークの一例について説明する。
図５は、画像超解像装置１の具体例を示すブロック構成図であって、畳み込みニューラルネットワークＮ_１として、５層ＣＮＮの例を示している。図５で、Ｃｏｎｖ．（５，５，１６）は、畳込手段１２０を示し、５×５のカーネルを１６種類備えていることを示している。また、ＲｅＬＵは、正規化線形関数を用いた活性化関数適用手段１２１を示している。
最終段の畳込手段であるＣｏｎｖ．（５，５，３）は、５×５のカーネルを３種類備えていることを示している。また、ここでは、最終段に、活性化関数適用手段１２１を用いない例を示している。なお、Ｃｏｎｖ．が使用するカーネルの結合重み係数は、パラメータ学習装置２（図７）からパラメータＰａとして与えられる。 (Concrete example of convolutional neural network)
An example of the convolutional neural network used by the image super-resolution device 1 will now be described.
FIG. 5 is a block configuration diagram showing a specific example of the image super-resolution device ₁ , showing an example of a 5-layer CNN as the convolutional neural network N1. In FIG. 5, Conv. (5, 5, 16) indicates that the convolution means 120 has 16 kinds of 5×5 kernels. ReLU also indicates activation function application means 121 using a normalized linear function.
Conv. (5, 5, 3) indicates that three types of 5×5 kernels are provided. Also, here, an example is shown in which the activation function applying means 121 is not used in the final stage. In addition, Conv. is given as a parameter Pa from the parameter learning device 2 (FIG. 7).

最終段のＣｏｎｖ．（５，５，３）のカーネルを３種類とすることで、畳み込みニューラルネットワークＮ_１は、ＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）およびＨＨ画像（ＨＨ_１＾）の３種類の画像を出力する。
これによって、画像超解像装置１は、ブロックＢをＬＬ画像（ＬＬ_１）とし、畳み込みニューラルネットワークＮ_１の出力であるＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）およびＨＨ画像（ＨＨ_１＾）とを、ウェーブレット再構成することで、超解像ブロックＳ（ＬＬ_０＾）を生成することができる。 Conv. By using three types of (5, 5, 3) kernels, the convolutional neural network N ₁ has three types of HL images (HL ₁ ^), LH images (LH ₁ ^) and HH images (HH ₁ ^). image.
As a result, the image super-resolution device ₁ uses the block B as an LL image (LL ₁ ), and the HL image (HL ₁ ^), LH image (LH ₁ ^) and HH image ( HH ₁ ̂) and wavelet reconstruction can generate a super-resolution block S(LL ₀ ̂).

図６は、画像超解像装置１の他の具体例を示すブロック構成図であって、畳み込みニューラルネットワークＮ_２として、８層ＣＮＮの例を示している。
なお、ここでは、図５の畳み込みニューラルネットワークＮ_１と層数が異なる以外に、各層間に適宜加算器Ａを備え、直前の層の出力に、その層よりも前の層の出力を加算する構成としている。
このように、畳み込みニューラルネットワークＮ_２は、ＲｅｓＮｅｔ（Residual Network）の構成としてもよい。これによって、少ない層数でも層の深度を深めることで、より推定精度の高い畳み込みニューラルネットワークを構成することができる。 FIG. 6 is a block configuration diagram showing another specific example of the image super-resolution device 1, showing an example of an 8-layer CNN as the convolutional neural network _N2 .
In this case, in addition to the _number of layers being different from that of the convolutional neural network N1 in FIG. It is configured.
Thus, the convolutional neural network _N2 may be configured as a ResNet (Residual Network). As a result, it is possible to configure a convolutional neural network with higher estimation accuracy by increasing the depth of layers even with a small number of layers.

＜パラメータ学習装置の構成＞
次に、図７を参照して、パラメータ学習装置２の構成について説明する。なお、ここでは、パラメータ学習装置２に入力される画像Ｄの解像度を水平Ｄ_ｘ画素、垂直Ｄ_ｙ画素とする。
図７に示すように、パラメータ学習装置２は、ブロック切り出し手段２０と、ブロック走査手段２１と、ウェーブレット分解手段２２と、学習用畳み込みニューラルネットワーク手段２３と、誤差演算手段２４と、パラメータ出力手段２５と、を備える。 <Configuration of parameter learning device>
Next, the configuration of the parameter learning device 2 will be described with reference to FIG. Here, the resolution of the image D input to the parameter learning device 2 is assumed to be horizontal D _x pixels and vertical D _y pixels.
As shown in FIG. 7, the parameter learning device 2 includes block extraction means 20, block scanning means 21, wavelet decomposition means 22, learning convolutional neural network means 23, error calculation means 24, and parameter output means 25. And prepare.

ブロック切り出し手段２０は、入力画像（画像Ｄ）の部分画像であるブロックを切り出すものである。以下、画像Ｄの画像座標（ｘ，ｙ）における第ｃの色成分の画素値をＤ（ｘ，ｙ，ｃ）と記す。ここで、画像Ｄをモノクロ画像とした場合、ｃ＝０、画像ＤをＣ原色のカラー画像とした場合、ｃは０以上Ｃ未満（Ｃは２以上の整数、例えば、ＲＧＢ画像の場合Ｃ＝３）である。 The block cutout means 20 cuts out a block which is a partial image of the input image (image D). Hereinafter, the pixel value of the c-th color component at the image coordinates (x, y) of the image D is denoted as D(x, y, c). Here, when the image D is a monochrome image, c = 0, and when the image D is a C primary color image, c is 0 or more and less than C (C is an integer of 2 or more, for example, in the case of an RGB image, C = 3).

ブロック切り出し手段２０は、水平２Ｐ画素および垂直２Ｑ画素（２Ｐ×２Ｑ画素）の矩形領域のブロックを画像Ｄから切り出す。ここで、ＰおよびＱはともに自然数とし、かつ、Ｐ×Ｑは２以上とする。なお、ＰおよびＱは、画像超解像装置１のブロック切り出し手段１０（図２）が切り出すブロックの水平画素数（Ｐ）および垂直画素数（Ｑ）と同じとする。例えば、Ｐ＝８およびＱ＝８である。 The block cutout means 20 cuts out from the image D a rectangular block of 2P pixels in the horizontal direction and 2Q pixels in the vertical direction (2P×2Q pixels). Here, both P and Q are natural numbers, and P×Q is 2 or more. Note that P and Q are the same as the number of horizontal pixels (P) and the number of vertical pixels (Q) of the block cut out by the block cutout means 10 (FIG. 2) of the image super-resolution device 1 . For example, P=8 and Q=8.

ここで、ブロック切り出し手段２０は、後記するブロック走査手段２１が指定する切り出し座標（ｐ，ｑ）を基準に切り出しを行う。例えば、ブロック走査手段２１から、切り出し座標（ｐ，ｑ）を指定された場合、ブロック切り出し手段２０は、画像座標（ｐ，ｑ）と画像座標（ｐ＋２Ｐ－１，ｑ＋２Ｑ－１）とを対角の２点とする矩形内（境界を含む）の画像Ｄの画素値列を部分画像（ブロック）として切り出す。
ブロック切り出し手段２０は、切り出したブロックをウェーブレット分解手段２２に出力する。 Here, the block cutout means 20 cuts out based on the cutout coordinates (p, q) specified by the block scanning means 21, which will be described later. For example, when the block scanning means 21 designates cut-out coordinates (p, q), the block cut-out means 20 converts the image coordinates (p, q) and the image coordinates (p+2P-1, q+2Q-1) diagonally. A pixel value string of the image D within the rectangle (including the boundary) with two points is cut out as a partial image (block).
The block cutting means 20 outputs the cut blocks to the wavelet decomposition means 22 .

なお、ブロック切り出し手段２０は、ブロックの切り出しとともに、色成分ｃごとの画素値の正規化（係数α_ｃ，オフセットβ_ｃ）を施しても構わない。
具体的には、ブロック切り出し手段２０は、以下の式（２０）により正規化を行いブロックＥの画素値（ｘ，ｙ，ｃ）とする。 Note that the block extraction unit 20 may normalize the pixel values (coefficient α _c , offset β _c ) for each color component c along with the block extraction.
Specifically, the block cut-out means 20 performs normalization according to the following equation (20) to obtain the pixel values (x, y, c) of the block E.

例えば、画像Ｄが、輝度・色差表現によるカラー画像（Ｃ＝３）であって、ｃ＝０（輝度）については、画素値が１６～２３５の範囲、ｃ＝１およびｃ＝２（色差）については、画素値が１６～２４０の範囲である場合、α_０＝１／２１９、β_０＝－１６／２１９、α_１＝α_２＝１／２２４、β_１＝β_２＝－１６／２２４とする。 For example, the image D is a color image (C=3) expressed by luminance and color difference, and for c=0 (luminance), the pixel values are in the range of 16 to 235, c=1 and c=2 (color difference). , when the pixel values are in the range of 16 to 240, α ₀ =1/219, β ₀ =−16/219, α ₁ =α ₂ =1/224, β ₁ =β ₂ =−16/224 and

ブロック走査手段２１は、ブロック切り出し手段１０がブロックを切り出す基準となる切り出し座標（ｐ，ｑ）を逐次生成するものである。
ブロック走査手段２１は、画像Ｄ内を所定の画素間隔でラスタスキャンするように走査しても構わないし、乱数により座標（ｐ，ｑ）を生成することとしても構わない。
ブロック走査手段２１が乱数により座標（ｐ，ｑ）を生成する場合、例えば、ｐは０以上（Ｄ_ｘ－２Ｐ）以下の一様乱数、また、ｑは０以上（Ｄ_ｙ－２Ｑ）以下の一様乱数とする。なお、この一様乱数は、それを近似する疑似乱数としても構わない。
ブロック走査手段２１は、生成した切り出し座標（ｐ，ｑ）を、ブロック切り出し手段２０に出力する。 The block scanning means 21 sequentially generates cut-out coordinates (p, q) that serve as references for the block cut-out means 10 to cut out blocks.
The block scanning means 21 may perform raster scanning within the image D at predetermined pixel intervals, or may generate coordinates (p, q) from random numbers.
When the block scanning means 21 generates coordinates (p, q) from random numbers, for example, p is a uniform random number of 0 or more (D _x −2P) or less, and q is a uniform random number of 0 or more (D _y −2Q) or less. Uniform random numbers. The uniform random numbers may be pseudo-random numbers that approximate them.
The block scanning means 21 outputs the generated clipping coordinates (p, q) to the block clipping means 20 .

図８に、本発明の実施形態に係るパラメータ学習装置２のカラー画像を対象としたウェーブレット分解手段２２の構成を示す。
ウェーブレット分解手段２２は、色成分ごとにウェーブレット分解を行う第１ウェーブレット分解手段２２_１と、第２ウェーブレット分解手段２２_２と、第３ウェーブレット分解手段２２_３と、を有し、ブロック切り出し手段２０で切り出したブロックを入力して、ウェーブレット分解を行うものである。なお、入力画像Ｄがモノクロ画像の場合には、ウェーブレット分解手段２２は図７に示すように１つの構成とすればよい。 FIG. 8 shows the configuration of the wavelet decomposition means 22 for color images of the parameter learning device 2 according to the embodiment of the present invention.
The wavelet decomposition means 22 has a _first wavelet decomposition means 22-1, a _second wavelet decomposition means 22-2, and a _third wavelet decomposition means 22-3 that perform wavelet decomposition for each color component. Input the extracted block and perform wavelet decomposition. Note that when the input image D is a monochrome image, the wavelet decomposition means 22 may have a single configuration as shown in FIG.

ウェーブレット分解手段２２は、入力したブロックに対して、２次元ウェーブレット分解を適用することで、水平、垂直ともに低域成分であるＬＬ画像、水平が高域成分、垂直が低域成分であるＨＬ画像、水平が低域成分、垂直が高域成分であるＬＨ画像、および、水平、垂直ともに高域成分であるＨＨ画像を生成する。ＬＬ画像、ＨＬ画像、ＬＨ画像およびＨＨ画像は、いずれもＰ×Ｑ画素の解像度を有する。 By applying two-dimensional wavelet decomposition to the input block, the wavelet decomposition means 22 generates an LL image that is both horizontal and vertical low-frequency components, and an HL image that is horizontal high-frequency components and vertical low-frequency components. , an LH image with horizontal low-frequency components and vertical high-frequency components, and an HH image with both horizontal and vertical high-frequency components. The LL, HL, LH and HH images all have a resolution of P×Q pixels.

２次元ウェーブレット分解に用いる基底関数は、画像超解像装置１のウェーブレット再構成手段１３（図２）が用いた基底関数と同じ（例えば、ハール基底）であることが好ましい。
例えば、基底関数としてハール基底を用いる場合、ウェーブレット分解手段２２は、以下の式（２１）により、ブロック切り出し手段２０で切り出したブロックＥから、ＬＬ画像（ＬＬ（ｒ，ｓ，ｔ））、ＨＬ画像（ＨＬ（ｒ，ｓ，ｔ））、ＬＨ画像（ＬＨ（ｒ，ｓ，ｔ））およびＨＨ画像（ＨＨ（ｒ，ｓ，ｔ））を生成する。 The basis functions used for two-dimensional wavelet decomposition are preferably the same (for example, Haar basis) as the basis functions used by the wavelet reconstruction means 13 (FIG. 2) of the image super-resolution apparatus 1 .
For example, when Haar bases are used as the basis functions, the wavelet decomposition means 22 extracts the LL image (LL (r, s, t)), HL Generate images (HL(r,s,t)), LH images (LH(r,s,t)) and HH images (HH(r,s,t)).

ただし、入力画像Ｄがモノクロ画像の場合には、Ｃ＝１とする。
ウェーブレット分解手段２２は、生成したＬＬ画像を、学習用畳み込みニューラルネットワーク手段２３に出力し、ＨＬ画像、ＬＨ画像およびＨＨ画像を、誤差演算手段２４に出力する。 However, when the input image D is a monochrome image, C=1.
The wavelet decomposition means 22 outputs the generated LL image to the learning convolutional neural network means 23 and outputs the HL, LH and HH images to the error calculation means 24 .

学習用畳み込みニューラルネットワーク手段２３は、ウェーブレット分解手段２２で生成されたＬＬ画像を入力し、出力がウェーブレット分解手段２２で生成されたＨＬ画像、ＬＨ画像およびＨＨ画像となるように、畳み込みニューラルネットワークのパラメータ（カーネルの結合重み係数）を学習するものである。
図７に示すように、学習用畳み込みニューラルネットワーク手段２３は、Ｌ個の畳込手段２３０（２３０_１，２３０_２，…，２３０_Ｌ）と、Ｌ個の活性化関数適用手段２３１（２３１_１，２３１_２，…，２３１_Ｌ）と、を備える。畳込手段２３０および活性化関数適用手段２３１は、画像超解像装置１の畳み込みニューラルネットワーク手段１２（図２）の畳込手段１２０および活性化関数適用手段１２１と同じ接続構成とする。 The learning convolutional neural network means 23 inputs the LL image generated by the wavelet decomposition means 22, and performs the convolutional neural network so that the output is the HL image, the LH image and the HH image generated by the wavelet decomposition means 22. It learns parameters (coupling weight coefficients of kernels).
As shown in FIG. 7, the learning convolutional neural network means 23 includes L convolution means 230 (230 ₁ , 230 ₂ , . . . , 230 _L ) and L activation function application means 231 (231 ₁ , 231 ₂ , . . . , 231 _L ). The convolution means 230 and activation function application means 231 have the same connection configuration as the convolution means 120 and activation function application means 121 of the convolution neural network means 12 ( FIG. 2 ) of the image super-resolution apparatus 1 .

畳込手段２３０は、逐次学習される結合重み係数（パラメータ）を用いて畳み込み演算を行うものである。さらに、畳込手段２３０は、畳み込みニューラルネットワークの後段から入力される誤差に基づいて誤差逆伝播法により結合重み係数を更新し、誤差を前段に伝播するものでもある。
畳込手段２３０_ｉ（ｉは１以上Ｌ以下の整数）は、サイズＰ×Ｑ×Ｋ_ｉ－１の３階テンソルＩ_ｉ－１（ｒ，ｓ，ｔ）の入力に対して、畳み込み演算を行い、サイズＰ×Ｑ×Ｋ_ｉの３階テンソルＪ_ｉ（ｒ，ｓ，ｔ）として出力する（前記式（４）参照）。
なお、畳込手段２３０_ｉが用いるカーネルのサイズおよび種類は、畳込手段１２０_ｉ（図２）と同じとする。 The convolution means 230 performs a convolution operation using connection weight coefficients (parameters) that are successively learned. Further, the convolution means 230 updates the coupling weight coefficients by error backpropagation based on the error input from the latter stage of the convolutional neural network, and propagates the error to the previous stage.
The convolution means 230 _i (i is an integer of 1 or more and L or less) performs a convolution operation on the input of the third order tensor I _i−1 (r, s, t) of size P×Q×K _i−1 . and output as a third-order tensor J _i (r, s, t) of size P×Q×K _i (see equation (4) above).
It is assumed that the size and type of kernel used by the convolution means 230 _i are the same as those used by the convolution means 120 _i (FIG. 2).

ここで。初段の畳込手段２３０_１への入力は、サイズＰ×Ｑ×Ｋ_０の３階テンソルＩ_０（ｒ，ｓ，ｔ）であるが、Ｋ_０は入力画像Ｄがモノクロ画像の場合にはＫ_０＝１、Ｃチャンネルのカラー画像の場合にはＫ_０＝Ｃと定義する（Ｃは原色の数、例えば、ＲＧＢ画像等の典型的なカラー画像においてはＣ＝３）。
また、畳込手段２３０_１への入力であるＩ_０（ｒ，ｓ，ｔ）には、以下の式（２２）に示すように、ウェーブレット分解手段２２から入力されるＬＬ画像（ＬＬ（ｒ，ｓ，ｔ））を設定する。 here. The input to the first-stage convolution means 230 ₁ is a third-order tensor I ₀ (r, s, t) of size P×Q×K ₀ , where K ₀ is K if the input image D is a monochrome image. Define ₀ = 1 and K ₀ =C for a color image with a C channel (where C is the number of primary colors, eg C=3 for a typical color image such as an RGB image).
Also, the LL image input from the _wavelet decomposition means 22 ( _LL (r, s, t)).

活性化関数適用手段２３１は、畳込手段２３０の出力に対して、活性化関数を用いた演算を行うものである。さらに、活性化関数適用手段２３１は、畳み込みニューラルネットワークの後段から入力される誤差を前段に伝播するものでもある。
活性化関数適用手段２３１_ｉ（ｉは１以上Ｌ以下の整数）は、畳込手段２３０_ｉから入力されるサイズＰ×Ｑ×Ｋ_ｉの３階テンソルＪ_ｉ（ｒ，ｓ，ｔ）の各成分に対して、活性化関数φを適用し、その適用結果を、サイズＰ×Ｑ×Ｋ_ｉの３階テンソルＩ_ｉ（ｒ，ｓ，ｔ）として出力する。なお、活性化関数適用手段２３１_ｉが用いる活性化関数は、活性化関数適用手段１２１_ｉと同じとする。 The activation function applying means 231 performs calculation using the activation function on the output of the convolution means 230 . Furthermore, the activation function applying means 231 also propagates the error inputted from the latter stage of the convolutional neural network to the former stage.
Activation function application means 231 _i ( _i is an integer of 1 or more and L or less ₎ _receives each To the components, we apply an activation function φ and output the result as a third-order tensor I _i (r, s, t) of size P×Q×K _i . It is assumed that the activation function used by the activation function application means _231i is the same as that used by the activation function application means _121i .

学習用畳み込みニューラルネットワーク手段２３は、畳み込みニューラルネットワーク手段１２（図２）と同様に、畳込手段２３０_１から活性化関数適用手段２３１_Ｌへとテンソルを順伝播することで、サイズＰ×Ｑ×３Ｃの３階テンソルＪ_Ｌ（ｒ，ｓ，ｔ）を算出する。なお、畳込手段２３０_１から畳込手段２３０_Ｌまでのそれぞれの畳込手段２３０の結合重み係数（パラメータ）の初期値は、予め無作為的または作為的に設定しておく。例えば、結合重み係数の初期値は、一様乱数またはこれを近似する疑似乱数により生成し、設定することができる。
学習用畳み込みニューラルネットワーク手段２３は、算出した３階テンソルＪ_Ｌを誤差演算手段２４に出力する。 Similarly to the convolutional neural network means 12 (FIG. 2), the learning convolutional neural network means 23 propagates the tensor forward from the convolution means ₂₃₀₁ to the activation function application means _231L to obtain a tensor of size P×Q× Compute the third-order tensor J _L (r, s, t) of 3C. The initial values of the connection weight coefficients (parameters) of the convolution means 230 ₁ to 230 _L are set randomly or intentionally in advance. For example, the initial values of the connection weight coefficients can be generated and set using uniform random numbers or pseudo-random numbers that approximate them.
The learning convolutional neural network means 23 outputs the calculated third order tensor J _L to the error calculation means 24 .

また、学習用畳み込みニューラルネットワーク手段２３は、逐次、誤差演算手段２４から誤差を入力されるたびに、繰り返し、誤差逆伝播法により結合重み係数を更新する。この繰り返しの回数は、予め定めた回数（例えば、１００万回）であってもよいし、学習用畳み込みニューラルネットワーク手段２３が畳込手段２３０の結合重み係数の変化の度合いを監視し、その変化の度合いが予め定めた閾値を下回るまでであってもよい。あるいは、繰り返しの回数は、予め定めた回数を超え、かつ、結合重み係数の変化の度合いが閾値を下回るまでとしてもよい。
学習用畳み込みニューラルネットワーク手段２３は、誤差逆伝播法による結合重み係数の更新を完了（学習完了）した後、それぞれの畳込手段２３０の結合重み係数をパラメータ出力手段２５に出力する。 Further, the learning convolutional neural network means 23 repetitively updates the connection weighting coefficients by the error backpropagation method each time an error is successively input from the error computing means 24 . The number of repetitions may be a predetermined number (for example, one million times), or the learning convolutional neural network means 23 monitors the degree of change in the connection weighting coefficients of the convolution means 230 and may fall below a predetermined threshold. Alternatively, the number of repetitions may exceed a predetermined number of times and the degree of change in the connection weighting coefficients may be less than the threshold.
The learning convolutional neural network means 23 outputs the connection weight coefficients of the respective convolution means 230 to the parameter output means 25 after completing the update of the connection weight coefficients by the error backpropagation method (learning completion).

誤差演算手段２４は、学習用畳み込みニューラルネットワーク手段２３で演算された３階テンソルＪ_Ｌと、ウェーブレット分解手段２２で生成されたＨＬ画像、ＬＨ画像およびＨＨ画像との誤差を演算するものである。
誤差演算手段２４は、以下の式（２３）に示すように、３階テンソルＪ_Ｌ（ｒ，ｓ，ｔ）と、ＨＬ画像（ＨＬ（ｒ，ｓ，ｔ））、ＬＨ画像（ＬＨ（ｒ，ｓ，ｔ））およびＨＨ画像（ＨＨ（ｒ，ｓ，ｔ））とから、サイズＰ×Ｑ×３Ｃの３階テンソル値である誤差テンソルΔを演算し、学習用畳み込みニューラルネットワーク手段２３に出力する。 The error computing means 24 computes the errors between the 3rd order tensor J _L computed by the learning convolutional neural network means 23 and the HL, LH and HH images generated by the wavelet decomposition means 22 .
As shown in the following equation (23), the error calculation means 24 calculates the third-order tensor J _L (r, s, t), the HL image (HL (r, s, t)), the LH image (LH (r , s, t)) and the HH image (HH (r, s, t)), an error tensor Δ, which is a third-order tensor value of size P×Q×3C, is calculated, and sent to the learning convolutional neural network means 23 Output.

パラメータ出力手段２５は、学習用畳み込みニューラルネットワーク手段２３の学習完了後出力されるそれぞれの畳込手段２３０における結合重み係数を、出力パラメータとして出力するものである。
このパラメータ出力手段２５が出力するパラメータは、画像超解像装置１（図２）の畳み込みニューラルネットワーク手段１２を構成する畳込手段１２０（１２０_１，１２０_２，…，１２０_Ｌ）に設定されることで、画像超解像装置１を最適な状態で動作させることができる。 The parameter output means 25 outputs, as output parameters, the connection weighting coefficients of the respective convolution means 230 output after the learning convolution neural network means 23 completes learning.
The parameters output by the parameter output means 25 are set in the _convolution means 120 (120 ₁ , 120 ₂ , . Thus, the image super-resolution device 1 can be operated in an optimal state.

以上説明したようにパラメータ学習装置２を構成することで、パラメータ学習装置２は、画像超解像装置１で画像を高解像度化するための畳み込みニューラルネットワークのパラメータを学習することができる。
なお、パラメータ学習装置２は、コンピュータを、前記した各手段として機能させるためのプログラム（パラメータ学習プログラム）により動作させることができる。 By configuring the parameter learning device 2 as described above, the parameter learning device 2 can learn the parameters of the convolutional neural network for increasing the resolution of the image by the image super-resolution device 1 .
The parameter learning device 2 can be operated by a program (parameter learning program) for causing a computer to function as each means described above.

＜パラメータ学習装置の動作＞
図９を参照（構成については、適宜図７参照して、パラメータ学習装置２の動作について説明する。なお、畳み込みニューラルネットワーク手段１２の畳込手段１２０の結合重み係数は、予めパラメータ学習装置２（図７）によって学習されたパラメータが設定されているものとする。 <Operation of parameter learning device>
9 (for the configuration, the operation of the parameter learning device 2 will be described with reference to FIG. 7 as needed. Note that the connection weight coefficients of the convolution means 120 of the convolution neural network means 12 are set in advance by the parameter learning device 2 ( It is assumed that the parameters learned by FIG. 7) are set.

ステップＳ１０において、ブロック走査手段２１は、入力画像Ｄにおいて、ラスタ走査またはランダムに、２Ｐ×２Ｑ画素のブロックの切り出し位置となる座標（ｐ，ｑ）を生成する。 In step S10, the block scanning means 21 generates coordinates (p, q) that are positions for cutting out a block of 2P×2Q pixels in the input image D by raster scanning or randomly.

ステップＳ１１において、ブロック切り出し手段２０は、入力画像Ｄから、ステップＳ１０で生成された座標（ｐ，ｑ）を切り出し位置として、入力画像Ｄから２Ｐ×２Ｑ画素のブロックを切り出す。ブロック切り出し手段２０が切り出すブロックは、図１（ｂ）に示したブロックＥのＬＬ画像（ＬＬ_０′）に相当する。 In step S11, the block clipping means 20 clips a block of 2P×2Q pixels from the input image D using the coordinates (p, q) generated in step S10 as the clipping position. The block cut out by the block cutout means 20 corresponds to the LL image (LL ₀ ′) of block E shown in FIG. 1(b).

ステップＳ１２において、ウェーブレット分解手段２２は、ステップＳ１１で切り出したブロックに対して、２次元ウェーブレット分解を適用する。これによって、ウェーブレット分解手段２２は、ブロックから、水平、垂直ともに低域成分であるＬＬ画像、水平が高域成分、垂直が低域成分であるＨＬ画像、水平が低域成分、垂直が高域成分であるＬＨ画像、および、水平、垂直ともに高域成分であるＨＨ画像を生成する。ウェーブレット分解後の画像は、図１（ｂ）に示したＬＬ画像（ＬＬ_１′）、ＨＬ画像（ＨＬ_１′）、ＬＨ画像（ＬＨ_１′）およびＨＨ画像（ＨＨ_１′）に相当する。 In step S12, the wavelet decomposition means 22 applies two-dimensional wavelet decomposition to the blocks cut out in step S11. As a result, the wavelet decomposition means 22 extracts, from the blocks, an LL image that is both horizontal and vertical low-frequency components, an HL image that is horizontal high-frequency components and vertical low-frequency components, a horizontal low-frequency component and a vertical high-frequency image. An LH image, which is a component, and an HH image, which is both horizontal and vertical high frequency components, are generated. The images after wavelet decomposition correspond to the LL image (LL ₁ '), HL image (HL ₁ '), LH image (LH ₁ ') and HH image (HH ₁ ') shown in FIG. 1(b).

ステップＳ１３において、学習用畳み込みニューラルネットワーク手段２３は、ステップＳ１２で生成したＬＬ画像を入力し、畳込手段２３０および活性化関数適用手段２３１で構成された畳み込みニューラルネットワーク（ＣＮＮ）による演算を実行することで、ブロックの標本数の３倍の標本数のデータを出力する。この学習用畳み込みニューラルネットワーク手段２３の出力は、図１（ｂ）に示したＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）、および、ＨＨ画像（ＨＨ_１＾）に相当する。 In step S13, the learning convolutional neural network means 23 inputs the LL image generated in step S12, and performs an operation by a convolutional neural network (CNN) composed of the convolution means 230 and the activation function application means 231. By doing so, the data of the number of samples three times the number of samples of the block is output. The output of this learning convolutional neural network means 23 corresponds to the HL image (HL ₁ ̂), the LH image (LH ₁ ̂), and the HH image (HH ₁ ̂) shown in FIG. 1(b).

ステップＳ１４において、誤差演算手段２４は、ステップＳ１２でウェーブレット分解で生成したＨＬ画像（ＨＬ_１′）、ＬＨ画像（ＬＨ_１′）およびＨＨ画像（ＨＨ_１′）と、ステップＳ１３でＣＮＮの演算で生成したＨＬ画像（ＨＬ_１＾）、ＬＨ画像（ＬＨ_１＾）およびＨＨ画像（ＨＨ_１＾）との誤差を演算する。 In step S14, the error computing means 24 computes the HL image (HL ₁ '), the LH image (LH ₁ '), and the HH image (HH ₁ ') generated by wavelet decomposition in step S12, and the CNN in step S13. Errors with the generated HL image (HL ₁ ̂), LH image (LH ₁ ̂) and HH image (HH ₁ ̂) are calculated.

ステップＳ１５において、学習用畳み込みニューラルネットワーク手段２３は、ステップＳ１４で演算された誤差に基づいて、誤差逆伝播法により、畳み込みニューラルネットワーク（ＣＮＮ）の結合重み係数を更新する。
ステップＳ１６において、学習用畳み込みニューラルネットワーク手段２３は、予め定めた繰り返し回数等によって、学習が完了したか否かを判定する。
ここで、学習が完了していない場合（ステップＳ１６でＮｏ）、パラメータ学習装置２は、ステップＳ１０に戻って、動作を継続する。 In step S15, the learning convolutional neural network means 23 updates the connection weight coefficients of the convolutional neural network (CNN) by error backpropagation based on the error calculated in step S14.
In step S16, the learning convolutional neural network means 23 determines whether or not learning has been completed based on a predetermined number of iterations or the like.
Here, if the learning is not completed (No in step S16), the parameter learning device 2 returns to step S10 and continues the operation.

一方、学習が完了した場合（ステップＳ１６でＹｅｓ）、ステップＳ１７において、パラメータ出力手段２５は、学習用畳み込みニューラルネットワーク手段２３の畳込手段２３０における結合重み係数を、出力パラメータとして出力する
以上の動作により、パラメータ学習装置２は、画像超解像装置１が用いる畳み込みニューラルネットワークのパラメータを学習することができる。 On the other hand, when the learning is completed (Yes in step S16), in step S17, the parameter output means 25 outputs the connection weighting coefficient in the convolution means 230 of the learning convolutional neural network means 23 as an output parameter. Thus, the parameter learning device 2 can learn the parameters of the convolutional neural network used by the image super-resolution device 1 .

なお、パラメータ学習装置２におけるパラメータの学習は画像超解像装置１を製造する前に行い、学習後のパラメータを画像超解像装置１に反映すればよい。
また、画像超解像装置１の製造後、適宜の時期にパラメータ学習装置２におけるパラメータの学習を行い、画像超解像装置１のパラメータを再設定することとしてもよい。
また、例えば、パラメータ学習装置２が画像超解像装置１と同一の入力画像で学習を行う場合、画像超解像装置１の動作中の適宜の時期（例えば、入力画像の毎入力時）に、パラメータ学習装置２を動作させ、学習後のパラメータを画像超解像装置１に設定することとしてもよい。 Note that parameter learning in the parameter learning device 2 may be performed before the image super-resolution device 1 is manufactured, and parameters after learning may be reflected in the image super-resolution device 1 .
Further, after the image super-resolution device 1 is manufactured, the parameters of the image super-resolution device 1 may be reset by performing parameter learning in the parameter learning device 2 at an appropriate time.
Further, for example, when the parameter learning device 2 performs learning with the same input image as the image super-resolution device 1, at an appropriate time during the operation of the image super-resolution device 1 (for example, each time an input image is input) , the parameter learning device 2 may be operated, and the parameters after learning may be set in the image super-resolution device 1 .

１画像超解像装置
１０ブロック切り出し手段
１１ブロック走査手段
１２畳み込みニューラルネットワーク手段
１２０畳込手段
１２１活性化関数適用手段
１３ウェーブレット再構成手段
１４ブロック配置手段
２パラメータ学習装置
２０ブロック切り出し手段
２１ブロック走査手段
２２ウェーブレット分解手段
２３学習用畳み込みニューラルネットワーク手段
２３０畳込手段
２３１活性化関数適用手段
２４誤差演算手段
２５パラメータ出力手段 REFERENCE SIGNS LIST 1 image super-resolution device 10 block extraction means 11 block scanning means 12 convolution neural network means 120 convolution means 121 activation function application means 13 wavelet reconstruction means 14 block placement means 2 parameter learning device 20 block extraction means 21 block scanning means 22 wavelet decomposition means 23 learning convolutional neural network means 230 convolution means 231 activation function application means 24 error calculation means 25 parameter output means

Claims

An image super-resolution device that increases the resolution of an input image using a convolutional neural network that estimates high-frequency components of the image from low-frequency components obtained by wavelet decomposition of the image,
a block extraction means for extracting a block of a predetermined size from the input image;
Convolutional neural network means for estimating the high-frequency component corresponding to the block using the convolutional neural network, with the block as the low-frequency component;
wavelet reconstruction means for wavelet reconstruction of the block and the high-frequency component to generate a super-resolution block by super-resolving the block;
a block placement unit that rearranges the super-resolution blocks according to the positions from which the blocks are cut out to generate a super-resolution image for the input image;
An image super-resolution device comprising:

2. The image super-resolution apparatus according to claim 1, wherein said convolutional neural network estimates high-frequency components for said channels from images for one or more channels regarding color.

The block cutout means cuts out the blocks for the channels from the images for the channels and inputs them to the convolutional neural network means,
The wavelet reconstruction means generates super-resolution blocks for the channels from the blocks for the channels and high-frequency components for the channels estimated by the convolutional neural network means,
3. The super-resolution image according to claim 2, wherein the block arranging unit arranges the super-resolution blocks for each channel to generate a super-resolution image corresponding to the number of channels. Device.

The block clipping means clips the input image so that regions overlap,
4. The image super-resolution according to any one of claims 1 to 3, wherein the block arrangement means generates the super-resolution image by synthesizing overlapping regions of the super-resolution blocks. image device.

A parameter learning device for learning parameters of a convolutional neural network used in the image super-resolution device according to any one of claims 1 to 4,
a block extraction means for sequentially extracting blocks having twice the resolution in the horizontal direction and the vertical direction of the input image of the convolutional neural network from the input image;
Wavelet decomposition means for generating a low-frequency component and a high-frequency component by wavelet-decomposing the block;
learning convolutional neural network means for estimating high-frequency components by inputting the low-frequency components and propagating them forward in the convolutional neural network;
an error calculation means for calculating an error between the high frequency component generated by the wavelet decomposition means and the high frequency component estimated by the learning convolutional neural network means;
The learning convolutional neural network means learns the connection weighting coefficient of the convolutional neural network as the parameter by propagating the error backward in the convolutional neural network by an error backpropagation method. Parameter learning device.

6. The parameter learning apparatus according to claim 5, wherein the convolutional neural network estimates high-frequency components for the channels from images for one or more channels regarding color.

The block cutout means cuts out blocks for the channel from the image for the channel,
7. The parameter learning apparatus according to claim 6, wherein said wavelet decomposition means generates low frequency components and high frequency components for said channels from blocks for said channels.

An image super-resolution program for causing a computer to function as the image super-resolution device according to any one of claims 1 to 4.

A parameter learning program for causing a computer to function as the parameter learning device according to any one of claims 5 to 7.