JP7475842B2

JP7475842B2 - Image decoding device, control method, and program

Info

Publication number: JP7475842B2
Application number: JP2019213302A
Authority: JP
Inventors: 大輔坂本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2024-04-30
Anticipated expiration: 2039-11-26
Also published as: JP2021087054A

Description

本発明は、符号化されたデータを復号する画像復号装置、制御方法、およびプログラムに関する。 The present invention relates to an image decoding device, a control method, and a program for decoding encoded data.

近年、ＣＣＤセンサやＣＭＯＳセンサ等のイメージセンサを採用した撮像素子を有するデジタルカメラやデジタルカムコーダ等の撮像装置が使用されている。以上のようなイメージセンサにおいては、センサ表面に設けられたカラーフィルタアレイ（ＣＦＡ）によって１画素が１つの色成分を構成する。ＣＦＡを用いることによって、例えば、図１に示すような周期的なパターンで色成分（Ｒ（赤）、Ｇ０（緑）、Ｂ（青）、Ｇ１（緑））が配置されたベイヤー配列の画像データ（以下、ＲＡＷデータと称することがある）が取得される。 In recent years, imaging devices such as digital cameras and digital camcorders have been used that have imaging elements that use image sensors such as CCD sensors and CMOS sensors. In such image sensors, one pixel constitutes one color component due to a color filter array (CFA) provided on the sensor surface. By using a CFA, image data (hereinafter sometimes referred to as RAW data) in a Bayer array in which color components (R (red), G0 (green), B (blue), G1 (green)) are arranged in a periodic pattern as shown in Figure 1 can be obtained.

人間の視覚特性は、輝度成分に対して相対的に高い感度を有している。以上の知見に基づき、一般的なベイヤー配列においては、図１に示すように、輝度成分をより多く含む緑成分の画素数が、赤成分の画素数および青成分の画素数よりもそれぞれ２倍となるように割り当てられている。 Human vision has a relatively high sensitivity to luminance components. Based on the above knowledge, in a typical Bayer array, as shown in Figure 1, the number of pixels for the green component, which contains more luminance components, is allocated to be twice as many as the number of pixels for the red component and the number of pixels for the blue component.

したがって、ＲＡＷデータの各画素は１つの色成分の情報のみを有する。撮像装置の画像処理において、ＲＡＷデータに対してデモザイク処理を実行することによって各画素が赤、青、緑の色成分を有することとなる。一般には、ＲＡＷデータに対するデモザイク処理によって取得されたＲＧＢ信号、またはＲＧＢ信号から変換されたＹＵＶ信号を符号化した画像データが撮像装置に記録される。 Therefore, each pixel of the RAW data only has information for one color component. In image processing by the imaging device, demosaic processing is performed on the RAW data, so that each pixel has red, blue, and green color components. Generally, image data that encodes the RGB signal obtained by demosaic processing of the RAW data, or the YUV signal converted from the RGB signal, is recorded in the imaging device.

以上のデモザイク処理によって各画素がＲＧＢまたはＹＵＶの３つの色成分を有することとなった画像データは、元のＲＡＷデータの３倍のデータ量を必要とする。したがって、デモザイク処理を実行していないＲＡＷデータ自体を直接的に符号化して記録することによりデータ容量を低減する手法が提案されている。例えば、特許文献１には、ＲＡＷデータをＲ、Ｇ０、Ｂ、Ｇ１の４つの色プレーンに分離した後に符号化する手法が開示されている。 Image data in which each pixel has three color components, RGB or YUV, due to the above demosaic processing requires three times the amount of data as the original RAW data. Therefore, a method has been proposed to reduce the data volume by directly encoding and recording the RAW data itself that has not been subjected to demosaic processing. For example, Patent Document 1 discloses a method in which RAW data is separated into four color planes, R, G0, B, and G1, and then encoded.

特開２００３－１２５２０９号公報JP 2003-125209 A

上記した符号化に関し、ウェーブレット変換等の周波数変換によって周波数帯（例えば、サブバンド）毎にデータを符号化する場合、周波数帯毎に量子化の程度が異なるケースが存在する。以上のように符号化されたデータを周波数帯に関わらず一括して処理すると、元のデータを精度良く復元できないという課題がある。 Regarding the above-mentioned encoding, when data is encoded for each frequency band (e.g., subband) by frequency transformation such as wavelet transform, there are cases where the degree of quantization differs for each frequency band. If the encoded data as described above is processed collectively regardless of the frequency band, there is a problem that the original data cannot be restored with high accuracy.

以上の事情に鑑み、本発明は、符号化されたデータをより高精度に復元することができる画像復号装置、制御方法、およびプログラムを提供することを目的とする。 In view of the above, the present invention aims to provide an image decoding device, a control method, and a program that can restore encoded data with higher accuracy.

上記目的を達成するために、本発明の画像復号装置は、画像データに対して周波数変換を行うことにより得られた複数のサブバンドデータを、サブバンド毎に量子化して符号化することにより得られた符号化データを復号する画像復号装置であって、前記符号化データを復号する復号手段と、前記復号手段により復号したデータを逆量子化して、複数のサブバンドデータを取得する逆量子化手段と、前記逆量子化手段により取得した前記複数のサブバンドデータに対して推論を実行することにより、量子化により劣化したデータが復元された複数のサブバンドデータを取得する推論手段であって、前記複数のサブバンド毎に、サブバンドに対応する推論パラメータを用いて、当該サブバンドに対応するサブバンドデータに対して推論を実行する推論手段と、を備え、前記推論手段は、前記複数のサブバンドにそれぞれ対応するように学習された推論パラメータである第１の推論パラメータを用いて、前記複数のサブバンドデータに対する第１の推論を実行し、学習された第２の推論パラメータを用いて、前記第１の推論によって復元された後のサブバンドデータに対する第２の推論を実行する、ことを特徴とする。 In order to achieve the above object, an image decoding device of the present invention is an image decoding device that decodes encoded data obtained by quantizing and encoding a plurality of subband data obtained by performing a frequency transform on image data , and includes a decoding means for decoding the encoded data, an inverse quantization means for inverse quantizing the data decoded by the decoding means to obtain a plurality of subband data, and an inference means for performing inference on the plurality of subband data obtained by the inverse quantization means to obtain a plurality of subband data in which data deteriorated by quantization has been restored, the inference means performing inference on the subband data corresponding to each of the plurality of subbands using inference parameters corresponding to the subband , the inference means performing a first inference on the plurality of subband data using first inference parameters which are inference parameters learned to correspond to each of the plurality of subbands, and performing a second inference on the subband data restored by the first inference using the learned second inference parameters .

本発明によれば、符号化されたデータをより高精度に復元することができる。 The present invention makes it possible to restore encoded data with higher accuracy.

ベイヤー配列の説明図である。FIG. 2 is an explanatory diagram of a Bayer array. 本発明の第１実施形態に係る画像符号化装置の構成を例示するブロック図である。1 is a block diagram illustrating a configuration of an image encoding device according to a first embodiment of the present invention. 本発明の第１実施形態におけるプレーン変換の説明図である。FIG. 4 is an explanatory diagram of plane conversion in the first embodiment of the present invention. 本発明の第１実施形態における可逆５－３ＤＷＴ変換の説明図である。1 is an explanatory diagram of a reversible 5-3 DWT transformation in a first embodiment of the present invention. 本発明の第１実施形態におけるサブバンド分解についての説明図である。FIG. 2 is an explanatory diagram of subband decomposition in the first embodiment of the present invention. 本発明の第１実施形態に係る画像復号装置の構成を例示するブロック図である。1 is a block diagram illustrating a configuration of an image decoding device according to a first embodiment of the present invention. 本発明の第１実施形態における可逆５－３逆ＤＷＴ変換の説明図である。FIG. 2 is an explanatory diagram of a reversible 5-3 inverse DWT transformation in the first embodiment of the present invention. 本発明の第１実施形態に係る周波数劣化復元部の構成を例示するブロック図である。2 is a block diagram illustrating a configuration of a frequency degradation restoration unit according to the first embodiment of the present invention; FIG. 本発明の第１実施形態に係るニューロンの構成を例示する図である。FIG. 1 is a diagram illustrating a configuration of a neuron according to a first embodiment of the present invention. 本発明の第１実施形態に係るニューラルネットワークの構成を例示する図である。FIG. 1 is a diagram illustrating a configuration of a neural network according to a first embodiment of the present invention. 本発明の第１実施形態に係るニューラルネットワークの別の構成を例示する図である。FIG. 2 is a diagram illustrating another configuration of the neural network according to the first embodiment of the present invention. 本発明の第１実施形態に係るニューラルネットワークの学習過程の説明図である。FIG. 2 is an explanatory diagram of a learning process of a neural network according to the first embodiment of the present invention. 本発明の第２実施形態に係る画像復号装置の構成を例示するブロック図である。FIG. 11 is a block diagram illustrating the configuration of an image decoding device according to a second embodiment of the present invention. 量子化によるＤＷＴ係数の情報喪失の説明図である。FIG. 2 is an explanatory diagram of information loss of DWT coefficients due to quantization. 本発明の第２実施形態に係るニューラルネットワークの学習過程の説明図である。FIG. 11 is an explanatory diagram of a learning process of a neural network according to a second embodiment of the present invention. 本発明の第２実施形態における圧縮率に応じた復元処理を示すフローチャートである。13 is a flowchart showing a restoration process according to a compression ratio in a second embodiment of the present invention. 本発明の第２実施形態に係る画像復号装置の他の構成を例示するブロック図である。FIG. 11 is a block diagram illustrating another configuration of the image decoding device according to the second embodiment of the present invention. 本発明の第３実施形態に係る画像復号装置の構成を例示するブロック図である。FIG. 13 is a block diagram illustrating the configuration of an image decoding device according to a third embodiment of the present invention. 本発明の変形例における４×４ＤＣＴの量子化マトリクスおよび学習・推論グループの説明図である。FIG. 11 is an explanatory diagram of a quantization matrix of 4×4 DCT and learning/inference groups in a modified example of the present invention.

以下、本発明の実施形態について添付図面を参照しながら詳細に説明する。以下に説明される各実施形態は、本発明を実現可能な構成の一例に過ぎない。以下の各実施形態は、本発明が適用される装置の構成や各種の条件に応じて適宜に修正または変更することが可能である。また、以下の各実施形態に含まれる要素の組合せの全てが本発明を実現するに必須であるとは限られず、要素の一部を適宜に省略することが可能である。したがって、本発明の範囲は、以下の各実施形態に記載される構成によって限定されるものではない。また、相互に矛盾のない限りにおいて実施形態内に記載された複数の構成を組み合わせた構成も採用可能である。 The following describes in detail the embodiments of the present invention with reference to the accompanying drawings. Each embodiment described below is merely one example of a configuration that can realize the present invention. Each of the following embodiments can be modified or changed as appropriate depending on the configuration of the device to which the present invention is applied and various conditions. Furthermore, not all of the combinations of elements included in each of the following embodiments are necessarily essential to realize the present invention, and some of the elements can be omitted as appropriate. Therefore, the scope of the present invention is not limited to the configurations described in each of the following embodiments. Furthermore, a configuration that combines multiple configurations described in the embodiments can be adopted as long as there are no mutual contradictions.

＜第１実施形態＞
本発明の第１実施形態に係る画像復号装置６０を説明するのに先立ち、画像復号装置６０によって復号されるべき符号化データの生成について、図２から図６を参照して説明する。 First Embodiment
Prior to describing the image decoding device 60 according to the first embodiment of the present invention, the generation of encoded data to be decoded by the image decoding device 60 will be described with reference to FIGS. 2 to 6. FIG.

図２は、本発明の実施形態に係る画像符号化装置２０の構成を例示するブロック図である。図２に示すように、画像符号化装置２０は、プレーン変換部２００、周波数変換部２０１、量子化部２０２、量子化パラメータ設定部２０３、およびエントロピー符号化部２０４を含む。 FIG. 2 is a block diagram illustrating a configuration of an image encoding device 20 according to an embodiment of the present invention. As shown in FIG. 2, the image encoding device 20 includes a plane transform unit 200, a frequency transform unit 201, a quantization unit 202, a quantization parameter setting unit 203, and an entropy encoding unit 204.

本実施形態において、符号化方式としてＪＰＥＧ２０００が例示され、周波数変換方式として可逆５－３ＤＷＴ変換が例示される。しかしながら、本発明は以上の方式に限定されず、任意の符号化方式および周波数変換方式が採用され得る。また、本実施形態の量子化パラメータは、ＤＷＴ変換によって分割されたサブバンド（周波数帯）毎に設定される。 In this embodiment, JPEG2000 is exemplified as an encoding method, and reversible 5-3 DWT is exemplified as a frequency transform method. However, the present invention is not limited to the above methods, and any encoding method and frequency transform method may be adopted. In addition, the quantization parameter in this embodiment is set for each subband (frequency band) divided by the DWT transform.

プレーン変換部２００は、撮像素子を含む撮像手段から入力される画像データ（ＲＡＷデータ）に対して色分離を実行する。すなわち、プレーン変換部２００は、図３に示すように、ベイヤー配列の色要素を含むＲＡＷデータを、色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータに分解して、周波数変換部２０１に出力する。 The plane conversion unit 200 performs color separation on image data (RAW data) input from an imaging means including an image sensor. That is, as shown in FIG. 3, the plane conversion unit 200 separates the RAW data including color elements of the Bayer array into independent plane data for each color plane (R, G0, G1, B) and outputs the data to the frequency conversion unit 201.

周波数変換部２０１は、プレーン変換部２００から入力されるプレーンデータに対して、それぞれウェーブレット変換を実行して、サブバンド毎に変換係数（サブバンド係数）を生成し、量子化部２０２に出力する。変換係数は、対象データ（すなわち、プレーンデータ）とウェーブレットとの相関を示す値である。前述したように、周波数変換部２０１は、ウェーブレット変換として可逆５－３ＤＷＴ変換を実行する。 The frequency transform unit 201 performs a wavelet transform on each piece of plain data input from the plane transform unit 200, generates transform coefficients (subband coefficients) for each subband, and outputs them to the quantization unit 202. The transform coefficients are values that indicate the correlation between the target data (i.e., the plain data) and the wavelet. As described above, the frequency transform unit 201 performs a reversible 5-3 DWT transform as a wavelet transform.

図４および図５を参照して、可逆５－３ＤＷＴ変換（以下、単に「ＤＷＴ変換」と称する）について説明する。図４は、プレーン変換部２００から出力された画素データ（プレーンデータ）に対するＤＷＴ変換によって生成される変換係数（ＤＷＴ係数）を説明する説明図である。図５は、サブバンド分解についての説明図である。 The reversible 5-3 DWT transform (hereinafter simply referred to as "DWT transform") will be described with reference to Figures 4 and 5. Figure 4 is an explanatory diagram explaining transform coefficients (DWT coefficients) generated by DWT transform of pixel data (plane data) output from the plane transform unit 200. Figure 5 is an explanatory diagram of subband decomposition.

周波数変換部２０１が、画素データａ，ｂ，ｃ，ｄ，ｅに対してＤＷＴ変換を実行すると、高周波成分のＤＷＴ係数ｂ’，ｄ’が生成される。より具体的には、周波数変換部２０１は、画素データａ，ｂ，ｃに対して以下の式（１）を適用してＤＷＴ係数ｂ’を生成すると共に、画素データｃ，ｄ，ｅに対して以下の式（２）に適用してＤＷＴ係数ｄ’を生成する。なお、高周波成分のＤＷＴ係数を生成する式（１）および式（２）は、使用する画素データが互いに相違しているが型は同一である。
ｂ’＝ｂ－（ａ＋ｃ）／２ ……式（１）
ｄ’＝ｄ－（ｃ＋ｅ）／２ ……式（２） When the frequency transform unit 201 performs DWT transform on the pixel data a, b, c, d, and e, DWT coefficients b' and d' of high frequency components are generated. More specifically, the frequency transform unit 201 applies the following equation (1) to the pixel data a, b, and c to generate a DWT coefficient b', and applies the following equation (2) to the pixel data c, d, and e to generate a DWT coefficient d'. Note that equations (1) and (2) for generating the DWT coefficients of high frequency components are of the same type, although they use different pixel data.
b'=b-(a+c)/2 ......Equation (1)
d'=d-(c+e)/2 ... Equation (2)

次いで、周波数変換部２０１が、画素データｃおよび高周波成分のＤＷＴ係数ｂ’，ｄ’に対してＤＷＴ変換を実行すると、低周波成分のＤＷＴ係数ｃ”が生成される。より具体的には、画素データｃおよびＤＷＴ係数ｂ’，ｄ’に対して以下の式（３）を適用してＤＷＴ係数ｃ”を生成する。
ｃ”＝ｃ＋（ｂ’＋ｄ’＋２）／４ ……式（３） Next, the frequency transform unit 201 performs a DWT transform on the pixel data c and the DWT coefficients b', d' of the high frequency components to generate a DWT coefficient c" of the low frequency components. More specifically, the DWT coefficient c" is generated by applying the following equation (3) to the pixel data c and the DWT coefficients b', d'.
c" = c + (b' + d' + 2) / 4 ... formula (3)

なお、周波数変換部２０１は、式（３）に代えて、画素データａ，ｂ，ｃ，ｄ，ｅに対して以下の式（４）を適用してＤＷＴ係数ｃ”を生成してもよい。
ｃ”＝（ａ＋２ｂ＋６ｃ＋２ｄ－ｅ）／８ ……式（４） Note that the frequency transform unit 201 may apply the following equation (4) to the pixel data a, b, c, d, and e, instead of equation (3), to generate the DWT coefficient c″.
c" = (a + 2b + 6c + 2d - e) / 8 ... formula (4)

周波数変換部２０１が、上述した１次元のＤＷＴ変換を垂直方向および水平方向に亘ってプレーンデータに対して実行することで、図５（ａ）に示すような分解レベル１の４つのサブバンド画像の信号が取得される。すなわち、以上のＤＷＴ変換によって、１つのプレーン画像が４つのサブバンド画像１ＬＬ，１ＨＬ，１ＬＨ，１ＨＨに分解される。図５において、「Ｈ」は高周波成分を示し、「Ｌ」は低周波成分を示す。例えば、図５（ａ）の右下の「１ＨＨ」は、水平方向および垂直方向の双方が高周波成分（Ｈ）である分解レベル１のサブバンドを示す。 The frequency transform unit 201 performs the above-mentioned one-dimensional DWT transform on the plane data in the vertical and horizontal directions to obtain signals of four subband images at decomposition level 1 as shown in FIG. 5(a). That is, one plane image is decomposed into four subband images 1LL, 1HL, 1LH, and 1HH by the above-mentioned DWT transform. In FIG. 5, "H" indicates high-frequency components and "L" indicates low-frequency components. For example, "1HH" in the lower right of FIG. 5(a) indicates a subband at decomposition level 1 in which both the horizontal and vertical directions are high-frequency components (H).

図５（ａ）に示すように、分解レベル１の各サブバンドの水平方向および垂直方向の係数の数は、入力された画素データのそれぞれの係数の数の半分である。また、図５（ｂ）に示すように、サブバンド１ＬＬに対してさらにＤＷＴ変換を実行した分解レベル２の各サブバンドの水平方向および垂直方向の係数の数は、分解レベル１のサブバンドのそれぞれの係数の数の半分である。 As shown in FIG. 5(a), the number of horizontal and vertical coefficients in each subband of decomposition level 1 is half the number of coefficients in the input pixel data. Also, as shown in FIG. 5(b), the number of horizontal and vertical coefficients in each subband of decomposition level 2, which is obtained by further performing a DWT transform on subband 1LL, is half the number of coefficients in the subbands of decomposition level 1.

量子化部２０２は、周波数変換部２０１から入力されるサブバンド毎のＤＷＴ係数（変換係数）を、後述する量子化パラメータ設定部２０３によって設定された量子化パラメータに従って量子化して、エントロピー符号化部２０４に出力する。 The quantization unit 202 quantizes the DWT coefficients (transformation coefficients) for each subband input from the frequency transformation unit 201 according to the quantization parameters set by the quantization parameter setting unit 203 described later, and outputs the quantized coefficients to the entropy coding unit 204.

量子化パラメータ設定部２０３は、ユーザからの指示等によって設定された圧縮率にしたがって、各サブバンドのＤＷＴ係数を量子化する際の量子化パラメータを特定して、量子化部２０２に設定する。一般に、サブバンドの周波数がより高いほど、またはサブバンドの分解レベルが低いほど、より強く量子化が実行されるようにパラメータを設定することで、同一の符号量で画質を向上させることができる。以上のようなサブバンドほど、量子化した際の視覚的な影響が小さいからである。本例において、分解レベル２までの周波数変換を実行する場合は、量子化の強度が２ＨＨ＞２ＨＬ≒２ＬＨ＞１ＨＨ＞１ＨＬ≒１ＬＨ＞１ＬＬとなるように、量子化パラメータ設定部２０３がパラメータを設定すると好適である。 The quantization parameter setting unit 203 specifies the quantization parameters for quantizing the DWT coefficients of each subband according to the compression ratio set by the user or the like, and sets them in the quantization unit 202. In general, the higher the frequency of the subband or the lower the decomposition level of the subband, the stronger the quantization is set, thereby improving image quality with the same amount of code. This is because the more the subband is quantized, the smaller the visual impact is when quantized. In this example, when frequency conversion is performed up to decomposition level 2, it is preferable for the quantization parameter setting unit 203 to set the parameters so that the quantization strength is 2HH>2HL≒2LH>1HH>1HL≒1LH>1LL.

エントロピー符号化部２０４は、量子化部２０２において量子化されたＤＷＴ係数および量子化パラメータを符号化して符号化データを生成し、符号化ストリームとして出力する。以上の符号化においては、例えば、ＥＢＣＯＴ（Embedded Block Coding with Optimized Truncation）等のエントロピー符号化が実行される。 The entropy coding unit 204 generates coded data by coding the DWT coefficients and quantization parameters quantized by the quantization unit 202, and outputs the coded data as a coded stream. In the coding described above, for example, entropy coding such as EBCOT (Embedded Block Coding with Optimized Truncation) is performed.

次いで、本発明の第１実施形態に係る画像復号装置６０の構成および復号手法について説明する。画像復号装置６０は、上記した画像符号化装置２０が生成した符号化データ（サブバンド符号化された画像データ）を以下に説明するように復号して、元の画像データ（ベイヤー配列のＲＡＷデータ）を復元する。 Next, the configuration and decoding method of the image decoding device 60 according to the first embodiment of the present invention will be described. The image decoding device 60 decodes the encoded data (subband encoded image data) generated by the image encoding device 20 described above as described below, and restores the original image data (raw data in the Bayer array).

図６は、本発明の第１実施形態に係る画像復号装置６０の構成を例示するブロック図である。図６に示すように、画像復号装置６０は、エントロピー復号部６００、逆量子化部６０１、周波数劣化復元部６０２、逆周波数変換部６０３、ベイヤー変換部６０４、および周波数パラメータ設定部６０５を機能ブロックとして含む。 Figure 6 is a block diagram illustrating the configuration of an image decoding device 60 according to the first embodiment of the present invention. As shown in Figure 6, the image decoding device 60 includes, as functional blocks, an entropy decoding unit 600, an inverse quantization unit 601, a frequency degradation restoration unit 602, an inverse frequency transformation unit 603, a Bayer transformation unit 604, and a frequency parameter setting unit 605.

以上の機能ブロックによって実行される以下の本実施形態の処理は、画像復号装置６０が有する１以上の制御プロセッサが、ＲＯＭ等の不揮発メモリ内のプログラムをＲＡＭ等の揮発メモリに展開して実行することによって実現される。上記した本実施形態の処理には、本実施形態による後述の学習モデル（ニューラルネットワーク）を用いた学習および推論が含まれる。 The following processing of this embodiment, which is executed by the above functional blocks, is realized by one or more control processors of the image decoding device 60 expanding a program in a non-volatile memory such as a ROM into a volatile memory such as a RAM and executing it. The above-described processing of this embodiment includes learning and inference using a learning model (neural network) according to this embodiment, which will be described later.

エントロピー復号部６００は、ＥＢＣＯＴのようなエントロピー符号化手法によって符号化されたＤＷＴ係数および量子化パラメータを復号して、復号されたＤＷＴ係数を取得し、逆量子化部６０１に出力する。 The entropy decoding unit 600 decodes the DWT coefficients and quantization parameters that have been coded using an entropy coding technique such as EBCOT to obtain the decoded DWT coefficients, which it then outputs to the inverse quantization unit 601.

逆量子化部６０１は、エントロピー復号部６００から入力された復号後のＤＷＴ係数を、量子化パラメータを用いて逆量子化して、逆量子化されたＤＷＴ係数を取得し、周波数劣化復元部６０２に出力する。 The inverse quantization unit 601 inverse quantizes the decoded DWT coefficients input from the entropy decoding unit 600 using the quantization parameter to obtain the inverse quantized DWT coefficients, and outputs them to the frequency degradation restoration unit 602.

周波数パラメータ設定部６０５は、サブバンド毎および圧縮率毎に学習されたニューラルネットワークのパラメータ（重みおよびバイアス）を、復号される符号化ストリームの圧縮率に応じて選択して周波数劣化復元部６０２に設定する。以下、ニューラルネットワーク（Neural Network）を「ＮＮ」と省略することがある。 The frequency parameter setting unit 605 selects the neural network parameters (weights and biases) learned for each subband and compression ratio according to the compression ratio of the encoded stream to be decoded, and sets them in the frequency degradation restoration unit 602. Hereinafter, the neural network may be abbreviated as "NN."

周波数劣化復元部６０２（推論手段）は、逆量子化部６０１から入力された逆量子化後のＤＷＴ係数に対し、周波数パラメータ設定部６０５が設定した推論パラメータを有するＮＮを適用して、量子化により劣化したＤＷＴ係数を推論によって復元する。周波数劣化復元部６０２の構成およびパラメータの学習過程については後に詳述される。 The frequency degradation restoration unit 602 (inference means) applies a neural network having inference parameters set by the frequency parameter setting unit 605 to the inversely quantized DWT coefficients input from the inverse quantization unit 601, and restores the DWT coefficients degraded by quantization by inference. The configuration of the frequency degradation restoration unit 602 and the parameter learning process will be described in detail later.

逆周波数変換部６０３は、周波数劣化復元部６０２から入力された復元後のＤＷＴ係数に対して逆周波数変換（逆ＤＷＴ変換）を施して、色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータを再構成する。再構成されたプレーンデータはベイヤー変換部６０４に出力される。 The inverse frequency transform unit 603 performs an inverse frequency transform (inverse DWT transform) on the restored DWT coefficients input from the frequency degradation restoration unit 602 to reconstruct independent plane data for each color plane (R, G0, G1, B). The reconstructed plane data is output to the Bayer transform unit 604.

図７を参照して、逆周波数変換部６０３が実行する逆ＤＷＴ変換である可逆５－３ＤＷＴ逆変換について説明する。図７において、ＤＷＴ係数ａ’，ｃ’，ｅ’は高周波の周波数変換係数であり、ＤＷＴ係数ｂ”，ｄ”は低周波の周波数変換係数である。逆周波数変換部６０３がＤＷＴ係数ａ’，ｂ”，ｃ’，ｄ”，ｅ’に対して逆ＤＷＴ変換を実行すると、画素データｂ，ｃ，ｄが生成される。より具体的には、逆周波数変換部６０３は、ＤＷＴ係数ａ’，ｂ”，ｃ’に対して以下の式（５）を適用して画素データｂを生成すると共に、ＤＷＴ係数ｃ’，ｄ”，ｅ’に対して以下の式（６）を適用して画素データｄを生成する。なお、画素データを生成する式（５）および式（６）は、使用するＤＷＴ係数が互いに相違しているが型は同一である。図７の２行目の画素データｂ，ｄは、ＤＷＴ変換開始位置の画素の０番目とした時の各プレーンにおける偶数番目の画素データを示す。
ｂ＝ｂ”－（ａ’＋ｃ’＋２）／４ ……式（５）
ｄ＝ｄ”－（ｃ’＋ｅ’＋２）／４ ……式（６） With reference to FIG. 7, the reversible 5-3 DWT inverse transform, which is the inverse DWT transform executed by the inverse frequency transform unit 603, will be described. In FIG. 7, DWT coefficients a', c', and e' are high-frequency frequency transform coefficients, and DWT coefficients b", and d" are low-frequency frequency transform coefficients. When the inverse frequency transform unit 603 executes the inverse DWT transform on the DWT coefficients a', b", c', d", and e', pixel data b, c, and d are generated. More specifically, the inverse frequency transform unit 603 applies the following equation (5) to the DWT coefficients a', b", and c' to generate pixel data b, and applies the following equation (6) to the DWT coefficients c', d", and e' to generate pixel data d. Note that equations (5) and (6) for generating pixel data are of the same type, although the DWT coefficients used are different from each other. The pixel data b and d in the second row of FIG. 7 indicate even-numbered pixel data in each plane when the pixel at the start position of the DWT transformation is the 0th pixel.
b=b″-(a′+c′+2)/4 …Equation (5)
d=d″-(c′+e′+2)/4 …Equation (6)

次いで、逆周波数変換部６０３が、ＤＷＴ係数ｃ’および画素データｂ，ｄに対して逆ＤＷＴ変換を実行すると、画素データｃが生成される。より具体的には、ＤＷＴ係数ｃ’および画素データｂ，ｄに対して以下の式（７）を適用して画素データｃを生成する。図７の３行目の画素データｃは、ＤＷＴ変換開始位置の画素の０番目とした時の各プレーンにおける奇数番目の画素データを示す。
ｃ＝ｃ’＋（ｂ＋ｄ）／２ ……式（７） Next, the inverse frequency transform unit 603 performs an inverse DWT transform on the DWT coefficients c' and the pixel data b and d to generate pixel data c. More specifically, the pixel data c is generated by applying the following equation (7) to the DWT coefficients c' and the pixel data b and d. The pixel data c in the third row of Fig. 7 indicates odd-numbered pixel data in each plane when the pixel at the DWT transform start position is the 0th pixel.
c=c′+(b+d)/2 ……Equation (7)

逆周波数変換部６０３は、水平方向および垂直方向に亘って上記した逆ＤＷＴ変換を繰り返し実施することによって、各プレーンの画素データを再構成する。 The inverse frequency transform unit 603 reconstructs the pixel data of each plane by repeatedly performing the above-mentioned inverse DWT transform in the horizontal and vertical directions.

ベイヤー変換部６０４は、逆周波数変換部６０３において再構成された色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータを、ベイヤー配列のＲＡＷ画像に再合成し、ＲＡＷ画像に相当するＲＡＷデータを出力する。 The Bayer conversion unit 604 recomposes the independent plane data for each color plane (R, G0, G1, B) reconstructed by the inverse frequency conversion unit 603 into a RAW image in a Bayer array, and outputs RAW data equivalent to the RAW image.

次いで、周波数劣化復元部６０２における学習モデル（ニューラルネットワーク）を用いた劣化ＤＷＴ係数の推論による復元について、図８から図１２を参照して詳細に説明する。図５（ａ）に示すような分解レベル１のＤＷＴ係数における劣化復元を例示して説明する。 Next, the inference-based restoration of degraded DWT coefficients using the learning model (neural network) in the frequency degradation restoration unit 602 will be described in detail with reference to Figs. 8 to 12. The restoration of degraded DWT coefficients at decomposition level 1 as shown in Fig. 5(a) will be described as an example.

図８に示すように、周波数劣化復元部６０２は、１ＬＬ復元部８００、１ＨＬ復元部８０１、１ＬＨ復元部８０２、および１ＨＨ復元部８０３を有する。すなわち、周波数劣化復元部６０２は、４つのサブバンド１ＬＬ，１ＨＬ，１ＬＨ，１ＨＨにそれぞれ対応する４つの復元部８００，８０１，８０２，８０３を有する。 As shown in FIG. 8, the frequency degradation restoration unit 602 has a 1LL restoration unit 800, a 1HL restoration unit 801, a 1LH restoration unit 802, and a 1HH restoration unit 803. That is, the frequency degradation restoration unit 602 has four restoration units 800, 801, 802, and 803 corresponding to the four subbands 1LL, 1HL, 1LH, and 1HH, respectively.

図示のように、１ＬＬ復元部８００は、逆量子化部６０１から入力された１ＬＬサブバンドのＤＷＴ係数（サブバンドデータ）に対して、対応するパラメータを設定したＮＮを適用して推論を実行し、量子化による劣化を復元した１ＬＬサブバンドを出力する。他の復元部８０１，８０２，８０３も同様に、それぞれ、逆量子化部８０１からのサブバンドのＤＷＴ係数に対して、対応するパラメータを設定したＮＮを適用して、量子化による劣化を復元したサブバンドのＤＷＴ係数を出力する。 As shown in the figure, the 1LL restoration unit 800 applies an NN with corresponding parameters to the DWT coefficients (subband data) of the 1LL subband input from the inverse quantization unit 601 to perform inference, and outputs the 1LL subband with the degradation due to quantization restored. Similarly, the other restoration units 801, 802, and 803 each apply an NN with corresponding parameters to the DWT coefficients of the subbands from the inverse quantization unit 801, and output the DWT coefficients of the subbands with the degradation due to quantization restored.

１ＬＬ復元部８００、１ＨＬ復元部８０１、１ＬＨ復元部８０２、および１ＨＨ復元部８０３におけるＮＮの構成は、同一であっても互いに異なっていてもよい。他方、１ＬＬ復元部８００、１ＨＬ復元部８０１、１ＬＨ復元部８０２、および１ＨＨ復元部８０３におけるＮＮのパラメータ（重み、バイアス）は、互いに異なっている。 The configurations of the NNs in the 1LL restoration unit 800, 1HL restoration unit 801, 1LH restoration unit 802, and 1HH restoration unit 803 may be the same or different from each other. On the other hand, the parameters (weights, biases) of the NNs in the 1LL restoration unit 800, 1HL restoration unit 801, 1LH restoration unit 802, and 1HH restoration unit 803 are different from each other.

図９は、復元部８００～８０３におけるニューラルネットワークの演算単位であるニューロン９００の構成を例示する図である。本実施形態のＮＮは、複数のニューロン９００を含む。ニューロン９００は、複数の入力値ｘ_１～ｘ_Ｎに対して、重みｗ_１～ｗ_Ｎ、バイアスｂ、および活性化関数による演算を行って出力値ｙを出力する。より詳細には以下の通りである。 9 is a diagram illustrating the configuration of a neuron 900, which is the computation unit of the neural network in the restoration units 800 to 803. The NN of this embodiment includes a plurality of neurons 900. The neuron 900 performs computations on a plurality of input values x ₁ to x _N using weights w ₁ to w _N , bias b, and an activation function, and outputs an output value y. More details are as follows.

ニューロン９００は、以下の式（８）に示すように、重みｗ_１～ｗ_Ｎおよびバイアスｂを用いて値ｘ’を算出する。重みｗ_１～ｗ_Ｎおよびバイアスｂは、後述される学習過程によって可変に決定される値であって、前述のように復元部８００～８０３毎に異なる値を取り得るパラメータである。 Neuron 900 calculates value x' using weights _w1 to _wN and bias b as shown in the following formula (8). Weights _w1 to _wN and bias b are values that are variably determined by a learning process described later, and are parameters that can take different values for each of restoration units 800 to 803 as described above.

次いで、ニューロン９００は、算出された値ｘ’を活性化関数に入力して出力ｙを算出する。活性化関数は、シグモイド関数やＲｅＬＵ関数（Rectified Linear Unit）等の非線形関数である。 Next, the neuron 900 inputs the calculated value x' into an activation function to calculate the output y. The activation function is a nonlinear function such as a sigmoid function or a ReLU (Rectified Linear Unit) function.

シグモイド関数に対して値ｘ’を与えた場合の出力値ｙは、以下の式（９）によって求められる。 When the value x' is given to the sigmoid function, the output value y is calculated using the following formula (9).

ＲｅＬＵ関数に対して値ｘ’を与えた場合の出力値ｙは、以下の式（１０）によって求められる。 When the value x' is given to the ReLU function, the output value y is calculated using the following formula (10).

図１０は、復元部８００～８０３におけるＮＮの構成を例示する図である。図１０に示すように、本実施形態のＮＮは、入力層１０００と第１中間層１００１と第２中間層１００２と出力層１００３とを有する４層構造を有する。 Figure 10 is a diagram illustrating the configuration of the NN in the restoration units 800 to 803. As shown in Figure 10, the NN of this embodiment has a four-layer structure with an input layer 1000, a first intermediate layer 1001, a second intermediate layer 1002, and an output layer 1003.

連続する２つの層は１以上のニューロン９００によって接続される。前段の層の出力値がニューロン９００に入力され、前述の演算処理による出力値が後段の層に出力される。 Two consecutive layers are connected by one or more neurons 900. The output value of the previous layer is input to the neuron 900, and the output value from the aforementioned calculation process is output to the subsequent layer.

入力層１０００に入力されるデータｉｎ_０～ｉｎ_Ｎの個数と出力層１００３から出力されるデータｏｕｔ_０～ｏｕｔ_Ｎの個数とは一致する。他方、第１中間層１００１のデータｍｉｄ_００～ｍｉｄ_０ｐの個数および第２中間層１００２のデータｍｉｄ_１１～ｍｉｄ_１ｑの個数は、入力層１０００および出力層１００３のデータ個数と一致しなくてよい。したがって、２つの層を接続するニューロン９００の個数は、１以上の任意の数であってよい。 The number of data in ₀ to in _N input to the input layer 1000 is equal to the number of data out ₀ to out _N output from the output layer 1003. On the other hand, the number of data mid ₀₀ to mid _0p in the first hidden layer 1001 and the number of data mid ₁₁ to mid _1q in the second hidden layer 1002 do not have to be equal to the number of data in the input layer 1000 and the output layer 1003. Therefore, the number of neurons 900 connecting the two layers may be any number equal to or greater than one.

入力層１０００に入力されるデータｉｎ_０～ｉｎ_Ｎはサブバンド毎のＤＷＴ係数であり、出力層１００３から出力されるデータｏｕｔ_０～ｏｕｔ_Ｎは劣化が復元されたサブバンド毎のＤＷＴ係数である。すなわち、１ＬＬ復元部８００のＮＮには、１ＬＬサブバンドのＤＷＴ係数が入力され、推論によって劣化が復元された１ＬＬサブバンドのＤＷＴ係数が出力される。他の復元部８０１，８０２，８０３にも同様に、それぞれ、１ＨＬサブバンド、１ＬＨサブバンド、１ＨＨサブバンドのＤＷＴ係数が入力され、劣化が復元された１ＨＬサブバンド、１ＬＨサブバンド、１ＨＨサブバンドのＤＷＴ係数が出力される。 The data in ₀ to in _N input to the input layer 1000 are DWT coefficients for each subband, and the data out ₀ to out _N output from the output layer 1003 are DWT coefficients for each subband with the degradation restored. That is, the DWT coefficients of the 1LL subband are input to NN of the 1LL restoration unit 800, and the DWT coefficients of the 1LL subband with the degradation restored by inference are output. Similarly, the DWT coefficients of the 1HL subband, the 1LH subband, and the 1HH subband are input to the other restoration units 801, 802, and 803, respectively, and the DWT coefficients of the 1HL subband, the 1LH subband, and the 1HH subband with the degradation restored are output.

図１１は、復元部８００～８０３におけるＮＮの他の構成を例示する図である。図１１に示すように、本実施形態の別のＮＮは、入力層１１００と第１中間層１１０１と第２中間層１１０２と出力層１１０３とを有する４層構造を有する。 Figure 11 is a diagram illustrating another configuration of the NN in the restoration units 800 to 803. As shown in Figure 11, another NN in this embodiment has a four-layer structure having an input layer 1100, a first intermediate layer 1101, a second intermediate layer 1102, and an output layer 1103.

図１１のＮＮは、離れている層（入力層１１００および第２中間層１１０２）が直接的に接続されるスキップコネクションを含む。入力層１１００と第１中間層１１０１との間の破線矢印が、スキップされている箇所を示している。図示の通り、入力層１１００のデータｉｎ_０，ｉｎ_１は、第１中間層１１０１をスキップして第２中間層１１０２に直接的に出力されている。以上のように、復元部８００～８０３は、スキップレイヤーを含むＮＮであってよい。 The NN in Fig. 11 includes skip connections that directly connect distant layers (input layer 1100 and second hidden layer 1102). The dashed arrow between input layer 1100 and first hidden layer 1101 indicates the skipped portion. As shown in the figure, data _in0 and _in1 of input layer 1100 are output directly to second hidden layer 1102, skipping first hidden layer 1101. As described above, restoration units 800 to 803 may be NNs that include skip layers.

復元部８００～８０３の少なくとも１つが図１０に示される構造のＮＮを有し、復元部８００～８０３の他の少なくとも１つが図１１に示される構造のＮＮを有してもよい。 At least one of the restoration units 800 to 803 may have an NN with the structure shown in FIG. 10, and at least another of the restoration units 800 to 803 may have an NN with the structure shown in FIG. 11.

図１２は、本発明の第１実施形態に係るニューラルネットワークの学習過程の説明図である。本実施形態では、以下に説明するように、復元のためのパラメータ（重みｗ_１～ｗ_Ｎ、バイアスｂ）がサブバンド毎に学習される。図１２では、１ＨＨサブバンドに関する学習過程を説明するが、他のサブバンド（１ＨＬサブバンド、１ＬＨサブバンド、１ＨＨサブバンド）についても同様の学習過程を適用できる。 12 is an explanatory diagram of the learning process of the neural network according to the first embodiment of the present invention. In this embodiment, as described below, parameters for restoration (weights w ₁ to w _N , bias b) are learned for each subband. In FIG. 12, the learning process for the 1HH subband is described, but the same learning process can be applied to other subbands (1HL subband, 1LH subband, 1HH subband).

概略的には、本実施形態の学習過程において、入力データはサブバンド毎のＤＷＴ係数であり、出力データは量子化による劣化が復元された復元ＤＷＴ係数であり、教師データは量子化劣化の無い元画像の未劣化ＤＷＴ係数である。 Overall, in the learning process of this embodiment, the input data are DWT coefficients for each subband, the output data are restored DWT coefficients in which degradation due to quantization has been restored, and the training data are undegraded DWT coefficients of the original image without quantization degradation.

画像符号化装置２０の量子化部２０２によって量子化された１ＨＨサブバンドのＤＷＴ係数１２００が、周波数劣化復元部６０２の１ＨＨ復元部８０３に入力される。周波数パラメータ設定部６０５は、１ＨＨサブバンド用の周波数パラメータ（重み、バイアス）を１ＨＨ復元部８０３に設定する。なお、１ＨＨ復元部８０３に設定される周波数パラメータの初期値は任意に設定され、例えば乱数によって決定された値が設定される。 The DWT coefficients 1200 of the 1HH subband quantized by the quantization unit 202 of the image encoding device 20 are input to the 1HH restoration unit 803 of the frequency degradation restoration unit 602. The frequency parameter setting unit 605 sets frequency parameters (weight, bias) for the 1HH subband in the 1HH restoration unit 803. The initial values of the frequency parameters set in the 1HH restoration unit 803 are set arbitrarily, for example, to values determined by random numbers.

１ＨＨ復元部８０３は、設定された周波数パラメータを適用したＮＮを用いて、１ＨＨサブバンドのＤＷＴ係数１２００の量子化による劣化を復元して、１ＨＨサブバンドの復元ＤＷＴ係数１２０１を出力する。出力された復元ＤＷＴ係数１２０１はパラメータ更新部１２０３に入力される。 The 1HH restoration unit 803 uses a neural network to which the set frequency parameters are applied to restore the degradation caused by quantization of the DWT coefficients 1200 of the 1HH subband, and outputs the restored DWT coefficients 1201 of the 1HH subband. The output restored DWT coefficients 1201 are input to the parameter update unit 1203.

加えて、パラメータ更新部１２０３には、教師データとして１ＨＨサブバンドの原画に相当する未劣化ＤＷＴ係数１２０２が入力される。未劣化ＤＷＴ係数１２０２は、画像符号化装置２０の周波数変換部２０１が出力するＤＷＴ係数であって、量子化による劣化が生じていないＤＷＴ係数である。 In addition, the parameter update unit 1203 receives undegraded DWT coefficients 1202 corresponding to the original image of the 1HH subband as training data. The undegraded DWT coefficients 1202 are DWT coefficients output by the frequency transform unit 201 of the image encoding device 20, and are DWT coefficients that are not degraded by quantization.

パラメータ更新部１２０３は、入力された復元ＤＷＴ係数１２０１と未劣化ＤＷＴ係数１２０２との比較結果を示す指標を求め、誤差逆伝播法等の更新手法に従って１ＨＨサブバンド用の周波数パラメータを更新する。上記の指標として、例えば、ピーク信号対雑音比（ＰＳＮＲ）や差分絶対値和（ＳＡＤ）を用いることができる。ピーク信号対雑音比を指標として用いる場合、パラメータ更新部１２０３は指標の値が増大するように周波数パラメータを更新する。差分絶対値和を指標として用いる場合、パラメータ更新部１２０３は指標の値が減少するように周波数パラメータを更新する。 The parameter update unit 1203 obtains an index indicating the comparison result between the input restored DWT coefficients 1201 and the undegraded DWT coefficients 1202, and updates the frequency parameters for the 1HH subband according to an update method such as backpropagation. For example, the peak signal-to-noise ratio (PSNR) or the sum of absolute differences (SAD) can be used as the index. When the peak signal-to-noise ratio is used as the index, the parameter update unit 1203 updates the frequency parameters so that the value of the index increases. When the sum of absolute differences is used as the index, the parameter update unit 1203 updates the frequency parameters so that the value of the index decreases.

以上と同様の処理が、他のサブバンドに対応する復元部８００～８０２に関して実行される。 The same processing as above is performed for the restoration units 800 to 802 corresponding to the other subbands.

以上の学習処理を、大量の画像データ（復元ＤＷＴ係数１２０１および未劣化ＤＷＴ係数１２０２の組）を用いて実行することによって、周波数パラメータ設定部６０５が周波数劣化復元部６０２に設定する周波数パラメータが決定される。 By performing the above learning process using a large amount of image data (a set of restored DWT coefficients 1201 and undegraded DWT coefficients 1202), the frequency parameters that the frequency parameter setting unit 605 sets in the frequency degradation restoration unit 602 are determined.

一般に、量子化の程度は、サブバンドの周波数および分解レベルによって相異なる。例えば、高周波や低レベルのサブバンドは、視覚的な劣化を低減するために比較的強く量子化される一方、低周波や高レベルのサブバンドは比較的弱く量子化される。上記した本実施形態の構成によれば、量子化の程度が相異なる複数のサブバンドについてそれぞれ学習処理を実行することによって、周波数劣化復元部６０２のパラメータがサブバンド毎に調整されるから、より高精度に画像を復元できる。 In general, the degree of quantization varies depending on the frequency and decomposition level of the subband. For example, high-frequency and low-level subbands are quantized relatively strongly to reduce visual degradation, while low-frequency and high-level subbands are quantized relatively weakly. According to the configuration of the present embodiment described above, by performing a learning process for each of multiple subbands with different degrees of quantization, the parameters of the frequency degradation restoration unit 602 are adjusted for each subband, thereby enabling the image to be restored with higher accuracy.

なお、以上の学習処理は、ユーザが選択可能な圧縮率（デジタルカメラにおいては記録画質）毎に実行されてよい。圧縮率毎に学習を実行して周波数パラメータを設定することで、圧縮率毎の劣化度を反映したより適切な復元を実現できる。 The above learning process may be performed for each compression ratio (recording image quality in the case of a digital camera) that can be selected by the user. By performing learning for each compression ratio and setting the frequency parameters, it is possible to achieve more appropriate restoration that reflects the degree of degradation for each compression ratio.

＜第２実施形態＞
以下、本発明の第２実施形態について説明する。なお、以下に例示する各実施形態において、作用、機能が第１実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In each of the embodiments exemplified below, elements that have the same actions and functions as those of the first embodiment will be designated by the same reference numerals as those in the above description, and the description of each element will be omitted as appropriate.

図１３は、本発明の第２実施形態に係る画像復号装置１３０の構成を例示するブロック図である。図１３に示すように、画像復号装置１３０は、エントロピー復号部６００ないし周波数パラメータ設定部６０５に加え、画素劣化復元部１３００および画素パラメータ設定部１３０１を機能ブロックとして含む。第１実施形態と同様、以上の機能ブロックによって実行される以下の本実施形態の処理は、画像復号装置１３０が有する１以上の制御プロセッサが、ＲＯＭ等の不揮発メモリ内のプログラムをＲＡＭ等の揮発メモリに展開して実行することによって実現される。 Figure 13 is a block diagram illustrating the configuration of an image decoding device 130 according to the second embodiment of the present invention. As shown in Figure 13, the image decoding device 130 includes, as functional blocks, a pixel degradation restoration unit 1300 and a pixel parameter setting unit 1301 in addition to an entropy decoding unit 600 through a frequency parameter setting unit 605. As in the first embodiment, the following processing of this embodiment executed by the above functional blocks is realized by one or more control processors possessed by the image decoding device 130 expanding a program in a non-volatile memory such as a ROM into a volatile memory such as a RAM and executing it.

画素パラメータ設定部１３０１は、圧縮率毎に学習された画素パラメータ（重み、バイアス）を、復号すべき符号化ストリームの圧縮率に応じて選択し、画素劣化復元部１３００に設定する。 The pixel parameter setting unit 1301 selects the pixel parameters (weights, biases) learned for each compression ratio according to the compression ratio of the encoded stream to be decoded, and sets them in the pixel degradation restoration unit 1300.

画素劣化復元部１３００（推論手段）は、ベイヤー変換部６０４が出力したベイヤー配列のＲＡＷ画像に対し、画素パラメータ設定部１３０１が設定した推論パラメータを有するＮＮを適用して、量子化により劣化した画素データを推論によって復元する。 The pixel degradation restoration unit 1300 (inference means) applies a neural network having inference parameters set by the pixel parameter setting unit 1301 to the raw image of the Bayer array output by the Bayer conversion unit 604, and restores pixel data degraded by quantization by inference.

図１４を参照して、周波数領域および画素領域における２段階の復元処理が好適である理由を説明する。図１４は、ＤＷＴ係数の最大値が１０２３（＝２^１０－１）である場合において５１１（＝２^９－１）を単位（量子化ステップ）として比較的強い量子化を実行したときの量子化前後の値の変化を例示している。図１４（ａ）は量子化前のＤＷＴ係数を示し、図１４（ｂ）は量子化後のＤＷＴ係数を示す。 The reason why a two-stage restoration process in the frequency domain and the pixel domain is preferable will be described with reference to Fig. 14. Fig. 14 illustrates an example of the change in value before and after quantization when relatively strong quantization is performed in units (quantization steps) of 511 (=2 ⁹ -1) when the maximum value of the DWT coefficient is 1023 (=2 ¹⁰ -1). Fig. 14(a) shows the DWT coefficient before quantization, and Fig. 14(b) shows the DWT coefficient after quantization.

図示のように、量子化前のＤＷＴ係数が５１１未満である場合、量子化によってそのＤＷＴ係数は０になる（すなわち、ＤＷＴ係数が失われる）。また、量子化前のＤＷＴ係数が５１１以上であって量子化後のＤＷＴ係数が０にならなくても、量子化前のＤＷＴ係数に対する誤差が値０～５１０の範囲で生じる。圧縮率が比較的高い場合（例えば、分解レベルが低く高周波である１ＨＨサブバンドに対して）、比較的強い量子化が適用されるので周波数領域における復元が困難となることがある。 As shown, if the pre-quantization DWT coefficient is less than 511, the quantization will result in the DWT coefficient becoming 0 (i.e., the DWT coefficient is lost). Also, even if the pre-quantization DWT coefficient is 511 or greater and the post-quantization DWT coefficient is not 0, an error will occur in the pre-quantization DWT coefficient with a value in the range 0 to 510. When the compression ratio is relatively high (e.g., for the 1HH subband, which has a low decomposition level and high frequency), relatively strong quantization is applied, which may make restoration in the frequency domain difficult.

そこで、本実施形態では、周波数領域において完全には復元できなかった画質の劣化を、他のサブバンドをも参照して構成される画素領域に基づいて推論し補完することによって、より高精度な（より原画に近い）復元を実現する。 Therefore, in this embodiment, image quality degradation that could not be completely restored in the frequency domain is inferred and complemented based on a pixel domain constructed with reference to other subbands, thereby achieving a more accurate restoration (closer to the original image).

図１５は、本発明の第２実施形態に係るニューラルネットワークの学習過程の説明図である。 Figure 15 is an explanatory diagram of the learning process of a neural network according to the second embodiment of the present invention.

画像符号化装置２０の量子化部２０２によって量子化された１ＬＬ，１ＨＬ，１ＬＨ，１ＨＨサブバンドのＤＷＴ係数１５００が、周波数劣化復元部６０２（推論手段）に入力される。周波数パラメータ設定部６０５は、第１実施形態と同様に、サブバンド毎の学習によって更新された周波数パラメータ（重み、バイアス）を周波数劣化復元部６０２に設定する。 The DWT coefficients 1500 of the 1LL, 1HL, 1LH, and 1HH subbands quantized by the quantization unit 202 of the image encoding device 20 are input to the frequency degradation restoration unit 602 (inference means). As in the first embodiment, the frequency parameter setting unit 605 sets the frequency parameters (weights, biases) updated by learning for each subband to the frequency degradation restoration unit 602.

周波数劣化復元部６０２は、以上のように周波数パラメータを適用したＮＮを用いて、量子化によって劣化したＤＷＴ係数１５００を復元して、逆周波数変換部６０３に出力する。 The frequency degradation restoration unit 602 uses the NN to which the frequency parameters are applied as described above to restore the DWT coefficients 1500 that have been degraded by quantization, and outputs them to the inverse frequency transform unit 603.

ベイヤー変換部６０４は、逆周波数変換部６０３において再構成された色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータを、ベイヤー配列のＲＡＷ画像に再合成し、ＲＡＷ画像に相当するＲＡＷデータを出力する。出力されたＲＡＷデータは画素劣化復元部１３００に入力される。 The Bayer conversion unit 604 recomposes the independent plane data for each color plane (R, G0, G1, B) reconstructed by the inverse frequency conversion unit 603 into a RAW image in a Bayer array, and outputs RAW data equivalent to the RAW image. The output RAW data is input to the pixel degradation restoration unit 1300.

画素劣化復元部１３００は、画素パラメータ設定部１３０１によって設定された画素パラメータを適用したＮＮを用いて、量子化により劣化したＲＡＷ画像内の画素データを復元して、復元ＲＡＷ画像１５０１を出力する。復元ＲＡＷ画像１５０１は、画素パラメータ更新部１５０２に入力される。 The pixel degradation restoration unit 1300 uses a neural network to which the pixel parameters set by the pixel parameter setting unit 1301 are applied to restore pixel data in a raw image that has been degraded by quantization, and outputs a restored raw image 1501. The restored raw image 1501 is input to the pixel parameter update unit 1502.

加えて、画素パラメータ更新部１５０２には、教師データとして原画ＲＡＷ画像１５０３が入力される。原画ＲＡＷ画像１５０３は、画像符号化装置２０に入力されるＲＡＷ画像データであって、量子化による劣化が生じていないＲＡＷ画像データである。 In addition, an original raw image 1503 is input to the pixel parameter update unit 1502 as training data. The original raw image 1503 is raw image data input to the image encoding device 20, and is raw image data that is not subject to degradation due to quantization.

画素パラメータ更新部１５０２は、入力された復元ＲＡＷ画像１５０１と原画ＲＡＷ画像１５０３との比較結果を示す指標を求め、誤差逆伝播法等の更新手法に従って画素パラメータを更新する。上記の指標として、例えば、ピーク信号対雑音比（ＰＳＮＲ）や差分絶対値和（ＳＡＤ）を用いることができる。ピーク信号対雑音比を指標として用いる場合、画素パラメータ更新部１５０２は指標の値が増大するように画素パラメータを更新する。差分絶対値和を指標として用いる場合、画素パラメータ更新部１５０２は指標の値が減少するように画素パラメータを更新する。 The pixel parameter update unit 1502 obtains an index indicating the comparison result between the input restored raw image 1501 and the original raw image 1503, and updates the pixel parameters according to an update method such as the backpropagation method. For example, the peak signal-to-noise ratio (PSNR) or the sum of absolute differences (SAD) can be used as the index. When the peak signal-to-noise ratio is used as the index, the pixel parameter update unit 1502 updates the pixel parameters so that the value of the index increases. When the sum of absolute differences is used as the index, the pixel parameter update unit 1502 updates the pixel parameters so that the value of the index decreases.

以上の学習処理を、大量の画像データ（復元ＲＡＷ画像１５０１および原画ＲＡＷ画像１５０３の組）を用いて実行することによって、画素パラメータ設定部１３０１が画素劣化復元部１３００に設定する画素パラメータが決定される。 By performing the above learning process using a large amount of image data (a pair of restored RAW images 1501 and original RAW images 1503), the pixel parameters that the pixel parameter setting unit 1301 sets in the pixel degradation restoration unit 1300 are determined.

上記した本実施形態の構成によれば、第１実施形態と同様の技術的効果が奏される。加えて、周波数領域における復元処理では復元が困難である場合がある画質劣化が、画素領域における復元処理によって補完されるので、より高精度に画像を復元できる。 The configuration of this embodiment described above provides the same technical effect as the first embodiment. In addition, image quality degradation that may be difficult to restore using restoration processing in the frequency domain is compensated for by restoration processing in the pixel domain, allowing images to be restored with higher accuracy.

なお、以上の学習処理は、ユーザが選択可能な圧縮率（デジタルカメラにおいては記録画質）毎に実行されてよい。圧縮率毎に学習を実行して画素パラメータを設定することで、圧縮率毎の劣化度を反映したより適切な復元を実現できる。 The above learning process may be performed for each compression ratio (recording image quality in the case of a digital camera) that can be selected by the user. By performing learning for each compression ratio and setting pixel parameters, it is possible to achieve more appropriate restoration that reflects the degree of degradation for each compression ratio.

本実施形態において、周波数劣化復元部６０２に含まれる１つ以上の復元部８００～８０３における処理（学習・推論）がスキップされる構成が採用されてもよい。前述のように、分解レベルが低く高周波であるサブバンドに対しては比較的強い（圧縮率が比較的高い）量子化が実行されるので、周波数領域における学習および推論を適切に実行することが困難である場合がある。本構成では、周波数領域における学習および推論が困難なサブバンド（周波数帯）の復元を、画素領域による復元処理によってカバーすることで、より高精度に画像を復元できる。加えて、所定のサブバンドに関する周波数領域の学習および推論がスキップされるので、学習処理および推論処理を実行する時間並びに学習し記憶すべき周波数パラメータを削減できる。 In this embodiment, a configuration may be adopted in which the processing (learning/inference) in one or more of the restoration units 800-803 included in the frequency degradation restoration unit 602 is skipped. As described above, relatively strong (relatively high compression rate) quantization is performed on subbands with low decomposition levels and high frequencies, so that it may be difficult to properly perform learning and inference in the frequency domain. In this configuration, the restoration of subbands (frequency bands) where learning and inference in the frequency domain are difficult is covered by restoration processing in the pixel domain, so that the image can be restored with higher accuracy. In addition, since learning and inference in the frequency domain for a specified subband is skipped, the time required to perform the learning and inference processing and the frequency parameters to be learned and stored can be reduced.

以上の構成において、さらに、所定の圧縮率α（閾値）を上回る圧縮率が適用される場合に限って、所定のサブバンドに関する処理（学習・推論）がスキップされてもよい。図１６は、適用すべき圧縮率が所定の圧縮率αを上回る場合に、周波数劣化復元部６０２の１ＨＨ復元部８０３による処理がスキップされるときの周波数劣化復元部６０２の処理を示すフローチャートである。 In the above configuration, processing (learning and inference) for a specific subband may be skipped only when a compression rate exceeding a specific compression rate α (threshold value) is applied. FIG. 16 is a flowchart showing the processing of the frequency degradation restoration unit 602 when the processing by the 1HH restoration unit 803 of the frequency degradation restoration unit 602 is skipped when the compression rate to be applied exceeds the specific compression rate α.

ステップＳ１６００において、周波数劣化復元部６０２は、ユーザによって設定された圧縮率が所定の圧縮率αよりも小さいか否かを判定する。 In step S1600, the frequency degradation restoration unit 602 determines whether the compression ratio set by the user is smaller than a predetermined compression ratio α.

設定された圧縮率がαより小さい場合（Ｓ１６００：ＹＥＳ）、処理はステップＳ１６０１に進む。ステップＳ１６０１において、周波数劣化復元部６０２は、ＮＮ（１ＨＨ復元部８０３）を用いた１ＨＨサブバンドのＤＷＴ係数の復元処理を含む各サブバンドの復元処理を実行して、復元後の各サブバンドのＤＷＴ係数を出力する。 If the set compression ratio is smaller than α (S1600: YES), the process proceeds to step S1601. In step S1601, the frequency degradation restoration unit 602 performs restoration processing for each subband, including restoration processing of the DWT coefficients of the 1HH subband using the NN (1HH restoration unit 803), and outputs the restored DWT coefficients of each subband.

設定された圧縮率がα以上である場合（Ｓ１６００：ＮＯ）、処理はステップＳ１６０２に進む。ステップＳ１６０２において、周波数劣化復元部６０２は、ＮＮ（１ＨＨ復元部８０３）を用いた１ＨＨサブバンドのＤＷＴ係数の復元処理をスキップし、未処理の１ＨＨサブバンドのＤＷＴ係数を出力する。他のサブバンドについては、第１実施形態と同様に周波数劣化復元部６０２（復元部８００～８０２）が復元処理を実行して、復元後のＤＷＴ係数を出力する。 If the set compression rate is equal to or greater than α (S1600: NO), the process proceeds to step S1602. In step S1602, the frequency degradation restoration unit 602 skips the restoration process of the DWT coefficients of the 1HH subband using the NN (1HH restoration unit 803), and outputs the unprocessed DWT coefficients of the 1HH subband. For the other subbands, the frequency degradation restoration unit 602 (restoration units 800 to 802) executes restoration processing as in the first embodiment, and outputs the restored DWT coefficients.

上記した本実施形態では、画素劣化復元がベイヤーＲＡＷ画像データにおいて実行されるが、逆周波数変換後の色プレーン毎に画素劣化復元が実行されてもよい。図１７は、逆周波数変換後に画素劣化復元を実行する画像復号装置１３０の他の構成を例示するブロック図である。図１７に示すように、本構成においては、画素劣化復元部１３００が、ベイヤー変換部６０４の後段ではなく逆周波数変換部６０３の後段に配置される。なお、本構成による学習処理および推論処理は、上記した本実施形態のＲＡＷ画像に関する処理を、そのまま色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎に適用すればよい。 In the above-described embodiment, pixel degradation restoration is performed on Bayer RAW image data, but pixel degradation restoration may also be performed for each color plane after inverse frequency transformation. FIG. 17 is a block diagram illustrating another configuration of an image decoding device 130 that performs pixel degradation restoration after inverse frequency transformation. As shown in FIG. 17, in this configuration, the pixel degradation restoration unit 1300 is placed after the inverse frequency transformation unit 603, not after the Bayer transformation unit 604. Note that the learning process and inference process in this configuration can be achieved by simply applying the process related to RAW images in the above-described embodiment to each color plane (R, G0, G1, B).

＜第３実施形態＞
図１８は、本発明の第３実施形態に係る画像復号装置１８０の構成を例示するブロック図である。図１８に示すように、画像復号装置１８０は、エントロピー復号部６００、逆量子化部６０１、逆周波数変換部６０３、およびベイヤー変換部６０４を機能ブロックとして含む。加えて、画像復号装置１８０は、画素・周波数劣化復元部１８０２および画素・周波数パラメータ設定部１８０５を機能ブロックとして含む。画素・周波数劣化復元部１８０２（推論手段）は、前述した実施形態の周波数劣化復元部６０２と画素劣化復元部１３００とを統合した要素である。画素・周波数パラメータ設定部１８０５は、周波数パラメータ設定部６０５と画素パラメータ設定部１３０１とを統合した要素である。前述の実施形態と同様、以上の機能ブロックによって実行される以下の本実施形態の処理は、画像復号装置１８０が有する１以上の制御プロセッサが、ＲＯＭ等の不揮発メモリ内のプログラムをＲＡＭ等の揮発メモリに展開して実行することによって実現される。 Third Embodiment
FIG. 18 is a block diagram illustrating the configuration of an image decoding device 180 according to the third embodiment of the present invention. As shown in FIG. 18, the image decoding device 180 includes an entropy decoding unit 600, an inverse quantization unit 601, an inverse frequency transform unit 603, and a Bayer transform unit 604 as functional blocks. In addition, the image decoding device 180 includes a pixel/frequency degradation restoration unit 1802 and a pixel/frequency parameter setting unit 1805 as functional blocks. The pixel/frequency degradation restoration unit 1802 (inference means) is an element that integrates the frequency degradation restoration unit 602 and the pixel degradation restoration unit 1300 of the above-mentioned embodiment. The pixel/frequency parameter setting unit 1805 is an element that integrates the frequency parameter setting unit 605 and the pixel parameter setting unit 1301. As in the above-mentioned embodiment, the following processing of this embodiment executed by the above functional blocks is realized by one or more control processors of the image decoding device 180 expanding a program in a non-volatile memory such as a ROM into a volatile memory such as a RAM and executing it.

周波数領域における周波数パラメータの学習、および画素領域における画素パラメータの学習については、前述した実施形態と同様に実行されるので、詳細な説明を省略する。 The learning of frequency parameters in the frequency domain and the learning of pixel parameters in the pixel domain are performed in the same manner as in the previously described embodiment, and therefore a detailed description is omitted.

画素・周波数パラメータ設定部１８０５は、サブバンド毎および圧縮率毎に学習された周波数パラメータ（重み、バイアス）を、復号すべき符号化ストリームの圧縮率に応じて選択し、画素・周波数劣化復元部１８０２に設定する。 The pixel/frequency parameter setting unit 1805 selects the frequency parameters (weights, biases) learned for each subband and compression ratio according to the compression ratio of the encoded stream to be decoded, and sets them in the pixel/frequency degradation restoration unit 1802.

画素・周波数劣化復元部１８０２は、逆量子化部６０１から入力された逆量子化後のＤＷＴ係数に対し、画素・周波数パラメータ設定部１８０５が設定した推論パラメータを有するＮＮを適用して、量子化により劣化したＤＷＴ係数を推論によって復元する。 The pixel/frequency degradation restoration unit 1802 applies a neural network having inference parameters set by the pixel/frequency parameter setting unit 1805 to the inversely quantized DWT coefficients input from the inverse quantization unit 601, and restores the DWT coefficients degraded by quantization by inference.

上記したＤＷＴ係数の復元後、画素・周波数パラメータ設定部１８０５は、圧縮率毎に学習された画素パラメータ（重み、バイアス）を、復号すべき符号化ストリームの圧縮率に応じて選択し、画素・周波数劣化復元部１８０２に設定する。 After restoring the DWT coefficients as described above, the pixel/frequency parameter setting unit 1805 selects the pixel parameters (weights, biases) learned for each compression ratio according to the compression ratio of the encoded stream to be decoded, and sets them in the pixel/frequency degradation restoration unit 1802.

逆周波数変換部６０３は、画素・周波数劣化復元部１８０２から入力された復元後のＤＷＴ係数に対して逆周波数変換（逆ＤＷＴ変換）を施して、色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータを再構成する。再構成されたプレーンデータはベイヤー変換部６０４に出力される。 The inverse frequency transform unit 603 performs an inverse frequency transform (inverse DWT transform) on the restored DWT coefficients input from the pixel/frequency degradation restoration unit 1802 to reconstruct independent plane data for each color plane (R, G0, G1, B). The reconstructed plane data is output to the Bayer transform unit 604.

ベイヤー変換部６０４は、逆周波数変換部６０３において再構成された色プレーン（Ｒ、Ｇ０、Ｇ１、Ｂ）毎の独立したプレーンデータを、ベイヤー配列のＲＡＷ画像に再合成し、ＲＡＷ画像に相当するＲＡＷデータを出力する。出力されたＲＡＷデータは画素・周波数劣化復元部１８０２に入力される。 The Bayer conversion unit 604 recomposes the independent plane data for each color plane (R, G0, G1, B) reconstructed by the inverse frequency conversion unit 603 into a Bayer array RAW image, and outputs RAW data equivalent to the RAW image. The output RAW data is input to the pixel/frequency degradation restoration unit 1802.

画素・周波数劣化復元部１８０２は、ベイヤー変換部６０４が出力したベイヤー配列のＲＡＷ画像に対し、画素・周波数パラメータ設定部１８０５が設定した画素パラメータを有するＮＮを適用して、量子化により劣化した画素データを推論によって復元する。 The pixel and frequency degradation restoration unit 1802 applies a neural network having pixel parameters set by the pixel and frequency parameter setting unit 1805 to the raw image of the Bayer array output by the Bayer conversion unit 604, and restores pixel data degraded by quantization through inference.

上記した本実施形態の構成によれば、第１実施形態および第２実施形態と同様の技術的効果が奏される。加えて、本構成では、周波数劣化復元部６０２と画素劣化復元部１３００とを統合した画素・周波数劣化復元部１８０２と、周波数パラメータ設定部６０５と画素パラメータ設定部１３０１とを統合した画素・周波数パラメータ設定部１８０５とが用いられる。結果として、第２実施形態と比較して、復元処理に用いるＮＮの回路規模を低減できる。 The configuration of this embodiment described above provides the same technical effects as the first and second embodiments. In addition, this configuration uses a pixel/frequency degradation restoration unit 1802 that integrates the frequency degradation restoration unit 602 and the pixel degradation restoration unit 1300, and a pixel/frequency parameter setting unit 1805 that integrates the frequency parameter setting unit 605 and the pixel parameter setting unit 1301. As a result, the circuit scale of the neural network used in the restoration process can be reduced compared to the second embodiment.

＜変形例＞
以上、本発明の好ましい実施の形態について説明したが、本発明は上述した実施の形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。 <Modification>
Although the preferred embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications and changes are possible within the scope of the gist of the present invention.

周波数劣化復元部６０２、画素劣化復元部１３００、および画素・周波数劣化復元部１８０２が有するＮＮは、それぞれ、上記した処理を実行可能な任意のネットワーク構成を有し得る。例えば、上記した各復元部（推論手段）が、ＣＮＮ（Convolution Neural Network）を有してもよいし、ＤＢＰ（Deep Brief Network）を有してもよい。また、上記した実施の形態においては、４層のＮＮが例示されるが、上記した処理を実行可能な任意の層数のＮＮが採用され得る。 The neural networks in the frequency degradation restoration unit 602, pixel degradation restoration unit 1300, and pixel and frequency degradation restoration unit 1802 may each have any network configuration capable of executing the above-mentioned processing. For example, each of the above-mentioned restoration units (inference means) may have a CNN (Convolution Neural Network) or a DBP (Deep Brief Network). In addition, in the above-mentioned embodiment, a four-layer neural network is exemplified, but a neural network with any number of layers capable of executing the above-mentioned processing may be adopted.

第１実施形態および第２実施形態の周波数劣化復元部６０２は、サブバンド毎のＮＮ（１ＬＬ，１ＨＬ，１ＬＨ，１ＨＨ復元部８００，８０１，８０２，８０３）を有する。しかしながら、周波数劣化復元部６０２は、１つのＮＮ（復元部）のみを有し、サブバンド毎にパラメータ（重み、バイアス）を切り替えるように学習および推論を実行してもよい。以上の構成によれば、周波数劣化復元部６０２が１つのＮＮのみを有するので、回路規模を削減できる。他方、周波数劣化復元部６０２がサブバンド毎のＮＮを有する構成においては、複数のサブバンドに対する復元処理を並列的に実行できるので、本変形例の構成と比較して処理時間を削減できる。 The frequency degradation restoration unit 602 in the first and second embodiments has an NN for each subband (1LL, 1HL, 1LH, 1HH restoration units 800, 801, 802, 803). However, the frequency degradation restoration unit 602 may have only one NN (restoration unit) and perform learning and inference so as to switch parameters (weights, bias) for each subband. With the above configuration, since the frequency degradation restoration unit 602 has only one NN, the circuit scale can be reduced. On the other hand, in a configuration in which the frequency degradation restoration unit 602 has an NN for each subband, restoration processing for multiple subbands can be performed in parallel, so the processing time can be reduced compared to the configuration of this modified example.

上記した実施形態の構成では、ＲＡＷ画像に対する周波数変換方式としてウェーブレット変換が用いられるが、他の周波数変換方式が用いられてもよい。例えば、Ｈ．２６４規格にて用いられる４×４ＤＣＴ（離散コサイン変換）が採用されてよい。図１９は、４×４ＤＣＴの量子化マトリクスおよび学習・推論グループの説明図である。 In the configuration of the above embodiment, wavelet transform is used as the frequency transform method for RAW images, but other frequency transform methods may be used. For example, 4x4 DCT (discrete cosine transform) used in the H.264 standard may be adopted. Figure 19 is an explanatory diagram of the quantization matrix and learning/inference groups of 4x4 DCT.

Ｈ．２６４規格においては、符号化の対象である画像を１６×１６サイズのマクロブロック単位で分割した上で、４×４単位でのＤＣＴを実行して符号化する。Ｈ．２６４規格は、視覚的な影響が小さい高周波成分を比較的強く量子化する一方、視覚的な影響が大きい低周波成分を比較的弱く量子化するために、図１９に示すような量子化マトリクスを採用する。図１９に示される数値は、イントラ予測時における量子化マトリクスの初期値である。図１９に示される数値、すなわち量子化マトリクスに含まれる値は、量子化パラメータから求められる量子化ステップに乗算される値であって、周波数が高いほど大きな値を取る。同一値に相当する周波数成分は同一の強度で量子化が実行される。したがって、同一値を１つのグループとして、または、周波数帯毎にグループ分けし、グループ毎に、学習・推論を実行することによって、上記実施形態のようなサブバンド毎の学習・推論と同様の技術的効果を実現できる。すなわち、本変形例においては、図１９においてそれぞれ点線で囲まれた周波数帯グループ（ａ）～（ｇ）毎に学習処理および推論処理が実行される。 In the H.264 standard, an image to be coded is divided into 16x16 macroblocks, and then coded by performing DCT in 4x4 units. The H.264 standard employs a quantization matrix as shown in FIG. 19 to quantize high-frequency components that have a small visual impact relatively strongly, while quantizing low-frequency components that have a large visual impact relatively weakly. The values shown in FIG. 19 are the initial values of the quantization matrix during intra prediction. The values shown in FIG. 19, i.e., the values included in the quantization matrix, are values multiplied by the quantization step obtained from the quantization parameter, and the higher the frequency, the larger the value. Frequency components that correspond to the same value are quantized with the same intensity. Therefore, by grouping the same values as one group or by frequency band, and performing learning and inference for each group, it is possible to achieve the same technical effect as the learning and inference for each subband as in the above embodiment. That is, in this modified example, learning and inference processes are performed for each of the frequency band groups (a) to (g) surrounded by dotted lines in FIG. 19.

本発明は、上述の実施の形態の１以上の機能を実現するプログラムを、ネットワークや記憶媒体を介してシステムや装置に供給し、そのシステム又は装置のコンピュータの１つ以上のプロセッサがプログラムを読み出して実行する処理でも実現可能である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention can also be realized by supplying a program that realizes one or more of the functions of the above-mentioned embodiments to a system or device via a network or storage medium, and having one or more processors of the computer in the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., an ASIC) that realizes one or more of the functions.

６０画像復号装置
１３０画像復号装置
１８０画像復号装置
６００エントロピー復号部
６０１逆量子化部
６０２周波数劣化復元部（推論手段）
６０３逆周波数変換部
６０４ベイヤー変換部
６０５周波数パラメータ設定部
１２０３パラメータ更新部
１３００画素劣化復元部（推論手段）
１３０１画素パラメータ設定部
１５０２画素パラメータ更新部
１８０２画素・周波数劣化復元部（推論手段）
１８０５画素・周波数パラメータ設定部 60 Image decoding device 130 Image decoding device 180 Image decoding device 600 Entropy decoding unit 601 Inverse quantization unit 602 Frequency degradation restoration unit (inference means)
603 Inverse frequency conversion unit 604 Bayer conversion unit 605 Frequency parameter setting unit 1203 Parameter update unit 1300 Pixel degradation restoration unit (inference means)
1301 pixel parameter setting unit 1502 pixel parameter update unit 1802 pixel/frequency degradation restoration unit (inference means)
1805 Pixel and frequency parameter setting unit

Claims

1. An image decoding device that decodes encoded data obtained by quantizing and encoding a plurality of subband data obtained by performing a frequency transform on image data , the image decoding device comprising:
A decoding means for decoding the encoded data;
an inverse quantization means for inverse quantizing the data decoded by the decoding means to obtain a plurality of subband data;
an inference means for performing inference on the plurality of subband data acquired by the inverse quantization means to acquire a plurality of subband data in which data deteriorated by quantization has been restored, the inference means performing inference on the subband data corresponding to each of the plurality of subbands by using an inference parameter corresponding to the subband ;
The inference means includes:
performing a first inference on the plurality of subband data using first inference parameters which are inference parameters trained to correspond to the plurality of subbands respectively;
11. An image decoding device, comprising: a decoder for executing a second inference on subband data restored by the first inference using learned second inference parameters .

the first inference parameters are weights and biases learned by a neural network to correspond to the plurality of subbands , respectively;
2. The image decoding apparatus according to claim 1, wherein the inference means comprises a neural network in which the weights and biases are set.

2. The image decoding apparatus according to claim 1 , wherein the inference means comprises a first inference means for executing the first inference, and a second inference means for executing the second inference.

The image decoding device according to claim 1, characterized in that the inference means performs the first inference using the first inference parameters, and then switches the first inference parameters to the second inference parameters and performs the second inference.

5. The image decoding device according to claim 1 , wherein the second inference is an inference using the second inference parameters learned using pixel data of image data before it is encoded.

5. An image decoding device as claimed in claim 1, characterized in that the second inference is an inference on raw image data obtained after the first inference, using the second inference parameters learned using raw image data before it is encoded.

An image decoding device as described in any one of claims 1 to 4, characterized in that the second inference is an inference on image data for each color plane obtained after the first inference, using the second inference parameters learned using image data for each color plane before encoding.

8. The image decoding device according to claim 1, wherein in the first inference, inferences corresponding to one or more subbands are skipped.

9. The image decoding apparatus according to claim 8 , wherein the inferences to be skipped are inferences corresponding to subbands having a compression ratio exceeding a predetermined value.

10. The image decoding device according to claim 1, wherein the inference parameters are learned for each compression ratio.

A control method for an image decoding device that decodes encoded data obtained by quantizing and encoding a plurality of subband data obtained by performing a frequency transform on image data , the method comprising the steps of:
a decoding step of decoding the encoded data;
an inverse quantization step of inverse quantizing the data decoded in the decoding step to obtain a plurality of subband data;
an inference step of performing inference on the plurality of subband data acquired in the inverse quantization step to acquire a plurality of subband data in which data deteriorated by quantization is restored, the inference step performing inference on the subband data corresponding to each of the plurality of subbands by using an inference parameter corresponding to the subband ;
The inference step includes:
performing a first inference on the plurality of subband data using first inference parameters which are inference parameters trained to correspond to the plurality of subbands respectively;
A control method comprising : performing a second inference on subband data after it has been restored by the first inference, using the learned second inference parameters .

A program for causing a computer to function as each of the means of the image decoding device according to any one of claims 1 to 10 .