JP2009267492A

JP2009267492A - Video image hierarchy encoding apparatus, video image hierarchy encoding method, video image hierarchy encoding program, video image hierarchy decoding apparatus, video image hierarchy decoding method, and video image hierarchy decoding program

Info

Publication number: JP2009267492A
Application number: JP2008111268A
Authority: JP
Inventors: Kazuhiro Shimauchi; 和博嶋内
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2008-04-22
Filing date: 2008-04-22
Publication date: 2009-11-12
Anticipated expiration: 2028-04-22
Also published as: JP4780139B2

Abstract

<P>PROBLEM TO BE SOLVED: To create a prediction video signal which has high correlation with an original video image signal by appropriately carrying out high resolution processing of a basic decoded video image signal which is decoded basic coding data, and achieve a more efficient video image hierarchy encoding. <P>SOLUTION: A video image hierarchy encoding apparatus 100 is provided with: a high pass filtering unit 208 which separates a high frequency component from the basic decoded video image signal, and creates a high frequency separation signal in a pre-stage of prediction processing between hierarchies when spacial interpolation is carried out from a basic layer to an extension layer; and a high frequency extraction filtering unit 210 which extracts the high frequency component corresponding to the original video image signal from the high frequency separation signal, and creates a high frequency equivalent signal. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、空間解像度の異なる複数の映像信号に復元可能な多重化データを生成または利用する映像階層符号化装置、映像階層符号化方法、映像階層符号化プログラム、映像階層復号装置、映像階層復号方法、および映像階層復号プログラムに関するものである。 The present invention relates to a video hierarchical encoding device, a video hierarchical encoding method, a video hierarchical encoding program, a video hierarchical decoding device, and a video hierarchical decoding that generate or use multiplexed data that can be restored to a plurality of video signals having different spatial resolutions. The present invention relates to a method and a video hierarchical decoding program.

従来から、映像信号の符号化または復号において、空間解像度、時間解像度およびＳＮＲ（Signal to Noise Ratio）それぞれのスケーラビリティを実現する符号化方式または復号方式が数多く提案されており、様々な分野でこれらの実用化がなされている。なかでも、空間解像度のスケーラビリティ(空間スケーラビリティ)に関しては、静止画像の符号化等も含め、その適用範囲が広く、様々な技術が公開されている。 Conventionally, in video signal encoding or decoding, many encoding schemes or decoding schemes have been proposed that realize the spatial resolution, temporal resolution, and SNR (Signal to Noise Ratio) scalability. Practical use has been made. In particular, the spatial resolution scalability (spatial scalability) includes a wide range of applications including encoding of still images, and various techniques have been disclosed.

例えば、特許文献１には、空間解像度および画像フォーマットの異なる複数の映像信号に復元可能な多重化データを生成する空間スケーラビリティの基本的な構成が記載されている。ここでは、空間的に補間された基本レイヤ（ベースレイヤ）が、適切な重みの選択によって拡張レイヤ（エンハンスメントレイヤ）の動き補償済み時間予測と組み合わされ、拡張レイヤを符号化するために用いられる予測映像信号が生成されている。かかる構成により帯域効率を上げ、伝送エラーに適切に対応することができる。 For example, Patent Document 1 describes a basic configuration of spatial scalability for generating multiplexed data that can be restored to a plurality of video signals having different spatial resolutions and image formats. Here, the spatially interpolated base layer (base layer) is combined with the motion-compensated temporal prediction of the enhancement layer (enhancement layer) through the selection of appropriate weights and used to encode the enhancement layer A video signal is generated. With such a configuration, it is possible to increase bandwidth efficiency and appropriately cope with transmission errors.

一方、画像拡大法の分野において、画像拡大時に拡大後の解像度に適切な高周波成分を推定して付加する非特許文献１の階層間推定処理技術がある。非特許文献１には、階層符号化におけるラプラシアンピラミッドの考え方を画像拡大法に応用した技術が記載され、階層間のラプラシアン成分の相関が強いことを利用して、着目する階層の信号のみから空間解像度がひとつ高い階層のラプラシアン成分の推定を成し遂げている。
特開平７−１６２８７０号公報「高周波成分推定を伴う任意倍率可能な画像拡大法」高橋靖正、田口亮、電子情報通信学会論文誌 Vol.J84-A No.9 pp.1192-1201 2001年9月 On the other hand, in the field of image enlargement methods, there is an inter-layer estimation processing technique of Non-Patent Document 1 that estimates and adds a high-frequency component appropriate for the resolution after enlargement when an image is enlarged. Non-Patent Document 1 describes a technique in which the idea of the Laplacian pyramid in hierarchical coding is applied to an image enlargement method, and makes use of the fact that the correlation of Laplacian components between layers is strong, so that only the signal of the target layer is used as a space. We have estimated the Laplacian component of the hierarchy with one higher resolution.
JP-A-7-162870 "Image magnification method with arbitrary magnification with high frequency component estimation" Takamasa Takamasa, Taguchi Ryo, IEICE Transactions Vol.J84-A No.9 pp.1192-1201 September 2001

映像階層符号化装置において、映像信号の空間スケーラビリティを実現するためには、特許文献１にも示されるように、基本レイヤにおける基本符号化データを復号した基本復号映像信号をインターポレーション（interpolation）し、それを拡張レイヤ符号化における予測映像信号として用いる。これは、拡張レイヤ符号化において入力されるオリジナルの映像信号（以下、単に原映像信号という。）と、基本映像信号との間にある程度の相関がある、即ち、原映像信号の一部の周波数成分を基本映像信号が含んでいることを利用したものである。 In order to realize the spatial scalability of the video signal in the video hierarchical encoding device, as shown in Patent Document 1, the basic decoded video signal obtained by decoding the basic encoded data in the base layer is interpolated. Then, it is used as a predicted video signal in enhancement layer coding. This is because there is a certain degree of correlation between an original video signal (hereinafter simply referred to as an original video signal) input in enhancement layer coding and a basic video signal, that is, a part of the frequency of the original video signal. This is based on the fact that the basic video signal contains components.

ここでは、基本復号映像信号と拡張レイヤ符号化に用いられる原映像信号との間の相関が高ければ高いほど、符号化効率は高くなる。従って、より効率的な符号化を実現するためには、基本復号映像信号を、単純に空間インターポレーションするだけでなく、より原映像信号に近づけるような階層間推定処理を介して予測映像信号を生成する必要がある。 Here, the higher the correlation between the basic decoded video signal and the original video signal used for enhancement layer encoding, the higher the encoding efficiency. Therefore, in order to realize more efficient coding, the basic decoded video signal is not only simply spatially interpolated but also predicted video signal through inter-layer estimation processing that makes it closer to the original video signal. Must be generated.

このような空間的拡大（インターポレーション）には、上述した非特許文献１の階層間推定処理技術を適用することが望ましい。しかし、非特許文献１の階層間推定処理技術は、そもそも自然画像の拡大を対象としており、そのまま空間インターポレーションに適用すると様々な問題を生じてしまう。これは、基本符号化データを復号した基本復号映像信号がその元となる基本映像信号と比較して劣化した信号であることに起因する。例えば、基本レイヤの符号化における量子化ステップ幅が大きい場合には、復号した基本復号映像信号の量子化誤差が大きくなり、原映像信号との相関が低くなる。 For such spatial expansion (interpolation), it is desirable to apply the inter-layer estimation processing technique of Non-Patent Document 1 described above. However, the inter-layer estimation processing technique of Non-Patent Document 1 is intended for natural image enlargement in the first place, and causes various problems when applied to spatial interpolation as it is. This is due to the fact that the basic decoded video signal obtained by decoding the basic encoded data is a signal that is deteriorated compared to the basic video signal that is the original. For example, when the quantization step width in the encoding of the base layer is large, the quantization error of the decoded basic decoded video signal becomes large and the correlation with the original video signal becomes low.

従って、自然画像における階層間の相関を利用した階層間推定処理技術を単純に空間インターポレーションに適用した場合、期待された符号化効率が得られない場合が生じてしまう。符号化効率を高めるためには、階層間の予測映像信号を生成する上で、符号化劣化の影響や階層間推定処理の特性を考慮しなければならない。このような符号化劣化の影響や階層間推定処理の特性の考慮は、映像階層符号化装置のみならず、映像階層符号化装置と同一の手順によって原映像信号を復元する映像階層復号装置にも適用する必要が生じる。 Therefore, when the inter-layer estimation processing technique using the correlation between layers in a natural image is simply applied to spatial interpolation, the expected encoding efficiency may not be obtained. In order to increase the coding efficiency, it is necessary to consider the influence of coding deterioration and the characteristics of the inter-layer estimation process when generating a predicted video signal between layers. Considering the influence of such coding degradation and the characteristics of the inter-layer estimation processing, not only the video hierarchical coding device but also the video hierarchical decoding device that restores the original video signal by the same procedure as the video hierarchical coding device. Need to apply.

本発明は、このような課題に鑑み、基本符号化データを復号した基本復号映像信号を適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成し、より効率的な映像階層符号化を実現することが可能な映像階層符号化装置、映像階層符号化方法、および映像階層符号化プログラムを提供することを目的としている。 In view of such problems, the present invention generates a predicted video signal having a high correlation with the original video signal by appropriately increasing the resolution of the basic decoded video signal obtained by decoding the basic encoded data, and is more efficient. An object of the present invention is to provide a video hierarchy encoding apparatus, a video hierarchy encoding method, and a video hierarchy encoding program capable of realizing video hierarchy encoding.

また、本発明は、このような映像階層符号化装置に対応して、映像階層符号化装置からの多重化データを適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成し、より効率的な映像階層復号を実現することが可能な映像階層復号装置、映像階層復号方法、および映像階層復号プログラムを提供することも目的としている。 In addition, in accordance with such a video hierarchical encoding device, the present invention appropriately generates a predicted video signal highly correlated with the original video signal by appropriately increasing the resolution of the multiplexed data from the video hierarchical encoding device. Another object of the present invention is to provide a video hierarchy decoding device, a video hierarchy decoding method, and a video hierarchy decoding program that can be generated and realize more efficient video hierarchy decoding.

上記課題を解決するために、本発明の代表的な構成は、空間解像度の異なる複数の映像信号に復元可能な多重化データを生成する映像階層符号化装置であって、原映像信号の空間解像度を縮小して基本映像信号を生成する空間デシメーション部と、基本映像信号を符号化して基本符号化データを生成する基本レイヤ符号化部と、基本符号化データを復号して基本復号映像信号を生成する基本レイヤ復号部と、基本復号映像信号の空間解像度を拡大して予測映像信号を生成する空間インターポレーション部と、原映像信号と予測映像信号とを用いて導出された予測誤差信号を符号化して拡張符号化データを生成する拡張レイヤ符号化部と、基本符号化データと拡張符号化データとを多重化して多重化データを生成する多重化部と、を備え、空間インターポレーション部は、基本復号映像信号から高周波成分を分離して高周波分離信号を生成するハイパスフィルタリング部と、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する高周波抽出フィルタリング部と、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する高周波空間インターポレーション部と、高周波分離信号と高周波相当信号との差分である差分信号を生成する差分生成部と、差分信号の空間解像度を拡大して差分高解像度信号を生成する差分空間インターポレーション部と、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成する基本インターポレーション部と、高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成する信号合成部と、を有することを特徴とする。 In order to solve the above-described problem, a representative configuration of the present invention is a video hierarchical encoding device that generates multiplexed data that can be restored to a plurality of video signals having different spatial resolutions. A spatial decimation unit that generates a basic video signal by reducing the size, a basic layer encoding unit that generates basic encoded data by encoding the basic video signal, and generates a basic decoded video signal by decoding the basic encoded data A base layer decoding unit that encodes a prediction error signal derived using the original video signal and the predicted video signal, and a spatial interpolation unit that generates a predicted video signal by expanding the spatial resolution of the basic decoded video signal. And an enhancement layer encoding unit that generates extended encoded data and a multiplexing unit that multiplexes basic encoded data and extended encoded data to generate multiplexed data. The interpolation unit generates a high-frequency separated signal by separating a high-frequency component from the basic decoded video signal, and generates a high-frequency equivalent signal by extracting a high-frequency component corresponding to the original video signal from the high-frequency separated signal The difference between the high-frequency separation signal and the high-frequency equivalent signal, the high-frequency extraction filtering unit, the high-frequency spatial interpolation unit that expands the spatial resolution of the high-frequency equivalent signal through inter-layer estimation processing of the high-frequency component, and generates the high-frequency component high-resolution signal A differential generation unit that generates a differential signal, a differential spatial interpolation unit that generates a differential high-resolution signal by expanding the spatial resolution of the differential signal, and a basic high-resolution by expanding the spatial resolution of the basic decoded video signal Basic interpolation unit that generates signal, high-frequency component high-resolution signal and differential high-resolution signal And having a signal combining unit for generating a prediction picture signal by synthesizing the basic high-resolution signal.

本発明によれば、映像階層符号化の階層間予測における単純な空間インターポレーション（空間的拡大）に、原映像信号の推定を伴った階層間推定処理を加えて階層間の予測誤差信号をより小さくすることで、効率的に高品位な映像階層符号化を実現することが可能となる。また、空間インターポレーションにおいて、基本符号化データを復号した基本復号映像信号（低解像度信号）の符号化劣化や階層間推定処理の特性を考慮して、基本符号化データを復号した基本復号映像信号を、高周波成分のみを通過させた後、推定目的となる原映像信号に相当する高周波成分とそれ以外の信号に分離しそれぞれに適した処理で原映像信号（高解像度信号）を推定するので、予測映像信号の生成をより適切に行うことができ、効率的な映像階層符号化を実現することが可能となる。 According to the present invention, a prediction error signal between layers is obtained by adding an inter-layer estimation process accompanied by an estimation of an original video signal to simple spatial interpolation (spatial expansion) in inter-layer prediction of video hierarchical coding. By making it smaller, it becomes possible to realize high-quality video hierarchical coding efficiently. Also, in spatial interpolation, the basic decoded video obtained by decoding the basic encoded data in consideration of the encoding deterioration of the basic decoded video signal (low resolution signal) obtained by decoding the basic encoded data and the characteristics of the inter-layer estimation process. After passing only the high-frequency component, the signal is separated into a high-frequency component corresponding to the original video signal to be estimated and other signals, and the original video signal (high resolution signal) is estimated by processing suitable for each. Therefore, the predicted video signal can be generated more appropriately, and efficient video hierarchical coding can be realized.

高周波抽出フィルタリング部は、突発的変化を有する映像信号に対する雑音除去を行う非線形ディジタルフィルタであるε−フィルタであり、多重化部は、ε−フィルタパラメータである係数列ａ_ｋおよび閾値εも多重化してよい。 The high-frequency extraction filtering unit is an ε-filter that is a non-linear digital filter that removes noise from a video signal having a sudden change, and the multiplexing unit also multiplexes a coefficient sequence a _k and a threshold ε that are ε-filter parameters. It's okay.

かかるε−フィルタの構成により、階層間推定処理に適した高周波成分のみが階層間推定処理に与えられ、残りの差分信号は高周波成分には特化しない処理によって拡大される。また、パラメータａ_ｋおよびεを最終的な多重化データに含めることで、映像階層復号装置においても同一のε−フィルタを用いた適切な映像階層復号が可能となる。 With this ε-filter configuration, only the high-frequency component suitable for the inter-layer estimation process is given to the inter-layer estimation process, and the remaining difference signals are expanded by a process that is not specialized for the high-frequency component. In addition, by including the parameters _ak and ε in the final multiplexed data, appropriate video layer decoding using the same ε-filter is also possible in the video layer decoding device.

本発明の代表的な他の構成は、空間解像度の異なる複数の映像信号に復元可能な多重化データを生成する映像階層符号化方法であって、原映像信号の空間解像度を縮小して基本映像信号を生成し、基本映像信号を符号化して基本符号化データを生成し、基本符号化データを復号して基本復号映像信号を生成し、基本復号映像信号から高周波成分を分離して高周波分離信号を生成し、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成し、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成し、高周波分離信号と高周波相当信号との差分である差分信号を生成し、差分信号の空間解像度を拡大して差分高解像度信号を生成し、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成し、高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成し、原映像信号と予測映像信号とを用いて導出された予測誤差信号を符号化して拡張符号化データを生成し、基本符号化データと拡張符号化データとを多重化して多重化データを生成することを特徴とする。 Another representative configuration of the present invention is a video hierarchical encoding method for generating multiplexed data that can be restored to a plurality of video signals having different spatial resolutions, and reducing the spatial resolution of an original video signal to generate a basic video Generate signal, encode basic video signal to generate basic encoded data, decode basic encoded data to generate basic decoded video signal, separate high frequency components from basic decoded video signal, and separate high frequency separated signal The high frequency component corresponding to the original video signal is extracted from the high frequency separation signal to generate a high frequency equivalent signal, and the spatial resolution of the high frequency equivalent signal is expanded through the inter-layer estimation process of the high frequency component, and the high frequency component high resolution signal is generated. To generate a differential signal that is the difference between the high-frequency separated signal and the high-frequency equivalent signal, expand the spatial resolution of the differential signal to generate a differential high-resolution signal, and generate the space of the basic decoded video signal A basic high-resolution signal is generated by enlarging the image, and a high-frequency component high-resolution signal, a differential high-resolution signal, and a basic high-resolution signal are combined to generate a predicted video signal, and the original video signal and the predicted video signal are combined. The prediction error signal derived using the method is encoded to generate extended encoded data, and the basic encoded data and the extended encoded data are multiplexed to generate multiplexed data.

本発明の代表的な他の構成は、コンピュータを、空間解像度の異なる複数の映像信号に復元可能な多重化データを生成する映像階層符号化装置として機能させる映像階層符号化プログラムであって、コンピュータを、原映像信号の空間解像度を縮小して基本映像信号を生成する空間デシメーション部と、基本映像信号を符号化して基本符号化データを生成する基本レイヤ符号化部と、基本符号化データを復号して基本復号映像信号を生成する基本レイヤ復号部と、基本復号映像信号から高周波成分を分離して高周波分離信号を生成するハイパスフィルタリング部、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する高周波抽出フィルタリング部、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する高周波空間インターポレーション部、高周波分離信号と高周波相当信号との差分である差分信号を生成する差分生成部、差分信号の空間解像度を拡大して差分高解像度信号を生成する差分空間インターポレーション部、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成する基本インターポレーション部、および高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成する信号合成部、を有する空間インターポレーション部と、原映像信号と予測映像信号とを用いて導出された予測誤差信号を符号化して拡張符号化データを生成する拡張レイヤ符号化部と、基本符号化データと拡張符号化データとを多重化して多重化データを生成する多重化部と、して機能させることを特徴とする。 Another typical configuration of the present invention is a video hierarchical encoding program that causes a computer to function as a video hierarchical encoding device that generates multiplexed data that can be restored to a plurality of video signals having different spatial resolutions. A spatial decimation unit that generates a basic video signal by reducing the spatial resolution of the original video signal, a base layer encoding unit that generates basic encoded data by encoding the basic video signal, and decodes the basic encoded data A base layer decoding unit that generates a basic decoded video signal, a high-pass filtering unit that generates a high frequency separated signal by separating high frequency components from the basic decoded video signal, and extracts a high frequency component corresponding to the original video signal from the high frequency separated signal High-frequency extraction filtering unit that generates a high-frequency equivalent signal, and the spatial resolution of the high-frequency equivalent signal is estimated between layers of high-frequency components. A high-frequency spatial interpolation unit that expands through processing to generate a high-frequency component high-resolution signal, a differential generation unit that generates a differential signal that is the difference between the high-frequency separation signal and the high-frequency equivalent signal, and a difference by expanding the spatial resolution of the differential signal Differential spatial interpolation unit that generates a high-resolution signal, basic interpolation unit that generates a basic high-resolution signal by expanding the spatial resolution of the basic decoded video signal, and high-frequency component high-resolution signal, differential high-resolution signal, and basic A spatial interpolation unit having a signal synthesis unit that synthesizes a high-resolution signal and generates a predicted video signal, and encodes a prediction error signal derived by using the original video signal and the predicted video signal, and an extended code An enhancement layer encoding unit for generating encoded data, and multiplexing the basic encoded data and the extended encoded data to A multiplexing unit for forming, characterized in that to function with.

上述した映像階層符号化装置における技術的思想に対応する構成要素やその説明は、当該映像階層符号化方法、映像階層符号化プログラムにも適用可能である。 The component corresponding to the technical idea in the video hierarchy encoding apparatus mentioned above and its description are applicable also to the said video hierarchy encoding method and a video hierarchy encoding program.

本発明の代表的な他の構成は、空間スケーラビリティが施された多重化データから空間解像度の異なる複数の映像信号を復元可能な映像階層復号装置であって、多重化データから、少なくとも基本符号化データおよび拡張符号化データを含む、空間解像度の異なる複数の符号化データを分離するエクストラクト部と、基本符号化データを復号して基本復号映像信号を生成する基本レイヤ復号部と、基本復号映像信号の空間解像度を拡大して予測映像信号を生成する空間インターポレーション部と、拡張符号化データを復号した予測誤差信号と予測映像信号とを用いて原映像信号を復元する拡張レイヤ復号部と、を備え、空間インターポレーション部は、基本復号映像信号から高周波成分を分離して高周波分離信号を生成するハイパスフィルタリング部と、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する高周波抽出フィルタリング部と、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する高周波空間インターポレーション部と、高周波分離信号と高周波相当信号との差分である差分信号を生成する差分生成部と、差分信号の空間解像度を拡大して差分高解像度信号を生成する差分空間インターポレーション部と、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成する基本インターポレーション部と、高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成する信号合成部と、を有することを特徴とする。 Another representative configuration of the present invention is a video hierarchical decoding apparatus capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability, and at least basic encoding is performed from the multiplexed data. An extract unit that separates a plurality of encoded data having different spatial resolutions, including a data and extended encoded data, a base layer decoding unit that decodes the basic encoded data to generate a basic decoded video signal, and a basic decoded video A spatial interpolation unit that generates a predicted video signal by expanding the spatial resolution of the signal, an enhancement layer decoding unit that restores an original video signal using a prediction error signal and a predicted video signal obtained by decoding the extended encoded data; The high-pass filter that generates a high-frequency separation signal by separating high-frequency components from the basic decoded video signal A high-frequency extraction filtering unit that extracts a high-frequency component corresponding to the original video signal from the high-frequency separation signal to generate a high-frequency equivalent signal, and expands the spatial resolution of the high-frequency equivalent signal through inter-layer estimation processing of the high-frequency component. A high-frequency spatial interpolation unit that generates a high-frequency component high-resolution signal, a difference generation unit that generates a difference signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal, and a differential high-resolution by expanding the spatial resolution of the difference signal A differential spatial interpolation unit that generates a signal, a basic interpolation unit that generates a basic high-resolution signal by expanding the spatial resolution of the basic decoded video signal, a high-frequency component high-resolution signal, a differential high-resolution signal, and a basic high-level signal And a signal synthesizer that synthesizes the resolution signal to generate a predicted video signal.

本発明によれば、映像階層復号の階層間予測における単純な空間インターポレーション（空間的拡大）に、原映像信号の推定を伴った階層間推定処理を加えて階層間の予測誤差信号をより小さくすることで、効率的に高品位な映像階層復号を実現することが可能となる。また、空間インターポレーションにおいて、基本符号化データを復号した基本復号映像信号（低解像度信号）の符号化劣化や階層間推定処理の特性を考慮して、基本符号化データを復号した基本復号映像信号を、高周波成分のみを通過させた後、推定目的となる原映像信号に相当する高周波成分とそれ以外の信号に分離しそれぞれに適した処理で原映像信号（高解像度信号）を推定するので、原映像信号の復元をより適切に行うことができ、効率的な映像階層復号を実現することが可能となる。 According to the present invention, inter-layer estimation processing accompanied by estimation of an original video signal is added to simple spatial interpolation (spatial expansion) in inter-layer prediction of video layer decoding, so that a prediction error signal between layers can be further improved. By reducing the size, it is possible to efficiently realize high-quality video hierarchical decoding. Also, in spatial interpolation, the basic decoded video obtained by decoding the basic encoded data in consideration of the encoding deterioration of the basic decoded video signal (low resolution signal) obtained by decoding the basic encoded data and the characteristics of the inter-layer estimation process. After passing only the high-frequency component, the signal is separated into a high-frequency component corresponding to the original video signal to be estimated and other signals, and the original video signal (high resolution signal) is estimated by processing suitable for each. Thus, the original video signal can be restored more appropriately, and efficient video hierarchical decoding can be realized.

高周波抽出フィルタリング部は、突発的変化を有する映像信号に対する雑音除去を行う非線形ディジタルフィルタであるε−フィルタであり、多重化データには、ε−フィルタのパラメータである係数列ａ_ｋおよび閾値εも含まれていてもよい。 The high-frequency extraction filtering unit is an ε-filter that is a non-linear digital filter that removes noise from a video signal having a sudden change, and the multiplexed data includes a coefficient sequence a _k and a threshold ε that are parameters of the ε-filter. It may be included.

かかるε−フィルタの構成により、階層間推定処理に適した高周波成分のみが階層間推定処理に与えられ、残りの差分信号は高周波成分には特化しない処理によって拡大される。また、パラメータａ_ｋおよびεを多重化データから取得することによって映像階層符号化装置におけるε−フィルタ処理と同一の処理を実行することが可能となり、適切な映像階層復号が可能となる。 With this ε-filter configuration, only the high-frequency component suitable for the inter-layer estimation process is given to the inter-layer estimation process, and the remaining difference signals are expanded by a process that is not specialized for the high-frequency component. Also, by obtaining the parameters _ak and ε from the multiplexed data, it is possible to execute the same processing as the ε-filter processing in the video layer encoding apparatus, and appropriate video layer decoding is possible.

本発明の代表的な他の構成は、空間スケーラビリティが施された多重化データから空間解像度の異なる複数の映像信号を復元可能な映像階層復号方法であって、多重化データから、少なくとも基本符号化データおよび拡張符号化データを含む、空間解像度の異なる複数の符号化データを分離し、基本符号化データを復号して基本復号映像信号を生成し、基本復号映像信号から高周波成分を分離して高周波分離信号を生成し、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成し、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成し、高周波分離信号と高周波相当信号との差分である差分信号を生成し、差分信号の空間解像度を拡大して差分高解像度信号を生成し、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成し、高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成し、拡張符号化データを復号した予測誤差信号と予測映像信号とを用いて原映像信号を復元することを特徴とする。 Another representative configuration of the present invention is a video hierarchical decoding method capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability, and includes at least basic encoding from the multiplexed data. Separate multiple encoded data with different spatial resolution, including data and extended encoded data, decode basic encoded data to generate basic decoded video signal, separate high frequency components from basic decoded video signal, Generate a separated signal, extract the high frequency component corresponding to the original video signal from the high frequency separated signal to generate a high frequency equivalent signal, expand the spatial resolution of the high frequency equivalent signal through the inter-layer estimation process of the high frequency component, and increase the high frequency component high Generates a resolution signal, generates a differential signal that is the difference between the high-frequency separated signal and the high-frequency equivalent signal, expands the spatial resolution of the differential signal, and performs high-resolution differential Generate a signal, expand the spatial resolution of the basic decoded video signal to generate a basic high resolution signal, and combine the high-frequency component high resolution signal, the differential high resolution signal, and the basic high resolution signal to generate a predicted video signal The original video signal is restored using the prediction error signal obtained by decoding the extended encoded data and the prediction video signal.

本発明の代表的な他の構成は、コンピュータを、
空間スケーラビリティが施された多重化データから空間解像度の異なる複数の映像信号を復元可能な映像階層復号装置として機能させる映像階層復号プログラムであって、コンピュータを、多重化データから、少なくとも基本多重化データおよび拡張多重化データを含む、空間解像度の異なる複数の符号化データを分離するエクストラクト部と、基本符号化データを復号して基本復号映像信号を生成する基本レイヤ復号部と、基本復号映像信号から高周波成分を分離して高周波分離信号を生成するハイパスフィルタリング部、高周波分離信号から原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する高周波抽出フィルタリング部、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する高周波空間インターポレーション部、高周波分離信号と高周波相当信号との差分である差分信号を生成する差分生成部、差分信号の空間解像度を拡大して差分高解像度信号を生成する差分空間インターポレーション部、基本復号映像信号の空間解像度を拡大して基本高解像度信号を生成する基本インターポレーション部と、および高周波成分高解像度信号と差分高解像度信号と基本高解像度信号とを合成して予測映像信号を生成する信号合成部、を有する空間インターポレーション部と、拡張符号化データを復号した予測誤差信号と予測映像信号とを用いて原映像信号を復元する拡張レイヤ復号部と、して機能させることを特徴とする。 Another representative configuration of the present invention includes a computer,
A video hierarchical decoding program for functioning as a video hierarchical decoding device capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability, wherein the computer is configured to at least basic multiplexed data from the multiplexed data. An extract unit for separating a plurality of encoded data having different spatial resolutions, a base layer decoding unit for decoding basic encoded data to generate a basic decoded video signal, and a basic decoded video signal High-pass filtering unit that generates a high-frequency separation signal by separating high-frequency components from a high-frequency filtering unit that generates a high-frequency equivalent signal by extracting a high-frequency component corresponding to the original video signal from the high-frequency separation signal, spatial resolution of the high-frequency equivalent signal High-frequency component high resolution High-frequency spatial interpolation unit that generates a degree signal, a difference generation unit that generates a difference signal that is a difference between a high-frequency separation signal and a high-frequency equivalent signal, and a difference that generates a differential high-resolution signal by expanding the spatial resolution of the difference signal Spatial interpolation unit, basic interpolation unit that expands the spatial resolution of basic decoded video signal to generate basic high resolution signal, and synthesizes high frequency component high resolution signal, differential high resolution signal and basic high resolution signal A spatial interpolation unit having a signal synthesis unit that generates a predicted video signal, and an enhancement layer decoding unit that restores the original video signal using the prediction error signal obtained by decoding the extended encoded data and the predicted video signal; , And function.

上述した映像階層復号装置における技術的思想に対応する構成要素やその説明は、当該映像階層復号方法、映像階層復号プログラムにも適用可能である。 The component corresponding to the technical idea in the video hierarchy decoding apparatus mentioned above and its description are applicable also to the said video hierarchy decoding method and a video hierarchy decoding program.

以上説明したように本発明によれば、基本符号化データを復号した映像信号を適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成し、より効率的な映像階層符号化を実現することが可能となる。また、このような映像階層符号化装置に対応して、映像階層符号化装置からの多重化データを適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成し、より効率的な映像階層復号を実現することも可能となる。 As described above, according to the present invention, the video signal obtained by decoding the basic encoded data is appropriately subjected to high-resolution processing, thereby generating a predicted video signal having a high correlation with the original video signal, and a more efficient video hierarchy. Encoding can be realized. In addition, in response to such a video hierarchical encoding device, by appropriately increasing the resolution of the multiplexed data from the video hierarchical encoding device, a predicted video signal highly correlated with the original video signal is generated, and more It is also possible to realize efficient video hierarchical decoding.

以下に添付図面を参照しながら、本発明の好適な実施形態について詳細に説明する。かかる実施形態に示す寸法、材料、その他具体的な数値などは、発明の理解を容易とするための例示にすぎず、特に断る場合を除き、本発明を限定するものではない。なお、本明細書及び図面において、実質的に同一の機能、構成を有する要素については、同一の符号を付することにより重複説明を省略し、また本発明に直接関係のない要素は図示を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The dimensions, materials, and other specific numerical values shown in the embodiment are merely examples for facilitating understanding of the invention, and do not limit the present invention unless otherwise specified. In the present specification and drawings, elements having substantially the same function and configuration are denoted by the same reference numerals, and redundant description is omitted, and elements not directly related to the present invention are not illustrated. To do.

本実施形態では、映像階層符号化において、階層間の予測効率を上げるための階層間の推定処理を導入している。このような階層間推定処理により、低解像度の基本復号映像信号の空間解像度を拡大する際、可能な限り原映像信号に近づけることができ、拡張レイヤにおける予測効率を向上させることで階層間の予測誤差を低減することが可能となる。 In this embodiment, estimation processing between layers is introduced in order to increase prediction efficiency between layers in video layer coding. With such an inter-layer estimation process, when the spatial resolution of a low-resolution basic decoded video signal is expanded, it can be as close to the original video signal as possible, and inter-layer prediction can be achieved by improving the prediction efficiency in the enhancement layer. The error can be reduced.

かかる階層間推定処理は、原映像信号との相関を高める効果を奏するが、単純に空間インターポレーションに適用すると、基本レイヤの符号化における量子化ステップ幅が大きい場合に、復号した映像信号の量子化誤差が大きく、符号化劣化によるノイズが増え、原映像信号との相関が低くなる。そこで、本実施形態では、階層間の予測映像信号を生成する上で、符号化劣化の影響や階層間推定処理の特性を考慮する。 Such an inter-layer estimation process has the effect of increasing the correlation with the original video signal. However, when simply applied to spatial interpolation, when the quantization step width in the encoding of the base layer is large, the decoding of the decoded video signal is performed. The quantization error is large, the noise due to coding degradation increases, and the correlation with the original video signal becomes low. Therefore, in the present embodiment, the influence of coding degradation and the characteristics of the inter-layer estimation process are taken into account when generating the predicted video signal between the hierarchies.

本実施形態では、主にエッジ（輪郭）の推定に優れている階層間推定処理を採用している。従って、階層間推定処理の前に、エッジを保存し小振幅信号のみを除去するフィルタリングを施す。かかるフィルタリングによって、符号化劣化した信号を、空間インターポレーションの推定目的であるエッジ（原映像信号に相当する高周波成分）とそれ以外のノイズを含んでいる部分とに分離し、階層間推定処理をエッジにのみ作用させることで、符号化劣化した低解像度信号から原映像信号により近い高解像度信号を導出することができる。ここでは、エッジを保存するフィルタリングにおける副作用的な信号劣化を抑えるため、その前段にハイパスフィルタリングを施し、通過した高周波成分のみを、エッジを保存するフィルタリングの対象としている。 In the present embodiment, an inter-layer estimation process that mainly excels in edge (contour) estimation is employed. Therefore, before the inter-layer estimation process, filtering for preserving edges and removing only small amplitude signals is performed. Through such filtering, the signal with degraded coding is separated into edges (high-frequency components corresponding to the original video signal), which are the purpose of spatial interpolation estimation, and other noise-containing portions, and inter-layer estimation processing is performed. By acting on only the edge, it is possible to derive a high-resolution signal closer to the original video signal from the low-resolution signal that has deteriorated encoding. Here, in order to suppress side-effect signal degradation in filtering that preserves edges, high-pass filtering is performed in the preceding stage, and only the high-frequency components that have passed through are subjected to filtering that preserves edges.

また、上述したエッジを保存するフィルタリングによって分離された小振幅の差分信号は、テクスチャ等も含んでいるので、エッジ部推定処理（階層間推定処理）後にエッジ部と再合成している。上述した階層間推定処理を実現するための映像階層符号化装置および映像階層復号装置の具体的な構成を、それぞれ実施形態を分けて以下に示す。また、本実施形態では、理解を容易にするため基本レイヤと拡張レイヤとの２階層の階層符号化を例として挙げているが、かかる場合に限られず、３階層、４階層、…等の３階層以上の多階層に適用できることは言うまでもない。また、基本レイヤの解像度を任意に選択することも当然にして可能である。 In addition, since the small-amplitude difference signal separated by the filtering for storing the edge described above includes texture and the like, it is re-synthesized with the edge portion after the edge portion estimation processing (inter-layer estimation processing). Specific configurations of the video layer encoding device and the video layer decoding device for realizing the above-described inter-layer estimation processing are shown below by dividing the embodiments. Further, in this embodiment, in order to facilitate understanding, two-level hierarchical encoding of a base layer and an enhancement layer is given as an example. However, the present invention is not limited to this, and three levels such as three levels, four levels,. Needless to say, it can be applied to multiple hierarchies. Of course, it is possible to arbitrarily select the resolution of the base layer.

（第１の実施形態：映像階層符号化装置１００）
図１は、映像階層符号化装置１００の概略的な構成を示した機能ブロック図である。映像階層符号化装置１００は、オリジナルとなる原映像信号が入力され、空間解像度スケーラビリティを施したビットストリームが通信回線または、ブルーレイディスクやＤＶＤといった記録媒体を通じて映像階層復号装置に伝達される。映像階層符号化装置１００は、空間デシメーション部１１０と、基本レイヤ符号化部１１２と、基本レイヤ復号部１１４と、空間インターポレーション部１１６と、拡張レイヤ符号化部１１８と、多重化部１２０とを含んで構成される。 (First embodiment: video hierarchy encoding apparatus 100)
FIG. 1 is a functional block diagram illustrating a schematic configuration of the video hierarchical encoding device 100. The video hierarchical encoding apparatus 100 receives an original video signal as an original, and transmits a bit stream subjected to spatial resolution scalability to the video hierarchical decoding apparatus through a communication line or a recording medium such as a Blu-ray disc or a DVD. The video hierarchy encoding apparatus 100 includes a spatial decimation unit 110, a base layer encoding unit 112, a base layer decoding unit 114, a spatial interpolation unit 116, an enhancement layer encoding unit 118, a multiplexing unit 120, It is comprised including.

空間デシメーション（decimation）部１１０は、当該映像階層符号化装置１００に入力される原映像信号の空間解像度を基本レイヤによって規定される所定の解像度に縮小し基本映像信号を生成し、基本レイヤ符号化部１１２に伝達する。解像度の縮小（空間デシメーション）は、既存の様々な解像度縮小方法を適用することが可能であるが、後述する空間インターポレーション部１１６におけるラプラシアンピラミッドによる階層間推定処理に対応した解像度縮小方法を適用することが望ましい。また、原映像信号や基本映像信号の解像度の変化に対応するため縮小率を設定することもできる。例えば、基本レイヤの解像度をＲＢ、拡張レイヤの解像度（ここでは原映像信号の解像度）をＲＥとすると、原映像信号の解像度から基本レイヤの解像度に空間デシメーションする際の縮小率は、ＲＢ／ＲＥに設定されるとよい。 The spatial decimation unit 110 generates a basic video signal by reducing the spatial resolution of the original video signal input to the video hierarchical encoding apparatus 100 to a predetermined resolution defined by the basic layer, and generates a basic layer encoding Transmitted to the unit 112. For resolution reduction (spatial decimation), various existing resolution reduction methods can be applied, but a resolution reduction method corresponding to inter-layer estimation processing by a Laplacian pyramid in the spatial interpolation unit 116 described later is applied. It is desirable to do. In addition, a reduction ratio can be set to cope with a change in resolution of the original video signal and the basic video signal. For example, assuming that the resolution of the base layer is RB and the resolution of the enhancement layer (here, the resolution of the original video signal) is RE, the reduction rate when spatially decimating from the resolution of the original video signal to the resolution of the basic layer is RB / RE. It is good to set to.

基本レイヤ符号化（エンコード）部１１２は、空間デシメーション部１１０によって縮小された基本映像信号を符号化して基本符号化データ（ビットストリーム）を生成し、基本レイヤ復号部１１４および多重化部１２０に伝達する。基本レイヤ符号化部１１２における符号化の方法は、例えば、ＭＰＥＧ２やＨ．２６４等におけるクローズドループのエンコーダが用いられる。ここで、時間方向のスケーラビリティやＳＮＲスケーラビリティなどの機能を含んでいてもよい。また、クローズドループのみならず、オープンループのエンコーダを用いた場合、そのエンコーダには復号（リコンストラクト）機能を含むものとする。 The base layer encoding (encoding) unit 112 encodes the basic video signal reduced by the spatial decimation unit 110 to generate basic encoded data (bit stream), and transmits the basic encoded data (bit stream) to the base layer decoding unit 114 and the multiplexing unit 120. To do. The encoding method in the base layer encoding unit 112 is, for example, MPEG2 or H.264. A closed loop encoder such as H.264 is used. Here, functions such as scalability in the time direction and SNR scalability may be included. When an open-loop encoder is used as well as a closed-loop, the encoder includes a decoding (reconstruction) function.

基本レイヤ復号部１１４は、基本レイヤ符号化部１１２で符号化された基本符号化データをさらに復号して基本復号映像信号を生成する。 Base layer decoding section 114 further decodes the base encoded data encoded by base layer encoding section 112 to generate a base decoded video signal.

空間インターポレーション部１１６は、基本レイヤ復号部１１４が復号した基本復号映像信号の空間解像度を空間的に拡大して予測映像信号を生成する。かかる予測映像信号は、原映像信号の推定信号であり、本実施形態では、基本復号映像信号を適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成している。本実施形態において特徴的な当該空間インターポレーション部１１６の詳細な構成は後程説明する。空間インターポレーション部１１６は、生成された予測映像信号を拡張レイヤ符号化部１１８に送信すると共に、その空間インターポレーションに用いた各パラメータを符号化した符号化パラメータを多重化部１２０に伝達する。 The spatial interpolation unit 116 spatially enlarges the spatial resolution of the basic decoded video signal decoded by the base layer decoding unit 114 to generate a predicted video signal. Such a predicted video signal is an estimated signal of the original video signal, and in this embodiment, a predicted video signal having a high correlation with the original video signal is generated by appropriately increasing the resolution of the basic decoded video signal. A detailed configuration of the spatial interpolation unit 116 that is characteristic in the present embodiment will be described later. The spatial interpolation unit 116 transmits the generated predicted video signal to the enhancement layer encoding unit 118 and transmits the encoding parameters obtained by encoding the parameters used for the spatial interpolation to the multiplexing unit 120. To do.

拡張レイヤ符号化部１１８は、オリジナルの原映像信号と空間インターポレーション部１１６からの予測映像信号とを用い、空間解像度間および時間の相関を利用して予測誤差信号を導出し、その予測誤差信号を符号化して拡張符号化データ（ビットストリーム）を生成する。かかる拡張レイヤ符号化部１１８に関しても詳細な構成は後程説明する。拡張レイヤ符号化部１１８は、生成した拡張符号化データを多重化部１２０に送信する。 The enhancement layer encoding unit 118 uses the original original video signal and the predicted video signal from the spatial interpolation unit 116, derives a prediction error signal using spatial correlation and temporal correlation, and generates the prediction error. The signal is encoded to generate extended encoded data (bit stream). A detailed configuration of the enhancement layer encoding unit 118 will be described later. The enhancement layer encoding unit 118 transmits the generated extension encoded data to the multiplexing unit 120.

多重化部１２０は、基本レイヤ符号化部１１２からの基本符号化データ、空間インターポレーション部１１６からの符号化パラメータ、および拡張レイヤ符号化部１１８からの拡張符号化データを受信し、それらを多重化して一つの多重化データ（ビットストリーム）を生成し、通信回線１３０やメディア（記憶媒体）１３２に出力する。 The multiplexing unit 120 receives the basic encoded data from the base layer encoding unit 112, the encoding parameters from the spatial interpolation unit 116, and the extended encoded data from the enhancement layer encoding unit 118, Multiplexed to generate one multiplexed data (bit stream) and output to the communication line 130 or the medium (storage medium) 132.

図２は、上述した映像階層符号化装置１００によりオリジナルの原映像信号を空間スケーラビリティする映像階層符号化方法の具体的な処理を示したフローチャートである。 FIG. 2 is a flowchart showing a specific process of a video hierarchical encoding method for spatially scaling an original original video signal by the video hierarchical encoding device 100 described above.

まず、空間デシメーション部１１０は、原映像信号の空間解像度を縮小して基本映像信号を生成し（Ｓ１５０）、基本レイヤ符号化部１１２は、このように生成された基本映像信号を符号化して、基本符号化データとしてのビットストリームを生成する（Ｓ１５２）。生成されたビットストリームは多重化部１２０に伝達されると共に基本レイヤ復号部１１４に伝達される。 First, the spatial decimation unit 110 generates a basic video signal by reducing the spatial resolution of the original video signal (S150), and the basic layer encoding unit 112 encodes the generated basic video signal, A bit stream as basic encoded data is generated (S152). The generated bitstream is transmitted to the multiplexing unit 120 and also to the base layer decoding unit 114.

基本レイヤ復号部１１４は、基本符号化データを復号して基本復号映像信号を生成し（Ｓ１５４）、空間インターポレーション部１１６は、基本復号映像信号の空間解像度を高周波成分の階層間推定処理を通じて拡大し予測映像信号を生成する（Ｓ１５６）。また、空間インターポレーション部１１６は、空間インターポレーションに用いたパラメータを符号化して多重化部１２０に送信する。 The base layer decoding unit 114 decodes the base encoded data to generate a base decoded video signal (S154), and the spatial interpolation unit 116 determines the spatial resolution of the base decoded video signal through the high-frequency component inter-layer estimation process. The enlarged video signal is generated (S156). The spatial interpolation unit 116 encodes the parameters used for the spatial interpolation and transmits the encoded parameters to the multiplexing unit 120.

そして、拡張レイヤ符号化部１１８は、原映像信号と、空間インターポレーション部１１６が生成した予測映像信号とを用いて予測誤差信号を導出し（Ｓ１５８）、その予測誤差信号を符号化して、拡張符号化データとしてのビットストリームを生成する（Ｓ１６０）。このように生成された拡張符号化データは、基本符号化データや符号化パラメータと共に多重化して（Ｓ１６２）、通信回線１３０やメディア１３２に出力される。 Then, the enhancement layer encoding unit 118 derives a prediction error signal using the original video signal and the prediction video signal generated by the spatial interpolation unit 116 (S158), encodes the prediction error signal, A bit stream as extended encoded data is generated (S160). The extended encoded data generated in this way is multiplexed together with the basic encoded data and the encoding parameters (S162) and output to the communication line 130 and the medium 132.

（空間インターポレーション部１１６、拡張レイヤ符号化部１１８）
次に、本実施形態において特徴的な空間インターポレーション部１１６および拡張レイヤ符号化部１１８を詳述する。 (Spatial interpolation unit 116, enhancement layer encoding unit 118)
Next, the spatial interpolation unit 116 and the enhancement layer encoding unit 118 that are characteristic in the present embodiment will be described in detail.

図３は、空間インターポレーション部１１６および拡張レイヤ符号化部１１８の構成を示した機能ブロック図である。空間インターポレーション部１１６は、第１ハイパスフィルタリング部２０８と、高周波抽出フィルタリング部２１０と、高周波空間インターポレーション部２１２と、差分生成部２１４と、差分空間インターポレーション部２１６と、基本インターポレーション部２１８、第１信号合成部２２０と、第１エントロピー符号化部２２２とを含んで構成される。 FIG. 3 is a functional block diagram showing configurations of spatial interpolation section 116 and enhancement layer encoding section 118. The spatial interpolation unit 116 includes a first high-pass filtering unit 208, a high-frequency extraction filtering unit 210, a high-frequency spatial interpolation unit 212, a difference generation unit 214, a difference spatial interpolation unit 216, and a basic interpolation. Configuration unit 218, first signal synthesis unit 220, and first entropy encoding unit 222.

第１ハイパスフィルタリング部（ハイパスフィルタリング部）２０８は、基本レイヤ復号部１１４で復号された基本復号映像信号をそのまま受け、高周波成分としてラプラシアン成分を抽出する。このように、符号化劣化を許容できる場合には、符号化劣化を抑制するための所定のフィルタリング処理を介さず、基本レイヤ復号部１１４で復号された基本復号映像信号をそのまま利用してラプラシアン成分を生成することで、復号画像中に含まれるより多くの周波数成分を利用しながら高品質なラプラシアン成分を得ることができる。ここでは、理解を容易にするため、１次元の信号モデル（数式１、２）を例に挙げ、入力信号をＧ_０（ｘ）、入力信号から抽出されるラプラシアン成分をＬ_０（ｘ）とする。

…（数式１）

…（数式２）
ここで、ρは、ガウシアンフィルタの帯域を調整するためのパラメータである。第１ハイパスフィルタリング部２０８は、入力信号から抽出したラプラシアン成分の高周波分離信号を高周波抽出フィルタリング部２１０へ出力する。 A first high-pass filtering unit (high-pass filtering unit) 208 receives the basic decoded video signal decoded by the base layer decoding unit 114 as it is, and extracts a Laplacian component as a high-frequency component. In this way, when encoding degradation can be tolerated, the Laplacian component can be used by directly using the basic decoded video signal decoded by the base layer decoding unit 114 without performing a predetermined filtering process for suppressing the encoding deterioration. By generating, high-quality Laplacian components can be obtained while utilizing more frequency components included in the decoded image. Here, for ease of understanding, a one-dimensional signal model (Equations 1 and 2) is taken as an example, the input signal is G ₀ (x), and the Laplacian component extracted from the input signal is L ₀ (x). To do.

... (Formula 1)

... (Formula 2)
Here, ρ is a parameter for adjusting the band of the Gaussian filter. The first high pass filtering unit 208 outputs the high frequency separation signal of the Laplacian component extracted from the input signal to the high frequency extraction filtering unit 210.

ここでは、第１ハイパスフィルタリング部２０８の一例として、ガウシアン関数を用いて高周波成分を抽出しているが、これを他の方法に置き換えることも可能である。ただし、ここで用いるフィルタや補間関数等と、空間デシメーション部１１０および後述する第２ハイパスフィルタリング部２３６に用いるフィルタや補間関数等の関係は、ピラミッド構成を満たしていることが望ましい。例えば、空間デシメーション部１１０にｓｉｎｃ関数を用いた場合、第１ハイパスフィルタリング部２０８および第２ハイパスフィルタリング部２３６にもｓｉｎｃ関数を用いることでｓｉｎｃ関数によるピラミッド構成の関係を構築することができる。 Here, as an example of the first high-pass filtering unit 208, high-frequency components are extracted using a Gaussian function, but this can be replaced with other methods. However, it is desirable that the relationship between the filter and interpolation function used here and the filter and interpolation function used in the spatial decimation unit 110 and the second high-pass filtering unit 236 described later satisfy the pyramid configuration. For example, when the sinc function is used for the spatial decimation unit 110, the sinc function can be used for the first high-pass filtering unit 208 and the second high-pass filtering unit 236 to build a pyramid configuration relationship based on the sinc function.

高周波抽出フィルタリング部２１０は、例えば、突発的変化を有する映像信号に対する雑音除去を効果的に行う非線形ディジタルフィルタであるε−フィルタで構成され、第１ハイパスフィルタリング部２０８で高周波成分が分離された高周波分離信号から、原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する。 The high-frequency extraction filtering unit 210 includes, for example, an ε-filter that is a non-linear digital filter that effectively removes noise from a video signal having a sudden change, and the first high-pass filtering unit 208 separates high-frequency components. A high frequency component corresponding to the original video signal is extracted from the separated signal to generate a high frequency equivalent signal.

後述する高周波空間インターポレーション部２１２による高周波成分の階層間推定処理は、主に高解像度信号におけるエッジ（輪郭）を推定する能力に優れている。一方、上述したようにε−フィルタはエッジを保存し、小振幅を除去するフィルタである。従って、符号化劣化した基本復号映像信号に対してε−フィルタを設けることにより、推定目的であるエッジ部分と、符号化劣化を含み強調すべきでない、エッジ以外の成分とに分離することができ、エッジ部分にのみ階層間推定処理を適用することが可能となる。なお、１次元ε−フィルタは次の数式３、４で与えられ、信号の水平および垂直方向の両方に対して処理を実行する。

…（数式３）

…（数式４）
ここで、ｙ（）はε−フィルタの出力、ｘ（）はε−フィルタに対する入力信号、ａ_ｋはフィルタの重みを決める係数列、εはエッジを決める閾値である。ａ_ｋおよびεは、ε−フィルタの特性を決めるパラメータとなり、任意に設定することができる。 The high-frequency component inter-layer estimation processing by the high-frequency spatial interpolation unit 212 described later is mainly excellent in the ability to estimate an edge (contour) in a high-resolution signal. On the other hand, as described above, the ε-filter is a filter that preserves edges and removes small amplitudes. Therefore, by providing an ε-filter for the basic decoded video signal that has been deteriorated in coding, it is possible to separate the edge portion that is the estimation purpose and the components other than the edge that should not be emphasized, including the deterioration in coding. Thus, the inter-layer estimation process can be applied only to the edge portion. The one-dimensional ε-filter is given by the following mathematical formulas 3 and 4, and performs processing for both the horizontal and vertical directions of the signal.

... (Formula 3)

... (Formula 4)
Here, y () is the output of the ε-filter, x () is an input signal to the ε-filter, a _k is a coefficient sequence that determines the weight of the filter, and ε is a threshold that determines the edge. a _k and ε are parameters that determine the characteristics of the ε-filter and can be set arbitrarily.

高周波抽出フィルタリング部２１０は、ε−フィルタを施した信号を高周波空間インターポレーション部２１２および差分生成部２１４に出力し、また、パラメータａ_ｋおよびεを第１エントロピー符号化部２２２に伝達する。こうしてパラメータａ_ｋおよびεも最終的な多重化データに含められる。パラメータａ_ｋおよびεを最終的な多重化データに含めることで、映像階層復号装置においても同一のε−フィルタを用いた適切な映像階層復号が可能となる。 The high frequency extraction filtering unit 210 outputs the signal subjected to the ε-filter to the high frequency spatial interpolation unit 212 and the difference generation unit 214, and transmits the parameters _ak and ε to the first entropy encoding unit 222. Thus, the parameters a _k and ε are also included in the final multiplexed data. By including the parameters _ak and ε in the final multiplexed data, appropriate video layer decoding using the same ε-filter becomes possible in the video layer decoding device.

高周波空間インターポレーション部２１２は、高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する。高周波空間インターポレーション部２１２は、具体的に、第１インターポレーション部２３２と、振幅制限定数倍処理部２３４と、第２ハイパスフィルタリング部２３６とから構成される。 The high frequency spatial interpolation unit 212 generates a high frequency component high resolution signal by expanding the spatial resolution of the high frequency equivalent signal through inter-layer estimation processing of the high frequency component. Specifically, the high-frequency spatial interpolation unit 212 includes a first interpolation unit 232, an amplitude limit constant multiplication unit 234, and a second high-pass filtering unit 236.

第１インターポレーション部２３２は、高周波抽出フィルタリング部２１０より出力される高周波相当信号を受け、所望の解像度となるような倍率のインターポレーションを行う。所望される倍率のインターポレーションは次のように実行する。倍率γにインターポレーションされた信号（ＥＸＰＡＮＤ）_γＬ_０（ｘ）は、入力ラプラシアン成分をＬ^０（ｘ）とすると、数式５、６、７

…（数式５）

…（数式６）

…（数式７）
で与えられる。ここでｉｎｔ（）は整数部分を取り出す関数である。第１インターポレーション部２３２は、インターポレーションした信号を振幅制限定数倍処理部２３４へ出力する。ここで倍率γは、例えば、基本レイヤの解像度をＲＢ、拡張レイヤの解像度をＲＥとすると、基本レイヤの解像度から拡張レイヤの解像度に空間インターポレーションする場合には、ＲＥ／ＲＢの値に設定されるとよい。 The first interpolation unit 232 receives the high-frequency equivalent signal output from the high-frequency extraction filtering unit 210 and performs interpolation at a magnification that provides a desired resolution. The desired magnification interpolation is performed as follows. The signal (EXPAND) _γ L ₀ (x) interpolated to the magnification γ is expressed by Equations 5, 6, and 7 assuming that the input Laplacian component is L ⁰ (x).

... (Formula 5)

... (Formula 6)

... (Formula 7)
Given in. Here, int () is a function for extracting an integer part. The first interpolation unit 232 outputs the interpolated signal to the amplitude limit constant multiplication unit 234. Here, for example, when the base layer resolution is RB and the enhancement layer resolution is RE, the magnification γ is set to the RE / RB value when spatial interpolation is performed from the base layer resolution to the enhancement layer resolution. It is good to be done.

振幅制限定数倍処理部２３４は、第１インターポレーション部２３２からの信号を受け、以下に示す未知の高周波成分を推定するための工程を実施する。未知の高周波成分を推定するための工程は、第１インターポレーション部２３２から入力された信号に対して、振幅制限と定数倍処理を行うことで実現される。振幅制限定数倍処理部２３４が生成する信号Ｌ_γ（バー￣）（ｘ）は、入力される信号を（ＥＸＰＡＮＤ）_γＬ_０（ｘ）とすると、数式８

…（数式８）
で与えられる。ここで、振幅制限のためのパラメータＴおよび定数倍処理のためのパラメータα_γは、試行を繰り返すことにより実験的に求められる。なお、パラメータα_γは、拡大率に応じて可変であるが、本実施形態では、拡大率だけではなく、基本レイヤの量子化の程度にも推定精度が関わるため、量子化の程度に応じても決定される。例えば、図４で示されるように、パラメータα_γと量子化ステップとの関係を決定するための変換式、もしくは変換テーブルを予め準備しておき、量子化ステップの変動に応じてパラメータα_γの値を制御できるようにすることが望ましい。振幅制限定数倍処理部２３４は、振幅制限および定数倍処理した信号を第２ハイパスフィルタリング部２３６へ出力し、当該振幅制限および定数倍処理に用いたパラメータα_γおよびＴは第１エントロピー符号化部２２２に伝達される。 The amplitude limit constant multiplication processing unit 234 receives the signal from the first interpolation unit 232 and performs the following steps for estimating an unknown high-frequency component. The process for estimating an unknown high-frequency component is realized by performing amplitude limitation and constant multiplication processing on the signal input from the first interpolation unit 232. The signal L _γ (bar ￣) (x) generated by the amplitude limit constant multiplication unit 234 is expressed by Equation 8 assuming that the input signal is (EXPAND) _γ L ₀ (x).

... (Formula 8)
Given in. Here, the parameter alpha _gamma for parameter T and the constant magnification process for amplitude limiting, experimentally obtained by repeating the attempt. Note that the parameter α _γ is variable according to the enlargement ratio, but in this embodiment, the estimation accuracy is related not only to the enlargement ratio but also to the degree of quantization of the base layer. Is also determined. For example, as shown in FIG. 4, a conversion equation or conversion table for determining the relationship between the parameter α _γ and the quantization step is prepared in advance, and the parameter α _γ is changed according to the variation of the quantization step. It is desirable to be able to control the value. The amplitude limit constant multiplication unit 234 outputs the signal subjected to the amplitude restriction and constant multiplication processing to the second high-pass filtering unit 236, and the parameters α _γ and T used for the amplitude restriction and constant multiplication processing are the first entropy coding unit. 222 is transmitted.

第２ハイパスフィルタリング部２３６は、振幅制限定数倍処理部２３４からの信号を受け、未知の高周波成分を推定するための他の工程を実施する。未知の高周波成分を推定するための他の工程は、入力された信号から低周波数帯域成分を取り除き、本来求めようとしている高周波成分のみを得るものである。これは、入力される信号に対してハイパスフィルタリングを行うことで実現される。ハイパスフィルタリングされた信号、即ち、推定された未知の高周波成分Ｌγ（ハット＾）（ｘ）は、入力される信号をＬ_γ（バー￣）（ｘ）とすると、数式９

…（数式９）
で与えられる。ここで、Ｗ（ｉ）は、数式２に示したものである。ここでも、数式９以外の方法を用いて高周波成分を抽出することができるが、ピラミッド構成を満たすものが望ましい。第２ハイパスフィルタリング部２３６は、推定された高周波成分を第１信号合成部２２０へ出力する。 The second high-pass filtering unit 236 receives the signal from the amplitude limit constant multiplication unit 234 and performs another process for estimating an unknown high-frequency component. Another process for estimating the unknown high-frequency component is to remove the low-frequency band component from the input signal and obtain only the high-frequency component originally desired. This is realized by performing high-pass filtering on the input signal. The high-pass filtered signal, that is, the estimated unknown high-frequency component Lγ (hat ^) (x) is expressed by Equation 9 when the input signal is L _γ (bar ￣) (x).

... (Formula 9)
Given in. Here, W (i) is shown in Formula 2. Again, high-frequency components can be extracted using methods other than Equation 9, but those satisfying the pyramid configuration are desirable. The second high-pass filtering unit 236 outputs the estimated high frequency component to the first signal synthesis unit 220.

差分生成部２１４は、第１ハイパスフィルタリング部２０８から出力された高周波分離信号と、高周波抽出フィルタリング部２１０が抽出した高周波相当信号との差分である差分信号を生成して差分空間インターポレーション部２１６に伝達する。 The difference generation unit 214 generates a difference signal that is a difference between the high-frequency separated signal output from the first high-pass filtering unit 208 and the high-frequency equivalent signal extracted by the high-frequency extraction filtering unit 210 to generate a difference space interpolation unit 216. To communicate.

差分空間インターポレーション部２１６は、差分生成部２１４からの差分信号の空間解像度を原映像信号と等しい解像度に拡大して差分高解像度信号を生成し、第１信号合成部２２０に伝送する。かかる空間的拡大は、上述した数式５から７を用いて実現することが可能であるが、拡大に用いるフィルタ係数や補間関数等は数式５から７に限定されない。 The differential spatial interpolation unit 216 generates a differential high resolution signal by expanding the spatial resolution of the differential signal from the differential generation unit 214 to the same resolution as the original video signal, and transmits the differential high resolution signal to the first signal synthesis unit 220. Such spatial enlargement can be realized using Equations 5 to 7 described above, but the filter coefficients and interpolation functions used for enlargement are not limited to Equations 5 to 7.

基本インターポレーション部２１８は、基本レイヤ復号部１１４で復号された基本復号映像信号をそのまま受け、その信号を所望の解像度となるように、インターポレーションを実行して基本高解像度信号を生成する。このように、符号化劣化を抑制するための所定のフィルタリング処理を介さず、基本レイヤ復号部１１４で復号された基本復号映像信号をそのまま利用して基本高解像度信号を生成しているのは、基本復号映像信号の符号化劣化を許容できる場合である。かかる場合には、復号画像中に含まれるより多くの周波数成分を利用しながら高品質な高解像度信号を得ることができる。所望される倍率のインターポレーションは次のようにして行う。倍率γにインターポレーションされた信号（ＥＸＰＡＮＤ）_γＧ_０（ｘ）とすると、数式１０

…（数式１０）
で与えられる。ここで、Ｗ_γ（ｉ）は数式６および７で示したものである。また、基本インターポレーション部２１８による空間的拡大は、上述した数式１０に限らずフィルタ係数や補間関数等が相違する他の拡大方法を用いることもできる。また、倍率γは、例えば、基本レイヤの解像度をＲＢ、拡張レイヤの解像度をＲＥとすると、基本レイヤの解像度から拡張レイヤの解像度に空間インターポレーションする場合には、ＲＥ／ＲＢの値に設定されるとよい。基本インターポレーション部２１８は、拡大した基本高解像度信号を第１信号合成部２２０へ出力する。 The basic interpolation unit 218 receives the basic decoded video signal decoded by the base layer decoding unit 114 as it is, and performs interpolation so as to generate a basic high resolution signal so that the signal has a desired resolution. . As described above, the basic high-resolution signal is generated by using the basic decoded video signal decoded by the base layer decoding unit 114 as it is without using the predetermined filtering process for suppressing the encoding deterioration. This is a case where encoding degradation of the basic decoded video signal can be tolerated. In such a case, a high-quality high-resolution signal can be obtained while using more frequency components included in the decoded image. The desired magnification interpolation is performed as follows. Assuming that the signal (EXPAND) _γ G ₀ (x) interpolated to the magnification γ is given by Equation 10

... (Formula 10)
Given in. Here, W _γ (i) is represented by Equations 6 and 7. Further, the spatial enlargement by the basic interpolation unit 218 is not limited to the above-described Expression 10, and other enlargement methods having different filter coefficients, interpolation functions, and the like can also be used. Further, for example, when the base layer resolution is RB and the enhancement layer resolution is RE, the scaling factor γ is set to the value of RE / RB when spatial interpolation is performed from the base layer resolution to the enhancement layer resolution. It is good to be done. The basic interpolation unit 218 outputs the enlarged basic high resolution signal to the first signal synthesis unit 220.

第１信号合成部（信号合成部）２２０は、第２ハイパスフィルタリング部２３６からの高周波成分高解像度信号と、差分空間インターポレーション部２１６からの差分高解像度信号と、基本インターポレーション部２１８からの基本高解像度信号とを合成して予測映像信号を生成する。 The first signal synthesis unit (signal synthesis unit) 220 receives the high-frequency component high-resolution signal from the second high-pass filtering unit 236, the differential high-resolution signal from the differential spatial interpolation unit 216, and the basic interpolation unit 218. Are combined with the basic high resolution signal to generate a predicted video signal.

第１エントロピー符号化部２２２は、高周波抽出フィルタリング部２１０からのパラメータａ_ｋおよびεと、振幅制限定数倍処理部２３４からのパラメータα_γおよびＴとをエントロピー符号化して符号化パラメータとしてのビットストリームを生成し、多重化部１２０に出力する。 The first entropy encoding unit 222 entropy-encodes the parameters a _k and ε from the high-frequency extraction filtering unit 210 and the parameters α _γ and T from the amplitude limit constant multiplication processing unit 234 to generate a bit stream as an encoding parameter. And output to the multiplexing unit 120.

このようにして、空間インターポレーション部１１６の第１信号合成部２２０は予測映像信号を生成し、拡張レイヤ符号化部１１８は、オリジナルの原映像信号とかかる予測映像信号とを用い、空間解像度間および時間の相関を利用して予測誤差信号を導出し、その予測誤差信号を符号化して拡張符号化データ（ビットストリーム）を生成する。拡張レイヤ符号化部１１８は、第１フレームメモリ２５０と、第２フレームメモリ２５２と、動き推定部２５４と、動き補償部２５６と、イントラ予測部２５８と、予測信号選択部２６０と、予測誤差信号生成部２６２と、直交変換量子化部２６４と、第２エントロピー符号化部２６６と、逆量子化逆直交変換部２６８と、第２信号合成部２７０と、デブロッキングフィルタ部２７２とを含んで構成される。 In this way, the first signal synthesis unit 220 of the spatial interpolation unit 116 generates a predicted video signal, and the enhancement layer encoding unit 118 uses the original original video signal and the predicted video signal to obtain a spatial resolution. A prediction error signal is derived using the correlation between time and time, and the prediction error signal is encoded to generate extended encoded data (bit stream). The enhancement layer encoding unit 118 includes a first frame memory 250, a second frame memory 252, a motion estimation unit 254, a motion compensation unit 256, an intra prediction unit 258, a prediction signal selection unit 260, and a prediction error signal. A configuration including a generation unit 262, an orthogonal transform quantization unit 264, a second entropy encoding unit 266, an inverse quantization inverse orthogonal transform unit 268, a second signal synthesis unit 270, and a deblocking filter unit 272 Is done.

第１フレームメモリ２５０は、原映像信号の少なくとも１ＧＯＰ（Group Of Picture）分の信号を格納する。そして、空間インターポレーション部１１６と当該拡張レイヤ符号化部１１８との同期がとれるように、予測誤差信号生成部２６２および動き推定部２５４へ対応するフレーム信号を出力する。 The first frame memory 250 stores at least one GOP (Group Of Picture) signal of the original video signal. Then, a frame signal corresponding to the prediction error signal generation unit 262 and the motion estimation unit 254 is output so that the spatial interpolation unit 116 and the enhancement layer encoding unit 118 can be synchronized.

第２フレームメモリ２５２は、デブロッキングフィルタ部２７２からの信号を少なくとも１フレーム分格納し、動き推定に必要なフレーム信号を動き推定部２５４へ、動き補償に必要なフレーム信号を動き補償部２５６へ出力する。 The second frame memory 252 stores at least one frame of the signal from the deblocking filter unit 272, and transmits a frame signal necessary for motion estimation to the motion estimation unit 254 and a frame signal necessary for motion compensation to the motion compensation unit 256. Output.

動き推定部２５４は、第１フレームメモリ２５０および第２フレームメモリ２５２からのフレーム信号を受け、Ｈ．２６４に代表される動き推定を行う。動き推定によって得られた動き情報は、動き補償部２５６および第２エントロピー符号化部２６６へ出力される。 The motion estimation unit 254 receives the frame signals from the first frame memory 250 and the second frame memory 252, Motion estimation represented by H.264 is performed. The motion information obtained by the motion estimation is output to the motion compensation unit 256 and the second entropy coding unit 266.

動き補償部２５６は、第２フレームメモリ２５２からのフレーム信号および動き推定部２５４からの動き情報を受け、Ｈ．２６４に代表される動き補償を行う。動き補償によって得られた信号は予測信号選択部２６０へ出力される。 The motion compensation unit 256 receives the frame signal from the second frame memory 252 and the motion information from the motion estimation unit 254, and Motion compensation represented by H.264 is performed. The signal obtained by the motion compensation is output to the prediction signal selection unit 260.

イントラ予測部２５８は、第２信号合成部２７０で合成された信号を受け、Ｈ．２６４に代表されるイントラ予測を行う。イントラ予測して得られた信号は予測信号選択部２６０へ出力される。 The intra prediction unit 258 receives the signal synthesized by the second signal synthesis unit 270 and receives the H.264 signal. Intra prediction represented by H.264 is performed. A signal obtained by intra prediction is output to the prediction signal selection unit 260.

予測信号選択部２６０は、動き補償部２５６、イントラ予測部２５８および空間インターポレーション部１１６からの信号をそれぞれ受け、いずれか１つを選択、または、それぞれの信号に重み付けを行い合成する。信号の選択や信号の合成における判断基準は任意に決定でき、例えば、符号化効率を重視する場合は、予測誤差信号の二乗平均が小さくなるように、信号を選択または合成する。予測信号選択部２６０は、選択または合成した信号を予測誤差信号生成部２６２および第２信号合成部２７０へ出力する。 The prediction signal selection unit 260 receives signals from the motion compensation unit 256, the intra prediction unit 258, and the spatial interpolation unit 116, and selects one of them, or weights and combines the signals. Determination criteria in signal selection and signal synthesis can be arbitrarily determined. For example, when importance is placed on coding efficiency, signals are selected or synthesized such that the mean square of the prediction error signal is reduced. The prediction signal selection unit 260 outputs the selected or synthesized signal to the prediction error signal generation unit 262 and the second signal synthesis unit 270.

予測誤差信号生成部２６２は、第１フレームメモリ２５０の原映像信号から、予測信号選択部２６０を介して入力される予測映像信号を差し引いて予測誤差信号を生成し、直交変換量子化部２６４へ出力する。 The prediction error signal generation unit 262 generates a prediction error signal by subtracting the prediction video signal input via the prediction signal selection unit 260 from the original video signal of the first frame memory 250, and outputs the prediction error signal to the orthogonal transform quantization unit 264. Output.

直交変換量子化部２６４は、予測誤差信号生成部２６２より出力された予測誤差信号を受け、その予測誤差信号を直交変換および量子化する。直交変換には、離散コサイン変換（ＤＣＴ）やウェーブレット等が用いられる。また、Ｈ．２６４のように、直交変換と量子化を合成した手段を採用することもできる。直交変換量子化部２６４は、直交変換および量子化した信号を、第２エントロピー符号化部２６６および逆量子化逆直交変換部２６８へ出力する。 The orthogonal transform quantization unit 264 receives the prediction error signal output from the prediction error signal generation unit 262, and orthogonally transforms and quantizes the prediction error signal. For the orthogonal transform, discrete cosine transform (DCT), wavelet, or the like is used. H. As in the case of H.264, it is possible to adopt a means that combines orthogonal transformation and quantization. Orthogonal transform quantization section 264 outputs the orthogonal transformed and quantized signal to second entropy coding section 266 and inverse quantization inverse orthogonal transform section 268.

第２エントロピー符号化部２６６は、直交変換量子化部２６４からの信号をエントロピー符号化し拡張符号化データとしてのビットストリームを生成する。こうして生成された拡張符号化データは多重化部１２０に送信される。 The second entropy encoding unit 266 performs entropy encoding on the signal from the orthogonal transform quantization unit 264 to generate a bit stream as extended encoded data. The extension encoded data generated in this way is transmitted to the multiplexing unit 120.

逆量子化逆直交変換部２６８は、直交変換および量子化された信号を受け、その信号を再度逆量子化および逆直交変換して推定信号を生成する。そして、その推定信号を第２信号合成部２７０に送信する。 The inverse quantization inverse orthogonal transform unit 268 receives the orthogonally transformed and quantized signal, and performs inverse quantization and inverse orthogonal transform on the signal again to generate an estimation signal. Then, the estimated signal is transmitted to the second signal synthesis unit 270.

第２信号合成部２７０は、予測信号選択部２６０からの予測映像信号と、逆量子化逆直交変換部２６８からの推定信号とを合成し、その結果をイントラ予測部２５８およびデブロッキングフィルタ部２７２へ出力する。 The second signal synthesis unit 270 synthesizes the prediction video signal from the prediction signal selection unit 260 and the estimation signal from the inverse quantization inverse orthogonal transformation unit 268, and the result is an intra prediction unit 258 and a deblocking filter unit 272. Output to.

デブロッキングフィルタ部２７２は、第２信号合成部２７０からの信号を受け、入力された信号に対してデブロッキングフィルタ処理を行う。ここで、用いられるデブロッキングフィルタは、例えばＨ．２６４で用いられているデブロッキングフィルタを利用することができる。デブロッキングフィルタ処理した信号は第２フレームメモリ２５２に伝送される。 The deblocking filter unit 272 receives the signal from the second signal synthesis unit 270 and performs deblocking filter processing on the input signal. The deblocking filter used here is, for example, H.264. The deblocking filter used in H.264 can be used. The deblocking filtered signal is transmitted to the second frame memory 252.

図５は、上述した空間インターポレーション部１１６による映像階層符号化方法の具体的な処理を示したフローチャートである。 FIG. 5 is a flowchart showing specific processing of the video hierarchical encoding method performed by the spatial interpolation unit 116 described above.

まず、第１ハイパスフィルタリング部２０８は基本レイヤ復号部１１４で復号された基本復号映像信号の高周波成分を分離して高周波分離信号を生成し（Ｓ３００）、高周波抽出フィルタリング部２１０はε−フィルタを用いて高周波分離信号にε−フィルタ処理を施し高周波相当信号を抽出する（Ｓ３０２）。高周波空間インターポレーション部２１２は、こうして生成された高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する。詳細には、高周波抽出フィルタリング部２１０が抽出した高周波相当信号を第１インターポレーション部２３２が拡大し（Ｓ３０４）、その信号に対して振幅制限定数倍処理部２３４を用いて振幅制限および定数倍処理を施す（Ｓ３０６）、そして、第２ハイパスフィルタリング部２３６において、振幅制限および定数倍処理した信号から高周波成分を抽出し高周波成分高解像度信号を生成する（Ｓ３０８）。 First, the first high-pass filtering unit 208 generates a high-frequency separation signal by separating high-frequency components of the basic decoded video signal decoded by the base layer decoding unit 114 (S300), and the high-frequency extraction filtering unit 210 uses an ε-filter. The high frequency separation signal is subjected to ε-filter processing to extract a high frequency equivalent signal (S302). The high-frequency spatial interpolation unit 212 expands the spatial resolution of the high-frequency equivalent signal generated in this way through high-frequency component inter-layer estimation processing to generate a high-frequency component high-resolution signal. Specifically, the first interpolation unit 232 expands the high-frequency equivalent signal extracted by the high-frequency extraction filtering unit 210 (S304), and the amplitude limit and constant multiplication are performed on the signal using the amplitude limit constant multiplication unit 234. Processing is performed (S306), and the second high-pass filtering unit 236 extracts a high-frequency component from the signal subjected to amplitude limitation and constant multiplication, and generates a high-frequency component high-resolution signal (S308).

一方、差分生成部２１４は、高周波分離信号と高周波相当信号との差分である差分信号を生成し（Ｓ３１０）、差分空間インターポレーション部２１６は、差分信号の空間解像度を拡大して差分高解像度信号を生成する（Ｓ３１２）。また、基本インターポレーション部２１８によって基本レイヤ復号部１１４で復号された基本復号映像信号の空間解像度がそのまま別途拡大され、基本高解像度信号が生成される（Ｓ３１４）。ここで、図５に示したａ、ｂ、ｃの処理は、どの順に遂行してもよく、処理能力の許す範囲で並行処理されてもよい。 On the other hand, the difference generation unit 214 generates a difference signal that is the difference between the high-frequency separation signal and the high-frequency equivalent signal (S310), and the difference space interpolation unit 216 expands the spatial resolution of the difference signal to increase the difference high resolution. A signal is generated (S312). In addition, the spatial resolution of the basic decoded video signal decoded by the basic layer decoding unit 114 by the basic interpolation unit 218 is separately expanded as it is to generate a basic high resolution signal (S314). Here, the processes a, b, and c shown in FIG. 5 may be performed in any order, and may be performed in parallel as long as the processing capability permits.

そして、第１信号合成部２２０は、第２ハイパスフィルタリング部２３６からの高周波成分高解像度信号と、差分空間インターポレーション部２１６からの差分高解像度信号と、基本インターポレーション部２１８からの基本高解像度信号とを合成して予測映像信号を生成する（Ｓ３１６）。また、高周波抽出フィルタリング部２１０および振幅制限定数倍処理部２３４で用いたパラメータα_γおよびＴを符号化する必要があれば、第１エントロピー符号化部２２２においてそれぞれ符号化する（Ｓ３１８）。ここで必要があればとしたのは、予め映像階層符号化装置１００と映像階層復号装置との間でパラメータの取り決めが為されていれば符号化の必要がないからである。 Then, the first signal synthesis unit 220 receives the high-frequency component high-resolution signal from the second high-pass filtering unit 236, the differential high-resolution signal from the differential spatial interpolation unit 216, and the basic high-frequency from the basic interpolation unit 218. The predicted video signal is generated by combining the resolution signal (S316). Further, if it is necessary to encode the parameters α _γ and T used in the high frequency extraction filtering unit 210 and the amplitude limit constant multiplication unit 234, the first entropy encoding unit 222 encodes them (S318). The reason why it is necessary here is that there is no need for encoding if the parameters are determined in advance between the video layer encoding device 100 and the video layer decoding device.

図６は、上述した拡張レイヤ符号化部１１８による映像階層符号化方法の具体的な処理を示したフローチャートである。 FIG. 6 is a flowchart showing specific processing of the video layer encoding method performed by the enhancement layer encoding unit 118 described above.

まず、イントラ予測部２５８を用いてイントラ予測を行い、イントラ予測した信号を予測信号選択部２６０に送信する（Ｓ３５０）。一方、動き推定部２５４および動き補償部２５６を用いて、動き推定および動き補償（動き補償予測）を行い、動き補償予測した信号も予測信号選択部２６０へ送信する（Ｓ３５２）。また、同様に空間インターポレーション部１１６が予測した予測映像信号も予測信号選択部２６０に送信される（Ｓ３５４）。ここで、図６に示したａ、ｂ、ｃの処理は、どの順に遂行してもよく、処理能力の許す範囲で並行処理されてもよい。 First, intra prediction is performed using the intra prediction unit 258, and the intra-predicted signal is transmitted to the prediction signal selection unit 260 (S350). On the other hand, motion estimation and motion compensation (motion compensation prediction) are performed using the motion estimation unit 254 and the motion compensation unit 256, and a motion compensation predicted signal is also transmitted to the prediction signal selection unit 260 (S352). Similarly, the predicted video signal predicted by the spatial interpolation unit 116 is also transmitted to the predicted signal selection unit 260 (S354). Here, the processes a, b, and c shown in FIG. 6 may be performed in any order, and may be performed in parallel as long as the processing capability permits.

予測信号選択部２６０において、イントラ予測した信号、動き補償予測した信号および高解像度推定信号のいずれかひとつが選択、または、それぞれの信号に重み付けを行って合成し（Ｓ３５６）、選択、または、合成された予測映像信号を、第１フレームメモリ２５０からの原映像信号から差し引いて予測誤差信号を生成する（Ｓ３５８）。次に、予測誤差信号に対して直交変換量子化部２６４を用いて直交変換および量子化する（Ｓ３６０）。直交変換および量子化した信号および動き情報に対して第２エントロピー符号化部２６６を用いてエントロピー符号化する（Ｓ３６２）。 The prediction signal selection unit 260 selects any one of the intra-predicted signal, the motion-compensated prediction signal, and the high-resolution estimation signal, or synthesizes each signal by weighting (S356). The predicted video signal is subtracted from the original video signal from the first frame memory 250 to generate a prediction error signal (S358). Next, orthogonal transformation and quantization are performed on the prediction error signal using the orthogonal transformation quantization unit 264 (S360). The second entropy encoding unit 266 performs entropy encoding on the orthogonally transformed and quantized signal and motion information (S362).

ここで、符号化対象の信号が全て符号化されたかどうか判断され（Ｓ３６４）、符号化されていればここで処理は終了する。全て符号化されていなければ、現在符号化している信号を他の信号の符号化時に参照させるため復号およびデブロッキング処理する。詳細には、直交変換および量子化した信号を逆量子化逆直交変換部２６８において逆量子化および逆直交変換し（Ｓ３６６）、逆量子化および逆直交変換した信号に対して第２信号合成部２７０を用いて、予測映像信号と合成し、復号信号を生成、その復号信号をイントラ予測部２５８およびデブロッキングフィルタ部２７２へ伝送する（Ｓ３６８）。最後に、デブロッキングフィルタ部２７２においてデブロッキングフィルタ処理された信号を第２フレームメモリ２５２に格納して（Ｓ３７０）、最初から処理を繰り返す。 Here, it is determined whether or not all signals to be encoded have been encoded (S364), and if encoded, the process ends here. If all are not encoded, decoding and deblocking processing is performed to refer to the currently encoded signal when other signals are encoded. Specifically, the orthogonally transformed and quantized signal is inversely quantized and inversely orthogonally transformed by the inversely quantized inversely orthogonal transform unit 268 (S366), and the second signal synthesis unit is applied to the inversely quantized and inversely orthogonally transformed signal. 270 is combined with the predicted video signal to generate a decoded signal, and the decoded signal is transmitted to the intra prediction unit 258 and the deblocking filter unit 272 (S368). Finally, the signal subjected to the deblocking filter process in the deblocking filter unit 272 is stored in the second frame memory 252 (S370), and the process is repeated from the beginning.

以上、説明した映像階層符号化装置１００によれば、従来における映像階層符号化の階層間予測における単純な空間インターポレーション（空間的拡大）に、原映像信号の推定を伴った階層間推定処理を加えて階層間の予測誤差信号をより小さくすることで、効率的により高品位な映像階層符号化を実現することが可能となる。また、空間インターポレーションにおいて、基本符号化データを復号した基本復号映像信号（低解像度信号）の符号化劣化や階層間推定処理の特性を考慮して、基本符号化データを復号した映像信号を、推定目的となる原映像信号に相当する高周波成分とそれ以外の信号に分離しそれぞれに適した処理で原映像信号（高解像度信号）を推定するので、予測映像信号の生成をより適切に行うことができ、効率的な映像階層符号化を実現することが可能となる。 As described above, according to the video hierarchy coding apparatus 100 described above, an inter-layer estimation process involving estimation of an original video signal in simple spatial interpolation (spatial expansion) in inter-layer prediction of conventional video hierarchy coding. In addition, by reducing the prediction error signal between layers, it is possible to efficiently realize higher-quality video layer coding. In addition, in spatial interpolation, a video signal obtained by decoding basic encoded data is taken into account in consideration of encoding degradation of the basic decoded video signal (low resolution signal) obtained by decoding the basic encoded data and characteristics of inter-layer estimation processing. Since the high-frequency component corresponding to the original video signal to be estimated and the other signals are separated and the original video signal (high resolution signal) is estimated by processing suitable for each, the predicted video signal is generated more appropriately. Therefore, efficient video hierarchical coding can be realized.

（第２の実施形態：映像階層復号装置４００）
図７は、映像階層復号装置４００の概略的な構成を示した機能ブロック図である。映像階層復号装置４００は、映像階層符号化装置１００で生成された多重化データから必要な情報を取り出して、ディスプレイ等の性能に合った空間解像度の復号映像信号を出力する。映像階層復号装置４００は、エクストラクト部４１０と、基本レイヤ復号部４１２と、空間インターポレーション部４１４と、拡張レイヤ復号部４１６とを含んで構成される。 (Second Embodiment: Video Hierarchy Decoding Device 400)
FIG. 7 is a functional block diagram illustrating a schematic configuration of the video hierarchy decoding apparatus 400. The video hierarchy decoding apparatus 400 extracts necessary information from the multiplexed data generated by the video hierarchy encoding apparatus 100 and outputs a decoded video signal having a spatial resolution suitable for the performance of a display or the like. Video hierarchy decoding apparatus 400 includes an extractor 410, a base layer decoder 412, a spatial interpolation unit 414, and an enhancement layer decoder 416.

エクストラクト（分離）部４１０は、映像階層符号化装置１００で生成された多重化データ（ビットストリーム）を受信し、当該映像階層復号装置４００の性能または映像出力を受けるディスプレイ等の性能に合わせて、多重化データ全体から復号に必要なデータを切り出し、分離する。ここでは、基本符号化データを基本レイヤ復号部４１２に、拡張符号化データを拡張レイヤ復号部４１６に、符号化パラメータが付与されていれば、その符号化パラメータを空間インターポレーション部４１４に分割伝送する。 The extractor 410 receives the multiplexed data (bitstream) generated by the video layer encoding device 100 and matches the performance of the video layer decoding device 400 or the performance of a display that receives the video output. Then, data necessary for decoding is cut out from the entire multiplexed data and separated. Here, the basic encoded data is divided into the base layer decoding unit 412, the extended encoded data is divided into the enhancement layer decoding unit 416, and if an encoding parameter is given, the encoding parameter is divided into the spatial interpolation unit 414. To transmit.

基本レイヤ復号部４１２は、エクストラクト部４１０で切り出された基本符号化データを復号して基本復号映像信号（低解像度の映像出力）を生成する。かかる低解像度の映像出力は必要に応じてディスプレイ４３０に出力される。かかる復号にはＭＰＥＧ−２やＨ．２６４等の技術を適用することができる。また、時間方向のスケーラビリティやＳＮＲスケーラビリティ等を組み合わせてもよい。 The base layer decoding unit 412 decodes the basic encoded data extracted by the extractor 410 to generate a basic decoded video signal (low resolution video output). Such low-resolution video output is output to the display 430 as necessary. Such decoding includes MPEG-2 and H.264. Technologies such as H.264 can be applied. Also, scalability in the time direction, SNR scalability, and the like may be combined.

空間インターポレーション部４１４は、第１の実施形態における空間インターポレーション部１１６同様、基本レイヤ復号部４１２が復号した基本復号映像信号の空間解像度を空間的に拡大して予測映像信号を生成する。ただし、第１の実施形態における空間インターポレーション部１１６では、その中で用いたパラメータを出力していたが、本実施形態では、その出力されたパラメータを利用して予測映像信号を生成している点で相違する。 Similar to the spatial interpolation unit 116 in the first embodiment, the spatial interpolation unit 414 spatially expands the spatial resolution of the basic decoded video signal decoded by the base layer decoding unit 412 to generate a predicted video signal. . However, the spatial interpolation unit 116 in the first embodiment outputs the parameters used in the spatial interpolation unit 116, but in the present embodiment, a predicted video signal is generated using the output parameters. Is different.

かかる予測映像信号は、原映像信号の推定信号であり、本実施形態でも、基本復号映像信号を適切に高解像度化処理することで原映像信号と相関の高い予測映像信号を生成している。本実施形態において特徴的な当該空間インターポレーション部４１４の詳細な構成は後程説明する。空間インターポレーション部４１４は、生成された予測映像信号を拡張レイヤ復号部４１６に送信する。 Such a predicted video signal is an estimated signal of the original video signal, and also in the present embodiment, a predicted video signal having a high correlation with the original video signal is generated by appropriately increasing the resolution of the basic decoded video signal. A detailed configuration of the spatial interpolation unit 414 characteristic in the present embodiment will be described later. The spatial interpolation unit 414 transmits the generated predicted video signal to the enhancement layer decoding unit 416.

拡張レイヤ復号部４１６は、エクストラクト部４１０で切り出された拡張符号化データを復号し、その復号した予測誤差信号と、空間インターポレーション部４１４が予測した予測映像信号とを用いて原映像信号（高解像度の映像出力）を復元する。かかる高解像度の映像出力は必要に応じてディスプレイ４３２に出力される。かかる拡張レイヤ復号部４１６に関しても詳細な構成は後程説明する。 The enhancement layer decoding unit 416 decodes the extension encoded data cut out by the extract unit 410, and uses the decoded prediction error signal and the predicted video signal predicted by the spatial interpolation unit 414 to generate the original video signal. Restore (high resolution video output). Such high-resolution video output is output to the display 432 as necessary. A detailed configuration of the enhancement layer decoding unit 416 will be described later.

図８は、上述した映像階層復号装置４００によりオリジナルの原映像信号を復元する映像階層復号方法の具体的な処理を示したフローチャートである。 FIG. 8 is a flowchart showing specific processing of the video hierarchy decoding method for restoring the original original video signal by the video hierarchy decoding apparatus 400 described above.

まず、エクストラクト部４１０は、映像階層符号化装置１００で生成された多重化データを解析し、基本符号化データ、拡張符号化データ、符号化パラメータに分離し、各データをそれぞれ基本レイヤ復号部４１２、拡張レイヤ復号部４１６、空間インターポレーション部４１４に伝達する（Ｓ４５０）。 First, the extract unit 410 analyzes the multiplexed data generated by the video layer encoding device 100, separates the data into basic encoded data, extended encoded data, and encoding parameters, and each data is a base layer decoding unit. 412, the enhancement layer decoding unit 416 and the spatial interpolation unit 414 (S450).

基本レイヤ復号部４１２は、基本符号化データを復号して基本復号映像信号を生成する（Ｓ４５２）。ディスプレイが低解像度にしか対応していない場合、この基本復号映像信号を表示する。空間インターポレーション部４１４は、基本復号映像信号およびエクストラクト部４１０からのパラメータを用いて、基本復号映像信号の空間解像度を高周波成分の階層間推定処理を通じて拡大し予測映像信号を生成する（Ｓ４５４）。 The base layer decoding unit 412 decodes the base encoded data to generate a base decoded video signal (S452). If the display supports only low resolution, this basic decoded video signal is displayed. The spatial interpolation unit 414 uses the parameters from the basic decoded video signal and the extractor 410 to expand the spatial resolution of the basic decoded video signal through high-frequency component inter-layer estimation processing to generate a predicted video signal (S454). ).

そして、拡張レイヤ復号部４１６は、エクストラクト部４１０からの拡張符号化データによる予測誤差信号と、空間インターポレーション部４１４からの予測映像信号とを用いて原映像信号を復元する（Ｓ４５６）。こうして階層の相違する複数の映像出力（基本復号映像信号、原映像信号）が出力される。 Then, enhancement layer decoding section 416 reconstructs the original video signal using the prediction error signal based on the extension encoded data from extract section 410 and the predicted video signal from spatial interpolation section 414 (S456). In this way, a plurality of video outputs (basic decoded video signal and original video signal) having different levels are output.

（空間インターポレーション部４１４、拡張レイヤ復号部４１６）
続いて、本実施形態において特徴的な空間インターポレーション部４１４、拡張レイヤ復号部４１６を詳述する。 (Spatial interpolation unit 414, enhancement layer decoding unit 416)
Next, the spatial interpolation unit 414 and the enhancement layer decoding unit 416 that are characteristic in the present embodiment will be described in detail.

図９は、空間インターポレーション部４１４および拡張レイヤ復号部４１６の構成を示した機能ブロック図である。空間インターポレーション部４１４は、第１ハイパスフィルタリング部２０８と、高周波抽出フィルタリング部５１０と、高周波空間インターポレーション部５１２と、差分生成部２１４と、差分空間インターポレーション部２１６と、基本インターポレーション部２１８、第１信号合成部２２０と、第１エントロピー復号部５２２とを含んで構成される。また、拡張レイヤ復号部４１６は、第２フレームメモリ２５２と、動き補償部２５６と、イントラ予測部２５８と、予測信号選択部２６０と、第２エントロピー復号部５６６と、逆量子化逆直交変換部２６８と、第２信号合成部２７０と、デブロッキングフィルタ部５７２とを含んで構成される。 FIG. 9 is a functional block diagram showing configurations of the spatial interpolation unit 414 and the enhancement layer decoding unit 416. The spatial interpolation unit 414 includes a first high-pass filtering unit 208, a high frequency extraction filtering unit 510, a high frequency spatial interpolation unit 512, a difference generation unit 214, a differential space interpolation unit 216, and a basic interpolation. Configuration unit 218, first signal synthesis unit 220, and first entropy decoding unit 522. Also, the enhancement layer decoding unit 416 includes a second frame memory 252, a motion compensation unit 256, an intra prediction unit 258, a prediction signal selection unit 260, a second entropy decoding unit 566, and an inverse quantization inverse orthogonal transform unit. 268, the 2nd signal synthetic | combination part 270, and the deblocking filter part 572 are comprised.

ここで、第１の実施形態における構成要素として既に述べた、第１ハイパスフィルタリング部２０８と、差分生成部２１４と、差分空間インターポレーション部２１６と、基本インターポレーション部２１８、第１信号合成部２２０と、第２フレームメモリ２５２と、動き補償部２５６と、イントラ予測部２５８と、予測信号選択部２６０と、逆量子化逆直交変換部２６８と、第２信号合成部２７０とは、実質的に機能が同一なので重複説明を省略し、ここでは、構成が相違する高周波抽出フィルタリング部５１０と、高周波空間インターポレーション部５１２と、第１エントロピー復号部５２２と、第２エントロピー復号部５６６と、デブロッキングフィルタ部５７２とを主に説明する。 Here, the first high-pass filtering unit 208, the difference generation unit 214, the difference space interpolation unit 216, the basic interpolation unit 218, the first signal synthesis, which have already been described as the constituent elements in the first embodiment. Unit 220, second frame memory 252, motion compensation unit 256, intra prediction unit 258, prediction signal selection unit 260, inverse quantization inverse orthogonal transform unit 268, and second signal synthesis unit 270 are substantially Since the functions are the same, redundant description is omitted. Here, the high-frequency extraction filtering unit 510, the high-frequency spatial interpolation unit 512, the first entropy decoding unit 522, and the second entropy decoding unit 566, which have different configurations, The deblocking filter unit 572 will be mainly described.

高周波抽出フィルタリング部５１０は、第１の実施形態における高周波抽出フィルタリング部２１０同様、例えば、突発的変化を有する映像信号に対する雑音除去を効果的に行う非線形ディジタルフィルタであるε−フィルタで構成され、第１ハイパスフィルタリング部２０８で高周波成分が分離された高周波分離信号から、原映像信号に相当する高周波成分を抽出して高周波相当信号を生成する。 The high-frequency extraction filtering unit 510 includes, for example, an ε-filter that is a non-linear digital filter that effectively removes noise from a video signal having a sudden change, like the high-frequency extraction filtering unit 210 in the first embodiment. The high frequency component corresponding to the original video signal is extracted from the high frequency separated signal from which the high frequency component has been separated by the high pass filtering unit 208 to generate a high frequency equivalent signal.

第１の実施形態では、ε−フィルタの特性を決めるパラメータａ_ｋおよびεを第１エントロピー符号化部２２２に送信していたが、本実施形態では、そのパラメータａ_ｋおよびεを受けてε−フィルタに設定する点が相違する。かかる構成により映像階層符号化時と同じパラメータによりフィルタリングできるので、映像階層符号化時と同一の高周波相当信号を生成することが可能となる。 In the first embodiment, the parameters a _k and ε that determine the characteristics of the ε-filter are transmitted to the first entropy encoding unit 222. In the present embodiment, the parameters a _k and ε are received and ε− The difference is that it is set in the filter. With such a configuration, filtering can be performed using the same parameters as in video hierarchical encoding, so that the same high-frequency equivalent signal as in video hierarchical encoding can be generated.

高周波空間インターポレーション部５１２は、第１インターポレーション部２３２と、振幅制限定数倍処理部５３４と、第２ハイパスフィルタリング部２３６とから構成される。第１の実施形態における構成要素として既に述べた、第１インターポレーション部２３２と、第２ハイパスフィルタリング部２３６とは、実質的に機能が同一である。構成が相違する振幅制限定数倍処理部５３４に関しては、第１の実施形態においてパラメータα_γおよびＴを第１エントロピー符号化部２２２に送信していたが、本実施形態では、そのパラメータα_γおよびＴを自体のパラメータとして設定している点のみが相違している。 The high frequency spatial interpolation unit 512 includes a first interpolation unit 232, an amplitude limit constant multiplication unit 534, and a second high-pass filtering unit 236. The first interpolation unit 232 and the second high-pass filtering unit 236, which have already been described as components in the first embodiment, have substantially the same function. For the amplitude system limited number multiple processor 534 configured to differences had transmitted the parameter alpha _gamma and T to the first entropy encoding unit 222 in the first embodiment, in this embodiment, and its parameters alpha _gamma The only difference is that T is set as its own parameter.

第１エントロピー復号部５２２は、符号化パラメータからパラメータａ_ｋ、ε、α_γ、Ｔ等を復号し、パラメータａ_ｋおよびεを高周波抽出フィルタリング部５１０に、パラメータα_γおよびＴを高周波空間インターポレーション部５１２の振幅制限定数倍処理部５３４に出力する。 The first entropy decoding unit 522 decodes the parameters a _k , ε, α _γ , T and the like from the encoding parameters, converts the parameters a _k and ε to the high frequency extraction filtering unit 510, and sets the parameters α _γ and T to the high frequency spatial interpolator. Output to the amplitude limit constant multiplication processing unit 534 of the modulation unit 512.

第２エントロピー復号部５６６は、エクストラクト部４１０からの拡張符号化データを復号し予測誤差信号を生成する。こうして生成された予測誤差信号は、逆量子化逆直交変換部２６８に送信され、動き情報は動き補償部２５６に送信される。 The second entropy decoding unit 566 decodes the extended encoded data from the extract unit 410 to generate a prediction error signal. The prediction error signal generated in this way is transmitted to the inverse quantization inverse orthogonal transform unit 268, and the motion information is transmitted to the motion compensation unit 256.

デブロッキングフィルタ部５７２は、第２信号合成部２７０からの信号を受け、入力された信号に対してデブロッキングフィルタ処理を行う。ここで、用いられるデブロッキングフィルタは、例えばＨ．２６４で用いられているデブロッキングフィルタを利用することができる。デブロッキングフィルタ処理した信号は第２フレームメモリ２５２に伝送されると共に原映像信号として外部に出力される。 The deblocking filter unit 572 receives the signal from the second signal synthesis unit 270 and performs deblocking filter processing on the input signal. The deblocking filter used here is, for example, H.264. The deblocking filter used in H.264 can be used. The signal subjected to the deblocking filter process is transmitted to the second frame memory 252 and output to the outside as an original video signal.

図１０は、上述した拡張レイヤ復号部４１６による映像階層復号方法の具体的な処理を示したフローチャートである。 FIG. 10 is a flowchart showing specific processing of the video layer decoding method performed by the enhancement layer decoding unit 416 described above.

まず、第２エントロピー復号部５６６が拡張符号化データを復号して予測誤差信号を生成し（Ｓ６００）、その予測誤差信号が逆量子化逆直交変換部２６８において逆量子化および逆直交変換される（Ｓ６０２）。 First, the second entropy decoding unit 566 decodes the extended encoded data to generate a prediction error signal (S600), and the prediction error signal is subjected to inverse quantization and inverse orthogonal transform in the inverse quantization inverse orthogonal transform unit 268. (S602).

続いて、目的とするブロック（マクロブロック）が、イントラ予測、動き補償予測および予測映像信号による予測のいずれが選択されていたか、または合成されていたかを解析し（Ｓ６０４）、それに対応する処理を行う。イントラ予測が選択されていた場合、イントラ予測部２５８を用いてイントラ予測を行う（Ｓ６０６）。一方、動き補償予測が選択されていた場合には、動き補償部２５６を用いて動き補償を行う（Ｓ６０８）。また、予測映像信号による予測が選択されていた場合には、空間インターポレーション部４１４を機能させて予測映像信号を生成させる（Ｓ６１０）。予測映像信号を生成する詳細な処理動作は後ほど説明する。さらに、それぞれの信号が合成されている場合には、上述したイントラ予測、動き補償予測、予測映像信号による予測を全て実行し、重み付けを行った後合成する。 Subsequently, it is analyzed whether the target block (macroblock) has been selected or synthesized from intra prediction, motion compensated prediction, and prediction based on the predicted video signal (S604), and processing corresponding to that is analyzed. Do. When intra prediction is selected, intra prediction is performed using the intra prediction unit 258 (S606). On the other hand, when motion compensation prediction is selected, motion compensation is performed using the motion compensation unit 256 (S608). If prediction based on the predicted video signal is selected, the spatial interpolation unit 414 is caused to function to generate a predicted video signal (S610). Detailed processing operations for generating the predicted video signal will be described later. Further, when the respective signals are combined, the above-described intra prediction, motion compensation prediction, and prediction based on the predicted video signal are all performed, weighted, and then combined.

予測信号選択部２６０において、選択、または、合成された予測映像信号を、逆量子化逆直交変換部２６８からの信号と合成し復号信号を生成、その復号信号をイントラ予測部２５８およびデブロッキングフィルタ部５７２へ伝送する（Ｓ６１２）。最後に、デブロッキングフィルタ部２７２においてデブロッキングフィルタ処理された信号を原映像信号として外部に出力する（Ｓ６１４）。ここで、復号対象の拡張符号化データが全て復号されたかどうか判断され（Ｓ６１６）、復号されていればここで処理は終了する。全て符号化されていなければ、最初から処理を繰り返す。 The prediction signal selection unit 260 combines the selected or synthesized prediction video signal with the signal from the inverse quantization inverse orthogonal transform unit 268 to generate a decoded signal, and the decoded signal is converted into the intra prediction unit 258 and the deblocking filter. The data is transmitted to the unit 572 (S612). Finally, the signal subjected to the deblocking filter processing in the deblocking filter unit 272 is output to the outside as an original video signal (S614). Here, it is determined whether or not all of the extended encoded data to be decoded has been decoded (S616), and if it has been decoded, the process ends here. If not encoded, the process is repeated from the beginning.

図１１は、上述した空間インターポレーション部４１４による映像階層復号方法の具体的な処理を示したフローチャートである。 FIG. 11 is a flowchart showing specific processing of the video hierarchy decoding method by the spatial interpolation unit 414 described above.

まず、第１エントロピー復号部５２２は、必要であれば、符号化パラメータからパラメータａ_ｋ、ε、α_γ、Ｔを復号し、パラメータａ_ｋおよびεを高周波抽出フィルタリング部５１０に、パラメータα_γおよびＴを高周波空間インターポレーション部５１２の振幅制限定数倍処理部５３４に出力する（Ｓ６５０）。ここで必要があればとしたのは、予め映像階層符号化装置１００と映像階層復号装置４００との間でパラメータの取り決めが為されていれば復号の必要がないからである。 First, the first entropy decoding unit 522 decodes the parameters a _k , ε, α _γ , T from the encoding parameters, if necessary, the parameters a _k and ε to the high frequency extraction filtering unit 510, and the parameters α _γ and T is output to the amplitude limit constant multiplication unit 534 of the high-frequency spatial interpolation unit 512 (S650). The reason why it is necessary here is that it is not necessary to perform decoding if a parameter is determined in advance between the video hierarchy coding apparatus 100 and the video hierarchy decoding apparatus 400.

続いて、第１ハイパスフィルタリング部２０８は基本レイヤ復号部４１２で復号された基本復号映像信号の高周波成分を分離して高周波分離信号を生成し（Ｓ６５２）、高周波抽出フィルタリング部５１０はε−フィルタを用いて高周波分離信号にε−フィルタ処理を施し高周波相当信号を生成する（Ｓ６５４）。高周波空間インターポレーション部５１２は、こうして生成された高周波相当信号の空間解像度を、高周波成分の階層間推定処理を通じて拡大し高周波成分高解像度信号を生成する。詳細には、高周波抽出フィルタリング部５１０が抽出した高周波相当信号を第１インターポレーション部２３２が拡大し（Ｓ６５６）、その信号に対して振幅制限定数倍処理部５３４を用いて振幅制限および定数倍処理を施す（Ｓ６５８）、そして、第２ハイパスフィルタリング部２３６において、振幅制限および定数倍処理した信号から高周波成分を抽出し高周波成分高解像度信号を生成する（Ｓ６６０）。 Subsequently, the first high-pass filtering unit 208 generates a high-frequency separated signal by separating the high-frequency component of the basic decoded video signal decoded by the base layer decoding unit 412 (S652), and the high-frequency extraction filtering unit 510 performs an ε-filter. The high frequency separation signal is subjected to ε-filter processing to generate a high frequency equivalent signal (S654). The high frequency spatial interpolation unit 512 generates a high frequency component high resolution signal by expanding the spatial resolution of the high frequency equivalent signal generated in this way through high frequency component inter-layer estimation processing. Specifically, the first interpolation unit 232 expands the high frequency equivalent signal extracted by the high frequency extraction filtering unit 510 (S656), and the amplitude limit and constant multiplication are performed on the signal using the amplitude limit constant multiplication unit 534. Processing is performed (S658), and the second high-pass filtering unit 236 extracts a high frequency component from the signal subjected to the amplitude limitation and constant multiplication processing to generate a high frequency component high resolution signal (S660).

一方、差分生成部２１４は、高周波分離信号と高周波相当信号との差分である差分信号を生成し（Ｓ６６２）、差分空間インターポレーション部２１６は、差分信号の空間解像度を拡大して差分高解像度信号を生成する（Ｓ６６４）。また、基本インターポレーション部２１８によって基本レイヤ復号部４１２で復号された基本復号映像信号の空間解像度がそのまま別途拡大され、基本高解像度信号が生成される（Ｓ６６６）。ここで、図１１に示したａ、ｂ、ｃの処理は、どの順に遂行してもよく、処理能力の許す範囲で並行処理されてもよい。 On the other hand, the difference generation unit 214 generates a difference signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal (S662), and the difference space interpolation unit 216 expands the spatial resolution of the difference signal to increase the difference high resolution. A signal is generated (S664). In addition, the spatial resolution of the basic decoded video signal decoded by the basic layer decoding unit 412 by the basic interpolation unit 218 is separately expanded as it is to generate a basic high resolution signal (S666). Here, the processes a, b, and c shown in FIG. 11 may be performed in any order, and may be performed in parallel as long as the processing capability permits.

そして、第１信号合成部２２０は、第２ハイパスフィルタリング部２３６からの高周波成分高解像度信号と、差分空間インターポレーション部２１６からの差分高解像度信号と、基本インターポレーション部２１８からの基本高解像度信号とを合成して予測映像信号を生成する（Ｓ６６８）。 Then, the first signal synthesis unit 220 receives the high-frequency component high-resolution signal from the second high-pass filtering unit 236, the differential high-resolution signal from the differential spatial interpolation unit 216, and the basic high-frequency from the basic interpolation unit 218. The predicted video signal is generated by combining the resolution signal (S668).

以上、説明した映像階層復号装置４００によれば、従来における映像階層復号の階層間予測における単純な空間インターポレーション（空間的拡大）に、原映像信号の推定を伴った階層間推定処理を加えて階層間の予測誤差信号をより小さくすることで、効率的により高品位な映像階層復号を実現することが可能となる。また、空間インターポレーションにおいて、基本符号化データの符号化劣化や階層間推定処理の特性を考慮して、基本符号化データを復号した映像信号を、推定目的となる原映像信号に相当する高周波成分とそれ以外の信号に分離しそれぞれに適した処理で原映像信号（高解像度信号）を推定するので、予測映像信号の生成をより適切に行うことができ、効率的な映像階層復号を実現することが可能となる。 As described above, according to the video hierarchy decoding apparatus 400 described above, inter-layer estimation processing accompanied with estimation of the original video signal is added to simple spatial interpolation (spatial expansion) in inter-layer prediction of conventional video hierarchy decoding. Thus, by reducing the prediction error signal between layers, it is possible to efficiently realize higher-quality video layer decoding. In addition, in spatial interpolation, a video signal obtained by decoding basic encoded data in consideration of characteristics of basic encoded data and characteristics of inter-layer estimation processing is converted into a high-frequency signal corresponding to the original video signal to be estimated. Since the original video signal (high resolution signal) is estimated by separating the component and other signals and processing appropriate for each, it is possible to generate the predicted video signal more appropriately and realize efficient video hierarchy decoding It becomes possible to do.

図１２は、本実施形態における映像階層符号化装置１００による映像階層符号化および映像階層復号装置４００による映像階層復号が可能なコンピュータ（情報処理装置）７００の典型例を示した機能ブロック図である。コンピュータ７００は、中央処理装置７１０と、一時記憶装置７１２と、外部記憶装置７１４と、通信部７１６と、入力部７１８と、出力部７２０とを含んで構成される。 FIG. 12 is a functional block diagram showing a typical example of a computer (information processing apparatus) 700 capable of video hierarchy coding by the video hierarchy coding apparatus 100 and video hierarchy decoding by the video hierarchy decoding apparatus 400 in the present embodiment. . The computer 700 includes a central processing unit 710, a temporary storage device 712, an external storage device 714, a communication unit 716, an input unit 718, and an output unit 720.

中央処理装置（ＣＰＵ）７１０は、一時記憶装置７１２や外部記憶装置７１４のプログラムやアプリケーションによりコンピュータ７００全体を管理および制御する。一時記憶装置７１２は、ＲＯＭ、ＲＡＭ、ＥＥＰＲＯＭ、不揮発性ＲＡＭ等から構成され、中央処理装置７１０で処理されるプログラムや映像データ等を一時的に記憶する。外部記憶装置７１４は、フラッシュメモリ、ＨＤＤ等で構成され、中央処理装置７１０で処理されるプログラムや映像データ等を記憶する。通信部７１６は、通信回線１３０を介して様々な電子機器やサーバと接続され、それらと映像データを送受信することができる。入力部７１８は、メディア１３２からの多重化データや各種パラメータを入力できる。出力部７２０は、ディスプレイ４３２等に映像出力を出力できる。 A central processing unit (CPU) 710 manages and controls the entire computer 700 using programs and applications in the temporary storage device 712 and the external storage device 714. The temporary storage device 712 includes a ROM, a RAM, an EEPROM, a nonvolatile RAM, and the like, and temporarily stores a program processed by the central processing unit 710, video data, and the like. The external storage device 714 includes a flash memory, an HDD, and the like, and stores programs processed by the central processing unit 710, video data, and the like. The communication unit 716 is connected to various electronic devices and servers via the communication line 130, and can transmit / receive video data to / from them. The input unit 718 can input multiplexed data and various parameters from the medium 132. The output unit 720 can output video output to the display 432 or the like.

上述した映像階層符号化および映像階層復号は、中央処理装置７１０がプログラムを実行することによって為される。従って、映像階層符号化装置１００および映像階層復号装置４００が提供されると同時に、コンピュータ７００を、映像階層符号化装置１００および映像階層復号装置４００として機能させるプログラムも提供される。また、このプログラムは、記録媒体から読みとられてコンピュータに取り込まれてもよいし、通信回線１３０を介して伝送されてコンピュータに取り込まれてもよい。 The video hierarchy coding and video hierarchy decoding described above are performed by the central processing unit 710 executing a program. Therefore, at the same time that the video hierarchy encoding device 100 and the video hierarchy decoding device 400 are provided, a program that causes the computer 700 to function as the video hierarchy encoding device 100 and the video hierarchy decoding device 400 is also provided. The program may be read from a recording medium and loaded into a computer, or may be transmitted via the communication line 130 and loaded into the computer.

以上、添付図面を参照しながら本発明の好適な実施形態について説明したが、本発明はかかる実施形態に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 As mentioned above, although preferred embodiment of this invention was described referring an accompanying drawing, it cannot be overemphasized that this invention is not limited to this embodiment. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

なお、本明細書の映像階層符号化方法および映像階層復号方法における各工程は、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく、並列的あるいはサブルーチンによる処理を含んでもよい。 Note that each step in the video hierarchical encoding method and video hierarchical decoding method of the present specification does not necessarily have to be processed in time series in the order described in the flowchart, and may include processing in parallel or by a subroutine. .

本発明は、空間解像度の異なる複数の映像信号に復元可能な多重化データを生成または利用する映像階層符号化装置、映像階層符号化方法、映像階層符号化プログラム、映像階層復号装置、映像階層復号方法、および映像階層復号プログラムに利用することができる。 The present invention relates to a video hierarchical encoding device, a video hierarchical encoding method, a video hierarchical encoding program, a video hierarchical decoding device, and a video hierarchical decoding that generate or use multiplexed data that can be restored to a plurality of video signals having different spatial resolutions. This method can be used for a method and a video hierarchical decoding program.

映像階層符号化装置の概略的な構成を示した機能ブロック図である。It is the functional block diagram which showed the schematic structure of the image | video hierarchy coding apparatus. 映像階層符号化方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy encoding method. 空間インターポレーション部および拡張レイヤ符号化部の構成を示した機能ブロック図である。It is the functional block diagram which showed the structure of the space interpolation part and the enhancement layer encoding part. 量子化パラメータとの関係を示した概念図である。It is the conceptual diagram which showed the relationship with a quantization parameter. 映像階層符号化方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy encoding method. 映像階層符号化方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy encoding method. 映像階層復号装置の概略的な構成を示した機能ブロック図である。It is the functional block diagram which showed the schematic structure of the image | video hierarchy decoding apparatus. 映像階層復号方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy decoding method. 空間インターポレーション部および拡張レイヤ復号部の構成を示した機能ブロック図である。It is the functional block diagram which showed the structure of the spatial interpolation part and the enhancement layer decoding part. 映像階層復号方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy decoding method. 映像階層復号方法の具体的な処理を示したフローチャートである。It is the flowchart which showed the specific process of the image | video hierarchy decoding method. 映像階層符号化および映像階層復号が可能なコンピュータの典型例を示した機能ブロック図である。And FIG. 11 is a functional block diagram showing a typical example of a computer capable of video hierarchy coding and video hierarchy decoding.

Explanation of symbols

１００ …映像階層符号化装置
１１０ …空間デシメーション部
１１２ …基本レイヤ符号化部
１１４、４１２ …基本レイヤ復号部
１１６、４１４ …空間インターポレーション部
１１８ …拡張レイヤ符号化部
１２０ …多重化部
２０８ …第１ハイパスフィルタリング部（ハイパスフィルタリング部）
２１０、５１０ …高周波抽出フィルタリング部
２１２、５１２ …高周波空間インターポレーション部
２１４ …差分生成部
２１６ …差分空間インターポレーション部
２１８ …基本インターポレーション部
２２０ …第１信号合成部（信号合成部）
４００ …映像階層復号装置
４１０ …エクストラクト部
４１６ …拡張レイヤ復号部 DESCRIPTION OF SYMBOLS 100 ... Video | video hierarchy coding apparatus 110 ... Spatial decimation part 112 ... Base layer encoding part 114, 412 ... Base layer decoding part 116, 414 ... Spatial interpolation part 118 ... Enhancement layer encoding part 120 ... Multiplexing part 208 ... First high-pass filtering unit (high-pass filtering unit)
210, 510 ... high-frequency extraction filtering units 212, 512 ... high-frequency spatial interpolation unit 214 ... difference generation unit 216 ... difference spatial interpolation unit 218 ... basic interpolation unit 220 ... first signal synthesis unit (signal synthesis unit)
400 ... video hierarchy decoding apparatus 410 ... extract unit 416 ... enhancement layer decoding unit

Claims

A video hierarchical encoding device that generates multiplexed data that can be restored to a plurality of video signals having different spatial resolutions,
A spatial decimation unit that generates a basic video signal by reducing the spatial resolution of the original video signal;
A base layer encoding unit that encodes the basic video signal to generate basic encoded data;
A base layer decoding unit that decodes the basic encoded data to generate a basic decoded video signal;
A spatial interpolation unit that generates a predicted video signal by expanding a spatial resolution of the basic decoded video signal;
An enhancement layer encoding unit that encodes a prediction error signal derived using the original video signal and the predicted video signal to generate extended encoded data;
A multiplexing unit that multiplexes the basic encoded data and the extended encoded data to generate the multiplexed data;
With
The spatial interpolation unit is
A high-pass filtering unit for separating a high-frequency component from the basic decoded video signal to generate a high-frequency separated signal;
A high frequency extraction filtering unit that extracts a high frequency component corresponding to the original video signal from the high frequency separation signal to generate a high frequency equivalent signal;
A high-frequency spatial interpolation unit that expands the spatial resolution of the high-frequency equivalent signal through high-frequency component inter-layer estimation processing and generates a high-frequency component high-resolution signal;
A difference generation unit that generates a difference signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal;
A differential spatial interpolation unit that generates a differential high resolution signal by enlarging the spatial resolution of the differential signal;
A basic interpolation unit for generating a basic high-resolution signal by expanding a spatial resolution of the basic decoded video signal;
A signal synthesizer that synthesizes the high-frequency component high-resolution signal, the differential high-resolution signal, and the basic high-resolution signal to generate the predicted video signal;
A video hierarchical encoding device comprising:

The high-frequency extraction filtering unit is an ε-filter that is a non-linear digital filter that performs noise removal on a video signal having a sudden change,
The video hierarchy encoding apparatus according to claim 1, wherein the multiplexing unit also multiplexes a coefficient sequence _ak and a threshold value ε which are parameters of the ε-filter.

A video hierarchical encoding method for generating multiplexed data that can be restored to a plurality of video signals having different spatial resolutions,
Generate the basic video signal by reducing the spatial resolution of the original video signal,
Encode the basic video signal to generate basic encoded data,
Decoding the basic encoded data to generate a basic decoded video signal;
Separating a high frequency component from the basic decoded video signal to generate a high frequency separation signal;
Extracting a high frequency component corresponding to the original video signal from the high frequency separation signal to generate a high frequency equivalent signal;
The spatial resolution of the high-frequency equivalent signal is expanded through high-frequency component inter-layer estimation processing to generate a high-frequency component high-resolution signal,
Generating a differential signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal;
Enlarging the spatial resolution of the differential signal to generate a differential high resolution signal,
Expanding the spatial resolution of the basic decoded video signal to generate a basic high-resolution signal;
Combining the high-frequency component high-resolution signal, the differential high-resolution signal, and the basic high-resolution signal to generate a predicted video signal;
Encoding the prediction error signal derived using the original video signal and the predicted video signal to generate extended encoded data;
A video layer encoding method, wherein the basic encoded data and the extended encoded data are multiplexed to generate the multiplexed data.

Computer
A video hierarchical encoding program that functions as a video hierarchical encoding device that generates multiplexed data that can be restored to a plurality of video signals having different spatial resolutions,
The computer,
A spatial decimation unit that generates a basic video signal by reducing the spatial resolution of the original video signal;
A base layer encoding unit that encodes the basic video signal to generate basic encoded data;
A base layer decoding unit that decodes the basic encoded data to generate a basic decoded video signal;
A high-pass filtering unit that generates a high-frequency separation signal by separating high-frequency components from the basic decoded video signal, and a high-frequency extraction filtering unit that generates a high-frequency equivalent signal by extracting high-frequency components corresponding to the original video signal from the high-frequency separation signal A high-frequency spatial interpolation unit that generates a high-frequency component high-resolution signal by expanding the spatial resolution of the high-frequency equivalent signal through inter-layer estimation processing of high-frequency components; a difference that is a difference between the high-frequency separated signal and the high-frequency equivalent signal A differential generation unit that generates a signal, a differential spatial interpolation unit that generates a differential high resolution signal by expanding the spatial resolution of the differential signal, and generates a basic high resolution signal by expanding the spatial resolution of the basic decoded video signal Basic interpolation unit, and high frequency component high resolution signal and differential high resolution A spatial interpolation unit comprising a signal combining unit, for generating the prediction image signal by synthesizing the signal and the basic high-resolution signal,
An enhancement layer encoding unit that encodes a prediction error signal derived using the original video signal and the predicted video signal to generate extended encoded data;
A multiplexing unit that multiplexes the basic encoded data and the extended encoded data to generate the multiplexed data;
A video hierarchical coding program characterized by being made to function.

A video hierarchical decoding apparatus capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability,
An extract unit for separating a plurality of pieces of encoded data having different spatial resolutions including at least basic encoded data and extended encoded data from the multiplexed data;
A base layer decoding unit that decodes the basic encoded data to generate a basic decoded video signal;
A spatial interpolation unit that generates a predicted video signal by expanding a spatial resolution of the basic decoded video signal;
An enhancement layer decoding unit that restores an original video signal using a prediction error signal obtained by decoding the extension encoded data and the prediction video signal;
With
The spatial interpolation unit is
A high-pass filtering unit for separating a high-frequency component from the basic decoded video signal to generate a high-frequency separated signal;
A high frequency extraction filtering unit that extracts a high frequency component corresponding to the original video signal from the high frequency separation signal to generate a high frequency equivalent signal;
A high-frequency spatial interpolation unit that expands the spatial resolution of the high-frequency equivalent signal through high-frequency component inter-layer estimation processing and generates a high-frequency component high-resolution signal;
A difference generation unit that generates a difference signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal;
A differential spatial interpolation unit that generates a differential high resolution signal by enlarging the spatial resolution of the differential signal;
A basic interpolation unit for generating a basic high-resolution signal by expanding a spatial resolution of the basic decoded video signal;
A signal synthesizer that synthesizes the high-frequency component high-resolution signal, the differential high-resolution signal, and the basic high-resolution signal to generate the predicted video signal;
A video hierarchical decoding device comprising:

The high-frequency extraction filtering unit is an ε-filter that is a non-linear digital filter that performs noise removal on a video signal having a sudden change,
6. The video hierarchical decoding apparatus according to claim 5, wherein the multiplexed data includes a coefficient sequence _ak and a threshold value ε, which are parameters of the ε-filter.

A video hierarchical decoding method capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability,
Separating a plurality of encoded data having different spatial resolutions including at least basic encoded data and extended encoded data from the multiplexed data;
Decoding the basic encoded data to generate a basic decoded video signal;
Separating a high frequency component from the basic decoded video signal to generate a high frequency separation signal;
Extracting a high frequency component corresponding to the original video signal from the high frequency separation signal to generate a high frequency equivalent signal;
The spatial resolution of the high-frequency equivalent signal is expanded through high-frequency component inter-layer estimation processing to generate a high-frequency component high-resolution signal,
Generating a differential signal that is a difference between the high-frequency separation signal and the high-frequency equivalent signal;
Enlarging the spatial resolution of the differential signal to generate a differential high resolution signal,
Expanding the spatial resolution of the basic decoded video signal to generate a basic high-resolution signal;
Combining the high-frequency component high-resolution signal, the differential high-resolution signal, and the basic high-resolution signal to generate a predicted video signal;
A video hierarchical decoding method, wherein an original video signal is restored using a prediction error signal obtained by decoding the extended encoded data and the predicted video signal.

Computer
A video hierarchical decoding program for functioning as a video hierarchical decoding device capable of restoring a plurality of video signals having different spatial resolutions from multiplexed data subjected to spatial scalability,
The computer,
An extract unit for separating a plurality of encoded data having different spatial resolutions including at least basic multiplexed data and extended multiplexed data from the multiplexed data;
A base layer decoding unit that decodes the basic encoded data to generate a basic decoded video signal;
A high-pass filtering unit that generates a high-frequency separation signal by separating high-frequency components from the basic decoded video signal, and a high-frequency extraction filtering unit that generates a high-frequency equivalent signal by extracting high-frequency components corresponding to the original video signal from the high-frequency separation signal A high-frequency spatial interpolation unit that generates a high-frequency component high-resolution signal by expanding the spatial resolution of the high-frequency equivalent signal through inter-layer estimation processing of high-frequency components; a difference that is a difference between the high-frequency separated signal and the high-frequency equivalent signal A differential generation unit that generates a signal, a differential spatial interpolation unit that generates a differential high resolution signal by expanding the spatial resolution of the differential signal, and generates a basic high resolution signal by expanding the spatial resolution of the basic decoded video signal Basic interpolation unit, and the high-frequency component high-resolution signal and differential high-resolution Signal synthesizing unit which generates the prediction image signal by combining the degree signal and the basic high-resolution signal, and the spatial interpolation unit having,
An enhancement layer decoding unit that restores an original video signal using a prediction error signal obtained by decoding the extension encoded data and the prediction video signal;
And a video hierarchy decoding program characterized by the above.