JP2018032913A

JP2018032913A - Video encoder, program and method, and video decoder, program and method, and video transmission system

Info

Publication number: JP2018032913A
Application number: JP2016162234A
Authority: JP
Inventors: 中川　聰; Satoshi Nakagawa; 聰中川
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2016-08-22
Filing date: 2016-08-22
Publication date: 2018-03-01

Abstract

PROBLEM TO BE SOLVED: To reduce image deterioration and to increase encoding efficiency, when performing video encoding accompanying inter-layer prediction processing.SOLUTION: A video encoder performing hierarchical coding of images of multiple layers has: prediction means for generating a prediction image predicting an input image of a coding object layer by inter-layer prediction, with reference to at least a re-sample image performed filtering of the decoded image of other reference layer; filter factor determination means for determining an optimum filter factor set, with reference to the input image of the coded object layer, and the decoded image of the reference layer; re-sampling means for filtering the decoded image of the reference layer, by using the filter factor set; and encoding means for encoding the residual signal of the input image of the coded object layer and the prediction image.SELECTED DRAWING: Figure 1

Description

本発明は、映像符号化装置、プログラム及び方法、並びに、映像復号装置、プログラム及び方法、並びに、映像伝送システムに関し、例えば、マルチレイヤの映像情報を圧縮符号化してストリームデータとして伝送するシステムに適用し得る。 The present invention relates to a video encoding device, a program and a method, a video decoding device, a program and a method, and a video transmission system. For example, the present invention is applied to a system for compressing and encoding multi-layer video information and transmitting it as stream data. Can do.

例えば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ：以下、「ＡＶＣ」とも呼ぶ）やＨ．２６５／ＭＰＥＧ−ＨＨＥＶＣ（ＨｉｇｈＥｆｆｉｃｉｅｎｃｙＶｉｄｅｏＣｏｄｉｎｇ：以下、「ＨＥＶＣ」とも呼ぶ）等に代表される映像符号化方式による映像情報の圧縮符号化処理は、入力された対象画像を分割した処理単位ごとに、イントラ予測や動き補償予測等のインター予測を行った予測画像と、入力された対象画像との差分である予測残差信号に、離散コサイン変換等の空間変換を施した変換係数を量子化して、これをエントロピー符号化することによって高効率の映像圧縮を実現している。さらに、これらの映像符号化方式では、スケーラブル符号化拡張が利用可能であり、複数の空間解像度や、フレームレート、画像品質、ビット深度等の複数の映像表現をマルチレイヤの映像ストリームとして符号化することができる。 For example, H.M. H.264 / MPEG-4 AVC (Advanced Video Coding: hereinafter also referred to as “AVC”) and H.264 / MPEG-4 AVC. H.264 / MPEG-H HEVC (High Efficiency Video Coding: hereinafter also referred to as “HEVC”) and the like, video information compression and encoding processing is performed for each processing unit obtained by dividing an input target image. In addition, the transform coefficient obtained by performing spatial transformation such as discrete cosine transform on the prediction residual signal that is the difference between the predicted image that has been inter-predicted such as intra prediction and motion compensation prediction and the input target image is quantized. Thus, high-efficiency video compression is realized by entropy encoding this. Furthermore, in these video coding schemes, scalable coding extension can be used, and multiple video representations such as multiple spatial resolutions, frame rates, image quality, bit depth, etc. are encoded as a multi-layer video stream. be able to.

図８は、マルチレイヤの映像符号化装置の構成を示すブロック図である。また、図９は、マルチレイヤの映像復号装置の構成を示すブロック図である。 FIG. 8 is a block diagram illustrating a configuration of a multi-layer video encoding apparatus. FIG. 9 is a block diagram illustrating a configuration of a multi-layer video decoding apparatus.

マルチレイヤの映像符号化装置では、レイヤ０（基本レイヤ）の入力映像は、スケーラブル符号化拡張を用いないエンコーダ（レイヤ０エンコーダ）を用いて符号化される。そして、基本レイヤのみの映像ストリームは、スケーラブル符号化拡張を用いないデコーダ（レイヤ０デコーダ）を用いて復号される。マルチレイヤの映像符号化装置では、レイヤ１以上のレイヤ（拡張レイヤ）のエンコーダは、基本レイヤを含む他のレイヤの復号画像を参照して符号化されるように構成される。拡張レイヤのエンコーダは、他のレイヤのエンコーダで生成した復号画像（参照レイヤの復号画像）を用いてレイヤ間予測を行うことで、レイヤ間予測を用いない場合よりもさらに高効率な符号化を行う。各レイヤのエンコーダから出力されたストリームは、多重化（ＭＵＸ）されてマルチレイヤストリームとしてマルチレイヤの映像符号化装置から出力される。 In a multi-layer video encoding apparatus, an input video of layer 0 (base layer) is encoded using an encoder (layer 0 encoder) that does not use scalable encoding extension. Then, the video stream of only the base layer is decoded using a decoder (layer 0 decoder) that does not use scalable coding extension. In a multi-layer video encoding apparatus, encoders in layers 1 and higher (enhancement layers) are configured to be encoded with reference to decoded images of other layers including the base layer. An enhancement layer encoder performs inter-layer prediction using a decoded image (decoded image of a reference layer) generated by an encoder of another layer, thereby enabling more efficient encoding than when no inter-layer prediction is used. Do. The stream output from the encoder of each layer is multiplexed (MUX) and output from the multi-layer video encoding apparatus as a multi-layer stream.

マルチレイヤの復号装置では、マルチレイヤストリームから各レイヤの符号化ストリームがデマルチプレックス（ＤＥＭＵＸ）されて、各レイヤのデコーダ（レイヤ０〜レイヤｎデコーダ）に入力される。そして、各レイヤのデコーダは、対象レイヤの復号画像を生成し各レイヤの復号映像（レイヤ０〜レイヤｎ復号映像）として出力する。なお、拡張レイヤ（１以上のレイヤ）のデコーダは、他のレイヤのデコーダで生成した復号画像（参照レイヤの復号画像）を用いて、レイヤ間予測を行って復号処理を行う。 In the multi-layer decoding apparatus, the encoded stream of each layer is demultiplexed (DEMUX) from the multi-layer stream and input to the decoders (layer 0 to layer n decoder) of each layer. Then, the decoder of each layer generates a decoded image of the target layer and outputs it as a decoded video (layer 0 to layer n decoded video) of each layer. An enhancement layer (one or more layers) decoder performs decoding processing by performing inter-layer prediction using a decoded image (decoded image of a reference layer) generated by a decoder of another layer.

図１０は、従来の映像符号化装置における拡張レイヤのエンコーダの構成を示すブロック図である。図１０において、例えば、ＨＥＶＣのような符号化技術を用いる場合、レイヤｉ入力映像（符号化対象レイヤの画像）がエンコーダに入力される。入力された符号化対象レイヤの画像は、符号化ユニット等の処理単位領域ごとに分割されて差分処理部３０１に与えられる。処理単位領域ごとに分割された入力画像は、差分処理部３０１により、動き補償を伴うインター予測部３０８による予測画像、又は画面内の符号化済み画素等から予測を行うイントラ予測部３０９による予測画像との差分である予測残差信号が求められる。そして、予測残差信号は、変換量子化部３０２により、ＤＣＴ（離散コサイン変換）やＤＳＴ（離散サイン変換）されて、得られた変換係数が量子化される。エントロピー符号化部１０３では、量子化された変換係数を、可変長符号や算術符号等のようなエントロピー符号化して符号化ストリームとして出力する。なお、対象レイヤの符号化ストリームは、他のレイヤのストリームと多重化されてマルチレイヤストリームとして出力される。 FIG. 10 is a block diagram showing a configuration of an enhancement layer encoder in a conventional video encoding apparatus. In FIG. 10, for example, when an encoding technique such as HEVC is used, a layer i input video (an image of an encoding target layer) is input to the encoder. The input image of the encoding target layer is divided for each processing unit area such as an encoding unit, and is supplied to the difference processing unit 301. The input image divided for each processing unit region is predicted by the difference processing unit 301 by the inter prediction unit 308 with motion compensation or the prediction image by the intra prediction unit 309 that performs prediction based on the encoded pixels in the screen. A prediction residual signal that is a difference between the two is obtained. Then, the prediction residual signal is subjected to DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform) by the transform quantization unit 302, and the obtained transform coefficient is quantized. In the entropy encoding unit 103, the quantized transform coefficient is entropy-encoded such as a variable length code or an arithmetic code, and is output as an encoded stream. Note that the encoded stream of the target layer is multiplexed with streams of other layers and output as a multi-layer stream.

量子化された変換係数は、逆量子化逆変換部３０４で逆量子化、及び逆変換されて、加算部３０５で予測画像と加算することによって再構成画像を得る。 The quantized transform coefficient is inversely quantized and inversely transformed by the inverse quantization and inverse transform unit 304 and is added to the predicted image by the adder 305 to obtain a reconstructed image.

さらに、再構成画像は、ブロック歪を軽減するデブロッキングフィルタなどのループ内フィルタ部３０６が適用され、復号側で生成される対象レイヤの復号画像（レイヤｉ復号画像）が得られる。対象レイヤの復号画像は、後続の入力画像の符号化時のインター予測の動き補償のための参照画像として参照画像バッファ３０７に保持されると共に、他の拡張レイヤでのレイヤ間予測のための参照レイヤ画像として他の拡張レイヤのエンコーダに供給される。 Furthermore, an intra-loop filter unit 306 such as a deblocking filter that reduces block distortion is applied to the reconstructed image, and a decoded image (layer i decoded image) of the target layer generated on the decoding side is obtained. The decoded image of the target layer is held in the reference image buffer 307 as a reference image for inter prediction motion compensation at the time of encoding the subsequent input image, and is also a reference for inter-layer prediction in another enhancement layer A layer image is supplied to an encoder of another enhancement layer.

ＨＥＶＣのスケーラブル拡張であるＳＨＶＣ（ＳｃａｌａｂｌｅＨｉｇｈ-ｅｆｆｉｃｉｅｎｃｙＶｉｄｅｏＣｏｄｉｎｇ）では、参照レイヤからの復号画像（レイヤｊ復号画像）は、リサンプリング部３２０において、対象画像の空間解像度、ビット深度にリサンプリングされ、参照画像バッファ３０７に保持された対象レイヤの参照画像（レイヤｉ復号画像）と同等に参照画像番号が付与される。そして、拡張レイヤのエンコーダでは、インター予測部３０８においてインター予測を用いて予測画像を生成することでレイヤ間予測を実現している。なお、基本レイヤのエンコーダは、このレイヤ間予測部分を省いた構成と同等である。 In SHVC (Scalable High-efficiency Video Coding), which is a scalable extension of HEVC, a decoded image (layer j decoded image) from a reference layer is resampled to the spatial resolution and bit depth of the target image in the resampling unit 320. A reference image number is assigned in the same way as the reference image (layer i decoded image) of the target layer held in the reference image buffer 307. In the enhancement layer encoder, inter prediction is realized by generating a prediction image using inter prediction in the inter prediction unit 308. Note that the encoder of the base layer is equivalent to a configuration in which this inter-layer prediction part is omitted.

図１１は、従来の映像復号装置における拡張レイヤのデコーダの構成を示すブロック図である。 FIG. 11 is a block diagram showing a configuration of an enhancement layer decoder in a conventional video decoding apparatus.

映像符号化装置で生成されたマルチレイヤストリームは、デマルチプレックスされて、それぞれのレイヤのストリームが各レイヤのデコーダに入力される。 The multi-layer stream generated by the video encoding device is demultiplexed, and the stream of each layer is input to the decoder of each layer.

入力された対象レイヤの符号化ストリーム（レイヤｉストリーム）は、エントロピー復号部４０３で復号されて、ＤＣＴ等の変換係数と、符号化モード情報や動きベクトル情報が得られる。得られた変換係数は、逆量子化逆変換部３０４で逆量子化、及び逆変換された後、加算部４０５で、インター予測部４０８若しくはイントラ予測部４０９で生成された予測画像と加算することによって、符号化側と同じ再構成画像が生成される。 The input encoded stream (layer i stream) of the target layer is decoded by the entropy decoding unit 403 to obtain transform coefficients such as DCT, encoding mode information, and motion vector information. The obtained transform coefficient is inversely quantized and inversely transformed by the inverse quantization inverse transform unit 304 and then added to the prediction image generated by the inter prediction unit 408 or the intra prediction unit 409 by the addition unit 405. Thus, the same reconstructed image as that on the encoding side is generated.

さらに、再構成画像は、デブロッキングフィルタなどのループ内フィルタ部４０６が適用され、対象レイヤの復号画像（レイヤｉ復号画像）として他の拡張レイヤのデコーダに供給されると共に、後続のインター予測のための参照画像として参照画像バッファ４０７に保持される。 Further, the intra-loop filter unit 406 such as a deblocking filter is applied to the reconstructed image, and the reconstructed image is supplied as a decoded image of the target layer (layer i decoded image) to a decoder of another enhancement layer. Is stored in the reference image buffer 407 as a reference image.

参照レイヤからの復号画像（レイヤｊ復号画像）は、リサンプリング部４２０において、対象画像の空間解像度、ビット深度にリサンプリングされ、参照画像バッファ４０７に保持された対象レイヤの参照画像（レイヤｉ復号画像）と同等に参照画像番号が付与される。そして、拡張レイヤのエンコーダでは、インター予測部４０８においてインター予測を用いて予測画像を生成することでレイヤ間予測を実現している。なお、基本レイヤのデコーダは、このレイヤ間予測部分を省いた構成と同等である。 The decoded image (layer j decoded image) from the reference layer is resampled to the spatial resolution and bit depth of the target image by the resampling unit 420 and is stored in the reference image buffer 407 (layer i decoding). A reference image number is assigned in the same manner as (image). In the enhancement layer encoder, inter prediction is realized by generating a prediction image using inter prediction in the inter prediction unit 408. Note that the base layer decoder is equivalent to a configuration in which the inter-layer prediction portion is omitted.

以上、ＨＥＶＣでは、空間解像度等が異なり得る他のレイヤの復号画像を参照画像として用いて、レイヤ間予測を行うために、空間解像度やビット深度を変換するリサンプリング処理が行われる。 As described above, in HEVC, resampling processing for converting spatial resolution and bit depth is performed in order to perform inter-layer prediction using a decoded image of another layer whose spatial resolution or the like may be different as a reference image.

図１２、図１３は、ＨＥＶＣの符号化技術で用いられるリサンプリングフィルタのフィルタ係数を示す説明図である。図１２は輝度サンプルのリサンプリングに用いられる８タップのフィル夕係数、図１３は色差サンプルのリサンプリングに用いられる４タップのフィルタ係数を示している。 12 and 13 are explanatory diagrams showing filter coefficients of a resampling filter used in HEVC encoding technology. FIG. 12 shows an 8-tap filter coefficient used for resampling luminance samples, and FIG. 13 shows a 4-tap filter coefficient used for resampling color difference samples.

ＨＥＶＣでは、対象レイヤの画素位置に対応する参照レイヤでの画素の位置が１／１６画素精度で算出され、参照レイヤ上での１／１６画素精度の水平方向および垂直方向の位相（整数画素位置からのずれ）に従って、フィルタ係数が選択されて、水平方向のフィルタ処理と垂直方向のフィルタ処理を行ってリサンプリング処理が行われる。 In HEVC, the pixel position in the reference layer corresponding to the pixel position of the target layer is calculated with 1/16 pixel accuracy, and the horizontal and vertical phases (integer pixel positions) with 1/16 pixel accuracy on the reference layer are calculated. The filter coefficient is selected according to the deviation from () and the resampling process is performed by performing the horizontal filter process and the vertical filter process.

例えば、輝度フィルタでは、対象画素の参照レイヤ上での１／１６画素精度の位置（Ｘ，Ｙ）に対して、整数位置（ｘＲ，ｙＲ）と位相（ｘＰ，ｙＰ）を用いて、次の（１−１）〜（１−６）式のように参照レイヤの復号画像の画素値配列ｒから、対象画素の位置のリサンプルされた画素値ｐを求める。 For example, the luminance filter uses the integer position (xR, yR) and phase (xP, yP) for the position (X, Y) of 1/16 pixel accuracy on the reference layer of the target pixel, and The resampled pixel value p at the position of the target pixel is obtained from the pixel value array r of the decoded image of the reference layer as expressed by equations (1-1) to (1-6).

ここで、ｆは、図１２のフィルタ係数（位相及びフィルタのタップ位置により定まる係数）である。ｓ_１、ｓ_２、ｏは画素値のビット深度に応じたシフト値と丸め値である。また、ｉとｊは、水平方向及垂直方向に用いるフィルタのタップ位置であり、図１２の例であれば、それぞれ０〜７の値が該当する。

Here, f is the filter coefficient of FIG. 12 (coefficient determined by the phase and the tap position of the filter). s ₁ , s ₂ , and o are a shift value and a rounded value corresponding to the bit depth of the pixel value. Further, i and j are the tap positions of the filters used in the horizontal direction and the vertical direction. In the example of FIG.

このように、ＨＥＶＣではリサンプリング処理で用いるフィルタ係数として、位相ごとに固定の係数値が用いられている。一方、このようなリサンプリング処理のフィルタ係数を適応的なものとする方法として、例えば、特許文献１で挙げる方法が提案されている。 Thus, in HEVC, a fixed coefficient value is used for each phase as a filter coefficient used in the resampling process. On the other hand, as a method for adapting filter coefficients for such resampling processing, for example, a method described in Patent Document 1 has been proposed.

特表２０１５−５３０８０５号公報Special table 2015-530805 gazette

しかしながら、従来のリサンプリング処理ではリサンプリングのフィルタ処理で用いるフィルタ係数に固定の値を用いているため、符号化対象画像に対して必ずしも最適ではないフィルタ処理が行われていた。 However, since the conventional resampling process uses a fixed value for the filter coefficient used in the resampling filter process, a filter process that is not necessarily optimal for the encoding target image has been performed.

また、これを適応的な係数とする方法については、先述の特許文献１を含め具体的な実現方法が明らかではないという課題があった。 In addition, with regard to a method for making this an adaptive coefficient, there has been a problem that a specific implementation method including Patent Document 1 described above is not clear.

そのため、レイヤ間予測処理（リンサンプリング処理）を伴う映像符号化処理及び映像復号処理を行う際に、画質劣化を低減し、かつ、符号化効率を高めることができる映像符号化装置、プログラム及び方法、並びに、映像復号装置、プログラム及び方法、並びに、映像伝送システムが望まれている。 Therefore, when performing video encoding processing and video decoding processing with inter-layer prediction processing (phosphorus sampling processing), a video encoding device, program, and method that can reduce image quality degradation and increase encoding efficiency In addition, a video decoding device, a program and method, and a video transmission system are desired.

第１の本発明は、複数レイヤの画像を階層符号化する映像符号化装置において、（１）符号化対象レイヤの入力画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成する予測手段と、（２）前記符号化対象レイヤの入力画像と、前記参照レイヤの復号画像とを参照して、最適なフィルタ係数セットを決定するフィルタ係数決定手段と、（３）前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行うリサンプリング手段と、（４）前記符号化対象レイヤの入力画像と前記予測画像との残差信号を符号化する符号化手段とを有することを特徴とする。 According to a first aspect of the present invention, in a video encoding apparatus that hierarchically encodes images of a plurality of layers, (1) a predicted image obtained by predicting an input image of an encoding target layer is at least a decoded image of another reference layer The resampled image that has been subjected to the filtering process in reference to the prediction unit that generates by inter-layer prediction, and (2) the input image of the encoding target layer and the decoded image of the reference layer Filter coefficient determining means for determining an appropriate filter coefficient set; (3) resampling means for performing filter processing on the decoded image of the reference layer using the filter coefficient set; and (4) the encoding target layer. And encoding means for encoding a residual signal between the input image and the prediction image.

第２の本発明は、階層符号化された符号化データを復号する映像復号装置において、（１）復号対象レイヤの対象画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成する予測手段と、（２）階層符号化された符号化データに含まれるフィルタ係数セットを復号する復号手段と、（３）前記復号手段により復号された前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行うリサンプリング手段とを有することを特徴とする。 According to a second aspect of the present invention, in a video decoding apparatus for decoding hierarchically encoded data, (1) a predicted image obtained by predicting a target image of a decoding target layer is at least a decoded image of another reference layer (2) decoding means for decoding the filter coefficient set included in the encoded data that has been hierarchically encoded, (3) And resampling means for performing filter processing on the decoded image of the reference layer using the filter coefficient set decoded by the decoding means.

第３の本発明の映像符号化プログラムは、複数レイヤの画像を階層符号化する映像符号化装置に搭載されるコンピュータを、（１）符号化対象レイヤの入力画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成する予測手段と、（２）前記符号化対象レイヤの入力画像と、前記参照レイヤの復号画像とを参照して、最適なフィルタ係数セットを決定するフィルタ係数決定手段と、（３）前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行うリサンプリング手段と、（４）前記符号化対象レイヤの入力画像と前記予測画像との残差信号を符号化する符号化手段として機能させることを特徴とする。 A video encoding program according to a third aspect of the present invention provides a computer mounted in a video encoding apparatus that hierarchically encodes images of a plurality of layers, (1) a predicted image obtained by predicting an input image of an encoding target layer, at least Prediction means for generating by inter-layer prediction with reference to a resampled image obtained by filtering the decoded image of another reference layer, (2) an input image of the encoding target layer, and the reference layer And (3) resampling means for performing filter processing on the decoded image of the reference layer using the filter coefficient set. And (4) functioning as encoding means for encoding a residual signal between the input image of the encoding target layer and the predicted image.

第４の本発明の映像復号プログラムは、階層符号化された符号化データを復号する映像復号装置に搭載されるコンピュータを、（１）復号対象レイヤの対象画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成する予測手段と、（２）階層符号化された符号化データに含まれるフィルタ係数セットを復号する復号手段と、（３）前記復号手段により復号された前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行うリサンプリング手段として機能させることを特徴とする。 According to a fourth aspect of the present invention, there is provided a video decoding program comprising: a computer mounted in a video decoding apparatus that decodes hierarchically encoded data; (1) a prediction image obtained by predicting a target image of a decoding target layer; Prediction means for generating by inter-layer prediction with reference to a resampled image obtained by filtering the decoded image of another reference layer, and (2) a filter coefficient set included in hierarchically encoded data And (3) using the filter coefficient set decoded by the decoding means as a resampling means for performing a filtering process on the decoded image of the reference layer.

第５の本発明は、複数レイヤの画像を階層符号化する映像符号化装置が行う映像符号化方法において、予測手段、フィルタ係数決定手段、リサンプリング手段、及び符号化手段を備え、（１）前記予測手段は、符号化対象レイヤの入力画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成し、（２）前記フィルタ係数決定手段は、前記符号化対象レイヤの入力画像と、前記参照レイヤの復号画像とを参照して、最適なフィルタ係数セットを決定し、（３）前記リサンプリング手段は、前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行い、（４）前記符号化手段は、前記符号化対象レイヤの入力画像と前記予測画像との残差信号を符号化することを特徴とする。 According to a fifth aspect of the present invention, there is provided a video encoding method performed by a video encoding apparatus that hierarchically encodes images of a plurality of layers, comprising: prediction means, filter coefficient determination means, resampling means, and encoding means; (1) The prediction means generates a prediction image obtained by predicting an input image of the encoding target layer by inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer. (2) The filter coefficient determining means determines an optimum filter coefficient set with reference to the input image of the encoding target layer and the decoded image of the reference layer, and (3) the resampling means And (4) the encoding unit performs a filtering process on the decoded image of the reference layer using the filter coefficient set; Wherein the encoding the residual signal between the prediction image.

第６の本発明は、階層符号化された符号化データを復号する映像復号装置が行う映像復号方法において、予測手段、復号手段、及びリサンプリング手段を備え、（１）前記予測手段は、復号対象レイヤの対象画像を予測した予測画像を、少なくとも、他の参照レイヤの復号画像に対してフィルタ処理を行ったリサンプル画像を参照して、レイヤ間予測により生成し、（２）前記復号手段は、階層符号化された符号化データに含まれるフィルタ係数セットを復号し、（３）前記リサンプリング手段は、前記復号手段により復号された前記フィルタ係数セットを用いて、前記参照レイヤの復号画像に対してフィルタ処理を行うことを特徴とする。 According to a sixth aspect of the present invention, in the video decoding method performed by the video decoding apparatus for decoding the hierarchically encoded data, the video decoding device includes a prediction unit, a decoding unit, and a resampling unit. (1) A prediction image obtained by predicting a target image of the target layer is generated by inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer; (2) the decoding unit (3) The resampling means uses the filter coefficient set decoded by the decoding means to decode the decoded image of the reference layer. A filtering process is performed on.

第７の本発明の映像伝送システムは、複数レイヤの画像を階層符号化する映像符号化装置と、前記映像符号化装置により階層符号化された符号化データを復号する映像復号装置とを備える映像伝送システムにおいて、前記映像符号化装置として第１の本発明の映像符号化装置を適用し、前記映像復号装置として第２の本発明の映像復号装置を適用したことを特徴とする。 A video transmission system according to a seventh aspect of the present invention is a video comprising: a video encoding device that hierarchically encodes images of a plurality of layers; and a video decoding device that decodes encoded data hierarchically encoded by the video encoding device. In the transmission system, the video encoding device of the first invention is applied as the video encoding device, and the video decoding device of the second invention is applied as the video decoding device.

本発明によれば、レイヤ間予測処理（リンサンプリング処理）を伴う映像符号化処理及び映像復号処理を行う際に、画質劣化を低減し、かつ、符号化効率を高めることができる。 ADVANTAGE OF THE INVENTION According to this invention, when performing the video encoding process and video decoding process accompanying an inter-layer prediction process (phosphorus sampling process), image quality degradation can be reduced and encoding efficiency can be improved.

実施形態の映像符号化装置における拡張レイヤのエンコーダの構成を示すブロック図である。It is a block diagram which shows the structure of the encoder of the enhancement layer in the video coding apparatus of embodiment. 実施形態の映像復号装置における拡張レイヤのデコーダの構成を示すブロック図である。It is a block diagram which shows the structure of the decoder of the enhancement layer in the video decoding apparatus of embodiment. 実施形態の映像伝送システムの全体構成について示したブロック図である。It is the block diagram shown about the whole structure of the video transmission system of embodiment. 実施形態の拡張レイヤのエンコーダにおいて、リサンプリング処理で用いるフィルタ係数セットの決定処理を示すフローチャートである。5 is a flowchart illustrating determination processing of a filter coefficient set used in resampling processing in the enhancement layer encoder according to the embodiment. 実施形態の拡張レイヤのエンコーダで用いる符号化対象画像（レイヤ間予測領域を含む）の一例を示す説明図である。It is explanatory drawing which shows an example of the encoding object image (an inter-layer prediction area | region is used) used with the encoder of the enhancement layer of embodiment. 実施形態のフィルタ係数決定部において、フィルタ係数セットを算出する一例を示すフローチャート（その１）である。It is a flowchart (the 1) which shows an example which calculates a filter coefficient set in the filter coefficient determination part of embodiment. 実施形態のフィルタ係数決定部において、フィルタ係数セットを算出する一例を示すフローチャート（その２）である。It is a flowchart (the 2) which shows an example which calculates a filter coefficient set in the filter coefficient determination part of embodiment. マルチレイヤの映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of a multi-layer video encoding apparatus. マルチレイヤの映像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video decoding apparatus of a multilayer. 従来の映像符号化装置における拡張レイヤのエンコーダの構成を示すブロック図である。It is a block diagram which shows the structure of the encoder of the enhancement layer in the conventional video coding apparatus. 従来の映像復号装置における拡張レイヤのデコーダの構成を示すブロック図である。It is a block diagram which shows the structure of the decoder of the enhancement layer in the conventional video decoding apparatus. ＨＥＶＣの符号化技術で用いられるリサンプリングフィルタのフィルタ係数を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the filter coefficient of the resampling filter used with the encoding technique of HEVC. ＨＥＶＣの符号化技術で用いられるリサンプリングフィルタのフィルタ係数を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the filter coefficient of the resampling filter used with the encoding technique of HEVC.

（Ａ）主たる実施形態
以下、本発明による映像符号化装置、プログラム及び方法、並びに、映像復号装置、プログラム及び方法、並びに、映像伝送システムの実施形態を、図面を参照しながら詳述する。 (A) Main Embodiments Hereinafter, embodiments of a video encoding device, a program and a method, a video decoding device, a program and a method, and a video transmission system according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）実施形態の構成
（Ａ−１−１）全体構成
図３は、この実施形態の映像伝送システム１の全体構成を示すブロック図である。 (A-1) Configuration of Embodiment (A-1-1) Overall Configuration FIG. 3 is a block diagram showing the overall configuration of the video transmission system 1 of this embodiment.

図３に示す映像伝送システム１では、符号化対象映像（入力映像）を構成する符号化対象画像（入力画像）を階層（レイヤ）毎に符号化して、レイヤ毎の符号化ストリームを多重化したマルチレイヤストリームを出力する映像符号化装置１０と、マルチレイヤストリームを複数のレイヤ毎の符号化ストリームに戻し、レイヤ毎の符号化ストリームを復号して、復号映像を構成する復号画像を得る映像復号装置２０が配置されている。 In the video transmission system 1 shown in FIG. 3, the encoding target image (input image) constituting the encoding target video (input video) is encoded for each layer (layer), and the encoded stream for each layer is multiplexed. Video encoding apparatus 10 that outputs a multi-layer stream, and video decoding that returns the multi-layer stream to an encoded stream for each of a plurality of layers, decodes the encoded stream for each layer, and obtains a decoded image that constitutes the decoded video A device 20 is arranged.

この実施形態では、映像符号化装置１０は、階層的符号化と呼ばれる符号化方式を採用し、基本レイヤのエンコーダと、１又は２以上の拡張レイヤのエンコーダとから構成される。また、映像復号装置２０も、同様に、基本レイヤのデコーダと、１又は２以上の拡張レイヤのデコーダとから構成される。 In this embodiment, the video encoding apparatus 10 employs an encoding method called hierarchical encoding, and includes a base layer encoder and one or more enhancement layer encoders. Similarly, the video decoding device 20 includes a base layer decoder and one or more enhancement layer decoders.

映像伝送システム１では、映像符号化装置１０と映像復号装置２０との間の接続構成については限定されないものであり、例えば、ネットワークを介して通信により伝送するようにしてもよいし、オフラインでデータ供給（例えば、ＤＶＤやＨＤＤ等のデータ記録媒体で供給）するようにしてもよい。また、映像符号化装置１０で符号化したマルチレイヤストリームの内、映像復号装置２０が要求する映像品質（映像表現）となるレイヤまでが復号可能となる一部レイヤのストリームのみが伝送されるようにしてもよい。 In the video transmission system 1, the connection configuration between the video encoding device 10 and the video decoding device 20 is not limited. For example, the video transmission system 1 may transmit by communication via a network, or offline data You may make it supply (for example, supply with data recording media, such as DVD and HDD). In addition, among the multi-layer streams encoded by the video encoding device 10, only a partial layer stream that can be decoded up to a layer having the video quality (video expression) required by the video decoding device 20 is transmitted. It may be.

（Ａ−１−２）映像符号化装置（拡張レイヤのエンコーダ）の詳細な構成
図１は、映像符号化装置における拡張レイヤのエンコーダの構成を示すブロック図である。この実施形態の拡張レイヤのエンコーダは、図１に示す各構成部を搭載した専用のＩＣチップ等のハードウェアとして構成しても良いし、又は、ＣＰＵと、ＣＰＵが実行するプログラムを中心としてソフトウェア的に構成して良いが、機能的には、図１で表すことができる。 (A-1-2) Detailed Configuration of Video Encoding Device (Enhancement Layer Encoder) FIG. 1 is a block diagram showing a configuration of an enhancement layer encoder in the video encoding device. The enhancement layer encoder of this embodiment may be configured as hardware such as a dedicated IC chip on which the components shown in FIG. 1 are mounted, or may be software centered on a CPU and a program executed by the CPU. Although functionally configured, it can be functionally represented in FIG.

図１において、拡張レイヤのエンコーダ１００は、差分処理部１０１、変換量子化部１０２、エントロピー符号化部１０３、逆量子化逆変換部１０４、加算部１０５、ループ内フィルタ部１０６、参照画像バッファ１０７、インター予測部１０８、イントラ予測部１０９、切り替え部１１０、リサンプリング部１２０、フィルタ係数記憶部１３０、及びフィルタ係数決定部１３１を有する。 In FIG. 1, an enhancement layer encoder 100 includes a difference processing unit 101, a transform quantization unit 102, an entropy coding unit 103, an inverse quantization inverse transform unit 104, an addition unit 105, an in-loop filter unit 106, and a reference image buffer 107. , An inter prediction unit 108, an intra prediction unit 109, a switching unit 110, a resampling unit 120, a filter coefficient storage unit 130, and a filter coefficient determination unit 131.

拡張レイヤのエンコーダ１００は、レイヤｉに入力された映像（レイヤｉ入力映像）を所定の符号化方式で符号化して、レイヤｉの符号化ストリーム（レイヤｉストリーム）を出力するものである。また、拡張レイヤのエンコーダ１００は、従来技術の映像符号化装置の拡張レイヤのエンコーダ（先述の図１０のエンコーダ）と同様に、入力された符号化対象レイヤの画像（レイヤｉの入力画像）を、符号化ユニット等の所定の単位（例えば、所定のサイズのブロック）の領域（以下、「処理単位領域」と呼ぶ）ごとに処理を行う。この実施形態ではレイヤｉの入力画像がそれぞれ個別に入力されているように説明するが、これらは、もともとの入力画像から、ダウンサンプル等の画像処理により各レイヤに応じた画像表現がレイヤｉの入力画像として生成されるように構成可能である。 The enhancement layer encoder 100 encodes video input to layer i (layer i input video) using a predetermined encoding method, and outputs an encoded stream of layer i (layer i stream). Also, the enhancement layer encoder 100 receives the input encoding target layer image (layer i input image) in the same manner as the enhancement layer encoder (encoder of FIG. 10 described above) of the conventional video encoding apparatus. The processing is performed for each area (hereinafter referred to as “processing unit area”) of a predetermined unit (for example, a block of a predetermined size) such as an encoding unit. In this embodiment, it is described that the input images of layer i are individually input. However, from these original input images, the image representation corresponding to each layer by image processing such as down-sampling is the layer i. It can be configured to be generated as an input image.

この実施形態では、符号化方式がＨ．２６５／ＭＰＥＧ−ＨＨＥＶＣの規格化技術等を基調として拡張した符号化方式である場合を例示するが、これに限らず、類似する種々様々な符号化方式にも適用可能である。 In this embodiment, the encoding method is H.264. The case where the encoding method is expanded based on the standardization technology of H.265 / MPEG-H HEVC and the like is illustrated, but the present invention is not limited to this, and the present invention can be applied to various similar encoding methods.

差分処理部１０１は、予測残差信号を求めるために、所定の処理単位領域ごとの入力画像と、インター予測部１０８若しくはイントラ予測部１０９からの、上記処理単位領域に対応する予測画像との差分を求め、その差分を予測残差信号として変換量子化部１０２に与えるものである。 The difference processing unit 101 calculates a difference between an input image for each predetermined processing unit region and a predicted image corresponding to the processing unit region from the inter prediction unit 108 or the intra prediction unit 109 in order to obtain a prediction residual signal. And the difference is given to the transform quantization unit 102 as a prediction residual signal.

変換量子化部１０２は、入力された予測残差信号を、ＤＣＴ（離散コサイン変換）やＤＳＴ（離散サイン変換）等によって変換係数に変換し、得られた変換係数に対して量子化を行うものである。 The transform quantization unit 102 transforms the input prediction residual signal into transform coefficients by DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), etc., and quantizes the obtained transform coefficients It is.

エントロピー符号化部１０３は、符号の出現確率の偏りを圧縮するために、変換量子化部１０２からの量子化された変換係数等（符号化モード情報、動きベクトル情報等を含む）をエントロピー符号化して、当該レイヤの符号ストリームとして出力するものである。なお、後述するフィルタ係数決定部１３１で決定したフィルタ係数（フィルタ係数記憶部１３０に記憶したフィルタ係数）も、量子化された変換係数等と同様にエントロピー符号化されてマルチレイヤストリームに多重化される。 The entropy coding unit 103 entropy-encodes the quantized transform coefficients (including coding mode information, motion vector information, etc.) from the transform quantization unit 102 in order to compress the bias of the appearance probability of the code. Thus, it is output as a code stream of the layer. Note that the filter coefficient determined by the filter coefficient determining unit 131 (filter coefficient stored in the filter coefficient storage unit 130), which will be described later, is also entropy-coded and multiplexed into a multi-layer stream in the same manner as the quantized transform coefficient. The

エントロピー符号化部１０３から出力される符号化ストリーム（レイヤｉストリーム）は、他のレイヤの符号化ストリームと多重化されて、映像符号化装置１０の符号化処理結果であるマルチレイヤストリームとして出力される。また、フィルタ係数決定部１３１で決定したフィルタ係数（フィルタ係数記憶部１３０に記憶したフィルタ係数）も、符号化されて符号化結果のマルチレイヤストリームに多重化して出力される。 The encoded stream (layer i stream) output from the entropy encoding unit 103 is multiplexed with an encoded stream of another layer and output as a multi-layer stream that is an encoding processing result of the video encoding device 10. The In addition, the filter coefficient determined by the filter coefficient determination unit 131 (the filter coefficient stored in the filter coefficient storage unit 130) is also encoded and multiplexed into the encoded multi-layer stream and output.

逆量子化逆変換部１０４は、符号化信号から残差信号（残差画像）を復元するために、変換量子化部１０２からの量子化された変換係数を逆量子化及び逆変換して、加算部１１２に与えるものである。 The inverse quantization inverse transform unit 104 performs inverse quantization and inverse transform on the quantized transform coefficient from the transform quantization unit 102 in order to restore a residual signal (residual image) from the encoded signal, This is given to the adder 112.

加算部１０５は、逆量子化逆変換部１０４からの復元された残差信号に、切り替え部１１０を介してインター予測部１０８又はイントラ予測部１０９からの予測画像を加算して、再構成画像を求めるものである。加算部１１２は、再構成画像を、ループ内フィルタ部１０６及びイントラ予測部１０９に与える。 The adding unit 105 adds the prediction image from the inter prediction unit 108 or the intra prediction unit 109 to the restored residual signal from the inverse quantization and inverse transform unit 104 via the switching unit 110, thereby obtaining a reconstructed image. It is what you want. The adding unit 112 provides the reconstructed image to the in-loop filter unit 106 and the intra prediction unit 109.

ループ内フィルタ部１０６は、符号化ループ内の量子化処理によって生じる符号化歪み（例えばブロック歪、リンギング歪等）を低減するために、加算部１０５からの再構成画像を復号画像としてフィルタリングするものである。この実施形態はＨＥＶＣ等を適用する場合を例示しており、デブロッキングフィルタ等を用いて符号化歪みを低減する。ループ内フィルタ部１０６から出力される復号画像（レイヤｉ復号画像）は、インター予測部１０８での動き補償等のための参照画像として参照画像バッファ１０７に保持されると共に、他の拡張レイヤのエンコーダに供給される。 The in-loop filter unit 106 filters the reconstructed image from the adder unit 105 as a decoded image in order to reduce coding distortion (for example, block distortion, ringing distortion, etc.) caused by quantization processing in the coding loop. It is. This embodiment exemplifies a case where HEVC or the like is applied, and reduces coding distortion using a deblocking filter or the like. The decoded image (layer i decoded image) output from the in-loop filter unit 106 is held in the reference image buffer 107 as a reference image for motion compensation or the like in the inter prediction unit 108, and is also an encoder of another enhancement layer. To be supplied.

参照画像バッファ１０７は、ループ内フィルタ部１０６から出力される画像を、参照画像として保持するものである。ループ内フィルタ部１０６からの出力画像が、後続の入力画像の符号化時のインター予測の動き補償のための参照画像となる。 The reference image buffer 107 holds the image output from the in-loop filter unit 106 as a reference image. The output image from the in-loop filter unit 106 becomes a reference image for inter prediction motion compensation at the time of encoding the subsequent input image.

インター予測部１０８は、参照画像バッファ１０７に保持されている画像を参照して、インター予測を行うものである。 The inter prediction unit 108 performs inter prediction with reference to an image held in the reference image buffer 107.

また、インター予測部１０８は、参照画像バッファ１０７に保持されている画像と共に、リサンプリング部１２０によってリサンプリングされた参照レイヤの復号画像を参照してインター予測（レイヤ間予測）を行うものでもある。レイヤ間予測は、他のレイヤの同時刻の画像（リサンプリングされた参照レイヤの復号画像）を予測画像として参照できるため、より符号化効率を改善することができる。変形例として、インター予測部１０８は、リサンプリング部１２０によってリサンプリングされた参照レイヤの復号画像のみを参照してインター予測（レイヤ間予測）を行っても良い。 The inter prediction unit 108 also performs inter prediction (inter-layer prediction) with reference to the decoded image of the reference layer resampled by the resampling unit 120 together with the image held in the reference image buffer 107. . Since inter-layer prediction can refer to an image at the same time in another layer (a resampled decoded image of a reference layer) as a predicted image, the encoding efficiency can be further improved. As a modification, the inter prediction unit 108 may perform inter prediction (interlayer prediction) with reference to only the decoded image of the reference layer resampled by the resampling unit 120.

イントラ予測部１０９は、加算部１０５から出力される画像（符号化済みの再構成画素等）を用いてイントラ予測を行うものである。 The intra prediction unit 109 performs intra prediction using the image (encoded reconstruction pixel or the like) output from the addition unit 105.

切り替え部１１０は、符号化モード（インター予測モード若しくはレイヤ間予測モード、又はイントラ予測モード）に応じて、インター予測部１０８又はイントラ予測部１０９の出力（予測画像）を切り替えるものである。 The switching unit 110 switches the output (predicted image) of the inter prediction unit 108 or the intra prediction unit 109 according to the coding mode (inter prediction mode, inter-layer prediction mode, or intra prediction mode).

フィルタ係数決定部１３１は、リサンプリング部１２０で用いるフィルタ係数群（以下、「フィルタ係数セット」と呼ぶ）を決定する。フィルタ係数決定部１３１の詳細については動作の項で述べる。フィルタ係数セットは、複数のフィルタ係数により構成されている。 The filter coefficient determination unit 131 determines a filter coefficient group (hereinafter referred to as “filter coefficient set”) used in the resampling unit 120. Details of the filter coefficient determination unit 131 will be described in the operation section. The filter coefficient set is composed of a plurality of filter coefficients.

フィルタ係数記憶部１３０は、フィルタ係数決定部１３１で決定されたフィルタ係数セットを記憶するものである。 The filter coefficient storage unit 130 stores the filter coefficient set determined by the filter coefficient determination unit 131.

リサンプリング部１２０は、フィルタ係数記憶部１３０に記憶されたフィルタ係数セットを用いて、参照レイヤの復号画像（レイヤｊ復号画像）をリサンプリング（フィルター処理）するものである。リサンプリング部１２０の詳細については動作の項で述べる。 The resampling unit 120 resamples (filters) the reference layer decoded image (layer j decoded image) using the filter coefficient set stored in the filter coefficient storage unit 130. Details of the resampling unit 120 will be described in the section of operation.

ここで、この実施形態の拡張レイヤのエンコーダ１００と、先述の図１０で説明した従来の拡張レイヤのエンコーダとの違いついて簡単に述べる。拡張レイヤのエンコーダ１００のリサンプリング部１２０では、固定のフィルタ係数では無く、適宜決定したフィルタ係数セットを用いて、リサンプリング処理を行う点が従来のリサンプリング処理と異なる。また、このリサンプリング処理で用いるフィルタ係数セットを決定するフィルタ係数決定部１３１とフィルタ係数記憶部１３０を設けた点が従来の拡張レイヤのエンコーダと異なる。そして、フィルタ係数決定部１３１で決定したフィルタ係数セットはエントロピー符号化部１０３で符号化され、符号化結果のマルチレイヤストリームに多重化して出力される点が従来の拡張レイヤのエンコーダと異なる。 Here, the difference between the enhancement layer encoder 100 of this embodiment and the conventional enhancement layer encoder described above with reference to FIG. 10 will be briefly described. The resampling unit 120 of the enhancement layer encoder 100 is different from the conventional resampling process in that the resampling process is performed using a filter coefficient set determined appropriately instead of a fixed filter coefficient. Moreover, the point which provided the filter coefficient determination part 131 and the filter coefficient memory | storage part 130 which determine the filter coefficient set used by this resampling process differs from the encoder of the conventional enhancement layer. The filter coefficient set determined by the filter coefficient determination unit 131 is encoded by the entropy encoding unit 103, and is multiplexed and output in the multi-layer stream of the encoding result, which is different from the conventional enhancement layer encoder.

映像符号化装置１０に適用される基本レイヤのエンコーダ１５０は、従来技術と同様に拡張レイヤのエンコーダ１００からレイヤ間予測処理に関わる構成要素を省略したものと同等である。 The base layer encoder 150 applied to the video encoding device 10 is the same as that obtained by omitting the components related to the inter-layer prediction processing from the enhancement layer encoder 100 as in the prior art.

（Ａ−１−３）映像復号装置（拡張レイヤのデコーダ）の詳細な構成
図２は、実施形態の映像復号装置における拡張レイヤのデコーダの構成を示すブロック図である。この実施形態の拡張レイヤのデコーダは、図２に示す各構成部を搭載した専用のＩＣチップ等のハードウェアとして構成しても良いし、又は、ＣＰＵと、ＣＰＵが実行するプログラムを中心としてソフトウェア的に構成して良いが、機能的には、図２で表すことができる。 (A-1-3) Detailed Configuration of Video Decoding Device (Enhancement Layer Decoder) FIG. 2 is a block diagram illustrating a configuration of an enhancement layer decoder in the video decoding device of the embodiment. The enhancement layer decoder of this embodiment may be configured as hardware such as a dedicated IC chip on which the components shown in FIG. 2 are mounted, or software centered on a CPU and a program executed by the CPU. Although it may be configured in terms of function, it can be functionally represented in FIG.

図２において、拡張レイヤのデコーダ２００は、エントロピー復号部２０３、逆量子化逆変換部２０４、加算部２０５、ループ内フィルタ部２０６、参照画像バッファ２０７、インター予測部２０８、イントラ予測部２０９、切り替え部２１０、リサンプリング部２２０、及びフィルタ係数記憶部２３０を有する。 In FIG. 2, the enhancement layer decoder 200 includes an entropy decoding unit 203, an inverse quantization inverse transformation unit 204, an addition unit 205, an in-loop filter unit 206, a reference image buffer 207, an inter prediction unit 208, an intra prediction unit 209, and a switching. Unit 210, resampling unit 220, and filter coefficient storage unit 230.

拡張レイヤのデコーダ２００は、拡張レイヤのエンコーダ１００で符号化された拡張レイヤの符号化ストリームを、デコード（復号）処理を行って、拡張レイヤの復号画像を得るものである。なお、拡張レイヤのデコーダ２００には、映像復号装置２０に入力されたマルチレイヤストリームをデマルチプレックスしたレイヤごとの符号化ストリーム（レイヤｉストリーム）が入力される。 The enhancement layer decoder 200 decodes the enhancement layer encoded stream encoded by the enhancement layer encoder 100 to obtain an enhancement layer decoded image. The enhancement layer decoder 200 receives an encoded stream (layer i stream) for each layer obtained by demultiplexing the multi-layer stream input to the video decoding device 20.

エントロピー復号部２０３は、入力された符号化ストリームをエントロピー復号して復号データを取得する。復号データには、量子化された変換係数（予測残差信号がＤＣＴ等により変換され、さらに量子化されたデータ）が含まれている。また、符号化ストリームには、上述の通り、その他の付加情報が含まれている。付加情報には、符号化モード情報、動きベクトル情報、及びリサンプリング処理に用いるフィルタ係数セット（拡張レイヤのエンコーダ１００のフィルタ係数決定部１３１で決定されたフィルタ係数セット）等が含まれるものとする。エントロピー復号部２０３は、復号により得た、量子化された変換係数を逆量子化逆変換部２０４に供給し、リサンプリング処理で用いるフィルタ係数セットをフィルタ係数記憶部２３０に供給する。 The entropy decoding unit 203 entropy decodes the input encoded stream to obtain decoded data. The decoded data includes quantized transform coefficients (data obtained by transforming the prediction residual signal by DCT or the like and further quantizing it). Also, the encoded stream includes other additional information as described above. The additional information includes coding mode information, motion vector information, a filter coefficient set used for resampling processing (a filter coefficient set determined by the filter coefficient determination unit 131 of the enhancement layer encoder 100), and the like. . The entropy decoding unit 203 supplies the quantized transform coefficient obtained by decoding to the inverse quantization inverse transform unit 204, and supplies the filter coefficient set used in the resampling process to the filter coefficient storage unit 230.

逆量子化逆変換部２０４は、符号化信号から残差信号（残差画像）を復元するために、エントロピー復号部２０３からの量子化された変換係数を逆量子化及び逆変換して、加算部２０５に与えるものである。 The inverse quantization inverse transform unit 204 performs inverse quantization and inverse transform on the quantized transform coefficient from the entropy decoding unit 203 in order to restore a residual signal (residual image) from the encoded signal, and adds This is given to the unit 205.

加算部２０５は、逆量子化逆変換部２０４からの復元された残差信号に、切り替え部２１０を介してインター予測部２０８又はイントラ予測部２０９からの予測画像を加算して、再構成画像を求めるものである。加算部２０５は、再構成画像を、ループ内フィルタ部２０６及びイントラ予測部２０９に与える。 The adding unit 205 adds the predicted image from the inter prediction unit 208 or the intra prediction unit 209 to the reconstructed residual signal from the inverse quantization and inverse transform unit 204 via the switching unit 210 to obtain a reconstructed image. It is what you want. The adding unit 205 provides the reconstructed image to the in-loop filter unit 206 and the intra prediction unit 209.

ループ内フィルタ部２０６は、符号化ループ内の量子化処理によって生じる符号化歪み（例えばブロック歪、リンギング歪等）を低減するために、加算部２０５からの再構成画像を復号画像としてフィルタリングするものである。この実施形態はＨＥＶＣ等を適用する場合を例示しており、デブロッキングフィルタ等を用いて符号化歪みを低減する。ループ内フィルタ部２０６から復号結果として出力される復号画像（レイヤｉ復号画像）は、インター予測部２０８での動き補償等のための参照画像として参照画像バッファ２０７に保持されると共に、他の拡張レイヤのデコーダに供給される。この実施形態では、それぞれのレイヤｉ復号画像がすべて復号結果として出力されるように説明するが、これらの内、復号処理として要求されている画像品質（画像表現）のレイヤの復号画像のみが映像復号装置２０の復号結果として出力されるように構成してもよい。 The in-loop filter unit 206 filters the reconstructed image from the adder unit 205 as a decoded image in order to reduce coding distortion (for example, block distortion, ringing distortion, etc.) caused by quantization processing in the coding loop. It is. This embodiment exemplifies a case where HEVC or the like is applied, and reduces coding distortion using a deblocking filter or the like. A decoded image (layer i decoded image) output as a decoding result from the in-loop filter unit 206 is held in the reference image buffer 207 as a reference image for motion compensation or the like in the inter prediction unit 208, and other extension Supplied to the layer decoder. In this embodiment, each layer i decoded image is described as being output as a decoding result. Of these, only a decoded image of a layer having an image quality (image representation) required as a decoding process is a video. You may comprise so that it may output as a decoding result of the decoding apparatus 20. FIG.

参照画像バッファ２０７は、ループ内フィルタ部２０６から出力される画像を、参照画像として保持するものである。 The reference image buffer 207 holds the image output from the in-loop filter unit 206 as a reference image.

インター予測部２０８は、参照画像バッファ２０７に保持されている画像と、リサンプリング部２２０によってリサンプリングされた参照レイヤの復号画像を参照してインター予測を行うものである。 The inter prediction unit 208 performs inter prediction with reference to the image held in the reference image buffer 207 and the decoded image of the reference layer resampled by the resampling unit 220.

イントラ予測部２０９は、加算部２０５から出力される画像（画面内の再構成画素等）を用いてイントラ予測を行うものである。 The intra prediction unit 209 performs intra prediction using an image (such as a reconstructed pixel in the screen) output from the addition unit 205.

切り替え部２１０は、エントロピー復号部２０３により符号化ストリームから復元された符号化モード情報に応じて、インター予測部２０８又はイントラ予測部２０９の出力（予測画像）を切り替えるものである。 The switching unit 210 switches the output (predicted image) of the inter prediction unit 208 or the intra prediction unit 209 according to the encoding mode information restored from the encoded stream by the entropy decoding unit 203.

フィルタ係数記憶部２３０は、エントロピー復号部２０３から供給されたフィルタ係数セットを記憶するものである。 The filter coefficient storage unit 230 stores the filter coefficient set supplied from the entropy decoding unit 203.

リサンプリング部２２０は、フィルタ係数記憶部２３０に記憶されたフィルタ係数セットを用いて、参照レイヤの復号画像（レイヤｊ復号画像）をリサンプリングするものである。 The resampling unit 220 resamples the reference layer decoded image (layer j decoded image) using the filter coefficient set stored in the filter coefficient storage unit 230.

ここで、この実施形態の拡張レイヤのデコーダ２００と、先述の図１１で説明した従来の拡張レイヤのデコーダとの違いついて簡単に述べる。拡張レイヤのデコーダ２００のリサンプリング部２２０では、固定のフィルタ係数では無く、映像符号化装置１０からのストリームをエントロピー復号部２０３で復号して得られたフィルタ係数セットを用いて、リサンプリング部２２０でリサンプリング処理を行う点が従来のリサンプリング処理と異なる。また、このリサンプリング処理で用いるエントロピー復号部２０３で復号して得られたフィルタ係数セットを記憶するフィルタ係数記憶部２３０を設けた点が従来の拡張レイヤのデコーダと異なる。 Here, the difference between the enhancement layer decoder 200 of this embodiment and the conventional enhancement layer decoder described with reference to FIG. 11 will be briefly described. The resampling unit 220 of the enhancement layer decoder 200 uses the filter coefficient set obtained by decoding the stream from the video encoding device 10 by the entropy decoding unit 203 instead of the fixed filter coefficient, and the resampling unit 220. The resampling process is different from the conventional resampling process. Moreover, the point which provided the filter coefficient memory | storage part 230 which memorize | stores the filter coefficient set decoded by the entropy decoding part 203 used by this resampling process differs from the decoder of the conventional enhancement layer.

（Ａ−２）実施形態の動作
次に、以上のような構成を有するこの実施形態における映像伝送システム１の動作（実施形態の映像符号化方法、および映像復号方法）を説明する。 (A-2) Operation of Embodiment Next, the operation (video encoding method and video decoding method of the embodiment) of the video transmission system 1 in this embodiment having the above-described configuration will be described.

以下では、本実施形態の特徴部分である拡張レイヤのエンコーダ１００（拡張レイヤのデコーダ２００）のリサンプリングに関わる処理（リサンプリング処理及びフィルタ係数決定処理）を中心に説明する。 The following description focuses on processing related to resampling (resampling processing and filter coefficient determination processing) of the enhancement layer encoder 100 (enhancement layer decoder 200), which is a characteristic part of the present embodiment.

（Ａ−２−１）リサンプリング処理
上述の通り、リサンプリング部１２０（リサンプリング部２２０）では、フィルタ処理に用いるフィルタ係数を固定値ではなく可変値（拡張レイヤのエンコーダ１００のフィルタ係数決定部１３１で決定したフィルタ係数セット）としている。 (A-2-1) Resampling Process As described above, in the resampling unit 120 (resampling unit 220), the filter coefficient used for the filter process is not a fixed value but a variable value (a filter coefficient determination unit of the enhancement layer encoder 100). The filter coefficient set determined in 131).

すなわち、リサンプリング部１２０（リサンプリング部２２０）は、フィルタ係数記憶部１３０（フィルタ係数記憶部２３０）に保持しているフィルタ係数セットを用いて、対象レイヤ上の画素位置に対応する参照レイヤ上での１／１６画素精度の対象画素の位置（Ｘ，Ｙ）に対して、整数位置（ｘＲ，ｙＲ）と整数位置からのずれである位相（ｘＰ，ｙＰ）を用いて、例えば、以下の（２−１）〜（２−６）式のように参照レイヤの復号画像（図１及び図２の例では、レイヤｊ復号画像）の画素値配列ｒから、対象画素の位置のリサンプルされた画素値ｐを求める。 In other words, the resampling unit 120 (resampling unit 220) uses the filter coefficient set stored in the filter coefficient storage unit 130 (filter coefficient storage unit 230) to perform the processing on the reference layer corresponding to the pixel position on the target layer. For example, using the phase (xP, yP) that is a deviation from the integer position (xR, yR) and the integer position with respect to the position (X, Y) of the target pixel with 1/16 pixel accuracy at As in equations (2-1) to (2-6), the position of the target pixel is resampled from the pixel value array r of the reference layer decoded image (layer j decoded image in the examples of FIGS. 1 and 2). The obtained pixel value p is obtained.

上記の（２−１）〜（２−６）式おいて、ａ（ａ［ｘＰ，ｉ］）、ｂ、ｃ（ｃ［ｙＰ，ｊ］）、ｄは、フィルタ係数記憶部１３０（フィルタ係数記憶部２３０）に記憶されるフィルタ係数セットを構成するフィルタ係数である。なお、ｂ及びｄは、定数項である（この実施形態では、定数項であるｂ及びｄもフィルタ係数と呼ぶものとする）。また、上記の（２−１）〜（２−６）式において、ｓ_１，ｓ_２は、画素値のビット深度に応じたシフト値であり、ｏは丸め値である。さらに、Ｎはフィルタタップ数である。また、ｉ及びｊは、水平方向及び垂直方向に用いるフィルタのタップ位置を示す値である。なお、この実施形態では、参照レイヤ上での画素位置を１／１６画素精度で表現し、位相としては１６通りのフィルタ係数を用いる例を示したが、１／１６画素精度とは異なる精度（１／ｎ画素）で表現しても良い。

In the above equations (2-1) to (2-6), a (a [xP, i]), b, c (c [yP, j]), and d are filter coefficient storage units 130 (filter coefficients). Filter coefficients constituting a filter coefficient set stored in the storage unit 230). Note that b and d are constant terms (in this embodiment, constant terms b and d are also called filter coefficients). In the above equations (2-1) to (2-6), s ₁ and s ₂ are shift values according to the bit depth of the pixel value, and o is a rounded value. Further, N is the number of filter taps. I and j are values indicating the tap positions of the filters used in the horizontal and vertical directions. In this embodiment, an example in which the pixel position on the reference layer is expressed with 1/16 pixel accuracy and 16 filter coefficients are used as the phase is shown. However, an accuracy different from the 1/16 pixel accuracy ( (1 / n pixel) may be used.

フィルタタップ数Ｎは、例えば輝度成分（輝度フィルタ）に対しては８、色差成分（色差フィルタ）に対しては４とするなど色成分ごとに異なる値を用いても良い。また、フィルタタップ数Ｎは上位シンタックスで指定可能としても良い。 The filter tap number N may be a different value for each color component, for example, 8 for a luminance component (luminance filter) and 4 for a color difference component (color difference filter). In addition, the number of filter taps N may be specified with the upper syntax.

上記の（２−１）〜（２−６）式では、ｂ及びｄのような定数項を所持する式となっている。なお、ｂ及びｄは、使用するプロファイル等によって使用しないよう制限（ｂ＝０，ｄ＝０）しても良い。また、水平方向で用いるフィルタ係数ａ、ｂと垂直方向で用いるフィルタ係数ｃ、ｄを独立に設定可能としているが、同じ係数を用いるよう制限（ａ＝ｃ，ｂ＝ｄ）しても良い。これらの制限により符号化すべきフィルタ係数の個数は削減可能である。 The above equations (2-1) to (2-6) are equations having constant terms such as b and d. It should be noted that b and d may be restricted (b = 0, d = 0) from being used depending on the profile used. Further, the filter coefficients a and b used in the horizontal direction and the filter coefficients c and d used in the vertical direction can be set independently. However, the same coefficient may be used (a = c, b = d). Due to these limitations, the number of filter coefficients to be encoded can be reduced.

映像符号化装置１０（拡張レイヤのエンコーダ１００）では、フィルタ係数決定部１３１で、リサンプリングのための最適なフィルタ係数セットの設計を行い、得られたフィルタ係数をリサンプリング部１２０で用いるフィルタ係数セットとしてフィルタ係数記憶部１３０に格納すると共に、エントロピー符号化部１０３で種々の符号化方法に従って符号化ストリームに多重化する。 In the video encoding device 10 (enhancement layer encoder 100), the filter coefficient determination unit 131 designs an optimum filter coefficient set for resampling and uses the obtained filter coefficient in the resampling unit 120. While being stored in the filter coefficient storage unit 130 as a set, the entropy encoding unit 103 multiplexes it into an encoded stream according to various encoding methods.

映像復号装置２０（拡張レイヤのデコーダ２００）では、エントロピー復号部２０３で、符号化ストリームに多重化されたリサンプリング処理で用いるフィルタ係数セットを復号し、リサンプリング部２２０で用いるフィルタ係数セットとしてフィルタ係数記憶部２３０に格納する。 In the video decoding device 20 (enhancement layer decoder 200), the entropy decoding unit 203 decodes the filter coefficient set used in the resampling process multiplexed in the encoded stream, and filters as the filter coefficient set used in the resampling unit 220. Stored in the coefficient storage unit 230.

エントロピー符号化部１０３における、フィルタ係数セットの多重化方法としては、例えば、スライスヘッダやピクチャパラメーターセット等のパラメータセットのような上位シンタックスで符号化するようにしてもよい。 As a multiplexing method of the filter coefficient set in the entropy encoding unit 103, for example, encoding may be performed using higher syntax such as a parameter set such as a slice header or a picture parameter set.

例えば、図１２、図１３で示したような既定のフィルタ係数を予め定めておき、エントロピー符号化部１０３は、既定のフィルタ係数からの変更がない場合は変更なしを表すフラグのみを上位シンタックス等でシグナリングする。なお、シグナリングとは、所望の情報を受信側（復号側）が取得できるように、伝達信号（符号化ストリーム）内に当該所望の情報を特定するための信号、又は、当該情報そのものを示す信号を含めることを意味する。 For example, predetermined filter coefficients as shown in FIGS. 12 and 13 are determined in advance, and the entropy encoding unit 103 displays only a flag indicating no change in the upper syntax when there is no change from the default filter coefficient. And so on. Signaling is a signal for specifying the desired information in the transmission signal (encoded stream) or a signal indicating the information itself so that the receiving side (decoding side) can obtain the desired information. Is meant to be included.

また、変更がある場合には、エントロピー符号化部１０３は、位相ごとにフラグを設け、既定のフィルタ係数と異なるフィルタ係数を含む位相を示すフラグをシグナリングする。これにより、例えば、空間解像度を２×２倍に拡大するようなリサンプリングの場合には、位相としては０と８のフィルタのみが使用されるようなときが存在するが、エントロピー符号化部１０３は、これら特定の位相のフィルタ係数のみを効率的に符号化することができる。さらに、エントロピー符号化部１０３は、フィルタ係数の値も対応する既定のフィルタ係数の値と、変更後のフィルタ係数の値の差分値を符号化するように構成することにより、フィルタ係数値の符号化に必要となる符号量を削減するように構成しても良い。 When there is a change, the entropy coding unit 103 provides a flag for each phase, and signals a flag indicating a phase including a filter coefficient different from a predetermined filter coefficient. Thereby, for example, in the case of resampling that expands the spatial resolution by 2 × 2, there are times when only the filters of 0 and 8 are used as the phase, but the entropy coding unit 103 Can efficiently encode only these specific phase filter coefficients. Further, the entropy encoding unit 103 is configured to encode the difference value between the predetermined filter coefficient value corresponding to the filter coefficient value and the changed filter coefficient value, thereby encoding the filter coefficient value. It may be configured to reduce the amount of code required for conversion.

映像復号装置２０（拡張レイヤのデコーダ２００）では、エントロピー復号部２０３は、上記のようにシグナリングされたフィルタ係数セットを復号する。また、既定のフィルタ係数の使用が指示された位相については、例えば、先述の図１２、図１３のような予め定められた既定のフィルタ係数と合せて、フィルタ係数記憶部２３０に記憶（格納）する。そして、リサンプリング部２２０では、フィルタ係数記憶部２３０に記憶されたフィルタ係数セットを用いて、上記の（２−１）〜（２−６）式に基づくリサンプリング処理を行うことで、対象レイヤの復号処理を行うことができる。 In the video decoding device 20 (enhancement layer decoder 200), the entropy decoding unit 203 decodes the signaled filter coefficient set as described above. Further, the phase instructed to use the predetermined filter coefficient is stored (stored) in the filter coefficient storage unit 230 together with the predetermined default filter coefficient as shown in FIGS. To do. Then, the resampling unit 220 uses the filter coefficient set stored in the filter coefficient storage unit 230 to perform the resampling process based on the above equations (2-1) to (2-6), so that the target layer Can be decrypted.

（Ａ−２−２）リサンプリング処理に用いる最適なフィルタ係数セットの設計（選択）について
次に、拡張レイヤのエンコーダ１００のフィルタ係数決定部１３１におけるフィルタ係数セットの設計処理について説明する。 (A-2-2) Design (Selection) of Optimal Filter Coefficient Set Used for Resampling Process Next, the filter coefficient set design process in the filter coefficient determination unit 131 of the enhancement layer encoder 100 will be described.

参照レイヤの復号画像をｒ、符号化対象レイヤの入力画像をｑとして、フィルタ係数セット（フィルタ係数ａ〜ｄ）によるリサンプル画像をｐとすると、リサンプル画像ｐは、先述の（２−１）〜（２−６）式により求まる値である。これと、入力画像ｑの誤差が最少となるようなフィルタ係数ａ〜ｄを、フィルタ係数決定部１３１で設計する。 Assuming that the decoded image of the reference layer is r, the input image of the encoding target layer is q, and the resampled image by the filter coefficient set (filter coefficients a to d) is p, the resampled image p is (2-1) described above. ) To (2-6). The filter coefficient determination unit 131 designs filter coefficients a to d that minimize the error of the input image q.

例えば、フィルタ係数決定部１３１は、次の（３）式に示すような、リサンプル画像ｐと入力画像ｑの誤差の二乗和Ｅを、最小化するフィルタ係数ａ〜ｄを、最小二乗法を用いて求める。 For example, the filter coefficient determination unit 131 uses the least-squares method for the filter coefficients a to d that minimize the square sum E of errors between the resampled image p and the input image q, as shown in the following equation (3). Use to find.

ここで、上記の（３）式における誤差の二乗和Ｅは、対象レイヤの画像の画面全体のすべての画素位置に関する和として求める。ｐ及びｑは対象レイヤ上の画素位置ごとの値である。

Here, the square sum E of errors in the above equation (3) is obtained as the sum of all pixel positions of the entire screen of the image of the target layer. p and q are values for each pixel position on the target layer.

また、フィルタ係数決定部１３１は、例えば、図５に示すように符号化対象画像の画面全体に対して、レイヤ間予測を用いて符号化する領域（レイヤ間予測領域）が一部である場合には、図４に示すような手順でフィルタ係数を決定しても良い。 In addition, for example, when the filter coefficient determination unit 131 has a part of an area (interlayer prediction area) to be encoded using inter-layer prediction for the entire screen of the encoding target image as illustrated in FIG. 5. Alternatively, the filter coefficient may be determined by a procedure as shown in FIG.

図４は、拡張レイヤのエンコーダ１００において、リサンプリング処理で用いるフィルタ係数の決定処理を示すフローチャートである。 FIG. 4 is a flowchart showing determination processing of filter coefficients used in resampling processing in the encoder 100 of the enhancement layer.

フィルタ係数決定部１３１は、既定のフィルタ係数（例えば、図１２、図１３で示したフィルタ係数）を用いて、参照レイヤの復号画像に対して、リサンプリング処理を行い、一旦仮のレイヤ間参照画像を作成する。そして、拡張レイヤのエンコーダ１００は、このレイヤ間参照画像と、参照画像バッファの対象レイヤの参照画像とを用いて、各処理単位領域の符号化モード（イントラ予測、インター予測、レイヤ間予測）や参照される参照画像を選択する（Ｓ１１）。 The filter coefficient determination unit 131 performs a resampling process on the decoded image of the reference layer using a predetermined filter coefficient (for example, the filter coefficient illustrated in FIGS. 12 and 13), and temporarily refers to a temporary layer. Create an image. Then, the enhancement layer encoder 100 uses the inter-layer reference image and the reference image of the target layer of the reference image buffer, and the encoding mode (intra prediction, inter prediction, inter-layer prediction) of each processing unit region, A reference image to be referred to is selected (S11).

フィルタ係数決定部１３１は、符号化対象画像の内、レイヤ間予測が選択された領域（例えば、図５のレイヤ間予測領域）を抽出する（Ｓ１２）。 The filter coefficient determination unit 131 extracts a region in which inter-layer prediction is selected (for example, the inter-layer prediction region in FIG. 5) from the encoding target image (S12).

フィルタ係数決定部１３１は、レイヤ間予測が選択された領域についてのみ、先述の（３）式により求まる誤差の二乗和Ｅを最小化するようなフィルタ係数ａ〜ｄを求める（Ｓ１３）。すなわち誤差の二乗和Ｅを、レイヤ間予測が選択された領域の画素位置に限定して和を求めた式に基づいてフィルタ係数を求める。算出されたフィルタ係数ａ〜ｄは、フィルタ係数記憶部１３０に記憶される。 The filter coefficient determination unit 131 obtains filter coefficients a to d that minimize the square sum E of errors obtained by the above-described equation (3) only for the region where inter-layer prediction is selected (S13). That is, the filter coefficient is obtained on the basis of an equation in which the sum of squares of errors is limited to the pixel position in the region where inter-layer prediction is selected. The calculated filter coefficients a to d are stored in the filter coefficient storage unit 130.

その後、リサンプリング部１２０では、フィルタ係数決定部１３１で決定されたフィルタ係数（フィルタ係数記憶部１３０に記憶されたフィルタ係数）を用いて、フィルタ処理が実行される（Ｓ１４）。 Thereafter, in the resampling unit 120, filter processing is executed using the filter coefficient determined by the filter coefficient determination unit 131 (filter coefficient stored in the filter coefficient storage unit 130) (S14).

なお、先述のステップＳ１１において、既定のフィルタ係数を用いたリサンプリング処理による仮のレイヤ間参照画像を用いる場合を説明したが、フィルタ係数決定部１３１は、画面全体の画素を用いて一旦フィルタ係数を設計し、このフィルタ係数を用いてリサンプリング処理を行って仮のレイヤ間参照画像を作成するよう構成しても良い。 In the above-described step S11, the case where the temporary inter-layer reference image using the re-sampling process using the predetermined filter coefficient has been described. However, the filter coefficient determination unit 131 uses the entire screen pixel to temporarily filter coefficients. May be configured to generate a temporary inter-layer reference image by performing resampling processing using the filter coefficient.

次に、フィルタ設計のための最小二乗法の詳細について説明を行う。 Next, details of the least square method for filter design will be described.

先述の（３）式のような誤差の二乗和Ｅを最小化するフィルタ係数ａ〜ｄを求める場合、フィルタ係数決定部１３１は、誤差の二乗和Ｅを、ａ〜ｄで偏微分した式が０となるようなフィルタ係数ａ〜ｄを、例えば、次の（４）式のような連立方程式を解くことによって求める。 When obtaining the filter coefficients a to d that minimize the square sum E of the error as in the above-described equation (3), the filter coefficient determination unit 131 uses an equation obtained by partially differentiating the square sum E of the errors by a to d. The filter coefficients a to d that become 0 are obtained by solving simultaneous equations such as the following equation (4), for example.

上記の（４）式は、ａ〜ｄに関する１次式とはならないが、既定のフィルタ係数値を始点とするような解析的な処理方法等によりフィルタ係数ａ〜ｄを求めることができる。

The above expression (4) is not a linear expression related to a to d, but the filter coefficients a to d can be obtained by an analytical processing method or the like using a predetermined filter coefficient value as a starting point.

図６は、フィルタ係数決定部において、線形方程式の解法のみを用いてフィルタ係数を近似的に求める一例を示すフローチャートである。 FIG. 6 is a flowchart illustrating an example in which the filter coefficient determination unit approximately obtains the filter coefficient using only the solution of the linear equation.

フィルタ係数決定部１３１は、先述の（３）式のｃ、ｄが既定のフィルタ係数と同じ値であると仮定して、ｃ、ｄに既定のフィルタ係数値を代入する。そして、フィルタ係数決定部１３１は、代入した（３）式の誤差の二乗和Ｅを、最小化するようなフィルタ係数ａ、ｂを求める（Ｓ２１）。フィルタ係数決定部１３１は、ｃ、ｄを定数とみなせば、（３）式のＥのａ、ｂによる偏微分（４−１）、（４−２）式は、ａ、ｂに関する１次式となり、簡単な線形方程式の解法を用いてａ、ｂを求めることができる。 The filter coefficient determination unit 131 substitutes default filter coefficient values for c and d on the assumption that c and d in the above equation (3) are the same values as the default filter coefficients. Then, the filter coefficient determination unit 131 obtains filter coefficients a and b that minimize the square sum E of the error in the substituted equation (3) (S21). If the filter coefficient determination unit 131 regards c and d as constants, the partial differentials (4-1) and (4-2) of E in equation (3) by a and b are linear expressions relating to a and b. Thus, a and b can be obtained using a simple linear equation solving method.

フィルタ係数決定部１３１は、先述のステップＳ２１で求めたａ、ｂを、定数とみなして代入した（３）式のＥを、最小化するような係数ｃ、ｄを（４−３）、（４−４）式より求める（Ｓ２２）。フィルタ係数決定部１３１は、先述のステップＳ２１と同様に線形方程式の解法で処理可能である。 The filter coefficient determination unit 131 considers the coefficients c and d that minimize E from the expression (3) in which a and b obtained in step S21 described above are substituted as constants (4-3), ( It is obtained from equation 4-4 (S22). The filter coefficient determination unit 131 can perform processing by solving a linear equation as in step S21 described above.

なお、フィルタ係数決定部１３１は、ステップＳ２１において係数ｃ、ｄを先に求め、その後、ステップＳ２２において係数ａ、ｂを決定しても良い。 Note that the filter coefficient determination unit 131 may first determine the coefficients c and d in step S21, and then determine the coefficients a and b in step S22.

また、例えば、水平方向と垂直方向で同じフィルタ係数を用いる制約（ａ＝ｃ、ｂ＝ｄ）を設ける場合には、フィルタ係数決定部１３１は、一方のフィルタ係数を定数と仮定して他方のフィルタ係数を求める処理を繰り返すことによってフィルタ係数を求める処理（図７）を行っても良い。 Also, for example, in the case of providing a constraint (a = c, b = d) that uses the same filter coefficient in the horizontal direction and the vertical direction, the filter coefficient determination unit 131 assumes that one filter coefficient is a constant and the other You may perform the process (FIG. 7) which calculates | requires a filter coefficient by repeating the process which calculates | requires a filter coefficient.

フィルタ係数決定部１３１は、フィルタ係数ｃ、ｄを既定のフィルタ係数で初期化する（Ｓ３１）。 The filter coefficient determination unit 131 initializes the filter coefficients c and d with predetermined filter coefficients (S31).

フィルタ係数決定部１３１は、フィルタ係数ｃ、ｄを定数とみなして、先述の（３）式の誤差の二乗和Ｅを、最小化するようなフィルタ係数ａ、ｂの決定処理を行う（Ｓ３２）。 The filter coefficient determination unit 131 considers the filter coefficients c and d as constants, and performs a determination process of the filter coefficients a and b so as to minimize the error sum of squares E in the above equation (3) (S32). .

フィルタ係数決定部１３１は、先述のステップＳ３２で求めた係数ａ、ｂを定数とみなして、同様に（３）式の誤差の二乗和Ｅを、最小化するような係数ｃ、ｄの決定処理を行う（Ｓ３３）。 The filter coefficient determination unit 131 regards the coefficients a and b obtained in step S32 described above as constants, and similarly determines the coefficients c and d so as to minimize the square sum E of the error in the equation (3). (S33).

フィルタ係数決定部１３１は、求めたフィルタ係数ｃ、ｄを利用するか否か判断し、利用しない場合にはステップＳ３２から再度決定処理を行う（Ｓ３４）。一方、フィルタ係数決定部１３１は、求めたフィルタ係数ｃ、ｄを利用する場合（係数値が収束したと判断した場合）には、決定処理の打ち切りを行う。 The filter coefficient determination unit 131 determines whether or not to use the obtained filter coefficients c and d. If not, the determination process is performed again from step S32 (S34). On the other hand, when using the obtained filter coefficients c and d (when it is determined that the coefficient values have converged), the filter coefficient determination unit 131 cancels the determination process.

以上のように、フィルタ係数決定部１３１は、レイヤ間予測領域に対して最適設計したフィルタ係数セットを求め、リサンプリング部１２０（フィルタ係数記憶部１３０）及び映像復号装置２０側に供給する。 As described above, the filter coefficient determination unit 131 obtains a filter coefficient set that is optimally designed for the inter-layer prediction region, and supplies the filter coefficient set to the resampling unit 120 (filter coefficient storage unit 130) and the video decoding device 20 side.

（Ａ−３）実施形態の効果
この実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of Embodiment According to this embodiment, the following effects can be achieved.

この実施形態では、拡張レイヤのエンコーダ１００のリサンプリング部１２０、及び拡張レイヤのデコーダ２００のリサンプリング部２２０で用いるフィルタ係数セットを、適切なものに設定可能（フィルタ係数セットを任意に変更可能）とした。これにより、この実施形態の映像符号化装置１０（拡張レイヤのエンコーダ１００）では、符号化対象の画像の性質に応じた最適なフィルタ処理を施すことが可能となり、画質劣化が低減できるとともに、より符号化効率の高い符号化ストリームを生成することができる。 In this embodiment, filter coefficient sets used in the resampling unit 120 of the enhancement layer encoder 100 and the resampling unit 220 of the enhancement layer decoder 200 can be set to appropriate values (the filter coefficient set can be arbitrarily changed). It was. Thereby, in the video encoding device 10 (enhancement layer encoder 100) of this embodiment, it is possible to perform the optimum filter processing according to the property of the image to be encoded, and the image quality deterioration can be reduced, and more An encoded stream with high encoding efficiency can be generated.

また、フィルタ係数決定部１３１は、既定のリサンプリングフィルタ係数と異なるフィルタ係数のみを符号化するように構成したので、フィルタ係数の符号化に要する符号量を削減することができる。 Further, since the filter coefficient determination unit 131 is configured to encode only filter coefficients different from the predetermined resampling filter coefficients, it is possible to reduce the amount of code required for encoding the filter coefficients.

さらに、拡張レイヤのエンコーダ１００（拡張レイヤのデコーダ２００）は、位相０のフィルタ係数も指定可能としているので、空間解像度変換を伴わないようなレイヤ間参照（参照レイヤと対象レイヤが同じ空間解像度）の場合や、ステレオ３Ｄ画像符号化のような、多視点符号化拡張を用いる符号化の場合の視点間参照予測（ビュー間予測）の場合に、リサンプリング部１２０（リサンプリング部２２０）を、参照レイヤからの参照画像に対する画質改善フィルタとして機能させることも可能である。 Furthermore, since the enhancement layer encoder 100 (enhancement layer decoder 200) can also specify a filter coefficient of phase 0, an inter-layer reference that does not involve spatial resolution conversion (the reference layer and the target layer have the same spatial resolution). Or in the case of inter-view reference prediction (inter-view prediction) in the case of encoding using multi-view encoding extension, such as stereo 3D image encoding, the resampling unit 120 (resampling unit 220) is It is also possible to function as an image quality improvement filter for the reference image from the reference layer.

（Ｂ）他の実施形態
本発明は、上記実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (B) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

上記の実施形態では、本発明を映像伝送システム（映像符号化装置及び映像復号装置）に適用する例について説明したが、その他のさまざまな符号化処理に利用可能である。例えば、上記の実施形態のリンサンプリング処理以外の符号化に関わる予測処理や変換量子化処理については、上記の実施形態で説明したような構成に限定されるものではなく、様々な符号化ツールを組み合わせたような符号化処理にも利用可能である。 In the above embodiment, an example in which the present invention is applied to a video transmission system (video encoding device and video decoding device) has been described. However, the present invention can be used for various other encoding processes. For example, the prediction process and transform quantization process related to encoding other than the phosphorus sampling process of the above embodiment are not limited to the configuration described in the above embodiment, and various encoding tools can be used. The present invention can also be used for a combination of encoding processes.

１…映像伝送システム、１０…映像符号化装置、２０…映像復号装置、１００…拡張レイヤのエンコーダ、１０１…差分処理部、１０２…変換量子化部、１０３…エントロピー符号化部、１０４…逆量子化逆変換部、１０５…加算部、１０６…ループ内フィルタ部、１０７…参照画像バッファ、１０８…インター予測部、１０９…イントラ予測部、１１０…切り替え部、１１２…加算部、１２０…リサンプリング部、１３０…フィルタ係数記憶部、１３１…フィルタ係数決定部、２００…拡張レイヤのデコーダ、２０３…エントロピー復号部、２０４…逆量子化逆変換部、２０５…加算部、２０６…ループ内フィルタ部、２０７…参照画像バッファ、２０８…インター予測部、２０９…イントラ予測部、２１０…切り替え部、２２０…リサンプリング部、２３０…フィルタ係数記憶部。
DESCRIPTION OF SYMBOLS 1 ... Video transmission system, 10 ... Video encoding apparatus, 20 ... Video decoding apparatus, 100 ... Encoder of an enhancement layer, 101 ... Difference processing part, 102 ... Transform quantization part, 103 ... Entropy encoding part, 104 ... Inverse quantum Inverse conversion unit 105 ... Adding unit 106 ... In-loop filter unit 107 ... Reference image buffer 108 ... Inter prediction unit 109 ... Intra prediction unit 110 ... Switching unit 112 ... Adding unit 120 ... Resampling unit , 130 ... filter coefficient storage unit, 131 ... filter coefficient determination unit, 200 ... enhancement layer decoder, 203 ... entropy decoding unit, 204 ... inverse quantization inverse transform unit, 205 ... addition unit, 206 ... in-loop filter unit, 207 Reference image buffer 208 Inter-prediction unit 209 Intra prediction unit 210 Switching unit 220 Resan Ring portion, 230 ... filter coefficient storage unit.

Claims

In a video encoding device that hierarchically encodes images of a plurality of layers,
A prediction unit that generates a prediction image obtained by predicting an input image of an encoding target layer by performing inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer;
Filter coefficient determining means for determining an optimum filter coefficient set with reference to the input image of the encoding target layer and the decoded image of the reference layer;
Resampling means for performing a filtering process on the decoded image of the reference layer using the filter coefficient set;
A video encoding device comprising: encoding means for encoding a residual signal between an input image of the encoding target layer and the predicted image.

The re-sampling means has an integer position (xR, yR) with respect to the position (X, Y) of the target pixel of the decoded image of the reference layer corresponding to the pixel position of the input image of the encoding target layer. The pixel value p resampled from the pixel value array r of the decoded image of the reference layer according to the following expressions (A1) and (A2) using the phase (xP, yP) that is a deviation from the integer position: The video encoding apparatus according to claim 1, wherein the video encoding apparatus is obtained.
t _j = (Σ _i a [xP, i] * r [xR + i−N / 2 + 1, yR + j−N / 2 + 1] + b) >> s ₁ (A1)
p = (Σ _j c [yP, j] * t _j + d + o) >> s ₂ (A2)
In the expressions (A1) and (A2), a, b, c, d are filter coefficients constituting the filter coefficient set, and s ₁ , s ₂ , o are shift values according to the bit depth of the pixel value. And rounded value. In equations (A1) and (A2), N is the number of filter taps, and i and j are filter tap positions used in the horizontal and vertical directions.

The resampling means expresses the position (X, Y) of the target pixel of the decoded image of the reference layer with 1/16 pixel accuracy, and the integer position (xR, yR) of the target pixel of the decoded image of the reference layer 3. The video according to claim 2, wherein the filter coefficient set is set as a filter coefficient with respect to 16 phases by calculating the phase and the phase (xP, yP) according to the following expressions (B1) to (B4): Encoding device.
xR = X >> 4 (B1)
xP = X% 16 (B2)
yR = Y >> 4 (B3)
yP = Y% 16 (B4)

4. The video encoding according to claim 1, wherein the encoding means outputs an encoded stream obtained by multiplexing and encoding the filter coefficient set determined by the filter coefficient determining means. apparatus.

The filter coefficient determining means is provided with a restriction that the same value is used for the horizontal filter coefficients a and b used in the resampling means and the filter coefficients c and d of the vertical filter. 5. The video encoding apparatus according to claim 2, wherein a coefficient set is determined.

The filter coefficient determining means determines the filter coefficient set after providing a constraint that the filter coefficients b and d, which are constant terms used in the resampling means, are not used. Item 5. The video encoding device according to any one of Items 2 to 4.

A predetermined filter coefficient for each phase of the target pixel position of the decoded image of the reference layer used in the resampling means;
The encoding means reduces the amount of data to be encoded by signaling a flag indicating whether or not the filter coefficient determination means has determined a filter coefficient having a value different from the predetermined filter coefficient. The video encoding device according to claim 2.

The encoding unit reduces a data amount to be encoded by encoding a difference value between the filter coefficient determined by the encoding unit and the corresponding predetermined filter coefficient. 8. The video encoding device according to 7.

The video encoding apparatus according to claim 2, wherein the filter coefficient determination unit determines the filter coefficient set using a least square method.

The filter coefficient determining unit generates a temporary inter-layer reference image by performing a filtering process on the decoded image of the reference layer using the resampling unit and the predetermined filter coefficient,
The filter coefficient determination means performs inter-layer prediction among input images of the encoding target layer with reference to the temporary inter-layer reference image and an inter prediction reference image referred to for inter prediction. Determine the inter-layer prediction region,
The filter coefficient set determination process is performed so as to minimize a sum of squares of errors between the resampled image and the input image of the encoding target layer relating to only the inter-layer prediction region. The video encoding device described.

In a video decoding device that decodes hierarchically encoded data,
A prediction unit that generates a predicted image obtained by predicting a target image of a decoding target layer by performing inter-layer prediction with reference to at least a resampled image obtained by performing filtering on the decoded image of another reference layer;
Decoding means for decoding a filter coefficient set included in the hierarchically encoded data;
A video decoding device comprising: a resampling unit that performs a filtering process on a decoded image of the reference layer using the filter coefficient set decoded by the decoding unit.

A computer mounted on a video encoding device that hierarchically encodes images of multiple layers,
A prediction unit that generates a prediction image obtained by predicting an input image of an encoding target layer by performing inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer;
Filter coefficient determining means for determining an optimum filter coefficient set with reference to the input image of the encoding target layer and the decoded image of the reference layer;
Resampling means for performing a filtering process on the decoded image of the reference layer using the filter coefficient set;
A video encoding program that functions as an encoding unit that encodes a residual signal between an input image of the encoding target layer and the predicted image.

A computer mounted on a video decoding device that decodes hierarchically encoded data,
A prediction unit that generates a predicted image obtained by predicting a target image of a decoding target layer by performing inter-layer prediction with reference to at least a resampled image obtained by performing filtering on the decoded image of another reference layer;
Decoding means for decoding a filter coefficient set included in the hierarchically encoded data;
A video decoding program that functions as resampling means for performing filter processing on a decoded image of the reference layer, using the filter coefficient set decoded by the decoding means.

In a video encoding method performed by a video encoding device that hierarchically encodes images of a plurality of layers,
A prediction unit, a filter coefficient determination unit, a resampling unit, and an encoding unit;
The prediction means generates a prediction image obtained by predicting an input image of the encoding target layer by inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer. ,
The filter coefficient determination means determines an optimal filter coefficient set with reference to the input image of the encoding target layer and the decoded image of the reference layer,
The resampling means performs a filtering process on the decoded image of the reference layer using the filter coefficient set,
The video encoding method, wherein the encoding means encodes a residual signal between an input image of the encoding target layer and the predicted image.

In a video decoding method performed by a video decoding device that decodes hierarchically encoded data,
A prediction unit, a decoding unit, and a resampling unit;
The prediction unit generates a prediction image obtained by predicting a target image of a decoding target layer by performing inter-layer prediction with reference to at least a resampled image obtained by performing filtering on a decoded image of another reference layer,
The decoding means decodes a filter coefficient set included in the hierarchically encoded data,
The video decoding method, wherein the resampling means performs a filtering process on the decoded image of the reference layer using the filter coefficient set decoded by the decoding means.

In a video transmission system comprising a video encoding device that hierarchically encodes images of a plurality of layers and a video decoding device that decodes encoded data hierarchically encoded by the video encoding device, the video encoding device is claimed as the video encoding device. 12. A video transmission system, wherein the video encoding device according to claim 1 is applied, and the video decoding device according to claim 11 is applied as the video decoding device.