JP2013232775A

JP2013232775A - Moving image decoding device and moving image coding device

Info

Publication number: JP2013232775A
Application number: JP2012103716A
Authority: JP
Inventors: Hisao Kumai; 久雄熊井; Tomoyuki Yamamoto; 智幸山本; Tomoko Aono; 友子青野; Norio Ito; 典男伊藤
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2012-04-27
Filing date: 2012-04-27
Publication date: 2013-11-14
Also published as: WO2013161689A1

Abstract

PROBLEM TO BE SOLVED: To improve coding efficiency of coded data in which each layer is coded by a different coding system.SOLUTION: A moving image decoding device comprises: a variable-length decoding unit 13 that decodes a motion vector used in decoding a first layer; a motion information normalization unit 22 that derives an intermediate motion vector on the basis of the decoded motion vector; and a motion information conversion processing unit 20 that derives a motion vector used in decoding a second layer with reference to the derived intermediate motion vector.

Description

本発明は、階層符号化された符号化データを復号する動画像復号装置、および画像を階層符号化して符号化データを生成する動画像符号化装置に関するものである。 The present invention relates to a moving picture decoding apparatus that decodes hierarchically encoded data and a moving picture encoding apparatus that generates encoded data by hierarchically encoding an image.

動画像を効率的に伝送または記録するために、動画像を符号化することによって符号化データを生成する動画像符号化装置（符号化装置）、および、当該符号化データを復号することによって復号画像を生成する動画像復号装置（復号装置）が用いられている。 In order to efficiently transmit or record a moving image, a moving image encoding device (encoding device) that generates encoded data by encoding the moving image, and decoding by decoding the encoded data A video decoding device (decoding device) that generates an image is used.

具体的な動画像符号化方式としては、必要なデータレートに従って、動画像を階層的に符号化する階層符号化が用いられている。階層符号化の方式としては、ＩＳＯ／ＩＥＣとＩＴＵ−Ｔの標準としてＨ．２６４／ＡＶＣＡｎｎｅｘＧＳｃａｌａｂｌｅＶｉｄｅｏＣｏｄｉｎｇ（ＳＶＣ）が挙げられる（非特許文献１）。 As a specific moving image encoding method, hierarchical encoding is used in which moving images are encoded hierarchically according to a required data rate. Hierarchical coding methods include ISO / IEC and ITU-T standards as H.264 standards. H.264 / AVC Annex G Scalable Video Coding (SVC) (Non-Patent Document 1).

ＳＶＣでは、符号化データを、基本レイヤ（下位レイヤ）と拡張レイヤ（上位レイヤ）との２つのレイヤ（階層）とすることが可能である。これにより、例えば、復号装置において、基本レイヤのみを参照した低品質再生と、基本レイヤ及び拡張レイヤを参照した高品質再生とを実現することができる。 In SVC, encoded data can be made into two layers (hierarchies) of a base layer (lower layer) and an enhancement layer (upper layer). Thereby, for example, in the decoding device, it is possible to realize low-quality reproduction referring only to the base layer and high-quality reproduction referring to the base layer and the enhancement layer.

また、ＳＶＣでは空間スケーラビリティ、時間スケーラビリティ、ＳＮＲスケーラビリティをサポートしている。例えば、空間スケーラビリティの場合、原画像から所望の解像度にダウンサンプリングした画像を下位レイヤとしてＨ．２６４／ＡＶＣで符号化する。上位レイヤではレイヤ間の冗長性を除去するためにレイヤ間予測を行う。レイヤ間予測としては、動き予測に関する情報を同時刻の下位レイヤの情報から予測する動き情報予測、あるいは同時刻の下位レイヤの復号画像をアップサンプリングした画像から予測するレイヤ間イントラ予測がある（非特許文献２）。 In addition, SVC supports spatial scalability, temporal scalability, and SNR scalability. For example, in the case of spatial scalability, an image obtained by down-sampling an original image to a desired resolution is used as a lower layer. It is encoded with H.264 / AVC. In the upper layer, inter-layer prediction is performed in order to remove redundancy between layers. As inter-layer prediction, there is motion information prediction in which information related to motion prediction is predicted from information in lower layers at the same time, or intra-layer prediction in which prediction is performed from an image obtained by up-sampling a decoded image of a lower layer at the same time (non- Patent Document 2).

また、非特許文献３には、符号化方式について、ＨＭ（HEVC TestModel）ソフトウェアに採用されている方式が記載されている。 Non-Patent Document 3 describes a method employed in HM (HEVC TestModel) software as an encoding method.

また、インターネット等のネットワーク網の発達により、放送と通信、ユニキャスト通信とマルチキャスト通信など、複数の経路、特に異種ネットワークを使ってコンテンツデータをデータ配信するハイブリッド伝送（Hybrid Delivery）と呼ばれる伝送方法も提案されている。ハイブリッド伝送により配信されたデータを受信した受信端末は、複数の経路から受信したデータを、同期、重畳、合成等して１つのデータとして画面に表示することが可能である。ハイブリッド伝送を利用することで、スケーラブル符号化（ＳＶＣ）された動画像について、基本レイヤデータを放送で、拡張レイヤデータは通信網を介してハイブリッド伝送するということも可能になる。 Also, with the development of network networks such as the Internet, there is also a transmission method called hybrid delivery that distributes content data using multiple routes, especially heterogeneous networks, such as broadcast and communication, unicast communication and multicast communication. Proposed. A receiving terminal that has received data distributed by hybrid transmission can display data received from a plurality of routes on the screen as one data by synchronizing, superimposing, synthesizing, or the like. By using the hybrid transmission, it is possible to broadcast the base layer data and the hybrid transmission of the extension layer data via the communication network for the scalable encoded (SVC) moving image.

ISO/IEC 14496-2:2004 （２００４年６月１日）ISO / IEC 14496-2: 2004 (June 1, 2004) ITU‐T H.264 Advanced video coding for generic audiovisual services（２０１０年３月）ITU-T H.264 Advanced video coding for generic audiovisual services (March 2010) Draft ISO/IEC 23008-HEVC : 201x (E):High efficiency video coding (HEVC) text specification draft 6Draft ISO / IEC 23008-HEVC: 201x (E): High efficiency video coding (HEVC) text specification draft 6

しかしながら、従来のスケーラブル符号化（ＳＶＣ）では、レイヤ毎で異なる符号化方式を利用する場合、各レイヤの符号化・復号を独立に行う構成か、もしくは拡張レイヤの符号化方式を、基本レイヤの符号化方式に依存した構成でしか符号化を行うことができない。換言すれば、基本レイヤと拡張レイヤとで符号化方式が異なる場合、基本レイヤの符号化に用いた符号化情報を、拡張レイヤの符号化に用いることができない。 However, in the conventional scalable coding (SVC), when different coding methods are used for each layer, a configuration in which encoding and decoding of each layer is performed independently or an encoding method of an enhancement layer is changed to that of the base layer. Encoding can be performed only with a configuration depending on the encoding method. In other words, when the encoding method differs between the base layer and the enhancement layer, the encoding information used for the encoding of the base layer cannot be used for the encoding of the enhancement layer.

例えば、既存の放送システムでは、符号化方式として、ＭＰＥＧ―２、またはＨ．２６４／ＡＶＣが採用されているが、基本レイヤについて、ＭＰＥＧ―２、またはＨ．２６４／ＡＶＣにより符号化した場合に、拡張レイヤでＭＰＥＧ―２、またはＨ．２６４／ＡＶＣとは異なる符号化方式、例えばＨＥＶＣによる符号化を行うときは、基本レイヤとは独立した構成により、符号化・復号を行う必要がある。 For example, in an existing broadcasting system, MPEG-2 or H.264 is used as an encoding method. H.264 / AVC is adopted, but the basic layer is MPEG-2 or H.264. When encoded by H.264 / AVC, MPEG-2 or H.264 is used in the enhancement layer. When encoding by an encoding method different from H.264 / AVC, for example, HEVC, it is necessary to perform encoding / decoding with a configuration independent of the base layer.

本発明は、上記の問題点に鑑みてなされたものであり、その目的は、スケーラブル符号化において、基本レイヤの符号化方式と拡張レイヤの符号化方式とが異なっていても、拡張レイヤの符号化において、基本レイヤの符号化に用いた符号化情報の利用可能な動画像復号装置等を実現することにある。 The present invention has been made in view of the above-described problems, and an object of the present invention is to perform enhancement layer coding even when the base layer coding scheme and the enhancement layer coding scheme are different in scalable coding. It is to realize a moving picture decoding apparatus and the like that can use encoded information used for encoding a base layer.

上記課題を解決するために、本発明に係る動画像復号装置は、互いに符号化方式の異なる複数のレイヤから構成される符号化データを復号する動画像復号装置であって、上記複数のレイヤのうちの第１レイヤに含まれる動きベクトル情報を参照して、当該第１レイヤの復号に用いる動きベクトルを復号する第１レイヤ動きベクトル復号手段と、上記第１レイヤ動きベクトル復号手段によって復号された動きベクトルに基づいて、中間的動きベクトルを導出する中間的動きベクトル導出手段と、上記中間的動きベクトル導出手段によって導出された中間的動きベクトルを参照して、上記複数のレイヤのうちの第２レイヤの復号に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えていることを特徴としている。 In order to solve the above problem, a video decoding device according to the present invention is a video decoding device that decodes encoded data composed of a plurality of layers having different encoding methods, and includes a plurality of layers. With reference to the motion vector information included in the first layer, the first layer motion vector decoding means for decoding the motion vector used for decoding the first layer and the first layer motion vector decoding means An intermediate motion vector deriving unit for deriving an intermediate motion vector based on the motion vector, and an intermediate motion vector derived by the intermediate motion vector deriving unit with reference to the second of the plurality of layers And second layer motion vector deriving means for deriving a motion vector used for layer decoding.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量が削減され、符号化効率を向上させることができる。また、上記中間的動きベクトル導出手段を備えているので、上記第１レイヤおよび上記第２イヤとして、様々な符号化方式が用いられる場合に対応することができる。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, code amount can be reduced and encoding efficiency can be improved. In addition, since the intermediate motion vector deriving unit is provided, it is possible to cope with a case where various encoding methods are used as the first layer and the second ear.

本発明に係る動画像復号装置では、上記中間的動きベクトル導出手段は、上記第１レイヤ動きベクトル復号手段によって復号された動きベクトルを、所定フレーム離れた値に変換して中間的動きベクトルとするものであってもよい。 In the moving picture decoding apparatus according to the present invention, the intermediate motion vector deriving means converts the motion vector decoded by the first layer motion vector decoding means into a value separated by a predetermined frame to obtain an intermediate motion vector. It may be a thing.

上記の構成によれば、第１レイヤの動きベクトルを、第１レイヤの符号化方式に依存しない形式に変換するので、第２レイヤの符号化方式が第１レイヤに依存しないものであっても、第２にレイヤの復号に、第１レイヤの動きベクトルを利用することができる。これにより、符号化効率を向上させることができる。 According to the above configuration, since the motion vector of the first layer is converted into a format that does not depend on the encoding method of the first layer, even if the encoding method of the second layer does not depend on the first layer. Second, the motion vector of the first layer can be used for layer decoding. Thereby, encoding efficiency can be improved.

ここで、所定フレーム離れた値に変換するとは、復号された動きベクトルの長さを、所定フレーム数（例えば、１フレーム）、もしくは所定フィールド（例えば、１フィールド）に対応する時間的距離分の値に変換することをいう。 Here, converting to a value separated by a predetermined frame means that the length of the decoded motion vector is a predetermined number of frames (for example, one frame) or a time distance corresponding to a predetermined field (for example, one field). To convert to a value.

本発明に係る動画像復号装置では、上記第１レイヤには、１フレームについて複数のフィールドを用いて導出された動きベクトルが含まれており、上記中間的動きベクトル導出手段は、上記第２レイヤの復号対象のフレームに用いる動きベクトルを、該処理対象フレームに対応する第１レイヤのフレームにおける複数のフィールドのうち、上記処理対象フレームに対応するフィールドを用いて導出された動きベクトルに基づいて、上記中間的動きベクトルを導出するものであってもよい。 In the video decoding device according to the present invention, the first layer includes a motion vector derived using a plurality of fields for one frame, and the intermediate motion vector deriving means includes the second layer. Based on the motion vector derived using the field corresponding to the processing target frame among the plurality of fields in the first layer frame corresponding to the processing target frame. The intermediate motion vector may be derived.

上記の構成によれば、第１レイヤにおいて、１フレームについて複数のフィールドを用いて導出された動きベクトルが含まれていても、対応するフィールドから動きベクトルを導出することができる。これにより、適切に動きベクトルを導出することができる。 According to the above configuration, even if the first layer includes motion vectors derived using a plurality of fields for one frame, the motion vectors can be derived from the corresponding fields. Thereby, a motion vector can be appropriately derived.

本発明に係る動画像復号装置では、上記中間的動きベクトル導出手段は、上記第２レイヤの復号対象のフレームに用いる動きベクトルを、該処理対象フレームに対応する第１レイヤのフレームにおける動きベクトルに基づいて、上記中間的動きベクトルを導出するものであってもよい。 In the moving picture decoding apparatus according to the present invention, the intermediate motion vector deriving means converts the motion vector used for the second layer decoding target frame into a motion vector in the first layer frame corresponding to the processing target frame. Based on this, the intermediate motion vector may be derived.

上記の構成によれば、処理対象のフレームに対応するフレームから動きベクトルを導出することができる。これにより、適切に動きベクトルを導出することができる。 According to the above configuration, a motion vector can be derived from a frame corresponding to a processing target frame. Thereby, a motion vector can be appropriately derived.

本発明に係る動画像復号装置では、上記中間的動きベクトル導出手段は、上記第２レイヤの復号対象のフレームに対応する上記第１レイヤのフレームにおいて上記第２レイヤで必要とする予測方向の動きベクトルが含まれていない場合、符号化順序で、第１レイヤの該フレームの直近の参照フレームに含まれる動きベクトルに基づいて、上記中間的動きベクトルを導出するものであってもよい。 In the moving picture decoding apparatus according to the present invention, the intermediate motion vector deriving means includes a motion in a prediction direction required in the second layer in the first layer frame corresponding to the decoding target frame in the second layer. When the vector is not included, the intermediate motion vector may be derived based on the motion vector included in the reference frame closest to the frame of the first layer in the encoding order.

上記の構成によれば、処理対象のフレームに対応するフレームに動きベクトルが含まれていない場合であっても、適切に動きベクトルを導出することができる。 According to said structure, even if it is a case where a motion vector is not contained in the flame | frame corresponding to the process target frame, a motion vector can be derived | led-out appropriately.

上記課題を解決するために、本発明に係る動画像復号装置は、互いに符号化方式の異なる複数のレイヤから構成される符号化データを復号する動画像復号装置であって、上記複数のレイヤのうちの第１レイヤに含まれる動きベクトル情報を参照して、当該第１レイヤの復号に用いる動きベクトルを復号する第１レイヤ動きベクトル復号手段と、上記第１レイヤ動きベクトル復号手段によって復号された動きベクトルおよび第２レイヤの参照フレームの参照関係に基づいて、上記複数のレイヤのうちの第２レイヤの復号に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えていることを特徴としている。 In order to solve the above problem, a video decoding device according to the present invention is a video decoding device that decodes encoded data composed of a plurality of layers having different encoding methods, and includes a plurality of layers. With reference to the motion vector information included in the first layer, the first layer motion vector decoding means for decoding the motion vector used for decoding the first layer and the first layer motion vector decoding means Second layer motion vector deriving means for deriving a motion vector used for decoding the second layer of the plurality of layers based on the reference relationship between the motion vector and the reference frame of the second layer. It is a feature.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量が削減され、符号化効率を向上させることができる。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, code amount can be reduced and encoding efficiency can be improved.

本発明に係る動画像復号装置では、上記第１レイヤには、１フレームについて複数のフィールドを用いて導出された動きベクトルが含まれており、上記第２レイヤ動きベクトル導出手段は、上記第２レイヤの復号対象のフレームに用いる動きベクトルを、該処理対象フレームに対応する第１レイヤのフレームにおける複数のフィールドのうち、上記処理対象フレームに対応するフィールドを用いて導出された動きベクトルに基づいて、上記第２レイヤ動きベクトルを導出するものであってもよい。 In the video decoding device according to the present invention, the first layer includes motion vectors derived using a plurality of fields for one frame, and the second layer motion vector deriving means includes the second layer A motion vector used for a decoding target frame of the layer is based on a motion vector derived using a field corresponding to the processing target frame among a plurality of fields in the first layer frame corresponding to the processing target frame. The second layer motion vector may be derived.

本発明に係る動画像復号装置では、上記第２レイヤ動きベクトル導出手段は、上記第２レイヤの復号対象のフレームに用いる動きベクトルを、該処理対象フレームに対応する第１レイヤのフレームにおける動きベクトルに基づいて、上記第２レイヤ動きベクトルを導出するものであってもよい。 In the moving image decoding apparatus according to the present invention, the second layer motion vector deriving means uses the motion vector used for the second layer decoding target frame as the motion vector in the first layer frame corresponding to the processing target frame. Based on the above, the second layer motion vector may be derived.

上記課題を解決するために、本発明に係る動画像符号化装置は、互いに符号化方式の異なる複数のレイヤで構成される符号化データであって、それぞれのレイヤに原画像と予測画像との差分である予測残差を含む符号化データを生成する動画像符号化装置において、第１レイヤの復号に用いる動きベクトルに基づいて、中間的動きベクトルを導出する中間的動きベクトル導出手段と、上記中間的動きベクトル導出手段によって導出された中間的動きベクトルを参照して、上記第２レイヤの符号化データを生成するための上記予測画像の生成に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えていることを特徴としている。 In order to solve the above-described problem, a video encoding apparatus according to the present invention is encoded data including a plurality of layers having different encoding methods, and each layer includes an original image and a predicted image. In a video encoding device that generates encoded data including a prediction residual that is a difference, an intermediate motion vector deriving unit that derives an intermediate motion vector based on a motion vector used for decoding of the first layer, and Referring to the intermediate motion vector derived by the intermediate motion vector deriving means, second layer motion vector derivation for deriving the motion vector used for generating the predicted image for generating the second layer encoded data And means.

上記の構成によれば、第１レイヤの復号に用いる動きベクトルに基づいて、第２レイヤの動きベクトルを導出するので、符号量を削減し、符号化効率を向上させることができる。また、中間的動きベクトルを導出することにより、第１レイヤおよび第２レイヤとして、様々な符号化方式が用いられる場合に対応することができる。 According to said structure, since the motion vector of a 2nd layer is derived | led-out based on the motion vector used for the decoding of a 1st layer, code amount can be reduced and encoding efficiency can be improved. Further, by deriving the intermediate motion vector, it is possible to cope with cases where various encoding methods are used as the first layer and the second layer.

上記課題を解決するために、本発明に係る動画像符号化装置は、互いに符号化方式の異なる複数のレイヤで構成される符号化データであって、それぞれのレイヤに原画像と予測画像との差分である予測残差を含む符号化データを生成する動画像符号化装置において、第１レイヤの復号に用いる動きベクトルおよび第２レイヤの参照フレームの参照関係に基づいて、上記第２レイヤの符号化データを生成するための上記予測画像の生成に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段を備えていることを特徴としている。 In order to solve the above-described problem, a video encoding apparatus according to the present invention is encoded data including a plurality of layers having different encoding methods, and each layer includes an original image and a predicted image. In the video encoding device that generates encoded data including a prediction residual that is a difference, the second layer code is based on the reference relationship between the motion vector used for decoding the first layer and the reference frame of the second layer. It is characterized by comprising second layer motion vector deriving means for deriving a motion vector used for generating the predicted image for generating the quantized data.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量を削減し、符号化効率を向上させることができる。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, code amount can be reduced and encoding efficiency can be improved.

以上のように、本発明に係る動画像復号装置は、互いに符号化方式の異なる複数のレイヤから構成される符号化データを復号する動画像復号装置であって、上記複数のレイヤのうちの第１レイヤに含まれる動きベクトル情報を参照して、当該第１レイヤの復号に用いる動きベクトルを復号する第１レイヤ動きベクトル復号手段と、上記第１レイヤ動きベクトル復号手段によって復号された動きベクトルに基づいて、中間的動きベクトルを導出する中間的動きベクトル導出手段と、上記中間的動きベクトル導出手段によって導出された中間的動きベクトルを参照して、上記複数のレイヤのうちの第２レイヤの復号に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えている構成である。 As described above, the moving picture decoding apparatus according to the present invention is a moving picture decoding apparatus that decodes encoded data including a plurality of layers having different encoding schemes, and is the first of the plurality of layers. By referring to the motion vector information included in one layer, the first layer motion vector decoding means for decoding the motion vector used for decoding the first layer, and the motion vector decoded by the first layer motion vector decoding means Based on the intermediate motion vector deriving means for deriving the intermediate motion vector, and decoding the second layer of the plurality of layers with reference to the intermediate motion vector derived by the intermediate motion vector deriving means And a second layer motion vector deriving unit for deriving a motion vector used for.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量が削減され、符号化効率を向上させることができるという効果を奏する。また、上記中間的動きベクトル導出手段を備えているので、上記第１レイヤおよび上記第２レイヤとして、様々な符号化方式が用いられる場合に対応することができるという効果を奏する。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, there exists an effect that code amount can be reduced and encoding efficiency can be improved. In addition, since the intermediate motion vector deriving unit is provided, there is an effect that it is possible to cope with cases where various encoding methods are used as the first layer and the second layer.

また、本発明に係る動画像復号装置は、互いに符号化方式の異なる複数のレイヤから構成される符号化データを復号する動画像復号装置であって、上記複数のレイヤのうちの第１レイヤに含まれる動きベクトル情報を参照して、当該第１レイヤの復号に用いる動きベクトルを復号する第１レイヤ動きベクトル復号手段と、上記第１レイヤ動きベクトル復号手段によって復号された動きベクトルに基づいて、上記複数のレイヤのうちの第２レイヤの復号に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えている構成である。 The moving picture decoding apparatus according to the present invention is a moving picture decoding apparatus that decodes encoded data composed of a plurality of layers having different encoding schemes, and is applied to a first layer of the plurality of layers. With reference to the included motion vector information, based on the first layer motion vector decoding means for decoding the motion vector used for decoding the first layer, and the motion vector decoded by the first layer motion vector decoding means, A second layer motion vector deriving unit for deriving a motion vector used for decoding the second layer of the plurality of layers.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量が削減され、符号化効率を向上させることができるという効果を奏する。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, there exists an effect that code amount can be reduced and encoding efficiency can be improved.

また、上記課題を解決するために、本発明に係る動画像符号化装置は、互いに符号化方式の異なる複数のレイヤで構成される符号化データであって、それぞれのレイヤに原画像と予測画像との差分である予測残差を含む符号化データを生成する動画像符号化装置において、第１レイヤの復号に用いる動きベクトルに基づいて、中間的動きベクトルを導出する中間的動きベクトル導出手段と、上記中間的動きベクトル導出手段によって導出された中間的動きベクトルを参照して、上記第２レイヤの符号化データを生成するための上記予測画像の生成に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段と、を備えている構成である。 In order to solve the above-described problem, a moving image encoding apparatus according to the present invention is encoded data including a plurality of layers having different encoding methods, and an original image and a predicted image are included in each layer. An intermediate motion vector deriving unit for deriving an intermediate motion vector based on a motion vector used for decoding of the first layer in a video encoding device that generates encoded data including a prediction residual that is a difference between A second layer motion for deriving a motion vector used for generating the predicted image for generating the second layer encoded data with reference to the intermediate motion vector derived by the intermediate motion vector deriving means Vector derivation means.

上記の構成によれば、第１レイヤの復号に用いる動きベクトルに基づいて、第２レイヤの動きベクトルを導出するので、符号量を削減し、符号化効率を向上させることができるという効果を奏する。また、中間的動きベクトルを導出することにより、第１レイヤおよび第２レイヤとして、様々な符号化方式が用いられる場合に対応することができるという効果を奏する。 According to said structure, since the motion vector of a 2nd layer is derived | led-out based on the motion vector used for the decoding of a 1st layer, there exists an effect that a code amount can be reduced and encoding efficiency can be improved. . Further, by deriving the intermediate motion vector, there is an effect that it is possible to cope with cases where various encoding methods are used as the first layer and the second layer.

また、本発明に係る動画像符号化装置は、互いに符号化方式の異なる複数のレイヤで構成される符号化データであって、それぞれのレイヤに原画像と予測画像との差分である予測残差を含む符号化データを生成する動画像符号化装置において、第１レイヤの復号に用いる動きベクトルに基づいて、上記第２レイヤの符号化データを生成するための上記予測画像の生成に用いる動きベクトルを導出する第２レイヤ動きベクトル導出手段を備えている構成である。 The video encoding apparatus according to the present invention is encoded data composed of a plurality of layers having different encoding methods, and each layer has a prediction residual that is a difference between an original image and a predicted image. A motion vector used for generating the predicted image for generating the second layer encoded data based on a motion vector used for decoding the first layer in a video encoding device that generates encoded data including Is provided with second layer motion vector deriving means for deriving.

上記の構成によれば、第１レイヤに含まれる動きベクトルに基づいて、第２レイヤ動きベクトルを導出するので、符号量を削減し、符号化効率を向上させることができるという効果を奏する。 According to said structure, since a 2nd layer motion vector is derived | led-out based on the motion vector contained in a 1st layer, there exists an effect that a code amount can be reduced and encoding efficiency can be improved.

本発明の実施の形態に係る動画像復号装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the moving image decoding apparatus which concerns on embodiment of this invention. 本発明の概要を説明するための図である。It is a figure for demonstrating the outline | summary of this invention. 本発明の実施形態に係る符号化データの構成を説明するための図であって、（ａ）は、シーケンスＳＥＱを規定するシーケンスレイヤを示しており、（ｂ）は、ピクチャＰＩＣＴを規定するピクチャレイヤを示しており、（ｃ）は、スライスＳを規定するスライスレイヤを示しており、（ｄ）は、ツリーブロック（Tree block）ＴＢＬＫを規定するツリーブロックレイヤを示しており、（ｅ）は、ツリーブロックＴＢＬＫに含まれる符号化単位（Coding Unit；ＣＵ）を規定するＣＵレイヤを示している。It is a figure for demonstrating the structure of the coding data which concern on embodiment of this invention, Comprising: (a) has shown the sequence layer which prescribes | regulates sequence SEQ, (b) is the picture which prescribes | regulates picture PICT (C) shows a slice layer that defines a slice S, (d) shows a tree block layer that defines a tree block TBLK, and (e) shows 2 shows a CU layer that defines a coding unit (CU) included in the tree block TBLK. 本発明の実施形態に係る符号化データの構成を示す図であって、（ａ）は、符号化データのピクチャレイヤの構成を示す図であり、（ｂ）は、ピクチャレイヤに含まれるスライスレイヤの構成を示す図であり、（ｃ）は、スライスレイヤに含まれるマクロブロックレイヤの構成を示す図であり、（ｄ）は、マクロブロックレイヤに含まれるブロックレイヤの構成を示す図である。FIG. 3 is a diagram illustrating a configuration of encoded data according to an embodiment of the present invention, where (a) is a diagram illustrating a configuration of a picture layer of encoded data, and (b) is a slice layer included in the picture layer. (C) is a figure which shows the structure of the macroblock layer contained in a slice layer, (d) is a figure which shows the structure of the block layer contained in a macroblock layer. フレーム構造における、フィールド予測とフレーム予測とを説明するための図であり、（ａ）はフィールド予測を示す図であり、（ｂ）はフレーム予測を示す図である。It is a figure for demonstrating the field prediction and frame prediction in a frame structure, (a) is a figure which shows field prediction, (b) is a figure which shows frame prediction. 動画像復号装置のインター予測部の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the inter estimation part of a moving image decoding apparatus. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動き情報蓄積部に蓄積される動き情報を示す図である。It is a figure which shows the motion information accumulate | stored in a motion information storage part. 拡張レイヤの復号処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a decoding process of an enhancement layer. インター予測処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an inter prediction process. 動き補償パラメータを導出するためのテーブルを示す図である。It is a figure which shows the table for deriving a motion compensation parameter. 中間動き情報導出処理の流れを示すフローチャートである。It is a flowchart which shows the flow of intermediate | middle motion information derivation processing. レイヤ間動き推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the motion estimation process between layers. 本発明の別の実施形態に係る動画像復号装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the moving image decoding apparatus which concerns on another embodiment of this invention. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動きベクトルを導出する方法を説明するための図である。It is a figure for demonstrating the method of deriving a motion vector. 動き情報蓄積部に蓄積される動き情報を示す図である。It is a figure which shows the motion information accumulate | stored in a motion information storage part. 本発明の実施の形態に係る動画像符号化装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the moving image encoder which concerns on embodiment of this invention. 動画像符号化装置の動き予測／補償部の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the motion estimation / compensation part of a moving image encoder. 本発明の別の実施の形態に係る動画像符号化装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the moving image encoder which concerns on another embodiment of this invention. 上記動画像符号化装置を搭載した送信装置、および、上記動画像復号装置を搭載した受信装置の構成について示した図である。（ａ）は、動画像符号化装置を搭載した送信装置を示しており、（ｂ）は、動画像復号装置を搭載した受信装置を示している。It is the figure shown about the structure of the transmitter which mounts the said moving image encoder, and the receiver which mounts the said moving image decoder. (A) shows a transmitting apparatus equipped with a moving picture coding apparatus, and (b) shows a receiving apparatus equipped with a moving picture decoding apparatus. 上記動画像符号化装置を搭載した記録装置、および、上記動画像復号装置を搭載した再生装置の構成について示した図である。（ａ）は、動画像符号化装置を搭載した記録装置を示しており、（ｂ）は、動画像復号装置を搭載した再生装置を示している。It is the figure shown about the structure of the recording device which mounts the said moving image encoder, and the reproducing | regenerating apparatus which mounts the said moving image decoder. (A) shows a recording apparatus equipped with a moving picture coding apparatus, and (b) shows a reproduction apparatus equipped with a moving picture decoding apparatus.

〔実施形態１〕
本実施の形態に係る動画像復号装置１は、スケーラブル符号化（ＳＶＣ）された符号化データを復号する際に、基本レイヤ（第１レイヤ）における符号化方式と拡張レイヤ（第２レイヤ）における符号化方式とが異なっていても、基本レイヤを復号する際に用いた符号化情報を、拡張レイヤの復号時に利用することができるものである。 Embodiment 1
The moving picture decoding apparatus 1 according to the present embodiment, when decoding encoded data that has been subjected to scalable coding (SVC), uses a coding method in the base layer (first layer) and an enhancement layer (second layer). Even if the encoding method is different, the encoding information used when decoding the base layer can be used when decoding the enhancement layer.

より具体的には、基本レイヤの動き予測情報および予測モード情報を、基本レイヤの符号化方式に依存しない形式に変換することで、拡張レイヤにおいて基本レイヤの符号化方式に依存する処理を行うことなく、拡張レイヤの動きベクトルとして参照可能とするものである。 More specifically, the base layer motion prediction information and prediction mode information are converted into a format that does not depend on the base layer encoding method, thereby performing processing dependent on the base layer encoding method in the enhancement layer. Rather, it can be referred to as an enhancement layer motion vector.

これにより、基本レイヤと拡張レイヤとにおいて、それぞれ独自の符号化方式を採用することが可能となるとともに、符号化効率を向上させることができる。なお、本実施の形態では、超高精細映像（動画像、４ｋ映像データ）を伝送する場合において、超高精細映像をスケーラブル符号化し、基本レイヤは、４ｋ映像データをダウンスケーリングし、インタレース化した映像データをＭＰＥＧ−２またはＨ．２６４／ＡＶＣにより符号化してテレビ放送網で伝送し、拡張レイヤは、４ｋ映像をＨＥＶＣにより符号化して（プログレッシブ）、インターネットで伝送する場合について説明するが、本発明の基本レイヤと拡張レイヤとの符号化方式はこれに限られるものではない。 As a result, it is possible to employ unique encoding methods for the base layer and the enhancement layer, respectively, and to improve the encoding efficiency. In this embodiment, when transmitting ultra-high-definition video (moving image, 4k video data), the ultra-high-definition video is scalable-coded, and the base layer downscales 4k video data for interlacing. Recorded video data is MPEG-2 or H.264. The following describes a case where the H.264 / AVC encoding is performed and transmitted on the television broadcasting network, and the enhancement layer is a case where 4k video is encoded by HEVC (progressive) and transmitted over the Internet. The encoding method is not limited to this.

（符号化データ＃１（ａ、ｂ））
本実施形態に係る動画像符号化装置２及び動画像復号装置１の詳細な説明に先立って、動画像符号化装置２によって生成され、動画像復号装置１によって復号される符号化データ＃１（ａ、ｂ）のデータ構造について説明を行う。符号化データ＃１は、基本レイヤ（符号化データ＃１ｂ）と拡張レイヤ（符号化データ＃１ａ）とから構成される。基本レイヤと拡張レイヤとは、互いに異なる伝送路を介して動画像復号装置１に供給されるものであってもよいし、同一の伝送路を介して動画像復号装置１に供給されるものであってもよい。 (Encoded data # 1 (a, b))
Prior to detailed description of the video encoding device 2 and the video decoding device 1 according to the present embodiment, encoded data # 1 (generated by the video encoding device 2 and decoded by the video decoding device 1). The data structure of a and b) will be described. The encoded data # 1 includes a base layer (encoded data # 1b) and an enhancement layer (encoded data # 1a). The base layer and the enhancement layer may be supplied to the video decoding device 1 via different transmission paths, or may be supplied to the video decoding device 1 via the same transmission path. There may be.

一例として、図２に示すように、放送波によって基本レイヤを含むビットストリーム（符号化データ＃１ｂ）が伝送され、インターネット通信網によって拡張レイヤを含むビットストリーム（符号化データ＃１ａ）が伝送される場合を挙げることができる。 As an example, as shown in FIG. 2, a bit stream (encoded data # 1b) including a base layer is transmitted by a broadcast wave, and a bit stream (encoded data # 1a) including an enhancement layer is transmitted by an Internet communication network. Can be mentioned.

また、上述したように、基本レイヤは、例えば、ＭＰＥＧ−２方式またはＨ．２６４／ＡＶＣによって符号化されており、拡張レイヤは、例えば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣの後継規格であるＨＥＶＣ（High Efficiency Video Coding）方式によって符号化されている。このように、以下の説明では、基本レイヤと拡張レイヤとが、互いに異なる符号化方式によって符号化されている場合を例に挙げるが、これは本実施形態を限定するものではない。 Further, as described above, the base layer is, for example, MPEG-2 or H.264. H.264 / AVC, and the enhancement layer is, for example, H.264. It is encoded by the HEVC (High Efficiency Video Coding) method, which is a successor to H.264 / MPEG-4 AVC. As described above, in the following description, a case where the base layer and the enhancement layer are encoded by different encoding methods will be described as an example, but this does not limit the present embodiment.

（拡張レイヤの符号化データ＃１ａ）
図３は、符号化データ＃１の拡張レイヤのデータ構造（符号化データ＃１ａ）を示す図である。符号化データ＃１ａは、例示的に、シーケンス、およびシーケンスを構成する複数のピクチャを含む。 (Encoded data of enhancement layer # 1a)
FIG. 3 is a diagram showing a data structure (encoded data # 1a) of the enhancement layer of encoded data # 1. The encoded data # 1a illustratively includes a sequence and a plurality of pictures constituting the sequence.

符号化データ＃１ａにおけるデータの階層構造を図３に示す。図３の（ａ）〜（ｅ）は、それぞれ、シーケンスＳＥＱを規定するシーケンスレイヤ、ピクチャＰＩＣＴを規定するピクチャレイヤ、スライスＳを規定するスライスレイヤ、ツリーブロック（Tree block）ＴＢＬＫを規定するツリーブロックレイヤ、ツリーブロックＴＢＬＫに含まれる符号化単位（Coding Unit；ＣＵ）を規定するＣＵレイヤを示す図である。 A hierarchical structure of data in the encoded data # 1a is shown in FIG. 3A to 3E respectively show a sequence layer that defines a sequence SEQ, a picture layer that defines a picture PICT, a slice layer that defines a slice S, and a tree block that defines a tree block TBLK. It is a figure which shows the CU layer which prescribes | regulates the coding unit (Coding Unit; CU) contained in a layer and tree block TBLK.

（シーケンスレイヤ）
シーケンスレイヤでは、処理対象のシーケンスＳＥＱ（以下、対象シーケンスとも称する）を復号するために動画像復号装置１が参照するデータの集合が規定されている。シーケンスＳＥＱは、図３の（ａ）に示すように、シーケンスパラメータセットＳＰＳ（Sequence Parameter Set）、ピクチャパラメータセットＰＰＳ（Picture Parameter Set）、適応パラメータセットＡＰＳ（Adaptation Parameter Set）、ピクチャＰＩＣＴ１〜ＰＩＣＴNP（ＮＰはシーケンスＳＥＱに含まれるピクチャの総数）、及び、付加拡張情報ＳＥＩ（Supplemental Enhancement Information）を含んでいる。 (Sequence layer)
In the sequence layer, a set of data referred to by the video decoding device 1 for decoding a sequence SEQ to be processed (hereinafter also referred to as a target sequence) is defined. As shown in FIG. 3A, the sequence SEQ includes a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), an adaptive parameter set APS (Adaptation Parameter Set), and pictures PICT1 to PICTNP ( The NP includes the total number of pictures included in the sequence SEQ) and supplemental enhancement information (SEI).

シーケンスパラメータセットＳＰＳでは、対象シーケンスを復号するために動画像復号装置１が参照する符号化パラメータの集合が規定されている。 In the sequence parameter set SPS, a set of encoding parameters referred to by the video decoding device 1 in order to decode the target sequence is defined.

ピクチャパラメータセットＰＰＳでは、対象シーケンス内の各ピクチャを復号するために動画像復号装置１が参照する符号化パラメータの集合が規定されている。なお、ＰＰＳは複数存在してもよい。その場合、対象シーケンス内の各ピクチャから複数のＰＰＳの何れかを選択する。 In the picture parameter set PPS, a set of encoding parameters referred to by the video decoding device 1 for decoding each picture in the target sequence is defined. A plurality of PPS may exist. In that case, one of a plurality of PPSs is selected from each picture in the target sequence.

適応パラメータセットＡＰＳは、対象シーケンス内の各スライスを復号するために動画像復号装置１が参照する符号化パラメータの集合が規定されている。ＡＰＳは複数存在してもよい。その場合、対象シーケンス内の各スライスから複数のＡＰＳの何れかを選択する。 The adaptive parameter set APS defines a set of coding parameters that the video decoding device 1 refers to in order to decode each slice in the target sequence. There may be a plurality of APSs. In that case, one of a plurality of APSs is selected from each slice in the target sequence.

（ピクチャレイヤ）
ピクチャレイヤでは、処理対象のピクチャＰＩＣＴ（以下、対象ピクチャとも称する）を復号するために動画像復号装置１が参照するデータの集合が規定されている。ピクチャＰＩＣＴは、図３の（ｂ）に示すように、ピクチャヘッダＰＨ、及び、スライスＳ1〜ＳNSを含んでいる（ＮＳはピクチャＰＩＣＴに含まれるスライスの総数）。 (Picture layer)
In the picture layer, a set of data referred to by the video decoding device 1 for decoding a picture PICT to be processed (hereinafter also referred to as a target picture) is defined. As shown in FIG. 3B, the picture PICT includes a picture header PH and slices S1 to SNS (NS is the total number of slices included in the picture PICT).

なお、以下、スライスＳ1〜ＳNSのそれぞれを区別する必要が無い場合、符号の添え字を省略して記述することがある。また、以下に説明する符号化データ＃１に含まれるデータであって、添え字を付している他のデータについても同様である。 Hereinafter, when it is not necessary to distinguish each of the slices S1 to SNS, the subscripts may be omitted. The same applies to other data with subscripts included in encoded data # 1 described below.

ピクチャヘッダＰＨには、対象ピクチャの復号方法を決定するために動画像復号装置１が参照する符号化パラメータ群が含まれている。なお、符号化パラメータ群は、必ずしもピクチャヘッダＰＨ内に直接含んでいる必要はなく、例えばピクチャパラメータセットＰＰＳへの参照を含むことで、間接的に含めても良い。 The picture header PH includes a coding parameter group that is referred to by the video decoding device 1 in order to determine a decoding method of the target picture. Note that the encoding parameter group is not necessarily included directly in the picture header PH, and may be included indirectly, for example, by including a reference to the picture parameter set PPS.

（スライスレイヤ）
スライスレイヤでは、処理対象のスライスＳ（対象スライスとも称する）を復号するために動画像復号装置１が参照するデータの集合が規定されている。スライスＳは、図３の（ｃ）に示すように、スライスヘッダＳＨ、及び、ツリーブロックＴＢＬＫ1〜ＴＢＬＫNC（ＮＣはスライスＳに含まれるツリーブロックの総数）のシーケンスを含んでいる。 (Slice layer)
In the slice layer, a set of data referred to by the video decoding device 1 for decoding the slice S to be processed (also referred to as a target slice) is defined. As shown in FIG. 3C, the slice S includes a slice header SH and a sequence of tree blocks TBLK1 to TBLKNC (NC is the total number of tree blocks included in the slice S).

スライスヘッダＳＨには、対象スライスの復号方法を決定するために動画像復号装置１が参照する符号化パラメータ群が含まれる。スライスタイプを指定するスライスタイプ指定情報（slice_type）は、スライスヘッダＳＨに含まれる符号化パラメータの一例である。 The slice header SH includes an encoding parameter group that is referred to by the video decoding device 1 in order to determine a decoding method of the target slice. Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.

スライスタイプ指定情報により指定可能なスライスタイプとしては、（１）符号化の際にイントラ予測のみを用いるＩスライス、（２）符号化の際に単方向予測、又は、イントラ予測を用いるＰスライス、（３）符号化の際に単方向予測、双方向予測、又は、イントラ予測を用いるＢスライスなどが挙げられる。 As slice types that can be specified by the slice type specification information, (1) I slice that uses only intra prediction at the time of encoding, (2) P slice that uses unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.

なお、スライスヘッダＳＨには、上記シーケンスレイヤに含まれる、ピクチャパラメータセットＰＰＳへの参照（pic_parameter_set_id）、適応パラメータセットＡＰＳへの参照（aps_id）を含んでいても良い。 Note that the slice header SH may include a reference to the picture parameter set PPS (pic_parameter_set_id) and a reference to the adaptive parameter set APS (aps_id) included in the sequence layer.

また、スライスヘッダＳＨには、動画像復号装置１の備える適応フィルタによって参照されるＡＬＦパラメータＦＰが含まれている。ＡＬＦパラメータＦＰの詳細については後述する。 Further, the slice header SH includes an ALF parameter FP that is referred to by an adaptive filter included in the video decoding device 1. Details of the ALF parameter FP will be described later.

（ツリーブロックレイヤ）
ツリーブロックレイヤでは、処理対象のツリーブロックＴＢＬＫ（以下、対象ツリーブロックとも称する）を復号するために動画像復号装置１が参照するデータの集合が規定されている。なお、ツリーブロックのことを符号化ツリーブロック（CTB:Coding Tree block）、または、最大符号化単位（LCU:Largest Cording Unit）と呼ぶこともある。 (Tree block layer)
In the tree block layer, a set of data referred to by the video decoding device 1 for decoding a processing target tree block TBLK (hereinafter also referred to as a target tree block) is defined. Note that the tree block may be referred to as a coding tree block (CTB) or a maximum coding unit (LCU).

ツリーブロックＴＢＬＫは、ツリーブロックヘッダＴＢＬＫＨと、符号化単位情報ＣＵ１〜ＣＵＮＬ（ＮＬはツリーブロックＴＢＬＫに含まれる符号化単位情報の総数）とを含む。ここで、まず、ツリーブロックＴＢＬＫと、符号化単位情報ＣＵとの関係について説明すると次のとおりである。 The tree block TBLK includes a tree block header TBLKH and coding unit information CU1 to CUNL (NL is the total number of coding unit information included in the tree block TBLK). Here, first, a relationship between the tree block TBLK and the coding unit information CU will be described as follows.

ツリーブロックＴＢＬＫは、イントラ予測またはインター予測、および、変換の各処理ためのブロックサイズを特定するためのパーティションに分割される。 The tree block TBLK is divided into partitions for specifying a block size for each process of intra prediction or inter prediction and conversion.

ツリーブロックＴＢＬＫの上記パーティションは、再帰的な４分木分割により分割されている。この再帰的な４分木分割により得られる木構造のことを以下、符号化ツリー（coding tree）と称する。 The partition of the tree block TBLK is divided by recursive quadtree partitioning. The tree structure obtained by this recursive quadtree partitioning is hereinafter referred to as a coding tree.

以下、符号化ツリーの末端のノードであるリーフ（leaf）に対応するパーティションを、符号化ノード（coding node）として参照する。また、符号化ノードは、符号化処理の基本的な単位となるため、以下、符号化ノードのことを、符号化単位（ＣＵ）とも称する。 Hereinafter, a partition corresponding to a leaf that is a node at the end of the coding tree is referred to as a coding node. In addition, since the encoding node is a basic unit of the encoding process, hereinafter, the encoding node is also referred to as an encoding unit (CU).

つまり、符号化単位情報（以下、ＣＵ情報と称する）ＣＵ１〜ＣＵＮＬは、ツリーブロックＴＢＬＫを再帰的に４分木分割して得られる各符号化ノード（符号化単位）に対応する情報である。 That is, coding unit information (hereinafter referred to as CU information) CU1 to CUNL is information corresponding to each coding node (coding unit) obtained by recursively dividing the tree block TBLK into quadtrees.

また、符号化ツリーのルート（root）は、ツリーブロックＴＢＬＫに対応付けられる。換言すれば、ツリーブロックＴＢＬＫは、複数の符号化ノードを再帰的に含む４分木分割の木構造の最上位ノードに対応付けられる。 Also, the root of the coding tree is associated with the tree block TBLK. In other words, the tree block TBLK is associated with the highest node of the tree structure of the quadtree partition that recursively includes a plurality of encoding nodes.

なお、各符号化ノードのサイズは、当該符号化ノードが直接に属する符号化ノード（すなわち、当該符号化ノードの１階層上位のノードのパーティション）のサイズの縦横とも半分である。 Note that the size of each coding node is half the size of the coding node to which the coding node directly belongs (that is, the partition of the node one layer higher than the coding node).

また、ツリーブロックＴＢＬＫのサイズ、および、各符号化ノードのとり得るサイズは、符号化データ＃１のシーケンスパラメータセットＳＰＳに含まれる、最小符号化ノードのサイズ指定情報、および最大符号化ノードと最小符号化ノードの階層深度の差分に依存する。例えば、最小符号化ノードのサイズが８×８画素であって、最大符号化ノードと最小符号化ノードの階層深度の差分が３である場合、ツリーブロックＴＢＬＫのサイズが６４×６４画素であって、符号化ノードのサイズは、４種類のサイズ、すなわち、６４×６４画素、３２×３２画素、１６×１６画素、および、８×８画素の何れかをとり得る。 Also, the size of the tree block TBLK and the size that each coding node can take are the size designation information of the minimum coding node and the maximum coding node and the minimum included in the sequence parameter set SPS of the coded data # 1. It depends on the difference in the hierarchical depth of the coding node. For example, when the size of the minimum coding node is 8 × 8 pixels and the difference in the layer depth between the maximum coding node and the minimum coding node is 3, the size of the tree block TBLK is 64 × 64 pixels. The size of the encoding node can take any of four sizes, namely, 64 × 64 pixels, 32 × 32 pixels, 16 × 16 pixels, and 8 × 8 pixels.

（ツリーブロックヘッダ）
ツリーブロックヘッダＴＢＬＫＨには、対象ツリーブロックの復号方法を決定するために動画像復号装置１が参照する符号化パラメータが含まれる。具体的には、図３の（ｄ）に示すように、対象ツリーブロックの各ＣＵへの分割パターンを指定するツリーブロック分割情報ＳＰ＿ＴＢＬＫ、および、量子化ステップの大きさを指定する量子化パラメータ差分Δｑｐ（qp_delta）が含まれる。 (Tree block header)
The tree block header TBLKH includes an encoding parameter referred to by the video decoding device 1 in order to determine a decoding method of the target tree block. Specifically, as shown in FIG. 3D, tree block division information SP_TBLK that specifies a division pattern of the target tree block into each CU, and a quantization parameter difference that specifies the size of the quantization step Δqp (qp_delta) is included.

ツリーブロック分割情報ＳＰ＿ＴＢＬＫは、ツリーブロックを分割するための符号化ツリーを表す情報であり、具体的には、対象ツリーブロックに含まれる各ＣＵの形状、サイズ、および、対象ツリーブロック内での位置を指定する情報である。 The tree block division information SP_TBLK is information representing a coding tree for dividing the tree block. Specifically, the shape and size of each CU included in the target tree block, and the position in the target tree block Is information to specify.

なお、ツリーブロック分割情報ＳＰ＿ＴＢＬＫは、ＣＵの形状やサイズを明示的に含んでいなくてもよい。例えばツリーブロック分割情報ＳＰ＿ＴＢＬＫは、対象ツリーブロック全体またはツリーブロックの部分領域を四分割するか否かを示すフラグの集合であってもよい。その場合、ツリーブロックの形状やサイズを併用することで各ＣＵの形状やサイズを特定できる。 Note that the tree block division information SP_TBLK may not explicitly include the shape or size of the CU. For example, the tree block division information SP_TBLK may be a set of flags indicating whether the entire target tree block or a partial region of the tree block is to be divided into four. In that case, the shape and size of each CU can be specified by using the shape and size of the tree block together.

また、量子化パラメータ差分Δｑｐは、対象ツリーブロックにおける量子化パラメータｑｐと、当該対象ツリーブロックの直前に符号化されたツリーブロックにおける量子化パラメータｑｐ’との差分ｑｐ−ｑｐ’である。 The quantization parameter difference Δqp is a difference qp−qp ′ between the quantization parameter qp in the target tree block and the quantization parameter qp ′ in the tree block encoded immediately before the target tree block.

（ＣＵレイヤ）
ＣＵレイヤでは、処理対象のＣＵ（以下、対象ＣＵとも称する）を復号するために動画像復号装置１が参照するデータの集合が規定されている。 (CU layer)
In the CU layer, a set of data referred to by the video decoding device 1 for decoding a CU to be processed (hereinafter also referred to as a target CU) is defined.

ここで、ＣＵ情報ＣＵに含まれるデータの具体的な内容の説明をする前に、ＣＵに含まれるデータの木構造について説明する。符号化ノードは、予測ツリー（prediction tree；ＰＴ）および変換ツリー（transform tree；ＴＴ）のルートのノードとなる。予測ツリーおよび変換ツリーについて説明すると次のとおりである。 Here, before describing specific contents of data included in the CU information CU, a tree structure of data included in the CU will be described. The encoding node is a node at the root of a prediction tree (PT) and a transform tree (TT). The prediction tree and the conversion tree are described as follows.

予測ツリーにおいては、符号化ノードが１または複数の予測ブロックに分割され、各予測ブロックの位置とサイズとが規定される。別の表現でいえば、予測ブロックは、符号化ノードを構成する１または複数の重複しない領域である。また、予測ツリーは、上述の分割により得られた１または複数の予測ブロックを含む。 In the prediction tree, the encoding node is divided into one or a plurality of prediction blocks, and the position and size of each prediction block are defined. In other words, the prediction block is one or a plurality of non-overlapping areas constituting the encoding node. The prediction tree includes one or a plurality of prediction blocks obtained by the above division.

予測処理は、この予測ブロックごとに行われる。以下、予測の単位である予測ブロックのことを、予測単位（prediction unit；ＰＵ）とも称する。 Prediction processing is performed for each prediction block. Hereinafter, a prediction block that is a unit of prediction is also referred to as a prediction unit (PU).

予測ツリーにおける分割の種類は、大まかにいえば、イントラ予測の場合と、インター予測の場合との２つがある。 Broadly speaking, there are two types of division in the prediction tree: intra prediction and inter prediction.

イントラ予測の場合、分割方法は、２Ｎ×２Ｎ（符号化ノードと同一サイズ）と、Ｎ×Ｎとがある。 In the case of intra prediction, there are 2N × 2N (the same size as the encoding node) and N × N division methods.

また、インター予測の場合、分割方法は、２Ｎ×２Ｎ（符号化ノードと同一サイズ）、２Ｎ×Ｎ、２Ｎ×ｎＵ、２Ｎ×ｎＤ、Ｎ×２Ｎ、ｎＬ×２Ｎ、ｎＲ×２Ｎ、および、Ｎ×Ｎなどがある。なお、２Ｎ×ｎＵは、２Ｎ×２Ｎの符号化ノードを上から順に２Ｎ×0.5Ｎと２Ｎ×1.5Ｎの２領域に分割することを示す。２Ｎ×ｎＤは、２Ｎ×２Ｎの符号化ノードを上から順に２Ｎ×1.5Ｎと２Ｎ×0.5Ｎの２領域に分割することを示す。ｎＬ×２Ｎは、２Ｎ×２Ｎの符号化ノードを左から順に0.5Ｎ×２Ｎと1.5Ｎ×２Ｎの２領域に分割することを示す。ｎＲ×２Ｎは、２Ｎ×２Ｎの符号化ノードを左から順に1.5Ｎ×２Ｎと0.5Ｎ×1.5Ｎの２領域に分割することを示す。 In the case of inter prediction, the division method is 2N × 2N (the same size as the encoding node), 2N × N, 2N × nU, 2N × nD, N × 2N, nL × 2N, nR × 2N, and N XN etc. Note that 2N × nU indicates that a 2N × 2N encoding node is divided into two regions of 2N × 0.5N and 2N × 1.5N in order from the top. 2N × nD indicates that a 2N × 2N encoding node is divided into two regions of 2N × 1.5N and 2N × 0.5N in order from the top. nL × 2N indicates that a 2N × 2N coding node is divided into two regions of 0.5N × 2N and 1.5N × 2N in order from the left. nR × 2N indicates that a 2N × 2N coding node is divided into two regions of 1.5N × 2N and 0.5N × 1.5N in order from the left.

また、変換ツリーにおいては、符号化ノードが１または複数の変換ブロックに分割され、各変換ブロックの位置とサイズとが規定される。別の表現でいえば、変換ブロックは、符号化ノードを構成する１または複数の重複しない領域のことである。また、変換ツリーは、上述の分割より得られた１または複数の変換ブロックを含む。 In the transform tree, the encoding node is divided into one or a plurality of transform blocks, and the position and size of each transform block are defined. In other words, the transform block is one or a plurality of non-overlapping areas constituting the encoding node. The conversion tree includes one or a plurality of conversion blocks obtained by the above division.

変換ツリーにおける分割には、符号化ノードと同一のサイズの領域を変換ブロックとして割り付けるものと、上述したツリーブロックの分割と同様、再帰的な４分木分割によるものがある。 There are two types of division in the transformation tree: one in which an area having the same size as that of a coding node is assigned as a transformation block, and the other in division by recursive quadtree division as in the above-described division of a tree block.

変換処理は、この変換ブロックごとに行われる。以下、変換の単位である変換ブロックのことを、変換単位（transform unit；ＴＵ）とも称する。 The conversion process is performed for each conversion block. Hereinafter, a transform block that is a unit of transform is also referred to as a transform unit (TU).

（ＣＵ情報のデータ構造）
続いて、図３の（ｅ）を参照しながらＣＵ情報ＣＵに含まれるデータの具体的な内容について説明する。図３の（ｅ）に示すように、ＣＵ情報ＣＵは、具体的には、スキップフラグＳＫＩＰ、ＰＴ情報ＰＴＩ、および、ＴＴ情報ＴＴＩを含む。 (Data structure of CU information)
Next, specific contents of data included in the CU information CU will be described with reference to FIG. As shown in FIG. 3E, the CU information CU specifically includes a skip flag SKIP, PT information PTI, and TT information TTI.

スキップフラグＳＫＩＰは、対象のＰＵについて、スキップモードが適用されているか否かを示すフラグであり、スキップフラグＳＫＩＰの値が１の場合、すなわち、対象ＣＵにスキップモードが適用されている場合、そのＣＵ情報ＣＵにおけるＰＴ情報ＰＴＩの一部、および、ＴＴ情報ＴＴＩは省略される。なお、スキップフラグＳＫＩＰは、Ｉスライスでは省略される。 The skip flag SKIP is a flag indicating whether or not the skip mode is applied to the target PU. When the value of the skip flag SKIP is 1, that is, when the skip mode is applied to the target CU, A part of the PT information PTI and the TT information TTI in the CU information CU are omitted. Note that the skip flag SKIP is omitted for the I slice.

ＰＴ情報ＰＴＩは、ＣＵに含まれるＰＴに関する情報である。言い換えれば、ＰＴ情報ＰＴＩは、ＰＴに含まれる１または複数のＰＵそれぞれに関する情報の集合であり、動画像復号装置１により予測画像を生成する際に参照される。ＰＴ情報ＰＴＩは、図３の（ｅ）に示すように、予測タイプ情報ＰＴｙｐｅ、および、予測情報ＰＩｎｆｏを含んでいる。 The PT information PTI is information related to the PT included in the CU. In other words, the PT information PTI is a set of information related to each of one or more PUs included in the PT, and is referred to when the moving image decoding apparatus 1 generates a predicted image. As shown in FIG. 3E, the PT information PTI includes prediction type information PType and prediction information PInfo.

予測タイプ情報ＰＴｙｐｅは、対象ＰＵについての予測画像生成方法として、イントラ予測を用いるのか、または、インター予測を用いるのかを指定する情報である。 The prediction type information PType is information that specifies whether intra prediction or inter prediction is used as a predicted image generation method for the target PU.

予測情報ＰＩｎｆｏは、予測タイプ情報ＰＴｙｐｅが何れの予測方法を指定するのかに応じて、イントラ予測情報、または、インター予測情報より構成される。以下では、イントラ予測が適用されるＰＵをイントラＰＵとも呼称し、インター予測が適用されるＰＵをインターＰＵとも呼称する。 The prediction information PInfo is configured from intra prediction information or inter prediction information depending on which prediction method is specified by the prediction type information PType. Hereinafter, a PU to which intra prediction is applied is also referred to as an intra PU, and a PU to which inter prediction is applied is also referred to as an inter PU.

また、予測情報ＰＩｎｆｏは、対象ＰＵの形状、サイズ、および、位置を指定する情報が含まれる。上述のとおり予測画像の生成は、ＰＵを単位として行われる。予測情報ＰＩｎｆｏの詳細については後述する。 Further, the prediction information PInfo includes information specifying the shape, size, and position of the target PU. As described above, the generation of the predicted image is performed in units of PU. Details of the prediction information PInfo will be described later.

ＴＴ情報ＴＴＩは、ＣＵに含まれるＴＴに関する情報である。言い換えれば、ＴＴ情報ＴＴＩは、ＴＴに含まれる１または複数のＴＵそれぞれに関する情報の集合であり、動画像復号装置１により残差データを復号する際に参照される。なお、以下、ＴＵのことをブロックと称することもある。 The TT information TTI is information regarding the TT included in the CU. In other words, the TT information TTI is a set of information regarding each of one or a plurality of TUs included in the TT, and is referred to when the moving image decoding apparatus 1 decodes residual data. Hereinafter, a TU may be referred to as a block.

ＴＴ情報ＴＴＩは、図３の（ｅ）に示すように、対象ＣＵの各変換ブロックへの分割パターンを指定するＴＴ分割情報ＳＰ＿ＴＴ、および、量子化予測残差ＱＤ1〜ＱＤNT（ＮＴは、対象ＣＵに含まれるブロックの総数）を含んでいる。 As shown in FIG. 3 (e), the TT information TTI includes TT division information SP_TT that specifies a division pattern of the target CU into each transform block, and quantized prediction residuals QD1 to QDNT (NT is the target CU). The total number of blocks contained in).

ＴＴ分割情報ＳＰ＿ＴＴは、具体的には、対象ＣＵに含まれる各ＴＵの形状、サイズ、および、対象ＣＵ内での位置を決定するための情報である。例えば、ＴＴ分割情報ＳＰ＿ＴＴは、対象となるノードの分割を行うのか否かを示す情報（split_transform_unit_flag）の集合により構成できる。 Specifically, the TT division information SP_TT is information for determining the shape and size of each TU included in the target CU and the position within the target CU. For example, the TT division information SP_TT can be configured by a set of information (split_transform_unit_flag) indicating whether or not the target node is divided.

また、例えば、ＣＵのサイズが、６４×６４の場合、分割により得られる各ＴＵは、３２×３２画素から４×４画素までのサイズをとり得る。 For example, when the size of the CU is 64 × 64, each TU obtained by the division can take a size from 32 × 32 pixels to 4 × 4 pixels.

各量子化予測残差ＱＤは、動画像符号化装置２が以下の処理１〜３を、処理対象のブロックである対象ブロックに施すことによって生成した符号化データである。 Each quantized prediction residual QD is encoded data generated by the video encoding device 2 performing the following processes 1 to 3 on a target block that is a processing target block.

処理１：符号化対象画像から予測画像を減算した予測残差をＤＣＴ変換（Discrete Cosine Transform）する；
処理２：処理１にて得られた変換係数を量子化する；
処理３：処理２にて量子化された変換係数を可変長符号化する；
なお、上述した量子化パラメータｑｐは、動画像符号化装置２が変換係数を量子化する際に用いた量子化ステップＱＰの大きさを表す（ＱＰ＝２qp/6）。 Process 1: DCT transform (Discrete Cosine Transform) of the prediction residual obtained by subtracting the prediction image from the encoding target image;
Process 2: Quantize the transform coefficient obtained in Process 1;
Process 3: Variable length coding is performed on the transform coefficient quantized in Process 2;
The quantization parameter qp described above represents the magnitude of the quantization step QP used when the moving image encoding apparatus 2 quantizes the transform coefficient (QP = 2qp / 6).

（予測情報ＰＩｎｆｏ）
上述のとおり、予測情報ＰＩｎｆｏには、インター予測情報およびイントラ予測情報の２種類がある。 (Prediction information PInfo)
As described above, there are two types of prediction information PInfo: inter prediction information and intra prediction information.

インター予測情報には、動画像復号装置１が、インター予測によってインター予測画像を生成する際に参照される符号化パラメータが含まれる。より具体的には、インター予測情報には、対象ＣＵの各インターＰＵへの分割パターンを指定するインターＰＵ分割情報、および、各インターＰＵについてのインター予測パラメータが含まれる。 The inter prediction information includes a coding parameter that is referred to when the video decoding device 1 generates an inter prediction image by inter prediction. More specifically, the inter prediction information includes inter PU division information that specifies a division pattern of the target CU into each inter PU, and inter prediction parameters for each inter PU.

インター予測パラメータには、参照画像インデックスと、推定動きベクトルインデックスと、動きベクトル残差とが含まれる。 The inter prediction parameters include a reference image index, an estimated motion vector index, and a motion vector residual.

一方、イントラ予測情報には、動画像復号装置１が、イントラ予測によってイントラ予測画像を生成する際に参照される符号化パラメータが含まれる。より具体的には、イントラ予測情報には、対象ＣＵの各イントラＰＵへの分割パターンを指定するイントラＰＵ分割情報、および、各イントラＰＵについてのイントラ予測パラメータが含まれる。イントラ予測パラメータは、各イントラＰＵについてのイントラ予測方法（予測モード）を指定するためのパラメータである。 On the other hand, the intra prediction information includes an encoding parameter that is referred to when the video decoding device 1 generates an intra predicted image by intra prediction. More specifically, the intra prediction information includes intra PU division information that specifies a division pattern of the target CU into each intra PU, and intra prediction parameters for each intra PU. The intra prediction parameter is a parameter for designating an intra prediction method (prediction mode) for each intra PU.

（基本レイヤの符号化データ＃１ｂ）
図４は、符号化データ＃１の基本レイヤのデータ構造（符号化データ＃１ｂ）を示す図である。符号化データ＃１ｂは、例示的に、シーケンス、およびシーケンスを構成する複数のピクチャグループから構成されるＧＯＰ（ＧｒｏｕｐｏｆＰｉｃｔｕｒｅｓ）を含む。 (Base layer encoded data # 1b)
FIG. 4 is a diagram showing the data structure of the base layer of encoded data # 1 (encoded data # 1b). The encoded data # 1b includes, for example, a sequence and a GOP (Group of Pictures) composed of a plurality of picture groups constituting the sequence.

ピクチャレイヤ以下の階層についての構造を図４に示す。図４（ａ）〜（ｄ）は、それぞれ、ピクチャレイヤＰ、スライスレイヤＳ、マクロブロックレイヤＭＢ、及び、ブロックレイヤＢの構造を示す図である。 The structure of the hierarchy below the picture layer is shown in FIG. 4A to 4D are diagrams showing the structures of the picture layer P, the slice layer S, the macroblock layer MB, and the block layer B, respectively.

ピクチャレイヤＰは、対応ピクチャを復号するために動画像復号装置１が参照するデータの集合である。ピクチャレイヤＰは、図４（ａ）に示すように、ピクチャヘッダＰＨ、及び、スライスレイヤＳ1〜ＳNsを含んでいる（ＮsはピクチャレイヤＰに含まれるスライスレイヤの総数）。 The picture layer P is a set of data referred to by the video decoding device 1 in order to decode the corresponding picture. As shown in FIG. 4A, the picture layer P includes a picture header PH and slice layers S1 to SNs (Ns is the total number of slice layers included in the picture layer P).

ピクチャヘッダＰＨには、対応ピクチャの復号方法を決定するために動画像復号装置１が参照する符号化パラメータ群が含まれている。例えば、動画像符号化装置２が符号化の際に用いた画像の表示順を示す番号（テンポラル・リファレンス）やＩピクチャ、Ｐピクチャ、Ｂピクチャの違いを示す符号（ピクチャ・タイプ）は、ピクチャヘッダＰＨに含まれる符号化パラメータの一例である。 The picture header PH includes a coding parameter group that is referred to by the video decoding device 1 in order to determine a decoding method of the corresponding picture. For example, a number (temporal reference) indicating the display order of images used in encoding by the moving image encoding device 2 and a code (picture type) indicating a difference between I picture, P picture, and B picture are pictures. It is an example of the encoding parameter contained in header PH.

ピクチャレイヤＰに含まれる各スライスレイヤＳは、対応スライスを復号するために動画像復号装置１が参照するデータの集合である。スライスレイヤＳは、図４（ｂ）に示すように、マクロブロックレイヤＭＢ1〜ＭＢNm（ＮmはスライスＳに含まれるマクロブロックの総数）を含んでいる。 Each slice layer S included in the picture layer P is a set of data referred to by the video decoding device 1 in order to decode the corresponding slice. As shown in FIG. 4B, the slice layer S includes macroblock layers MB1 to MBNm (Nm is the total number of macroblocks included in the slice S).

スライスレイヤＳに含まれる各マクロブロックレイヤＭＢは、対応マクロブロックを復号するために動画像復号装置１が参照するデータの集合である。マクロブロックレイヤＭＢは、図４（ｃ）に示すように、１６画素×１６ラインの正方形の画素ブロックで、輝度ブロックＹ１〜Ｙ４と対応する２つの８画素×８ラインの色差ブロックＣｂ、Ｃｒによって構成されており、さらにＤＣＴ処理単位である８画素×８画素ラインのブロックに細分化される。これは、符号化画像の色差フォーマットが４：２：０の場合で、符号化画像の色差フォーマットが４：２：２となる時、２つの８画素×１６ライン色差ブロックに対応する。４：４：４：の色差フォーマットでは、２つの１６画素×１６ライン色差ブロックに対応する。マクロブロックレイヤＭＢに含まれる各ブロックレイヤＢは、通常は、ＤＣＴ変換され、量子化されたデータから構成される。 Each macroblock layer MB included in the slice layer S is a set of data that the video decoding device 1 refers to in order to decode the corresponding macroblock. As shown in FIG. 4C, the macroblock layer MB is a square pixel block of 16 pixels × 16 lines, and is composed of two 8-pixel × 8-line color difference blocks Cb and Cr corresponding to the luminance blocks Y1 to Y4. The block is further subdivided into 8 pixel × 8 pixel line blocks which are DCT processing units. This corresponds to two 8-pixel × 16-line color difference blocks when the color difference format of the encoded image is 4: 2: 0 and the color difference format of the encoded image is 4: 2: 2. The 4: 4: 4: color difference format corresponds to two 16 pixel × 16 line color difference blocks. Each block layer B included in the macroblock layer MB is usually composed of data that has been DCT transformed and quantized.

（動画像復号装置１）
次に、本実施形態に係る動画像復号装置１について、図１、図６を参照して説明する。動画像復号装置１は、その一部に、ＭＰＥＧ−２に採用されている方式、Ｈ．２６４／ＭＰＥＧ−４．ＡＶＣに採用されている方式、ＶＣＥＧ（Video Coding Expert Group）における共同開発用コーデックであるＫＴＡソフトウェアに採用されている方式、その後継コーデックであるＴＭｕＣ（Test Model under Consideration）ソフトウェアに採用されている方式、及び、ＨＭ（HEVC TestModel）ソフトウェアに採用されている技術を含んでいる。 (Moving picture decoding apparatus 1)
Next, the video decoding device 1 according to the present embodiment will be described with reference to FIGS. 1 and 6. The moving picture decoding apparatus 1 includes, as part thereof, a method adopted in MPEG-2, H.264, and H.264. H.264 / MPEG-4. A method adopted in AVC, a method adopted in KTA software, which is a codec for joint development in VCEG (Video Coding Expert Group), and a method adopted in TMuC (Test Model under Consideration) software, which is the successor codec And the technology employed in HM (HEVC TestModel) software.

図１は、動画像復号装置１の構成を示すブロック図である。図１に示すように、動画像復号装置１は、拡張レイヤ復号部（ＨＥＶＣ）１０及び基本レイヤ復号部（ＭＰＥＧ−２）１１を含む構成である。 FIG. 1 is a block diagram showing a configuration of the moving picture decoding apparatus 1. As shown in FIG. 1, the moving picture decoding apparatus 1 includes an enhancement layer decoding unit (HEVC) 10 and a base layer decoding unit (MPEG-2) 11.

拡張レイヤ復号部１０は、可変長復号部１３、逆直交変換・逆量子化部１４、ループ内フィルタ１５、イントラ予測部１６、拡大／ＩＰ変換部１７、フレームメモリ１８、インター予測部１９、動き情報変換処理部２０、動き情報蓄積部２１、動き情報正規化部２２、選択部２３、加算器２４を含む。 The enhancement layer decoding unit 10 includes a variable length decoding unit 13, an inverse orthogonal transform / inverse quantization unit 14, an in-loop filter 15, an intra prediction unit 16, an expansion / IP conversion unit 17, a frame memory 18, an inter prediction unit 19, a motion An information conversion processing unit 20, a motion information storage unit 21, a motion information normalization unit 22, a selection unit 23, and an adder 24 are included.

拡張レイヤ復号部１０は、基本レイヤ復号部１１で復号された復号画像、動き情報をもとに、復号処理を行うものである。 The enhancement layer decoding unit 10 performs a decoding process based on the decoded image and motion information decoded by the base layer decoding unit 11.

また、基本レイヤ復号部１１は、可変長復号部３１、逆直交変換・逆量子化部３２、フレームメモリ３３、動き補償部３４、及び加算器３５を含む。 The base layer decoding unit 11 includes a variable length decoding unit 31, an inverse orthogonal transform / inverse quantization unit 32, a frame memory 33, a motion compensation unit 34, and an adder 35.

基本レイヤ復号部１１は、符号化データ＃１ｂから復号した動き情報、イントラ予測情報をもとに復号画像を生成するものである。 The base layer decoding unit 11 generates a decoded image based on motion information and intra prediction information decoded from the encoded data # 1b.

動画像復号装置１は、符号化データ＃１の基本レイヤを基本レイヤ復号部１１で、拡張レイヤを拡張レイヤ復号部１０で復号するための装置であり、拡張レイヤ復号部１０は、基本レイヤ復号部１１で復号された符号化情報（後述する）を用いて復号を行うものである。 The video decoding device 1 is a device for decoding the base layer of the encoded data # 1 by the base layer decoding unit 11 and the enhancement layer by the enhancement layer decoding unit 10, and the enhancement layer decoding unit 10 Decoding is performed using the encoded information (described later) decoded by the unit 11.

可変長復号部１３は、各ブロックに関する量子化予測残差ＱＤ、及び、そのブロックを含むツリーブロックに関する量子化パラメータ差分Δｑｐを符号化データ＃１ａから復号し、これらを逆直交変換・逆量子化部１４に供給する。また、可変長復号部１３は、各パーティションに関する予測パラメータＰＰを、符号化データ＃１ａから復号する。すなわち、インター予測パーティションに関しては、参照画像インデックスＲＩ、推定動きベクトルインデックスＰＭＶＩ、及び、動きベクトル残差ＭＶＤ等の動き情報、モード情報を符号化データ＃１から復号し、これらをインター予測部１９に供給する。一方、イントラ予測パーティションに関しては、（１）パーティションのサイズを指定するサイズ指定情報、および、（２）予測インデックスを指定する予測インデックス指定情報を含むイントラ予測情報を符号化データ＃１から復号し、これをイントラ予測部１６に供給する。 The variable length decoding unit 13 decodes the quantized prediction residual QD for each block and the quantization parameter difference Δqp for the tree block including the block from the encoded data # 1a, and performs inverse orthogonal transform / inverse quantization To the unit 14. Further, the variable length decoding unit 13 decodes the prediction parameter PP related to each partition from the encoded data # 1a. That is, with respect to the inter prediction partition, the motion information and mode information such as the reference image index RI, the estimated motion vector index PMVI, and the motion vector residual MVD are decoded from the encoded data # 1, and these are transmitted to the inter prediction unit 19. Supply. On the other hand, with respect to the intra prediction partition, (1) decoding the intra prediction information including the size specifying information for specifying the size of the partition, and (2) the prediction index specifying information for specifying the prediction index, from the encoded data # 1, This is supplied to the intra prediction unit 16.

逆直交変換・逆量子化部１４は、（１）量子化予測残差ＱＤを逆量子化し、（２）逆量子化によって得られたＤＣＴ係数を逆ＤＣＴ（Discrete Cosine Transform）変換し、（３）逆ＤＣＴ変換によって得られた予測残差Ｄを加算器２４に供給する。なお、量子化予測残差ＱＤを逆量子化する際に、逆直交変換・逆量子化部１４は、可変長復号部１３から供給された量子化パラメータ差分Δｑｐから量子化ステップＱＰを導出する。量子化パラメータｑｐは、直前に逆量子化／逆ＤＣＴ変換したツリーブロックに関する量子化パラメータｑｐ’に量子化パラメータ差分Δｑｐを加算することによって導出でき、量子化ステップＱＰは、量子化ステップｑｐからＱＰ＝２^ｐｑ／６によって導出できる。また、逆直交変換・逆量子化部１４による予測残差Ｄの生成は、ブロック（変換単位）を単位として行われる。 The inverse orthogonal transform / inverse quantization unit 14 (1) inversely quantizes the quantized prediction residual QD, (2) performs inverse DCT (Discrete Cosine Transform) conversion on the DCT coefficient obtained by the inverse quantization, and (3 ) The prediction residual D obtained by the inverse DCT transform is supplied to the adder 24. When the quantization prediction residual QD is inversely quantized, the inverse orthogonal transform / inverse quantization unit 14 derives a quantization step QP from the quantization parameter difference Δqp supplied from the variable length decoding unit 13. The quantization parameter qp can be derived by adding the quantization parameter difference Δqp to the quantization parameter qp ′ relating to the tree block that has just undergone inverse quantization / inverse DCT transformation, and the quantization step QP is performed from the quantization step qp to QP. = 2 ^{pq / 6} . The generation of the prediction residual D by the inverse orthogonal transform / inverse quantization unit 14 is performed in units of blocks (transform units).

ループ内フィルタ１５は、加算器２４から供給された復号画像に対し、デブロッキング処理や、適応フィルタパラメータによるフィルタ処理を施すものである。 The in-loop filter 15 performs deblocking processing and filtering processing using adaptive filter parameters on the decoded image supplied from the adder 24.

イントラ予測部１６は、各イントラ予測パーティションに関する予測画像を生成する。具体的には、符号化データ＃１から復号されたイントラ予測情報、拡大／ＩＰ変換部から供給された基本レイヤの復号画像から予測画像を生成する。 The intra prediction unit 16 generates a prediction image related to each intra prediction partition. Specifically, a prediction image is generated from the intra prediction information decoded from the encoded data # 1 and the decoded image of the base layer supplied from the expansion / IP conversion unit.

拡大／ＩＰ変換部１７は、基本レイヤ復号部１１で復号された基本レイヤの復号画像を拡大、およびＩＰ変換した画像をイントラ予測部１６に供給する。 The expansion / IP conversion unit 17 expands the base layer decoded image decoded by the base layer decoding unit 11 and supplies the image obtained by IP conversion to the intra prediction unit 16.

フレームメモリ１８は、ループ内フィルタ１５によるフィルタ済み復号画像を格納している。 The frame memory 18 stores a decoded image that has been filtered by the in-loop filter 15.

インター予測部１９は、各インター予測パーティションに関する動き補償画像を生成する。具体的には、可変長復号部１３から供給された動き情報、モード情報または、動き情報変換処理部から供給された、対応する基本レイヤにおける動き情報を用いて、動き補償画像を生成する。インター予測部１９の詳細な構成については後述する。 The inter prediction part 19 produces | generates the motion compensation image regarding each inter prediction partition. Specifically, a motion compensated image is generated using motion information, mode information supplied from the variable length decoding unit 13, or motion information in the corresponding base layer supplied from the motion information conversion processing unit. The detailed configuration of the inter prediction unit 19 will be described later.

動き情報正規化部２２は、マクロブロック単位で、基本レイヤの対応するマクロブロックにおける動きベクトルＭＶ_ＢＬを１フレーム（フィールド）離れた場合の値にそろえた中間フォーマットＭＶ_ＩＴＭ（中間的動きベクトル）に変換する。そして、変換した中間フォーマットＭＶ_ＩＴＭを動き情報蓄積部に供給する。 The motion information normalization unit 22 converts the motion vector MV_BL in the corresponding macroblock of the base layer into an intermediate format MV_ITM (intermediate motion vector) that is aligned with the value when separated by one frame (field) in units of macroblocks. . Then, the converted intermediate format MV_ITM is supplied to the motion information storage unit.

動き情報蓄積部２１は、動き情報正規化部２２で変換された中間フォーマットＭＶ_ＩＴＭを蓄積し、インター予測部１９からの指示に従い、中間フォーマットＭＶ_ＩＴＭを動き情報変換処理部２０に供給する。動き情報蓄積部２１に蓄積される情報例について、図１１を参照して説明する。図１１に示すように、動き情報蓄積部２１には、ピクチャ単位でフィールドレート（ｖ）が、マクロブロック単位で動きベクトルの方向（前方予測動きベクトル、後方予測動きベクトル）と、水平成分であるか、垂直成分であるかを示す情報が、マクロブロックと対応付けられて動き情報として蓄積されている。 The motion information storage unit 21 stores the intermediate format MV_ITM converted by the motion information normalization unit 22 and supplies the intermediate format MV_ITM to the motion information conversion processing unit 20 in accordance with an instruction from the inter prediction unit 19. An example of information stored in the motion information storage unit 21 will be described with reference to FIG. As shown in FIG. 11, in the motion information storage unit 21, the field rate (v) in units of pictures is the direction of motion vectors (forward prediction motion vector, backward prediction motion vector) and horizontal components in units of macroblocks. Or information indicating whether the component is a vertical component is stored as motion information in association with the macroblock.

動き情報変換処理部２０は、拡張レイヤでのピクチャ参照関係に基づき、動き情報蓄積部２１に蓄積されている中間動きベクトルＭＶ_ＩＴＭをスケーリングして、インター予測部１９が用いる動きベクトルＭＶ_ＥＬを導出する。 The motion information conversion processing unit 20 scales the intermediate motion vector MV_ITM stored in the motion information storage unit 21 based on the picture reference relationship in the enhancement layer, and derives a motion vector MV_EL used by the inter prediction unit 19.

なお、動き情報正規化部２２、動き情報蓄積部２１、および動き情報変換処理部２０における処理の詳細な内容については後述する。 The detailed contents of processing in the motion information normalization unit 22, the motion information storage unit 21, and the motion information conversion processing unit 20 will be described later.

選択部２３は、イントラ予測部１６が生成した予測画像およびインター予測部１９が生成した予測画像のうち、何れを用いるかを選択し、選択した予測画像を加算器２４に供給する。 The selection unit 23 selects which one of the prediction image generated by the intra prediction unit 16 and the prediction image generated by the inter prediction unit 19 to use, and supplies the selected prediction image to the adder 24.

加算器２４は、選択部２３から供給された予測画像と、逆直交変換・逆量子化部１４から供給された予測残差Ｄとを加算することによって復号画像を生成する。 The adder 24 generates a decoded image by adding the prediction image supplied from the selection unit 23 and the prediction residual D supplied from the inverse orthogonal transform / inverse quantization unit 14.

可変長復号部３１は可変長復号部１３と、逆直交変換・逆量子化部３２は逆直交変換・逆量子化部１４と、フレームメモリ３３はフレームメモリ１８と、加算器３５は加算器２４と同様の機能を有するものであるので、その説明は省略する。 The variable length decoding unit 31 is the variable length decoding unit 13, the inverse orthogonal transform / inverse quantization unit 32 is the inverse orthogonal transform / inverse quantization unit 14, the frame memory 33 is the frame memory 18, and the adder 35 is the adder 24. Since this has the same function, the description thereof is omitted.

動き補償部３４は、各インター予測パーティションに関する動きベクトルを、そのパーティションに関する動きベクトル残差ＭＶＤと、他のパーティションに関する復元済みの動きベクトルとから復元し、インター予測画像を生成するものである。 The motion compensation unit 34 restores the motion vector related to each inter prediction partition from the motion vector residual MVD related to that partition and the restored motion vector related to another partition, and generates an inter prediction image.

（インター予測部１９の構成）
次に、インター予測部１９の構成について、図６を参照して説明する。図６に示すように、インター予測部１９は、拡張レイヤ動きベクトルバッファ９０１、モードバッファ９０２、動き補償部９０３、および基本レイヤ動きベクトルバッファ９０４を含む。 (Configuration of the inter prediction unit 19)
Next, the configuration of the inter prediction unit 19 will be described with reference to FIG. As illustrated in FIG. 6, the inter prediction unit 19 includes an enhancement layer motion vector buffer 901, a mode buffer 902, a motion compensation unit 903, and a base layer motion vector buffer 904.

拡張レイヤ動きベクトルバッファ９０１は、可変長復号部１３から供給された動きベクトル残差ＭＶＤを格納し、拡張レイヤ動きベクトル情報として動き補償部９０３に供給する。 The enhancement layer motion vector buffer 901 stores the motion vector residual MVD supplied from the variable length decoding unit 13 and supplies it to the motion compensation unit 903 as enhancement layer motion vector information.

モードバッファ９０２は、可変長復号部１３から供給された予測モード情報を格納している。 The mode buffer 902 stores prediction mode information supplied from the variable length decoding unit 13.

基本レイヤ動きベクトルバッファ９０４は、動き情報変換処理部２０から供給された基本レイヤ動きベクトル情報を格納し、動き補償部９０３に供給する。 The base layer motion vector buffer 904 stores the base layer motion vector information supplied from the motion information conversion processing unit 20 and supplies the base layer motion vector information to the motion compensation unit 903.

動き補償部９０３は、拡張レイヤ動きベクトルバッファ９０１から供給された拡張レイヤ動きベクトル情報と基本レイヤ動きベクトルバッファ９０４から供給された基本レイヤ動きベクトル情報との何れかを用いてインター予測画像を生成し、選択部２３に供給する。 The motion compensation unit 903 generates an inter prediction image using either the enhancement layer motion vector information supplied from the enhancement layer motion vector buffer 901 or the base layer motion vector information supplied from the base layer motion vector buffer 904. , Supplied to the selector 23.

（動きベクトルの導出方法１−１）
まず、動きベクトルの導出方法の説明に先立って、フィード予測とフレーム予測について、図５を参照して説明する。図５（ａ）に示すように、フィールド予測を行う場合は、１つのピクチャについて、２つのフィールドから動きベクトルを予測する。予測データは、２つの同一もしくは異なったフィールドから２つの動きベクトルによって生成される。 (Motion vector derivation method 1-1)
First, prior to the description of the motion vector derivation method, feed prediction and frame prediction will be described with reference to FIG. As shown in FIG. 5A, when performing field prediction, a motion vector is predicted from two fields for one picture. Prediction data is generated by two motion vectors from two identical or different fields.

また、図５（ｂ）に示すように、フレーム予測の場合、予測データは１つのフレームから１つの動きベクトルによって生成される。 As shown in FIG. 5B, in the case of frame prediction, prediction data is generated from one frame by one motion vector.

次に、インター予測部１９が用いる動きベクトルを導出する方法、換言すれば、動き情報正規化部２２、動き情報蓄積部２１、動き情報変換処理部２０の処理について、図７を参照して説明する。ここでは、基本レイヤで、フレーム構造におけるフィールド予測を行っている場合の拡張レイヤで用いる動きベクトルの導出処理について説明する。 Next, the method of deriving the motion vector used by the inter prediction unit 19, in other words, the processing of the motion information normalization unit 22, the motion information storage unit 21, and the motion information conversion processing unit 20 will be described with reference to FIG. To do. Here, the motion vector derivation process used in the enhancement layer when field prediction in the frame structure is performed in the base layer will be described.

図７は、動画像復号装置１において動画像を復号する際の、各フレームの表示時間順の並びを示している。図７の基本レイヤ（ＢＬ）では、フィールド構造のピクチャを示している。例えば、フレームＢＢ０は、破線で示すフィールドＢ０１とＢ０２とを合成したものである。また、基本レイヤのフレームＢＩ２はＩピクチャ、フレームＢＰ５はＰピクチャ、フレームＢＢ０、ＢＢ１、ＢＢ３、ＢＢ４はＢピクチャとして処理される。また、１つのフレームは２つのフィールドとして扱うことができる。例えば、フレームＢＢ０はフィールドＢ０１およびＢ０２として、フレームＢＢ１はフィールドＢ１１およびＢ１２として扱うことができる。ここでは、基本レイヤの各フレームは、フレーム構造またはフィールド構造のいずれかの形式で適応的に復号処理されるものとする。 FIG. 7 shows the arrangement of the frames in the display time order when the moving image decoding apparatus 1 decodes moving images. The base layer (BL) in FIG. 7 shows a field structure picture. For example, frame BB0 is a combination of fields B01 and B02 indicated by broken lines. The base layer frame BI2 is processed as an I picture, the frame BP5 is processed as a P picture, and the frames BB0, BB1, BB3, and BB4 are processed as B pictures. One frame can be handled as two fields. For example, the frame BB0 can be handled as fields B01 and B02, and the frame BB1 can be handled as fields B11 and B12. Here, it is assumed that each frame of the base layer is adaptively decoded in either a frame structure or a field structure.

なお、本実施の形態では、基本レイヤにおいて動き補償で使用するピクチャをフレーム構造とし、フィールドを単位としたフィールド予測にて行う例について説明する。 In the present embodiment, an example will be described in which a picture used for motion compensation in the base layer has a frame structure and field prediction is performed in units of fields.

図７に示す拡張レイヤ（ＥＬ）において、各フレームのピクチャは基本レイヤの各フィールドのピクチャに対応している。例えば、拡張レイヤのフレームｂ０に対応する基本レイヤのフィールドはＢ０１となる。また、拡張レイヤのＩ４はＩピクチャ、Ｐ１０はＰピクチャ、Ｂ１、Ｂ７は参照Ｂピクチャ、ｂ０、ｂ２、ｂ３、ｂ６、ｂ８、ｂ９、ｂ１１は非参照Ｂピクチャとして処理される。 In the enhancement layer (EL) shown in FIG. 7, the picture of each frame corresponds to the picture of each field of the base layer. For example, the base layer field corresponding to the enhancement layer frame b0 is B01. Further, I4 of the enhancement layer is processed as an I picture, P10 is processed as a P picture, B1 and B7 are processed as reference B pictures, and b0, b2, b3, b6, b8, b9, and b11 are processed as non-reference B pictures.

また、本実施の形態では、図７に「符号化順」として示した順序で、基本レイヤ、拡張レイヤは復号されるものとする。 In this embodiment, the base layer and the enhancement layer are decoded in the order shown as “coding order” in FIG.

ここで、現在の処理対象を、フレームＢ１とし、フレームＢ１は、後方参照ピクチャとして拡張レイヤのＩ４を、階層間参照ピクチャとして基本レイヤのＢＢ０（フィールドＢ０１、Ｂ０２）を用いることとする。これらの参照ピクチャは、処理対象のフレームＢ１が復号される時点で、すでに復号処理が完了している。 Here, the current processing target is a frame B1, and the frame B1 uses an enhancement layer I4 as a backward reference picture and a base layer BB0 (fields B01 and B02) as an inter-layer reference picture. These reference pictures have already been decoded when the processing target frame B1 is decoded.

また、拡張レイヤの現在の処理対象のフレームに対応する下位レイヤ（基本レイヤ）のフレーム（フィールド）は、各レイヤのビットストリームに含まれる時間情報をもとに特定する。なお、対応するフレームの特定は、時間情報をもとに構成に限られず、例えば、基本レイヤと拡張レイヤとで共通のピクチャ番号を付与する構成により、行ってもよい。 Also, the frame (field) of the lower layer (base layer) corresponding to the current processing target frame of the enhancement layer is specified based on the time information included in the bit stream of each layer. The identification of the corresponding frame is not limited to the configuration based on the time information, and may be performed by a configuration in which a common picture number is assigned to the base layer and the enhancement layer, for example.

フレームＢ１の予測ブロックＰＵで用いる動きベクトルＭＶ_ＥＬ（１）_ａを基本レイヤから導出する場合を考える。この場合、フレームＢ１に対応する、基本レイヤのフレームＢＢ０の予測ブロックＰＵにおけるマクロブロックａを処理する際に用いた動きベクトルを利用する。本実施例では、マクロブロックａは、フレームＢ１の予測ブロックＰＵの中心点に対応する座標にある基本レイヤのマクロブロックをマクロブロックａとする。ここでは、予測ブロックのＰＵの中心点の座標に基づいて基本レイヤのマクロブロックを特定したが、ＰＵの左上端や他の端点の座標に基づいて基本レイヤのマクロブロックを特定してもよく、これに限定するものではない。 Consider a case where the motion vector MV_EL (1) _a used in the prediction block PU of the frame B1 is derived from the base layer. In this case, the motion vector used when processing the macroblock a in the prediction block PU of the base layer frame BB0 corresponding to the frame B1 is used. In the present embodiment, the macroblock a is a macroblock a that is a base layer macroblock located at coordinates corresponding to the center point of the prediction block PU of the frame B1. Here, the macroblock of the base layer is specified based on the coordinates of the center point of the PU of the prediction block, but the macroblock of the base layer may be specified based on the coordinates of the upper left corner of the PU and other end points, However, the present invention is not limited to this.

上述したようにフレームＢＢ０はフレーム構造としてフィールド予測により処理されている。ここでは、フレームＢＢ０は後方参照ピクチャとして、フィールドＩ２１、Ｉ２２を参照し、マクロブロックａがそれぞれの参照ピクチャを参照して動きベクトルＭＶ_ＢＬ（１−１）_ａ、ＭＶ_ＢＬ（１−２）_ａにより処理されているものとする。 As described above, the frame BB0 is processed by field prediction as a frame structure. Here, the frame BB0 is referred to as the backward reference picture, referring to the fields I21 and I22, and the macroblock a refers to the respective reference pictures and is processed by the motion vectors MV_BL (1-1) _a and MV_BL (1-2) _a. It is assumed that

マクロブロックａはフィールド予測にて動きベクトルを算出しているので、ＭＶ_ＢＬ（１−１）_ａ、ＭＶ_ＢＬ（１−２）_ａの参照元フィールドはそれぞれＢ０１、Ｂ０２となる。処理対象のフレームｂ１に対応する基本レイヤのフィールドはＢ０２なので、インター予測部１９は動き情報蓄積部２１に蓄積された中間動きベクトルＭＶ_ＩＴＭ（１）_ａを用いて動き情報変換処理部２０が変換した動きベクトルＭＶ_ＥＬ（１）_ａを用いて動き補償を行う。 Since the macro block a calculates a motion vector by field prediction, the reference source fields of MV_BL (1-1) _a and MV_BL (1-2) _a are B01 and B02, respectively. Since the base layer field corresponding to the processing target frame b1 is B02, the inter prediction unit 19 uses the intermediate motion vector MV_ITM (1) _a stored in the motion information storage unit 21 to convert it. Motion compensation is performed using the motion vector MV_EL (1) _a.

具体的には、まず、インター予測部１９は、処理対象のＢ１に対応する基本レイヤのフレームＢＢ０を特定する。そして、フレームＢＢ０の対応するマクロブロックを処理する際に用いた動きベクトル情報（ＭＶ_ＢＬ（１−１）_ａ、ＭＶ_ＢＬ（１−２）_ａ）のうち、処理対象のＢ１に対応する基本レイヤのフレームＢ０２を処理する際に用いた動きベクトルＭＶ_ＢＬ（１−２）_ａを正規化した中間動きベクトルＭＶ_ＩＴＭ（１）_ａを動き情報蓄積部２１から動き情報変換処理部２０に供給させる。 Specifically, first, the inter prediction unit 19 specifies the frame BB0 of the base layer corresponding to B1 to be processed. Of the motion vector information (MV_BL (1-1) _a, MV_BL (1-2) _a) used when processing the corresponding macroblock of the frame BB0, the frame of the base layer corresponding to B1 to be processed An intermediate motion vector MV_ITM (1) _a obtained by normalizing the motion vector MV_BL (1-2) _a used when processing B02 is supplied from the motion information storage unit 21 to the motion information conversion processing unit 20.

ここで、動き情報蓄積部２１に蓄積された中間動きベクトルＭＶ_ＩＴＭ（１）_ａは、予測ブロックＰＵの復号時には動き情報正規化部２２により既に正規化されて蓄積されているものとする。中間動きベクトルＭＶ_ＩＴＭ（１）_ａは、動き情報正規化部２２により、動きベクトルＭＶ_ＢＬ（１−２）_ａから式（１）を用いて算出される。 Here, it is assumed that the intermediate motion vector MV_ITM (1) _a stored in the motion information storage unit 21 is already normalized and stored by the motion information normalization unit 22 when the prediction block PU is decoded. The intermediate motion vector MV_ITM (1) _a is calculated by the motion information normalization unit 22 from the motion vector MV_BL (1-2) _a using Equation (1).

MV_ITM(1)_a=MV_BL(1-2)_a×tb_a /td_a 式（１）
ここで、ｔｂは処理対象フレームと１フレーム離れた距離を示す時間間隔であり、ｔｄは対象基本レイヤを復号する際に用いた動きベクトルＭＶ_ＢＬ（１−２）_ａを算出する元となったフレームＢ０２とフレームＩ２２との距離を示す時間間隔である。 MV_ITM (1) _a = MV_BL (1-2) _a × tb_a / td_a Formula (1)
Here, tb is a time interval indicating a distance one frame away from the processing target frame, and td is a frame from which the motion vector MV_BL (1-2) _a used for decoding the target base layer is calculated. This is a time interval indicating the distance between B02 and the frame I22.

次に、動き情報変換処理部２０は、式（２）に従い、用いる動きベクトルＭＶ_ＥＬ（１）_ａを算出する。 Next, the motion information conversion processing unit 20 calculates a motion vector MV_EL (1) _a to be used according to Expression (2).

MV_EL(1)_a =MV_ITM(2)_a×tde_a/tb_a×スケーリング処理式（２）
ここで、スケーリング処理とは、ＥＬ解像度／ＢＬ解像度であり、ｔｄeとは、対象フレームＢ１から１フレーム離れた距離を示す時間間隔である。 MV_EL (1) _a = MV_ITM (2) _a × tde_a / tb_a × scaling formula (2)
Here, the scaling processing is EL resolution / BL resolution, and tde is a time interval indicating a distance one frame away from the target frame B1.

なお、本実施の形態では、基本レイヤと拡張レイヤとでフレームレート（フィールドレート）が一致している前提で記載したが、フレームレートが異なり、かつ逓倍の場合は、拡張レイヤフレームレート/基本レイヤフレームレートをＭＶ_ＩＴＭに乗算して、時間方向のスケールを合わせてＭＶ_ＥＬを求めればよい。基本レイヤフレームレートは、ＭＰＥＧ−２の場合はシーケンスヘッダから取得することができる。 In this embodiment, the frame rate (field rate) is the same for the base layer and the enhancement layer. However, if the frame rate is different and multiplied, the enhancement layer frame rate / base layer The MV_EL may be obtained by multiplying the MV_ITM by the frame rate and adjusting the scale in the time direction. The base layer frame rate can be acquired from the sequence header in the case of MPEG-2.

また、上記では、復号処理を説明したが、動画像符号化装置２における符号化処理においても同様の処理を行う。 Moreover, although the decoding process was demonstrated above, the same process is performed also in the encoding process in the moving image encoding device 2. FIG.

（動きベクトルの導出方法１−２）
次に、基本レイヤで、フレーム構造におけるフレーム予測を行っている場合の拡張レイヤで用いる動きベクトルの導出処理について、図８を参照して説明する。 (Motion vector deriving method 1-2)
Next, the motion vector derivation process used in the enhancement layer when frame prediction is performed in the frame structure in the base layer will be described with reference to FIG.

ここでは、復号対象は、ｂ２であるとする。まず、インター予測部１９は、処理対象のｂ２に対応する基本レイヤのフレームＢＢ１を特定する。そして、フレームＢＢ１の対応するマクロブロックを処理する際に用いた動きベクトルＭＶ_ＢＬ（１）_ｂを正規化した中間動きベクトルＭＶ_ＩＴＭ（１）_ｂを動き情報蓄積部２１から動き情報変換処理部２０に供給させる。 Here, it is assumed that the decoding target is b2. First, the inter prediction unit 19 specifies the frame BB1 of the base layer corresponding to b2 to be processed. Then, an intermediate motion vector MV_ITM (1) _b obtained by normalizing the motion vector MV_BL (1) _b used when processing the corresponding macroblock of the frame BB1 is supplied from the motion information storage unit 21 to the motion information conversion processing unit 20. Let

ここで、動き情報蓄積部２１に蓄積された中間動きベクトルＭＶ_ＩＴＭ（１）_ｂは、予測ブロックＰＵの復号時には動き情報正規化部２２により既に正規化されて蓄積されているものとする。中間動きベクトルＭＶ_ＩＴＭ（１）_ｂは、動き情報正規化部２２により動きベクトルＭＶ_ＢＬ（１）_ｂから式（３）を用いて算出される。 Here, it is assumed that the intermediate motion vector MV_ITM (1) _b stored in the motion information storage unit 21 is already normalized and stored by the motion information normalization unit 22 when the prediction block PU is decoded. The intermediate motion vector MV_ITM (1) _b is calculated by the motion information normalization unit 22 from the motion vector MV_BL (1) _b using equation (3).

MV_ITM(1)_b=MV_BL(1)_b×tb_b/td_b 式（３）
次に、動き情報変換処理部２０は、式（４）に従い、用いる動きベクトルＭＶ_ＥＬ（１）_ｂを算出する。 MV_ITM (1) _b = MV_BL (1) _b × tb_b / td_b Formula (3)
Next, the motion information conversion processing unit 20 calculates a motion vector MV_EL (1) _b to be used according to Equation (4).

MV_EL(1)_b =MV_ITM(2)_b×tde_b/tb_b×スケーリング処理式（４）
（動きベクトルの導出方法１−３）
次に、基本レイヤで、フレーム構造におけるフィールド予測を行っている場合の拡張レイヤで用いる動きベクトルの導出処理の別の例について、図９を参照して説明する。 MV_EL (1) _b = MV_ITM (2) _b × tde_b / tb_b × scaling formula (4)
(Motion vector derivation method 1-3)
Next, another example of the motion vector derivation process used in the enhancement layer when field prediction in the frame structure is performed in the base layer will be described with reference to FIG.

ここでは、復号対象は、ｂ９であるとする。インター予測部１９は、ｂ９に対応する基本レイヤのフレームＢＢ４を特定する。そして、フレームＢＢ４の該当するマクロブックにおいて同じ予測方向の動きベクトル情報が動き情報蓄積部２１に蓄積されていないため、復号対象フレームであるｂ９から符号化順序で直近の参照フレームであるＰ１０に対応する基本レイヤのフレームＢＰ５の該当するマクロブロックの動きベクトル情報を参照する。フレームＢＰ５、すなわちフィールドＰ５１、Ｐ５２において、対応する位置のアンカーマクロブロックを処理した際に用いた動きベクトルのうち、同じ予測方向の動きベクトルであるＭＶ_ＢＬ（１−１）_ｃ、ＭＶ_ＢＬ（１−２）_ｃから算出した中間動きベクトルＭＶ_ＩＴＭ（１−１）_ｃ、ＭＶ_ＩＴＭ（１−１）_ｃの何れかを動き情報蓄積部２１から変換部２２０に供給させる。ここでは、中間動きベクトルＭＶ_ＩＴＭ（１−１）_ｃを供給させることとする。なお、中間動きベクトルＭＶ_ＩＴＭ（１−１）_ｃ、ＭＶ_ＩＴＭ（１−１）_ｃの両方を供給させてもよい。 Here, it is assumed that the decoding target is b9. The inter prediction unit 19 specifies the frame BB4 of the base layer corresponding to b9. Since motion vector information in the same prediction direction is not stored in the motion information storage unit 21 in the corresponding macrobook of the frame BB4, it corresponds to P10 which is the nearest reference frame in the encoding order from b9 which is the decoding target frame. The motion vector information of the corresponding macroblock in the base layer frame BP5 is referred to. Among the motion vectors used when processing the anchor macroblock at the corresponding position in the frame BP5, that is, the fields P51 and P52, MV_BL (1-1) _c and MV_BL (1-2 ) _C. The intermediate motion vector MV_ITM (1-1) _c or MV_ITM (1-1) _c calculated from _c is supplied from the motion information storage unit 21 to the conversion unit 220. Here, the intermediate motion vector MV_ITM (1-1) _c is supplied. Note that both the intermediate motion vector MV_ITM (1-1) _c and MV_ITM (1-1) _c may be supplied.

そして、動き情報正規化部２２ここで中間動きベクトルＭＶ_ＩＴＭ（１）_ｃは、予測ブロックＰＵの復号時にはにより既に正規化されて蓄積されているものとする。中間動きベクトルＭＶ_ＩＴＭ（１）_ｃは、動き情報正規化部２２により動きベクトルＭＶ_ＢＬ（１−１）_ｂから式（５）を用いて算出される。 The motion information normalization unit 22 here assumes that the intermediate motion vector MV_ITM (1) _c is already normalized and accumulated when the prediction block PU is decoded. The intermediate motion vector MV_ITM (1) _c is calculated by the motion information normalization unit 22 using the equation (5) from the motion vector MV_BL (1-1) _b.

MV_ITM(1)_c=MV_BL(1-1)_c×tb_c/td_c 式（５）
次に、動き情報変換処理部２０は、式（６）に従い、用いる動きベクトルＭＶ_ＥＬ（１）_ｃを算出する。 MV_ITM (1) _c = MV_BL (1-1) _c × tb_c / td_c Formula (5)
Next, the motion information conversion processing unit 20 calculates a motion vector MV_EL (1) _c to be used according to Equation (6).

MV_EL(1)_c =MV_ITM(2)_c×tde_c/tb_c×スケーリング処理式（６）
（動きベクトルの導出方法１−４）
次に、基本レイヤで、フレーム構造におけるフレーム予測を行っている場合の拡張レイヤで用いる動きベクトルの導出処理の別の例について、図１０を参照して説明する。 MV_EL (1) _c = MV_ITM (2) _c × tde_c / tb_c × scaling formula (6)
(Motion vector deriving method 1-4)
Next, another example of motion vector derivation processing used in the enhancement layer when frame prediction is performed in the frame structure in the base layer will be described with reference to FIG.

ここでは、復号対象は、ｂ９であるとする。インター予測部１９は、ｂ９に対応する基本レイヤのフレームＢ４２を特定する。そして、フレームＢ４２の該当するマクロブックにおいて同じ予測方向の動きベクトル情報が動き情報蓄積部２１に蓄積されていないため、復号対象フレームであるｂ９から符号化順序で直近の参照フレームであるＰ１０に対応する基本レイヤのフレームＢＰ５、すなわちフィールドＰ５１、Ｐ５２において、対応する位置のアンカーマクロブロックを処理した際に用いた動きベクトルのうち、同じ予測方向の動きベクトルであるＭＶ_ＢＬ（１）_ｄから算出した中間動きベクトルＭＶ_ＩＴＭ（１）_ｄを動き情報蓄積部２１から動き情報変換部２０に供給させる。 Here, it is assumed that the decoding target is b9. The inter prediction unit 19 specifies the frame B42 of the base layer corresponding to b9. Since motion vector information in the same prediction direction is not stored in the motion information storage unit 21 in the corresponding macrobook of the frame B42, it corresponds to P10 which is the nearest reference frame in the encoding order from b9 which is the decoding target frame. In the base layer frame BP5, that is, in the fields P51 and P52, the intermediate vector calculated from MV_BL (1) _d which is the motion vector in the same prediction direction among the motion vectors used when the anchor macroblock at the corresponding position is processed The motion vector MV_ITM (1) _d is supplied from the motion information storage unit 21 to the motion information conversion unit 20.

ここで、中間動きベクトルＭＶ_ＩＴＭ（１）_ｄは、予測ブロックＰＵの復号時には動き情報正規化部２２により既に正規化されて蓄積されているものとする。中間動きベクトルＭＶ_ＩＴＭ（１）_ｄは、動き情報正規化部２２により動きベクトルＭＶ_ＢＬ（１−１）_ｄから式（７）を用いて算出される。 Here, it is assumed that the intermediate motion vector MV_ITM (1) _d is already normalized and accumulated by the motion information normalization unit 22 when the prediction block PU is decoded. The intermediate motion vector MV_ITM (1) _d is calculated by the motion information normalization unit 22 from the motion vector MV_BL (1-1) _d using equation (7).

MV_ITM(1)_d=MV_BL(1-1)_d×tb_d/td_d 式（７）
次に、動き情報変換処理部２０は、式（８）に従い、用いる動きベクトルＭＶ_ＥＬ（１）_ｄを算出する。 MV_ITM (1) _d = MV_BL (1-1) _d × tb_d / td_d Equation (7)
Next, the motion information conversion processing unit 20 calculates a motion vector MV_EL (1) _d to be used according to Expression (8).

MV_EL(1)_d =MV_ITM(2)_d×tde_d/tb_d×スケーリング処理式（８）
なお、ここでは基本レイヤにおいて、拡張レイヤで必要とする予測方向の動きベクトルが含まれていない場合、符号化順序で、該フレームの直近の参照フレームに含まれる動きベクトルに基づいて、中間動きベクトルを導出することとしたが、基本レイヤにおいて、拡張レイヤで必要とする予測方向と異なる予測方向の動きベクトルが含まれている場合には、その動きベクトルに基づいて、中間動きベクトルを導出してもよい。 MV_EL (1) _d = MV_ITM (2) _d × tde_d / tb_d × scaling formula (8)
Here, when the motion vector in the prediction direction required in the enhancement layer is not included in the base layer, the intermediate motion vector is based on the motion vector included in the reference frame nearest to the frame in the encoding order. However, if the base layer contains a motion vector with a prediction direction different from the prediction direction required by the enhancement layer, an intermediate motion vector is derived based on the motion vector. Also good.

（動画像復号装置１における処理の流れ）
次に、動画像復号装置１に於ける処理の流れについて、図１２〜１６を参照して説明する。 (Processing flow in the video decoding device 1)
Next, the flow of processing in the video decoding device 1 will be described with reference to FIGS.

まず、拡張レイヤの復号処理の流れについて、図１２を参照して説明する。拡張レイヤ復号部１０は、対象フレームの復号に用いる参照画像リストを構築する（Ｓ１０１）。参照画像リストには、各参照画像の復号画像のフレームバッファ内での位置、および、各参照画像の出力順序の情報を含んでいる。 First, an enhancement layer decoding process flow will be described with reference to FIG. The enhancement layer decoding unit 10 constructs a reference image list used for decoding the target frame (S101). The reference image list includes the position of the decoded image of each reference image in the frame buffer and the output order of each reference image.

次に、基本レイヤ復号部１１は、拡張レイヤの対象フレームの復号に必要な基本レイヤのピクチャ（フレームまたはフィールド）を復号する（Ｓ１０２）。そして、動き情報正規化部２２は、対応する基本レイヤのピクチャを復号するために用いた動きベクトルから、中間動きベクトル情報を算出する中間動き情報導出処理を実行する（中間動き導出処理：Ｓ１０３）。 Next, the base layer decoding unit 11 decodes a picture (frame or field) of a base layer necessary for decoding the target frame of the enhancement layer (S102). Then, the motion information normalization unit 22 executes an intermediate motion information derivation process for calculating intermediate motion vector information from the motion vector used for decoding the corresponding base layer picture (intermediate motion derivation process: S103). .

次に、処理対象の符号単位ＣＵを設定し（Ｓ１０４）、可変長復号部１３は、対象ＣＵのサイド情報を復号する。サイド情報には、ＣＵ予測タイプ（イントラ予測モード、インター予測モードの識別情報）、スキップフラグ、ＰＵ分割情報を含んでいる（Ｓ１０５）。 Next, the code unit CU to be processed is set (S104), and the variable length decoding unit 13 decodes the side information of the target CU. The side information includes a CU prediction type (intra prediction mode and inter prediction mode identification information), a skip flag, and PU partition information (S105).

さらに、可変長復号部１３は、対象ＣＵ内の各ＴＵにおける変換係数を復号し、逆直交変換・逆量子化部１４が逆直交変換・逆量子化を行って予測残差を復号する（Ｓ１０６）。 Furthermore, the variable length decoding unit 13 decodes the transform coefficient in each TU in the target CU, and the inverse orthogonal transform / inverse quantization unit 14 performs inverse orthogonal transform / inverse quantization to decode the prediction residual (S106). ).

そして、対象ＣＵ内の予測単位ＰＵについて、イントラ予測により予測画像を生成するイントラＣＵであれば（Ｓ１０７でＹＥＳ）、イントラ予測部１６は、イントラ予測により対象ＣＵ内の各予測単位ＰＵの予測画像を生成する（Ｓ１０８）。 If the prediction unit PU in the target CU is an intra CU that generates a prediction image by intra prediction (YES in S107), the intra prediction unit 16 predicts each prediction unit PU in the target CU by intra prediction. Is generated (S108).

一方、対象ＣＵ内の予測単位ＰＵについて、インター予測により予測画像を生成するインターＣＵであれば（Ｓ１０７でＮＯ）、インター予測部１９は、インター予測により、対象ＣＵ内の各ＰＵの予測画像を生成する（インター予測処理：Ｓ１０９）。 On the other hand, if the prediction unit PU in the target CU is an inter CU that generates a prediction image by inter prediction (NO in S107), the inter prediction unit 19 uses the inter prediction to calculate the prediction image of each PU in the target CU. Generate (inter prediction process: S109).

そして、対象ＣＵについて復号画像を生成し（Ｓ１１０）、全てのＣＵについて復号画像を生成すると（Ｓ１１１でＹＥＳ）、復号した画像にデブロッキングフィルタ、適応オフセットフィルタ（SAO: Sample Adaptive Offset)、適応ループフィルタ（ALF: Alternative Loop Filter）を適用して（Ｓ１１２）、処理を終了する。 When a decoded image is generated for the target CU (S110) and decoded images are generated for all CUs (YES in S111), a deblocking filter, an adaptive offset filter (SAO: Sample Adaptive Offset), and an adaptive loop are added to the decoded image. A filter (ALF: Alternative Loop Filter) is applied (S112), and the process ends.

次に、インター予測処理の流れについて、図１３を参照して説明する。 Next, the flow of the inter prediction process will be described with reference to FIG.

インター予測処理では、インター予測部１９は、まず対象ＰＵを設定し（Ｓ２０１）、ＰＵサイド情報を復号する（Ｓ２０２）。ＰＵサイド情報には、ベースレイヤ予測フラグ(base_mode_flag）、マージフラグ(merge_flag)、マージインデックス(merge_idx)が含まれている。 In the inter prediction process, the inter prediction unit 19 first sets a target PU (S201) and decodes PU side information (S202). The PU side information includes a base layer prediction flag (base_mode_flag), a merge flag (merge_flag), and a merge index (merge_idx).

そして、対象ＰＵがマージを行うマージＰＵでなければ（Ｓ２０３でＮＯ）、当該ＰＵの動き補償パラメータを復号する（Ｓ２０５）。一方、対象ＰＵが、マージＰＵであり（Ｓ２０３でＹＥＳ）、基本レイヤの動きベクトルを参照するベースモードであれば（Ｓ２０３でＹＥＳ）、拡張レイヤ復号部１０は、レイヤ間動き推定処理を行う（レイヤ間動き推定処理：Ｓ２０７）。また、対象ＰＵがベースモードでなければ（Ｓ２０４でＮＯ）、インター予測部１９は、マージ候補を導出する（Ｓ２０６）。 If the target PU is not a merge PU to be merged (NO in S203), the motion compensation parameter of the PU is decoded (S205). On the other hand, if the target PU is a merge PU (YES in S203) and the base mode refers to the motion vector of the base layer (YES in S203), the enhancement layer decoding unit 10 performs an inter-layer motion estimation process ( Inter-layer motion estimation processing: S207). If the target PU is not the base mode (NO in S204), the inter prediction unit 19 derives a merge candidate (S206).

その後、インター予測部１９は、ステップＳ２０５、ステップＳ２０６、またはステップＳ２０７で導出した動き補償パラメータを用いて、動き補償を行い予測画像を生成する（Ｓ２０８）。そして、全てのＰＵについて処理が終わると（Ｓ２０９でＹＥＳ）、インター予測処理を終了する。 Thereafter, the inter prediction unit 19 performs motion compensation using the motion compensation parameter derived in step S205, step S206, or step S207 to generate a predicted image (S208). Then, when the process is completed for all PUs (YES in S209), the inter prediction process is terminated.

動き補償パラメータを導出するためのテーブルを図１４に示す。図１４に示す動き補償パラメータ導出テーブルは、ＣＵサイド情報およびＰＵサイド情報と、対象ＰＵにおける動き補償パラメータの導出方法とを対応付けたものである。図１４において、「０」、「１」は対応するシンタックスの値を示し、「−」はそのシンタックス要素の復号が不要であることを示す。 A table for deriving motion compensation parameters is shown in FIG. The motion compensation parameter derivation table shown in FIG. 14 associates CU side information and PU side information with a motion compensation parameter derivation method for the target PU. In FIG. 14, “0” and “1” indicate corresponding syntax values, and “−” indicates that decoding of the syntax elements is unnecessary.

次に、中間動き情報導出処理の流れについて、図１５を参照して説明する。 Next, the flow of the intermediate motion information derivation process will be described with reference to FIG.

まず、動き情報正規化部２２は、中間動き情報に導出に用いる基本レイヤのピクチャを設定する（Ｓ３０１）。次に、動き情報正規化部２２は、対象フレームを基本処理単位（本実施形態ではＭＰＥＧ−２のマクロブロック）に分割し（Ｓ３０２）、対象基本処理単位に対応する基本レイヤのピクチャの位置を特定し、対応する動きベクトルを読み出す（Ｓ３０３）。そして、読み出した動きベクトルから中間動きベクトルＭＶ_ＩＴＭを導出する（Ｓ３０４）。すべての基本処理単位について処理が終了すると（Ｓ３０５でＹＥＳ）、中間動き情報導出処理を終了する。 First, the motion information normalization unit 22 sets a base layer picture used for derivation as intermediate motion information (S301). Next, the motion information normalization unit 22 divides the target frame into basic processing units (in this embodiment, MPEG-2 macroblocks) (S302), and determines the position of the base layer picture corresponding to the target basic processing unit. The specified motion vector is read out (S303). Then, an intermediate motion vector MV_ITM is derived from the read motion vector (S304). When the processing is completed for all basic processing units (YES in S305), the intermediate motion information derivation processing ends.

次に、レイヤ間動き推定処理の流れについて、図１６を参照して説明する。図１６に示すように、まず、拡張レイヤ復号部１０は、参照画像リストの利用フラグを推定する。利用フラグは、中間動き情報に動きベクトルが２つ存在する場合には、Ｌ０、Ｌ１ともに利用可能とし、１つしか存在しない場合にはＬ０のみ利用可能とする。 Next, the flow of the inter-layer motion estimation process will be described with reference to FIG. As illustrated in FIG. 16, first, the enhancement layer decoding unit 10 estimates the use flag of the reference image list. The use flag can be used for both L0 and L1 when there are two motion vectors in the intermediate motion information, and only L0 can be used when there is only one.

次に、拡張レイヤ復号部１０は、対象ＰＵの復号に用いる動きベクトルを参照するための参照画像リストを設定する（Ｓ４０２）。参照画像リストは、Ｌ０→Ｌ１の順に選択する。 Next, the enhancement layer decoding unit 10 sets a reference image list for referring to a motion vector used for decoding the target PU (S402). The reference image list is selected in the order of L0 → L1.

そして、拡張レイヤ復号部１０は、設定した参照画像リストから参照画像を設定する（Ｓ４０３、refIdxLX = 0 ）。その後、拡張レイヤ復号部１０は、参照画像の中間動き情報から動きベクトルを読み出し、対象フレームと参照画像の表示間隔に基づいてスケールして、動きベクトルを導出する（Ｓ４０４）。そして、全ての参照画像リストについて処理が終了すると（Ｓ４０５）、レイヤ間推定処理が終了する。 The enhancement layer decoding unit 10 sets a reference image from the set reference image list (S403, refIdxLX = 0). Thereafter, the enhancement layer decoding unit 10 reads a motion vector from the intermediate motion information of the reference image, scales based on the display interval between the target frame and the reference image, and derives a motion vector (S404). When the processing is completed for all the reference image lists (S405), the inter-layer estimation processing is completed.

（動画像符号化装置２）
次に、符号化対象画像を符号化することによって符号化データ＃１（ａ、ｂ）を生成する動画像符号化装置２について、図２１、２２を参照して説明する。動画像符号化装置２は、その一部に、ＭＰＥＧ−２に採用されている方式、Ｈ．２６４／ＭＰＥＧ−４．ＡＶＣに採用されている方式、ＶＣＥＧ（Video Coding Expert Group）における共同開発用コーデックであるＫＴＡソフトウェアに採用されている方式、その後継コーデックであるＴＭｕＣ（Test Model under Consideration）ソフトウェアに採用されている方式、及び、ＨＭ（HEVC TestModel）ソフトウェアに採用されている技術を含んでいる。 (Moving picture encoding device 2)
Next, the moving picture encoding apparatus 2 that generates encoded data # 1 (a, b) by encoding an encoding target image will be described with reference to FIGS. The moving image encoding apparatus 2 includes, in part, a method adopted in MPEG-2, H.264, and H.264. H.264 / MPEG-4. A method adopted in AVC, a method adopted in KTA software, which is a codec for joint development in VCEG (Video Coding Expert Group), and a method adopted in TMuC (Test Model under Consideration) software, which is the successor codec And the technology employed in HM (HEVC TestModel) software.

図２１は、本実施形態に係る動画像符号化装置２の構成を示すブロック図である。図２１に示すように、動画像符号化装置２は、拡張レイヤ符号化部（ＨＥＶＣ）８１、基本レイヤ符号化部（ＭＰＥＧ−２）８２、および縮小／インタレース化処理部８３を含む構成である。また、拡張レイヤ符号化部８１は、画像並べ替えバッファ５１、動き情報変換処理部５２、動き情報蓄積部５３、動き情報正規化部５４、フレームメモリ５５、動き予測／補償部５６、拡大／ＩＰ変換部５７、ループ内フィルタ５８、イントラ予測部５９、選択部６０、逆直交変換・逆量子化部６１、直交変換・量子化部６２、可変長符号化部６３、減算器６４、加算器６５を含む構成である。 FIG. 21 is a block diagram showing a configuration of the video encoding device 2 according to the present embodiment. As shown in FIG. 21, the moving picture coding apparatus 2 includes an enhancement layer coding unit (HEVC) 81, a base layer coding unit (MPEG-2) 82, and a reduction / interlacing processing unit 83. is there. Further, the enhancement layer encoding unit 81 includes an image rearrangement buffer 51, a motion information conversion processing unit 52, a motion information accumulation unit 53, a motion information normalization unit 54, a frame memory 55, a motion prediction / compensation unit 56, an enlargement / IP Transformer 57, intra-loop filter 58, intra prediction unit 59, selection unit 60, inverse orthogonal transform / inverse quantization unit 61, orthogonal transform / quantization unit 62, variable length coding unit 63, subtractor 64, adder 65 It is the structure containing.

また、基本レイヤ符号化部８２は、画像並べ替えバッファ７１、動き推定／動き補償部７２、フレームメモリ７３、逆直交変換・逆量子化部７４、直交変換・量子化部７５、可変長符号化部７６、加算器７７、減算器７８を含む構成である。 The base layer encoding unit 82 also includes an image rearrangement buffer 71, a motion estimation / motion compensation unit 72, a frame memory 73, an inverse orthogonal transform / inverse quantization unit 74, an orthogonal transform / quantization unit 75, and a variable length encoding. The unit 76 includes an adder 77 and a subtractor 78.

動画像符号化装置２は、動画像＃１０（符号化対象画像）をスケーラブル符号化（ＳＶＣ）することによって、基本レイヤおよび拡張レイヤを含む符号化データ＃１（ａ、ｂ）を生成する装置である。 The moving image encoding device 2 generates encoded data # 1 (a, b) including a base layer and an enhancement layer by performing scalable encoding (SVC) on moving image # 10 (encoding target image). It is.

画像並べ替えバッファ５１、画像並べ替えバッファ７１は、入力された画像を符号化順に並べ替えるためのバッファである。 The image rearrangement buffer 51 and the image rearrangement buffer 71 are buffers for rearranging input images in the encoding order.

直交変換・量子化部６２は、（１）符号化対象画像から予測画像Ｐｒｅｄを減算した予測残差Ｄをブロック毎にＤＣＴ変換（Discrete Cosine Transform）し、（２）ＤＣＴ変換により得られたＤＣＴ係数を量子化し、（３）量子化により得られた量子化予測残差ＱＤを可変長符号化部６３及び逆直交変換・逆量子化部６１に供給する。なお、直交変換・量子化部６２は、（１）量子化の際に用いる量子化ステップＱＰを、ツリーブロック毎に選択し、（２）選択した量子化ステップＱＰの大きさを示す量子化パラメータ差分Δｑｐを可変長符号化部６３に供給し、（３）選択した量子化ステップＱＰを逆直交変換・逆量子化部６１に供給する。ここで、量子化パラメータ差分Δｑｐとは、ＤＣＴ変換／量子化するツリーブロックに関する量子化パラメータｑｐ（ＱＰ＝２ｐｑ／６）の値から、直前にＤＣＴ変換／量子化したツリーブロックに関する量子化パラメータｑｐ’の値を減算して得られる差分値のことを指す。 The orthogonal transform / quantization unit 62 (1) performs DCT transform (Discrete Cosine Transform) on the prediction residual D obtained by subtracting the predicted image Pred from the encoding target image, and (2) DCT obtained by DCT transform. The coefficient is quantized, and (3) the quantized prediction residual QD obtained by the quantization is supplied to the variable length coding unit 63 and the inverse orthogonal transform / inverse quantization unit 61. The orthogonal transform / quantization unit 62 selects (1) a quantization step QP used for quantization for each tree block, and (2) a quantization parameter indicating the size of the selected quantization step QP. The difference Δqp is supplied to the variable length encoding unit 63, and (3) the selected quantization step QP is supplied to the inverse orthogonal transform / inverse quantization unit 61. Here, the quantization parameter difference Δqp is the quantization parameter qp for the tree block DCT transformed / quantized immediately before from the value of the quantization parameter qp (QP = 2pq / 6) for the tree block to be DCT transformed / quantized. The difference value obtained by subtracting the value of '.

可変長符号化部６３は、（１）直交変換・量子化部６２から供給された量子化予測残差ＱＤ並びにΔｑｐ、（２）イントラ予測部５９から供給されたイントラ予測情報、（３）動き予測／補償部５６から供給された動き情報を可変長符号化することによって、符号化データ＃１ａを生成する。 The variable length encoding unit 63 includes (1) a quantized prediction residual QD and Δqp supplied from the orthogonal transform / quantization unit 62, (2) intra prediction information supplied from the intra prediction unit 59, and (3) motion. The motion information supplied from the prediction / compensation unit 56 is variable-length encoded to generate encoded data # 1a.

動き予測／補償部５６は、各パーティションに関する動きベクトルｍｖを検出し、検出した動きベクトルｍｖと、参照画像として利用したフィルタ済復号画像を指定する参照画像インデックスＲＩとを用いて、動き補償画像ｍｃを生成する。動き予測／補償部５６の詳細については後述する。 The motion prediction / compensation unit 56 detects the motion vector mv for each partition, and uses the detected motion vector mv and the reference image index RI that specifies the filtered decoded image used as the reference image, to compensate for the motion compensated image mc. Is generated. Details of the motion prediction / compensation unit 56 will be described later.

動き推定／動き補償部７２は、各パーティションに関する動きベクトルｍｖを検出し、検出した動きベクトルｍｖと、参照画像として利用したフィルタ済復号画像を指定する参照画像インデックスＲＩとを用いて、動き補償画像ｍｃを生成する。 The motion estimation / motion compensation unit 72 detects a motion vector mv related to each partition, and uses the detected motion vector mv and a reference image index RI that specifies a filtered decoded image used as a reference image, to compensate for a motion compensated image. Generate mc.

直交変換・量子化部７５は、（１）符号化対象画像から予測画像Ｐｒｅｄを減算した予測残差Ｄをブロック毎にＤＣＴ変換（Discrete Cosine Transform）し、（２）ＤＣＴ変換により得られたＤＣＴ係数を量子化し、（３）量子化により得られた量子化予測残差ＱＤを可変長符号化部７６及び逆直交変換・逆量子化部７４に供給する。 The orthogonal transform / quantization unit 75 (1) performs DCT transform (Discrete Cosine Transform) on the prediction residual D obtained by subtracting the predicted image Pred from the encoding target image, and (2) DCT obtained by DCT transform. The coefficient is quantized, and (3) the quantized prediction residual QD obtained by the quantization is supplied to the variable length coding unit 76 and the inverse orthogonal transform / inverse quantization unit 74.

なお、動き情報変換処理部５２、動き情報蓄積部５３、動き情報正規化部５４、フレームメモリ５５、拡大／ＩＰ変換部５７、ループ内フィルタ５８、逆直交変換・逆量子化部６１、加算器６５、フレームメモリ７３、逆直交変換・逆量子化部７４については、上述した動き情報変換処理部２０、動き情報蓄積部２１、動き情報正規化部２２、フレームメモリ１８、拡大／ＩＰ変換部１７、ループ内フィルタ１５、逆直交変換・逆量子化部１４、加算器２４、フレームメモリ３３、逆直交変換・逆量子化部３２と同様の機能を有するものであるので、その説明は省略する。 The motion information conversion processing unit 52, the motion information storage unit 53, the motion information normalization unit 54, the frame memory 55, the expansion / IP conversion unit 57, the in-loop filter 58, the inverse orthogonal transform / inverse quantization unit 61, and the adder 65, the frame memory 73, and the inverse orthogonal transform / inverse quantization unit 74, the motion information conversion processing unit 20, the motion information storage unit 21, the motion information normalization unit 22, the frame memory 18, and the expansion / IP conversion unit 17 described above. The in-loop filter 15, the inverse orthogonal transform / inverse quantization unit 14, the adder 24, the frame memory 33, and the inverse orthogonal transform / inverse quantization unit 32 have the same functions, and thus description thereof is omitted.

（動き予測／補償部５６）
次に、動き予測／補償部５６の構成について、図２２を参照して説明する。図２２に示すように、動き予測／補償部５６は、動き探索部９１１、コスト関数算出部９１２、モード判定部９１３、動き補償部９１４を含む構成である。 (Motion prediction / compensation unit 56)
Next, the configuration of the motion prediction / compensation unit 56 will be described with reference to FIG. As shown in FIG. 22, the motion prediction / compensation unit 56 includes a motion search unit 911, a cost function calculation unit 912, a mode determination unit 913, and a motion compensation unit 914.

動き探索部９１１は、画像並べ替えバッファ５１から供給された入力画像情報と、フレームメモリ５５から供給された参照画像情報とから、動き情報を導出し、拡張レイヤ動きベクトル情報としてコスト関数算出部９１２に供給する。 The motion search unit 911 derives motion information from the input image information supplied from the image rearrangement buffer 51 and the reference image information supplied from the frame memory 55, and costs function calculation unit 912 as enhancement layer motion vector information. To supply.

コスト関数算出部９１２は、動きベクトル情報とモード判定とについて、コスト関数を定義する。 The cost function calculation unit 912 defines a cost function for motion vector information and mode determination.

モード判定部９１３は、コスト関数算出部９１２で定義されたコスト関数を用いてマージを行うか、基本レイヤの動きベクトルを参照するか、通常の動き補償処理を行うか予測モードの判定を行う。 The mode determination unit 913 determines a prediction mode whether to perform merging using the cost function defined by the cost function calculation unit 912, to refer to the motion vector of the base layer, or to perform normal motion compensation processing.

動き補償部９１４は、コスト関数が最小となるインター予測処理を最適パラメータとして選択し、予測画像を生成して選択部６０に供給するとともに、動き補償パラメータ、動きベクトル情報を可変長符号化部６３に供給する。 The motion compensation unit 914 selects an inter prediction process that minimizes the cost function as an optimum parameter, generates a prediction image, supplies the prediction image to the selection unit 60, and also supplies the motion compensation parameter and the motion vector information to the variable length coding unit 63. To supply.

なお、上述した実施の形態では、算出した中間動きベクトルＭＶ_ＩＴＭを変換して拡張レイヤの動きベクトルＭＶ_ＥＬとして利用したが、中間動きベクトルＭＶ_ＩＴＭを、動き補償で利用する動きベクトルの候補の一つとして拡張レイヤの動き補償予測処理を行ってもよいし、動きベクトルの予測における予測ベクトルとして用いてもよい。また、本実施の形態では、基本レイヤにおいて動き補償で使用するピクチャをフレーム構造としたが、フィールド単位で処理を行ってもよい。また、予測方法としてフレーム予測とフィールド予測の例を説明するが、マクロブロックに対して、異なったフィールドからの予測の平均を求めるデュアルプライム予測の場合でも、上述したようにピクチャの参照関係に基づいて基本レイヤの動きベクトルから中間動きベクトルを算出することが可能である。 In the above-described embodiment, the calculated intermediate motion vector MV_ITM is converted and used as the enhancement layer motion vector MV_EL. However, the intermediate motion vector MV_ITM is expanded as one of motion vector candidates used in motion compensation. The motion compensation prediction processing of the layer may be performed, or may be used as a prediction vector in motion vector prediction. Further, in this embodiment, the picture used for motion compensation in the base layer has a frame structure, but processing may be performed in units of fields. In addition, although examples of frame prediction and field prediction will be described as prediction methods, even in the case of dual prime prediction in which an average of predictions from different fields is obtained for a macroblock, as described above, it is based on the reference relationship of pictures. Thus, it is possible to calculate an intermediate motion vector from the motion vector of the base layer.

縮小／インタレース化処理部８３は、入力画像をダウンスケーリングするとともに、インタレース化するものである。 The reduction / interlacing processing unit 83 downscales the input image and interlaces it.

また、上述した実施の形態では、基本レイヤとしてインタレース化した映像データを用いた例で説明したが、プログレッシブの映像データを用いた場合でも、基本レイヤの動きベクトルから算出した中間動きベクトルを用いて、拡張レイヤの動きベクトルを算出することが可能である。この場合、縮小／インタレース化処理部８３は入力画像をダウンスケーリングする処理のみを行う。 In the above-described embodiment, the example using interlaced video data as the base layer has been described. However, even when progressive video data is used, an intermediate motion vector calculated from the motion vector of the base layer is used. Thus, the motion vector of the enhancement layer can be calculated. In this case, the reduction / interlacing processing unit 83 performs only the process of downscaling the input image.

〔実施の形態２〕
本発明の他の実施の形態について図１７から図２０に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記の実施の形態１において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those shown in the first embodiment are given the same reference numerals, and explanation thereof is omitted.

本実施の形態において、上記実施の形態１と異なるのは、基本レイヤの動き情報をもとに、拡張レイヤの復号を行う点である。換言すれば、上記実施の形態１では、基本レイヤの動き情報を中間動き情報に変換して、拡張レイヤの復号処理に用いたが、本実施の形態では、中間動き情報に変換することなく、拡張レイヤの復号処理に用いる。 The present embodiment is different from the first embodiment in that the enhancement layer is decoded based on the motion information of the base layer. In other words, in the first embodiment, the base layer motion information is converted into intermediate motion information and used for the enhancement layer decoding process. However, in the present embodiment, without converting into the intermediate motion information, Used for enhancement layer decoding.

（動画像復号装置１´）
図１７に、本実施の形態に係る動画像復号装置１´の構成を示す。動画像復号装置１´において、動画像復号装置１と異なるのは、動き情報正規化部２２を備えておらず、かつ動き情報変換処理部２０、動き情報蓄積部２１の代わりに動き情報変換処理部２０´、動き情報蓄積部２１´を備えている点である。 (Moving picture decoding apparatus 1 ')
FIG. 17 shows the configuration of the video decoding device 1 ′ according to the present embodiment. The moving image decoding apparatus 1 ′ differs from the moving image decoding apparatus 1 in that it does not include the motion information normalization unit 22, and instead of the motion information conversion processing unit 20 and the motion information storage unit 21, motion information conversion processing is performed. It is a point provided with the part 20 'and the motion information storage part 21'.

動き情報蓄積部２１´は、基本レイヤの動き情報ＭＶ_ＢＬをマクロブロック単位で蓄積し、インター予測部１９からの指示に従い、動き情報ＭＶ_ＢＬを動き情報変換処理部２０´に供給する。動き情報蓄積部２１´に蓄積される情報例について、図２０を参照して説明する。図２０に示すように、動き情報蓄積部２１´には、ピクチャ単位でピクチャ番号差分値とフィールドレートとが、マクロブロック単位で動きベクトルの方向（前方予測動きベクトル、後方予測動きベクトル）と、水平成分であるか、垂直成分であるかを示す情報が、マクロブロックと対応付けられて動き情報として蓄積されている。 The motion information accumulating unit 21 ′ accumulates the base layer motion information MV_BL in units of macroblocks, and supplies the motion information MV_BL to the motion information conversion processing unit 20 ′ in accordance with an instruction from the inter prediction unit 19. An example of information stored in the motion information storage unit 21 ′ will be described with reference to FIG. As shown in FIG. 20, in the motion information storage unit 21 ′, the picture number difference value and the field rate in units of pictures, the direction of motion vectors (forward prediction motion vector, backward prediction motion vector) in units of macroblocks, Information indicating whether the component is a horizontal component or a vertical component is stored as motion information in association with the macroblock.

ピクチャ番号差分値は、例えば、基本レイヤがＭＰＥＧ−２の場合、ピクチャヘッダから取得した現在のピクチャ番号と参照ピクチャのピクチャ番号の差分を取ることで算出できる。また、基本レイヤと拡張レイヤとで共通となるピクチャ番号が定義されている場合、スライスヘッダから取得した現在のピクチャ番号との差分を取ることで算出できる。すなわち、ピクチャ番号差分値は、動きベクトルの参照元と参照先のピクチャの時間間隔を示すものであればよい。 For example, when the base layer is MPEG-2, the picture number difference value can be calculated by calculating the difference between the current picture number acquired from the picture header and the picture number of the reference picture. Further, when a common picture number is defined in the base layer and the enhancement layer, it can be calculated by taking a difference from the current picture number acquired from the slice header. That is, the picture number difference value only needs to indicate the time interval between the motion vector reference source and the reference destination picture.

フレームレートは、基本レイヤと拡張レイヤとの表示レートが異なる場合に用いる。フレームレートは、シーケンスヘッダから取得することができる。本実施の形態では、基本レイヤがインタレースなのでフィールドレートに変換している。 The frame rate is used when the display rates of the base layer and the enhancement layer are different. The frame rate can be obtained from the sequence header. In this embodiment, since the base layer is interlaced, it is converted to a field rate.

なお、本実施の形態では、基本レイヤと拡張レイヤとでフレームレート（フィールドレート）が一致している前提で記載したが、フレームレートが異なり、かつ逓倍の場合は、拡張レイヤフレームレート/基本レイヤフレームレート”をＭＶ_ＢＬに乗算して、時間方向のスケールを合わせてＭＶ_ＥＬを求めればよい。 In this embodiment, the frame rate (field rate) is the same for the base layer and the enhancement layer. However, if the frame rate is different and multiplied, the enhancement layer frame rate / base layer MV_EL may be obtained by multiplying MV_BL by “frame rate” and adjusting the scale in the time direction.

動き情報変換処理部２０´は、拡張レイヤでのピクチャ参照関係に基づき、動き情報蓄積部２１´に蓄積されている動きベクトルＭＶ_ＢＬをスケーリングして、インター予測部１９が用いる動きベクトルＭＶ_ＥＬを導出する。 The motion information conversion processing unit 20 ′ scales the motion vector MV_BL stored in the motion information storage unit 21 ′ based on the picture reference relationship in the enhancement layer, and derives a motion vector MV_EL used by the inter prediction unit 19. .

（動きベクトルの導出方法２−１）
次にインター予測部１９が用いる動きベクトルを導出する方法について、図１８を参照して説明する。ここでは、基本レイヤで、フレーム構造におけるフィールド予測を行っている場合の拡張レイヤで用いる動きベクトルの導出処理について説明する。 (Motion vector derivation method 2-1)
Next, a method for deriving a motion vector used by the inter prediction unit 19 will be described with reference to FIG. Here, the motion vector derivation process used in the enhancement layer when field prediction in the frame structure is performed in the base layer will be described.

まず、処理対象はフレームｂ３とする。そして、フレームｂ３は、前方参照ピクチャとしてＢ１を、後方参照ピクチャとしてＩ４を、階層間参照ピクチャとして基本レイヤのＢＢ１を用いることとする。これらの参照ピクチャは、処理対象フレームｂ３が復号される時点で、すでに復号処理が完了している。 First, the processing target is a frame b3. Frame b3 uses B1 as the forward reference picture, I4 as the backward reference picture, and BB1 of the base layer as the inter-layer reference picture. These reference pictures have already been decoded when the processing target frame b3 is decoded.

フレームｂ３の予測ブロックＰＵで用いる動きベクトルＭＶ_ＥＬ（１）_ｅを基本レイヤから導出する場合を考える。この場合、フレームｂ３に対応する、基本レイヤのフレームＢＢ１の予測ブロックＰＵに対応するマクロブロックａを処理する際に用いた動きベクトルを利用する。本実施例では、マクロブロックａは、フレームｂ３の予測ブロックＰＵの中心点に対応する座標にある基本レイヤのマクロブロックをマクロブロックａとする。 Consider a case where the motion vector MV_EL (1) _e used in the prediction block PU of the frame b3 is derived from the base layer. In this case, the motion vector used when processing the macroblock a corresponding to the prediction block PU of the base layer frame BB1 corresponding to the frame b3 is used. In this embodiment, the macroblock a is a macroblock a of the base layer located at the coordinates corresponding to the center point of the prediction block PU of the frame b3.

上述したようにフレームＢＢ１はフレーム構造としてフィールド予測により処理されている。ここでは、フレームＢＢ１は後方参照ピクチャとして、フィールドＩ２１、Ｉ２２を参照し、マクロブロックａがそれぞれの参照ピクチャを参照して動きベクトルＭＶ_ＢＬ（１−１）_ｅ、ＭＶ_ＢＬ（１−２）_ｅにより処理されているものとする。 As described above, the frame BB1 is processed by field prediction as a frame structure. Here, the frame BB1 refers to the fields I21 and I22 as backward reference pictures, and the macroblock a refers to the respective reference pictures and processes them with the motion vectors MV_BL (1-1) _e and MV_BL (1-2) _e. It is assumed that

マクロブロックａはフィールド予測にて動きベクトルを算出しているので、ＭＶ_ＢＬ（１−１）_ｅ、ＭＶ_ＢＬ（１−２）_ｅの参照元フィールドはそれぞれＢ０１、Ｂ０２となる。処理対象のフレームｂ３に対応する基本レイヤのフィールドはＢ１２なので、インター予測部１９は動き情報蓄積部２１に蓄積された動きベクトルＭＶ_ＢＬ（１−２）_ｅを用いて動き補償を行う。 Since the macroblock a calculates a motion vector by field prediction, the reference source fields of MV_BL (1-1) _e and MV_BL (1-2) _e are B01 and B02, respectively. Since the base layer field corresponding to the frame b3 to be processed is B12, the inter prediction unit 19 performs motion compensation using the motion vector MV_BL (1-2) _e stored in the motion information storage unit 21.

具体的には、まず、インター予測部１９は、処理対象のｂ３に対応する基本レイヤのフレームＢＢ１を特定する。そして、フレームＢＢ１の対応するマクロブロックを処理する際に用いた動きベクトル情報（ＭＶ_ＢＬ（１−１）_ｅ、ＭＶ_ＢＬ（１−２）_ｅ）のうち、処理対象のｂ３に対応する基本レイヤのフレームＢ１２を処理する際に用いた動きベクトルＭＶ_ＢＬ（１−２）_ｅを動き情報蓄積部２１´から動き情報変換処理部２０´に供給させる。 Specifically, first, the inter prediction unit 19 specifies the frame BB1 of the base layer corresponding to the processing target b3. Of the motion vector information (MV_BL (1-1) _e, MV_BL (1-2) _e) used when processing the corresponding macroblock of the frame BB1, the frame of the base layer corresponding to b3 to be processed The motion vector MV_BL (1-2) _e used when processing B12 is supplied from the motion information storage unit 21 ′ to the motion information conversion processing unit 20 ′.

動き情報変換処理部２０´は、式（９）に従い、動きベクトルＭＶ_ＢＬ（１−２）_ｅから動きベクトルＭＶ_ＥＬ（１）_ｅを算出する。 The motion information conversion processing unit 20 ′ calculates a motion vector MV_EL (1) _e from the motion vector MV_BL (1-2) _e according to the equation (9).

MV_EL(1)_e=MV_BL(1-2)_e×tb_e/td_e×スケーリング処理式（９）
ここで、スケーリング処理とは、ＥＬ解像度／ＢＬ解像度であり、tb/tdは、各ピクチャの時間方向の距離の比である。 MV_EL (1) _e = MV_BL (1-2) _e × tb_e / td_e × scaling formula (9)
Here, the scaling processing is EL resolution / BL resolution, and tb / td is the ratio of the distances in the time direction of each picture.

（動きベクトルの導出方法２−２）
次に、基本レイヤで、フレーム構造におけるフレーム予測を行っている場合の拡張レイヤで用いる動きベクトルを導出処理について、図１９を参照して説明する。 (Motion vector derivation method 2-2)
Next, with reference to FIG. 19, a description will be given of a process for deriving a motion vector used in the enhancement layer when frame prediction is performed in the frame structure in the base layer.

ここでは、復号対象は、ｂ３であるとする。まず、インター予測部１９は、処理対象のｂ３に対応する基本レイヤのフレームＢ１を特定する。そして、フレームＢ１の対応するマクロブロックを処理する際に用いた動きベクトルＭＶ_ＢＬ（１）_ｆを動き情報蓄積部２１´から動き情報変換処理部２０´に供給させる。 Here, it is assumed that the decoding target is b3. First, the inter prediction unit 19 specifies the frame B1 of the base layer corresponding to b3 to be processed. Then, the motion vector MV_BL (1) _f used when processing the corresponding macroblock of the frame B1 is supplied from the motion information storage unit 21 ′ to the motion information conversion processing unit 20 ′.

動き情報変換処理部２０´は、動きベクトルＭＶ_ＢＬ（１）_ｆから式（１０）を用いて動きベクトルＭＶ_ＥＬ（１）_ｆを算出する。 The motion information conversion processing unit 20 ′ calculates a motion vector MV_EL (1) _f from the motion vector MV_BL (1) _f using the equation (10).

MV_EL(1)_f=MV_BL(1)_f×tb_f/td_f×スケーリング処理式（１０）
（動画像符号化装置２´）
次に、動画像符号化装置２´について、図２３を参照して説明する。動画像符号化装置２´において、動画像符号化装置２と異なるのは、動き情報正規化部５４を備えていない点と、動き情報変換処理部５２、動き情報蓄積部５３に代えて動き情報変換処理部５２´、動き情報蓄積部５３´が備えられている点である。 MV_EL (1) _f = MV_BL (1) _f × tb_f / td_f × scaling processing equation (10)
(Moving picture encoding device 2 ')
Next, the moving picture coding apparatus 2 ′ will be described with reference to FIG. The moving image encoding device 2 ′ differs from the moving image encoding device 2 in that the motion information normalization unit 54 is not provided, and the motion information conversion processing unit 52 and the motion information storage unit 53 are replaced with motion information. A conversion processing unit 52 ′ and a motion information storage unit 53 ′ are provided.

なお、動き情報変換処理部５２´、動き情報蓄積部５３´は、動き情報変換処理部２０´、動き情報蓄積部２１´と同様の機能を有するものである。 The motion information conversion processing unit 52 ′ and the motion information storage unit 53 ′ have the same functions as the motion information conversion processing unit 20 ′ and the motion information storage unit 21 ′.

なお、本実施の形態では、基本レイヤの動き情報から算出した動きベクトルを利用して拡張レイヤの動き補償予測を行うこととしたが、算出した動きベクトルを動き補償で利用する動きベクトルの候補の一つとして拡張レイヤの動き補償予測処理を行ってもよいし、動きベクトルの予測における予測ベクトルとして用いてもよい。例えば、マージモードやメディアン予測で使用する動きベクトルの候補の一つとして基本レイヤの動き情報から算出した動きベクトルを用いてもよい。 In the present embodiment, the motion compensation prediction of the enhancement layer is performed using the motion vector calculated from the motion information of the base layer, but the motion vector candidate of the motion vector to be used for motion compensation is calculated. As one, the motion compensation prediction process of the enhancement layer may be performed, or may be used as a prediction vector in motion vector prediction. For example, a motion vector calculated from motion information of the base layer may be used as one of motion vector candidates used in merge mode or median prediction.

また、本実施の形態では、基本レイヤにおいて動き補償で使用するピクチャをフレーム構造としたが、フィールド単位で処理を行ってもよい。また、予測方法としてフレーム予測とフィールド予測の例を説明するが、マクロブロックに対して、異なったフィールドからの予測の平均を求めるデュアルプライム予測の場合でも、上述したように各レイヤのピクチャの参照関係に基づいて基本レイヤから拡張レイヤへの動きベクトルを算出することが可能である。 Further, in this embodiment, the picture used for motion compensation in the base layer has a frame structure, but processing may be performed in units of fields. In addition, examples of frame prediction and field prediction will be described as prediction methods. However, even in the case of dual prime prediction in which an average of predictions from different fields is obtained for a macroblock, reference to pictures in each layer as described above. A motion vector from the base layer to the enhancement layer can be calculated based on the relationship.

また、上述した実施の形態では、基本レイヤとしてインタレース化した映像データを用いた例で説明したが、プログレッシブの映像データを用いた場合でも、基本レイヤの動きベクトルから算出した中間動きベクトルを用いて、拡張レイヤの動きベクトルを算出することが可能である。 In the above-described embodiment, the example using interlaced video data as the base layer has been described. However, even when progressive video data is used, an intermediate motion vector calculated from the motion vector of the base layer is used. Thus, the motion vector of the enhancement layer can be calculated.

（付記事項１）
上述した動画像符号化装置２及び動画像復号装置１は、動画像の送信、受信、記録、再生を行う各種装置に搭載して利用することができる。なお、動画像は、カメラ等により撮像された自然動画像であってもよいし、コンピュータ等により生成された人工動画像（ＣＧおよびＧＵＩを含む）であってもよい。 (Appendix 1)
The above-described moving image encoding device 2 and moving image decoding device 1 can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images. The moving image may be a natural moving image captured by a camera or the like, or may be an artificial moving image (including CG and GUI) generated by a computer or the like.

まず、上述した動画像符号化装置２及び動画像復号装置１を、動画像の送信及び受信に利用できることを、図２０を参照して説明する。 First, it will be described with reference to FIG. 20 that the moving picture encoding apparatus 2 and the moving picture decoding apparatus 1 described above can be used for transmission and reception of moving pictures.

図２０の（ａ）は、動画像符号化装置２を搭載した送信装置ＰＲＯＤ＿Ａの構成を示したブロック図である。図２０の（ａ）に示すように、送信装置ＰＲＯＤ＿Ａは、動画像を符号化することによって符号化データを得る符号化部ＰＲＯＤ＿Ａ１と、符号化部ＰＲＯＤ＿Ａ１が得た符号化データで搬送波を変調することによって変調信号を得る変調部ＰＲＯＤ＿Ａ２と、変調部ＰＲＯＤ＿Ａ２が得た変調信号を送信する送信部ＰＲＯＤ＿Ａ３と、を備えている。上述した動画像符号化装置２は、この符号化部ＰＲＯＤ＿Ａ１として利用される。 FIG. 20A is a block diagram illustrating a configuration of a transmission device PROD_A in which the moving image encoding device 2 is mounted. As illustrated in FIG. 20A, the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_A1. Thus, a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided. The moving image encoding apparatus 2 described above is used as the encoding unit PROD_A1.

送信装置ＰＲＯＤ＿Ａは、符号化部ＰＲＯＤ＿Ａ１に入力する動画像の供給源として、動画像を撮像するカメラＰＲＯＤ＿Ａ４、動画像を記録した記録媒体ＰＲＯＤ＿Ａ５、動画像を外部から入力するための入力端子ＰＲＯＤ＿Ａ６、及び、画像を生成または加工する画像処理部Ａ７を更に備えていてもよい。図２０の（ａ）においては、これら全てを送信装置ＰＲＯＤ＿Ａが備えた構成を例示しているが、一部を省略しても構わない。 The transmission device PROD_A is a camera PROD_A4 that captures a moving image, a recording medium PROD_A5 that records the moving image, an input terminal PROD_A6 that inputs the moving image from the outside, as a supply source of the moving image input to the encoding unit PROD_A1. An image processing unit A7 that generates or processes an image may be further provided. In FIG. 20A, a configuration in which all of these are provided in the transmission device PROD_A is illustrated, but a part may be omitted.

なお、記録媒体ＰＲＯＤ＿Ａ５は、符号化されていない動画像を記録したものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化された動画像を記録したものであってもよい。後者の場合、記録媒体ＰＲＯＤ＿Ａ５と符号化部ＰＲＯＤ＿Ａ１との間に、記録媒体ＰＲＯＤ＿Ａ５から読み出した符号化データを記録用の符号化方式に従って復号する復号部（不図示）を介在させるとよい。 The recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 according to the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.

図２０の（ｂ）は、動画像復号装置１を搭載した受信装置ＰＲＯＤ＿Ｂの構成を示したブロック図である。図２０の（ｂ）に示すように、受信装置ＰＲＯＤ＿Ｂは、変調信号を受信する受信部ＰＲＯＤ＿Ｂ１と、受信部ＰＲＯＤ＿Ｂ１が受信した変調信号を復調することによって符号化データを得る復調部ＰＲＯＤ＿Ｂ２と、復調部ＰＲＯＤ＿Ｂ２が得た符号化データを復号することによって動画像を得る復号部ＰＲＯＤ＿Ｂ３と、を備えている。上述した動画像復号装置１は、この復号部ＰＲＯＤ＿Ｂ３として利用される。 FIG. 20B is a block diagram illustrating a configuration of a receiving device PROD_B in which the video decoding device 1 is mounted. As illustrated in FIG. 20B, the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator. A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2. The moving picture decoding apparatus 1 described above is used as the decoding unit PROD_B3.

受信装置ＰＲＯＤ＿Ｂは、復号部ＰＲＯＤ＿Ｂ３が出力する動画像の供給先として、動画像を表示するディスプレイＰＲＯＤ＿Ｂ４、動画像を記録するための記録媒体ＰＲＯＤ＿Ｂ５、及び、動画像を外部に出力するための出力端子ＰＲＯＤ＿Ｂ６を更に備えていてもよい。図２０の（ｂ）においては、これら全てを受信装置ＰＲＯＤ＿Ｂが備えた構成を例示しているが、一部を省略しても構わない。 The receiving device PROD_B has a display PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording the moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3. PROD_B6 may be further provided. FIG. 20B illustrates a configuration in which the reception apparatus PROD_B includes all of these, but a part may be omitted.

なお、記録媒体ＰＲＯＤ＿Ｂ５は、符号化されていない動画像を記録するためのものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化されたものであってもよい。後者の場合、復号部ＰＲＯＤ＿Ｂ３と記録媒体ＰＲＯＤ＿Ｂ５との間に、復号部ＰＲＯＤ＿Ｂ３から取得した動画像を記録用の符号化方式に従って符号化する符号化部（不図示）を介在させるとよい。 The recording medium PROD_B5 may be used for recording a non-encoded moving image, or may be encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.

なお、変調信号を伝送する伝送媒体は、無線であってもよいし、有線であってもよい。また、変調信号を伝送する伝送態様は、放送（ここでは、送信先が予め特定されていない送信態様を指す）であってもよいし、通信（ここでは、送信先が予め特定されている送信態様を指す）であってもよい。すなわち、変調信号の伝送は、無線放送、有線放送、無線通信、及び有線通信の何れによって実現してもよい。 Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.

例えば、地上デジタル放送の放送局（放送設備など）／受信局（テレビジョン受像機など）は、変調信号を無線放送で送受信する送信装置ＰＲＯＤ＿Ａ／受信装置ＰＲＯＤ＿Ｂの一例である。また、ケーブルテレビ放送の放送局（放送設備など）／受信局（テレビジョン受像機など）は、変調信号を有線放送で送受信する送信装置ＰＲＯＤ＿Ａ／受信装置ＰＲＯＤ＿Ｂの一例である。 For example, a terrestrial digital broadcast broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by wireless broadcasting. Further, a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.

また、インターネットを用いたＶＯＤ（Video On Demand）サービスや動画共有サービスなどのサーバ（ワークステーションなど）／クライアント（テレビジョン受像機、パーソナルコンピュータ、スマートフォンなど）は、変調信号を通信で送受信する送信装置ＰＲＯＤ＿Ａ／受信装置ＰＲＯＤ＿Ｂの一例である（通常、ＬＡＮにおいては伝送媒体として無線又は有線の何れかが用いられ、ＷＡＮにおいては伝送媒体として有線が用いられる）。ここで、パーソナルコンピュータには、デスクトップ型ＰＣ、ラップトップ型ＰＣ、及びタブレット型ＰＣが含まれる。また、スマートフォンには、多機能携帯電話端末も含まれる。 Also, a server (workstation or the like) / client (television receiver, personal computer, smartphone, etc.) such as a VOD (Video On Demand) service or a video sharing service using the Internet transmits and receives a modulated signal by communication. This is an example of PROD_A / reception device PROD_B (usually, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.

なお、動画共有サービスのクライアントは、サーバからダウンロードした符号化データを復号してディスプレイに表示する機能に加え、カメラで撮像した動画像を符号化してサーバにアップロードする機能を有している。すなわち、動画共有サービスのクライアントは、送信装置ＰＲＯＤ＿Ａ及び受信装置ＰＲＯＤ＿Ｂの双方として機能する。 Note that the client of the video sharing service has a function of encoding a moving image captured by a camera and uploading it to the server in addition to a function of decoding the encoded data downloaded from the server and displaying it on the display. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.

次に、上述した動画像符号化装置２及び動画像復号装置１を、動画像の記録及び再生に利用できることを、図２１を参照して説明する。 Next, the fact that the above-described moving image encoding device 2 and moving image decoding device 1 can be used for recording and reproduction of moving images will be described with reference to FIG.

図２１の（ａ）は、上述した動画像符号化装置２を搭載した記録装置ＰＲＯＤ＿Ｃの構成を示したブロック図である。図２１の（ａ）に示すように、記録装置ＰＲＯＤ＿Ｃは、動画像を符号化することによって符号化データを得る符号化部ＰＲＯＤ＿Ｃ１と、符号化部ＰＲＯＤ＿Ｃ１が得た符号化データを記録媒体ＰＲＯＤ＿Ｍに書き込む書込部ＰＲＯＤ＿Ｃ２と、を備えている。上述した動画像符号化装置２は、この符号化部ＰＲＯＤ＿Ｃ１として利用される。 FIG. 21A is a block diagram showing a configuration of a recording apparatus PROD_C in which the above-described moving picture encoding apparatus 2 is mounted. As shown in FIG. 21 (a), the recording device PROD_C includes an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M. A writing unit PROD_C2 for writing. The moving image encoding apparatus 2 described above is used as the encoding unit PROD_C1.

なお、記録媒体ＰＲＯＤ＿Ｍは、（１）ＨＤＤ（Hard Disk Drive）やＳＳＤ(Solid State Drive)などのように、記録装置ＰＲＯＤ＿Ｃに内蔵されるタイプのものであってもよいし、（２）ＳＤメモリカードやＵＳＢ（Universal Serial Bus）フラッシュメモリなどのように、記録装置ＰＲＯＤ＿Ｃに接続されるタイプのものであってもよいし、（３）ＤＶＤ（Digital Versatile Disc）やＢＤ（Blu-ray Disc:登録商標）などのように、記録装置ＰＲＯＤ＿Ｃに内蔵されたドライブ装置（不図示）に装填されるものであってもよい。 The recording medium PROD_M may be of a type built in the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of a type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disc: registration) Or a drive device (not shown) built in the recording device PROD_C.

また、記録装置ＰＲＯＤ＿Ｃは、符号化部ＰＲＯＤ＿Ｃ１に入力する動画像の供給源として、動画像を撮像するカメラＰＲＯＤ＿Ｃ３、動画像を外部から入力するための入力端子ＰＲＯＤ＿Ｃ４、動画像を受信するための受信部ＰＲＯＤ＿Ｃ５、及び、画像を生成または加工する画像処理部Ｃ６を更に備えていてもよい。図２１の（ａ）においては、これら全てを記録装置ＰＲＯＤ＿Ｃが備えた構成を例示しているが、一部を省略しても構わない。 In addition, the recording device PROD_C serves as a moving image supply source to be input to the encoding unit PROD_C1. The unit PROD_C5 and an image processing unit C6 that generates or processes an image may be further provided. FIG. 21A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but a part of the configuration may be omitted.

なお、受信部ＰＲＯＤ＿Ｃ５は、符号化されていない動画像を受信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを受信するものであってもよい。後者の場合、受信部ＰＲＯＤ＿Ｃ５と符号化部ＰＲＯＤ＿Ｃ１との間に、伝送用の符号化方式で符号化された符号化データを復号する伝送用復号部（不図示）を介在させるとよい。 The receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.

このような記録装置ＰＲＯＤ＿Ｃとしては、例えば、ＤＶＤレコーダ、ＢＤレコーダ、ＨＤＤ（Hard Disk Drive）レコーダなどが挙げられる（この場合、入力端子ＰＲＯＤ＿Ｃ４又は受信部ＰＲＯＤ＿Ｃ５が動画像の主な供給源となる）。また、カムコーダ（この場合、カメラＰＲＯＤ＿Ｃ３が動画像の主な供給源となる）、パーソナルコンピュータ（この場合、受信部ＰＲＯＤ＿Ｃ５又は画像処理部Ｃ６が動画像の主な供給源となる）、スマートフォン（この場合、カメラＰＲＯＤ＿Ｃ３又は受信部ＰＲＯＤ＿Ｃ５が動画像の主な供給源となる）なども、このような記録装置ＰＲＯＤ＿Ｃの一例である。 Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, and an HDD (Hard Disk Drive) recorder (in this case, the input terminal PROD_C4 or the receiving unit PROD_C5 is a main supply source of moving images). . In addition, a camcorder (in this case, the camera PROD_C3 is a main source of moving images), a personal computer (in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images), a smartphone (in this case In this case, the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images) is also an example of such a recording device PROD_C.

図２１の（ｂ）は、上述した動画像復号装置１を搭載した再生装置ＰＲＯＤ＿Ｄの構成を示したブロックである。図２１の（ｂ）に示すように、再生装置ＰＲＯＤ＿Ｄは、記録媒体ＰＲＯＤ＿Ｍに書き込まれた符号化データを読み出す読出部ＰＲＯＤ＿Ｄ１と、読出部ＰＲＯＤ＿Ｄ１が読み出した符号化データを復号することによって動画像を得る復号部ＰＲＯＤ＿Ｄ２と、を備えている。上述した動画像復号装置１は、この復号部ＰＲＯＤ＿Ｄ２として利用される。 FIG. 21B is a block diagram illustrating a configuration of a playback device PROD_D in which the above-described video decoding device 1 is mounted. As shown in (b) of FIG. 21, the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written to the recording medium PROD_M and a coded data read by the read unit PROD_D1. And a decoding unit PROD_D2 to be obtained. The moving picture decoding apparatus 1 described above is used as the decoding unit PROD_D2.

なお、記録媒体ＰＲＯＤ＿Ｍは、（１）ＨＤＤやＳＳＤなどのように、再生装置ＰＲＯＤ＿Ｄに内蔵されるタイプのものであってもよいし、（２）ＳＤメモリカードやＵＳＢフラッシュメモリなどのように、再生装置ＰＲＯＤ＿Ｄに接続されるタイプのものであってもよいし、（３）ＤＶＤやＢＤなどのように、再生装置ＰＲＯＤ＿Ｄに内蔵されたドライブ装置（不図示）に装填されるものであってもよい。 Note that the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory, It may be of a type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as DVD or BD. Good.

また、再生装置ＰＲＯＤ＿Ｄは、復号部ＰＲＯＤ＿Ｄ２が出力する動画像の供給先として、動画像を表示するディスプレイＰＲＯＤ＿Ｄ３、動画像を外部に出力するための出力端子ＰＲＯＤ＿Ｄ４、及び、動画像を送信する送信部ＰＲＯＤ＿Ｄ５を更に備えていてもよい。図２１の（ｂ）においては、これら全てを再生装置ＰＲＯＤ＿Ｄが備えた構成を例示しているが、一部を省略しても構わない。 In addition, the playback device PROD_D has a display PROD_D3 that displays a moving image, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image as a supply destination of the moving image output by the decoding unit PROD_D2. PROD_D5 may be further provided. FIG. 21B illustrates a configuration in which the playback apparatus PROD_D includes all of these, but a part may be omitted.

なお、送信部ＰＲＯＤ＿Ｄ５は、符号化されていない動画像を送信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを送信するものであってもよい。後者の場合、復号部ＰＲＯＤ＿Ｄ２と送信部ＰＲＯＤ＿Ｄ５との間に、動画像を伝送用の符号化方式で符号化する符号化部（不図示）を介在させるとよい。 The transmission unit PROD_D5 may transmit an unencoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image with an encoding method for transmission between the decoding unit PROD_D2 and the transmission unit PROD_D5.

このような再生装置ＰＲＯＤ＿Ｄとしては、例えば、ＤＶＤプレイヤ、ＢＤプレイヤ、ＨＤＤプレイヤなどが挙げられる（この場合、テレビジョン受像機等が接続される出力端子ＰＲＯＤ＿Ｄ４が動画像の主な供給先となる）。また、テレビジョン受像機（この場合、ディスプレイＰＲＯＤ＿Ｄ３が動画像の主な供給先となる）、デジタルサイネージ（電子看板や電子掲示板等とも称され、ディスプレイＰＲＯＤ＿Ｄ３又は送信部ＰＲＯＤ＿Ｄ５が動画像の主な供給先となる）、デスクトップ型ＰＣ（この場合、出力端子ＰＲＯＤ＿Ｄ４又は送信部ＰＲＯＤ＿Ｄ５が動画像の主な供給先となる）、ラップトップ型又はタブレット型ＰＣ（この場合、ディスプレイＰＲＯＤ＿Ｄ３又は送信部ＰＲＯＤ＿Ｄ５が動画像の主な供給先となる）、スマートフォン（この場合、ディスプレイＰＲＯＤ＿Ｄ３又は送信部ＰＲＯＤ＿Ｄ５が動画像の主な供給先となる）なども、このような再生装置ＰＲＯＤ＿Ｄの一例である。 Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main supply destination of moving images). . In addition, a television receiver (in this case, the display PROD_D3 is a main supply destination of moving images), a digital signage (also referred to as an electronic signboard or an electronic bulletin board), and the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images. Desktop PC (in this case, the output terminal PROD_D4 or the transmission unit PROD_D5 is the main video image supply destination), laptop or tablet PC (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a moving image) A smartphone (which is a main image supply destination), a smartphone (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination), and the like are also examples of such a playback device PROD_D.

（付記事項２）
最後に、動画像復号装置１（１´）、動画像符号化装置２（２´）の各ブロックは、集積回路（ＩＣチップ）上に形成された論理回路によってハードウェア的に実現していてもよいし、ＣＰＵ（central processing unit）を用いてソフトウェア的に実現してもよい。 (Appendix 2)
Finally, each block of the moving picture decoding apparatus 1 (1 ′) and the moving picture encoding apparatus 2 (2 ′) is realized in hardware by a logic circuit formed on an integrated circuit (IC chip). Alternatively, it may be realized by software using a CPU (central processing unit).

後者の場合、動画像復号装置１（１´）、動画像符号化装置２（２´）は、各機能を実現する制御プログラムの命令を実行するＣＰＵ、上記プログラムを格納したＲＯＭ（read only memory）、上記プログラムを展開するＲＡＭ（random access memory）、上記プログラムおよび各種データを格納するメモリ等の記憶装置（記録媒体）などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである動画像復号装置１（１´）、動画像符号化装置２（２´）の制御プログラムのプログラムコード（実行形式プログラム、中間コードプログラム、ソースプログラム）をコンピュータで読み取り可能に記録した記録媒体を、上記の動画像復号装置１（１´）、動画像符号化装置２（２´）に供給し、そのコンピュータ（またはＣＰＵやＭＰＵ（micro processing unit））が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。 In the latter case, the moving picture decoding apparatus 1 (1 ′) and the moving picture encoding apparatus 2 (2 ′) include a CPU that executes instructions of a control program that implements each function, and a ROM (read only memory) that stores the program. ), A RAM (random access memory) for expanding the program, and a storage device (recording medium) such as a memory for storing the program and various data. An object of the present invention is to provide program codes (execution format program, intermediate code) of control programs of the video decoding device 1 (1 ′) and the video encoding device 2 (2 ′) which are software for realizing the functions described above. A recording medium in which a program and a source program are recorded so as to be readable by a computer is supplied to the moving picture decoding apparatus 1 (1 ′) and the moving picture encoding apparatus 2 (2 ′), and the computer (or CPU or MPU) (Micro processing unit)) can also be achieved by reading and executing the program code recorded on the recording medium.

上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ類、フロッピー（登録商標）ディスク／ハードディスク等の磁気ディスクやＣＤ−ＲＯＭ（compact disc read-only memory）／ＭＯ（magneto-optical）／ＭＤ（Mini Disc）／ＤＶＤ（digital versatile disk）／ＣＤ−Ｒ（CD Recordable）等の光ディスクを含むディスク類、ＩＣカード（メモリカードを含む）／光カード等のカード類、マスクＲＯＭ／ＥＰＲＯＭ（erasable programmable read-only memory）／ＥＥＰＲＯＭ（electrically erasable and programmable read-only memory）／フラッシュＲＯＭ等の半導体メモリ類、あるいはＰＬＤ（Programmable logic device）やＦＰＧＡ（Field Programmable Gate Array）等の論理回路類などを用いることができる。 Examples of the recording medium include tapes such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disc read-only memory) / MO (magneto-optical) / Disks including optical discs such as MD (Mini Disc) / DVD (digital versatile disk) / CD-R (CD Recordable), cards such as IC cards (including memory cards) / optical cards, mask ROM / EPROM (erasable) Uses semiconductor memory such as programmable read-only memory (EEPROM) / EEPROM (electrically erasable and programmable read-only memory) / flash ROM, or logic circuits such as PLD (Programmable logic device) and FPGA (Field Programmable Gate Array) be able to.

また、動画像復号装置１（１´）、動画像符号化装置２（２´）を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークは、プログラムコードを伝送可能であればよく、特に限定されない。例えば、インターネット、イントラネット、エキストラネット、ＬＡＮ（local area network）、ＩＳＤＮ（integrated services digital network）、ＶＡＮ（value-added network）、ＣＡＴＶ（community antenna television）通信網、仮想専用網（virtual private network）、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、この通信ネットワークを構成する伝送媒体も、プログラムコードを伝送可能な媒体であればよく、特定の構成または種類のものに限定されない。例えば、ＩＥＥＥ（institute of electrical and electronic engineers）１３９４、ＵＳＢ、電力線搬送、ケーブルＴＶ回線、電話線、ＡＤＳＬ（asynchronous digital subscriber loop）回線等の有線でも、ＩｒＤＡ（infrared data association）やリモコンのような赤外線、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＩＥＥＥ８０２．１１無線、ＨＤＲ（high data rate）、ＮＦＣ（Near Field Communication）、ＤＬＮＡ（Digital Living Network Alliance）、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。 Further, the moving picture decoding apparatus 1 (1 ′) and the moving picture encoding apparatus 2 (2 ′) may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN (local area network), ISDN (integrated services digital network), VAN (value-added network), CATV (community antenna television) communication network, virtual private network (virtual private network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, infrared data such as IrDA (infrared data association) or remote control can be used for wired such as IEEE (institute of electrical and electronic engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line, etc. , Bluetooth (registered trademark), IEEE 802.11 wireless, HDR (high data rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, terrestrial digital network, etc. Is possible. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

本発明は上述した各実施の形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments can be obtained by appropriately combining technical means disclosed in different embodiments. The form is also included in the technical scope of the present invention.

本発明は、スケーラブル符号化により符号化を行う画像符号化装置、画像復号装置に好適に用いることができる。 The present invention can be suitably used for an image encoding device and an image decoding device that perform encoding by scalable encoding.

１、１´ 動画像復号装置
３１可変長復号部（第１レイヤ動きベクトル復号手段）
２０、２０´ 動き情報変換処理部（第２レイヤ動きベクトル導出手段）
２２動き情報正規化部（中間的動きベクトル導出手段）
２、２´ 動画像符号化装置
５２、５２´ 動き情報変換処理部（第２レイヤ動きベクトル導出手段）
５４動き情報正規化部（中間的動きベクトル導出手段） 1, 1 ′ moving picture decoding apparatus 31 variable length decoding unit (first layer motion vector decoding means)
20, 20 ′ motion information conversion processing unit (second layer motion vector deriving means)
22 Motion information normalization unit (intermediate motion vector deriving means)
2, 2 ′ moving image encoding device 52, 52 ′ motion information conversion processing unit (second layer motion vector deriving means)
54 Motion information normalization unit (intermediate motion vector deriving means)

Claims

A video decoding device for decoding encoded data composed of a plurality of layers having different encoding methods,
A first layer motion vector decoding means for decoding a motion vector used for decoding the first layer with reference to motion vector information included in the first layer of the plurality of layers;
Intermediate motion vector deriving means for deriving an intermediate motion vector based on the motion vector decoded by the first layer motion vector decoding means;
Second layer motion vector deriving means for deriving a motion vector used for decoding the second layer of the plurality of layers with reference to the intermediate motion vector derived by the intermediate motion vector deriving means;
A moving picture decoding apparatus comprising:

2. The intermediate motion vector deriving unit converts the motion vector decoded by the first layer motion vector decoding unit into a value separated by a predetermined frame to obtain an intermediate motion vector. Video decoding device.

The first layer includes motion vectors derived using a plurality of fields for one frame,
The intermediate motion vector deriving unit uses a motion vector used for the decoding target frame of the second layer as a field corresponding to the processing target frame among a plurality of fields in the first layer frame corresponding to the processing target frame. The moving image decoding apparatus according to claim 1, wherein the intermediate motion vector is derived based on a motion vector derived by using.

The intermediate motion vector deriving means derives the intermediate motion vector based on the motion vector used in the second layer decoding target frame based on the motion vector in the first layer frame corresponding to the processing target frame. The moving picture decoding apparatus according to claim 1, wherein the moving picture decoding apparatus is provided.

The intermediate motion vector deriving means, when the frame of the first layer corresponding to the decoding target frame of the second layer does not include a motion vector in the prediction direction required by the second layer, 3. The moving picture decoding apparatus according to claim 1, wherein the intermediate motion vector is derived based on a motion vector included in a reference frame nearest to the frame of the first layer in a conversion order.

A video decoding device for decoding encoded data composed of a plurality of layers having different encoding methods,
A first layer motion vector decoding means for decoding a motion vector used for decoding the first layer with reference to motion vector information included in the first layer of the plurality of layers;
A second layer for deriving a motion vector used for decoding the second layer of the plurality of layers based on the reference relationship between the motion vector decoded by the first layer motion vector decoding means and the reference frame of the second layer; A motion vector deriving means;
A moving picture decoding apparatus comprising:

The first layer includes motion vectors derived using a plurality of fields for one frame,
The second layer motion vector deriving means corresponds to a motion vector used for a decoding target frame of the second layer corresponding to the processing target frame among a plurality of fields in the first layer frame corresponding to the processing target frame. 7. The moving picture decoding apparatus according to claim 6, wherein the second layer motion vector is derived based on a motion vector derived using a field to be transmitted.

The second layer motion vector deriving means determines the motion vector used for the second layer decoding target frame based on the motion vector in the first layer frame corresponding to the processing target frame. The video decoding device according to claim 6, wherein a vector is derived.

Video encoding apparatus that generates encoded data including a plurality of layers having different encoding methods and including a prediction residual that is a difference between an original image and a predicted image in each layer In
Intermediate motion vector deriving means for deriving an intermediate motion vector based on the motion vector used for decoding of the first layer;
A second layer motion vector for deriving a motion vector used for generating the predicted image for generating the second layer encoded data with reference to the intermediate motion vector derived by the intermediate motion vector deriving means A moving picture coding apparatus comprising: a derivation unit;

Video encoding apparatus that generates encoded data including a plurality of layers having different encoding methods and including a prediction residual that is a difference between an original image and a predicted image in each layer In
A second vector for deriving a motion vector used for generating the predicted image for generating the encoded data of the second layer based on the reference relationship between the motion vector used for decoding the first layer and the reference frame of the second layer; A moving picture coding apparatus comprising layer motion vector deriving means.