JP2016127372A

JP2016127372A - Video encoder, video decoder, video processing system, video encoding method, video decoding method, and program

Info

Publication number: JP2016127372A
Application number: JP2014265509A
Authority: JP
Inventors: 圭河村; Kei Kawamura; 内藤　整; Hitoshi Naito; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-12-26
Filing date: 2014-12-26
Publication date: 2016-07-11

Abstract

PROBLEM TO BE SOLVED: To enhance the coding efficiency of a video of a plurality of view points of different perspective.SOLUTION: A video encoder 1 includes a first encoder 10, a second encoder 20, and an inter-view processing unit 30. The first encoder 10 encodes the original image SIG1 of expansion point of view, and outputs a bit stream SIG6 of expansion point of view. The second encoder 20 encodes the original image SIG2 of reference point of view, and outputs a bit stream SIG7 of reference point of view. The inter-view processing unit 30 performs projective transformation and illumination compensation of a local decoded image SIG3 depending on the relationship of reference point of view and expansion point of view, and generates a local decoded image SIG4 after conversion of the reference point of view, so that the first encoder 10 can use this local decoded image SIG4 after conversion of the reference point of view as a reference image, when performing inter-prediction.SELECTED DRAWING: Figure 2

Description

本発明は、動画像符号化装置、動画像復号装置、動画像処理システム、動画像符号化方法、動画像復号方法、およびプログラムに関する。 The present invention relates to a moving image encoding device, a moving image decoding device, a moving image processing system, a moving image encoding method, a moving image decoding method, and a program.

従来、様々な視点からの映像の視聴を可能とする、フリーナビゲーションと呼ばれる映像表現方式が提案されている。ビデオゲームでは、フリーナビゲーションは既に一般的に利用されているが、近年、実写映像を素材映像としてフリーナビゲーションを実現する手法が検討されている。 2. Description of the Related Art Conventionally, a video expression method called free navigation has been proposed that enables viewing of video from various viewpoints. In video games, free navigation is already generally used, but in recent years, techniques for realizing free navigation using live-action video as material video have been studied.

実写映像を素材映像としてフリーナビゲーションを実現する場面として、例えばサッカーの試合の映像を視聴者に配信する場合が考えられる。この場合、サッカーフィールドを囲うように複数のカメラを固定して配置し、これら複数のカメラによりサッカーフィールド全体を撮影して、撮影した映像を素材映像として視聴者に配信する。これによれば、視聴したい視点を視聴者が適宜操作すると、操作された視点からの映像を、素材映像として配信された複数の映像から作成して、視聴者に提示することができる。 As a scene for realizing free navigation using a live-action video as a material video, for example, a video of a soccer game can be distributed to viewers. In this case, a plurality of cameras are fixedly arranged so as to surround the soccer field, the entire soccer field is photographed by the plurality of cameras, and the photographed video is distributed to the viewer as a material video. According to this, when the viewer appropriately operates the viewpoint to be viewed, the video from the operated viewpoint can be created from the plurality of videos distributed as the material video and presented to the viewer.

以上のように、実写映像を映像素材としてフリーナビゲーションを実現する場合には、パースの異なる複数の視点から撮影した複数の映像（多視点映像）を、素材映像として視聴者に配信する必要がある。このため、素材映像を効率的に圧縮することが好ましい。 As described above, in order to realize free navigation using live-action video as video material, it is necessary to deliver multiple videos (multi-view video) taken from multiple perspectives with different perspectives to the viewer as material video . For this reason, it is preferable to efficiently compress the material image.

そこで、ＨＥＶＣの拡張方式として、ＭＶ−ＨＥＶＣと呼ばれる多視点拡張方式が規格化されている（例えば、非特許文献１参照）。ＭＶ−ＨＥＶＣでは、パースの異なる複数の視点の映像を、相互に参照して、同時に符号化することが可能である。 Thus, a multi-view extension method called MV-HEVC has been standardized as an extension method of HEVC (see, for example, Non-Patent Document 1). In MV-HEVC, videos from a plurality of viewpoints with different perspectives can be referred to each other and encoded simultaneously.

ＭＶ−ＨＥＶＣにより２つの視点の映像を符号化および復号する場合について、以下に説明する。なお、以降では、２つの視点のうち、一方を基準視点と呼び、他方を拡張視点と呼ぶこととする。基準視点の映像については、ＨＥＶＣで従来と同様に符号化し、符号化された基準視点の映像については、ＨＥＶＣで従来と同様に復号する。一方、拡張視点の映像については、基準視点のローカル復号画像をインター予測の参照画像として用いて符号化し、符号化された拡張視点の映像については、基準視点の復号画像をインター予測の参照画像として用いて復号する。このため、基準視点の映像と拡張視点の映像とで相関が高い場合に、基準視点の画像を参照してインター予測を行うことで、フレーム内予測を行う場合と比べて高能率符号化を行うことができる。 A case where video of two viewpoints is encoded and decoded by MV-HEVC will be described below. In the following, one of the two viewpoints is referred to as a reference viewpoint, and the other is referred to as an extended viewpoint. The reference viewpoint video is encoded by HEVC in the same manner as in the prior art, and the encoded reference viewpoint video is decoded by HEVC in the same manner as in the prior art. On the other hand, for the extended viewpoint video, the local decoded image of the base viewpoint is encoded as the inter prediction reference image, and for the encoded extended viewpoint video, the base viewpoint decoded image is used as the inter prediction reference image. To decrypt. For this reason, when the correlation between the video of the reference viewpoint and the video of the extended viewpoint is high, the inter prediction is performed with reference to the image of the reference viewpoint, so that the highly efficient encoding is performed as compared with the case of performing the intraframe prediction. be able to.

図１０は、ＭＶ−ＨＥＶＣにより基準視点の映像および拡張視点の映像を符号化および復号する場合における、インター予測の参照画像として用いる画像を示す図である。画像Ｐ１、Ｐ２、Ｐ３、Ｐ４は、拡張視点におけるフレームごとの画像を示し、画像Ｐ５、Ｐ６、Ｐ７、Ｐ８は、基準視点におけるフレームごとの画像を示す。例えば、インター予測により基準視点の画像Ｐ６を求める場合には、基準視点の画像Ｐ５をインター予測の参照画像として用いる。また、インター予測により拡張視点の画像Ｐ３を求める場合には、拡張視点の画像Ｐ２だけでなく、基準視点の画像Ｐ７もインター予測の参照画像として用いる。 FIG. 10 is a diagram illustrating an image used as a reference image for inter prediction when encoding and decoding a base viewpoint video and an extended viewpoint video by MV-HEVC. Images P1, P2, P3, and P4 indicate images for each frame at the extended viewpoint, and images P5, P6, P7, and P8 indicate images for each frame at the reference viewpoint. For example, when the base viewpoint image P6 is obtained by inter prediction, the base viewpoint image P5 is used as an inter prediction reference image. When the extended viewpoint image P3 is obtained by inter prediction, not only the extended viewpoint image P2 but also the base viewpoint image P7 is used as a reference image for inter prediction.

ITU-T H.265 High Efficiency Video Coding.ITU-T H.265 High Efficiency Video Coding.

特開２０１２−８０１５１号公報JP 2012-80151 A

ここで、多視点映像には、視点ごとの動画像の間の相関が低いという特徴がある。しかし、非特許文献１に示されているＭＶ−ＨＥＶＣでは、上述のように、基準視点のローカル復号画像をインター予測の参照画像として用いて、拡張視点の映像を符号化し、基準視点の復号画像をインター予測の参照画像として用いて、符号化された拡張視点の映像を復号する。このため、非特許文献１に示されているＭＶ−ＨＥＶＣでは、参照画像として、相関の低い画像を用いることになってしまうので、符号化効率が低下してしまうおそれがある。 Here, the multi-view video has a feature that the correlation between moving images for each viewpoint is low. However, in the MV-HEVC disclosed in Non-Patent Document 1, as described above, the video of the extended viewpoint is encoded using the local decoded image of the reference viewpoint as the reference image for inter prediction, and the decoded image of the reference viewpoint is used. Is used as a reference image for inter prediction, and the encoded video of the extended viewpoint is decoded. For this reason, in MV-HEVC shown in the nonpatent literature 1, since an image with a low correlation will be used as a reference image, there exists a possibility that encoding efficiency may fall.

そこで、基準視点のローカル復号画像や基準視点の復号画像に対して、特許文献１に示されているような幾何変換を行って、その結果をインター予測の参照画像として用いることが考えられる。 Therefore, it is conceivable to perform geometric transformation as shown in Patent Document 1 on the local decoded image at the base viewpoint and the decoded image at the base viewpoint, and use the result as a reference image for inter prediction.

しかし、特許文献１に示されている幾何変換は、ブロックごとに行われる。このため、特許文献１に示されている幾何変換を非特許文献１に示されているＭＶ−ＨＥＶＣに適用しようとすると、ビデオ符号化レイヤ以下やビデオ復号レイヤ以下の変更が必要になる。また、ブロックごとに幾何変換に必要なメモリが変動する可能性があるので、メモリアクセス粒度を考慮した帯域の必要量が増大するおそれもある。以上によれば、特許文献１に示されている幾何変換を非特許文献１に示されているＭＶ−ＨＥＶＣに適用しようとすると、実装コストが増大してしまう。 However, the geometric transformation shown in Patent Document 1 is performed for each block. For this reason, if it is going to apply the geometric transformation shown in patent documents 1 to MV-HEVC shown in nonpatent literature 1, changes below a video coding layer and a video decoding layer are needed. In addition, since the memory required for the geometric transformation may vary from block to block, there is a possibility that the required amount of bandwidth considering the memory access granularity increases. According to the above, when the geometric transformation shown in Patent Document 1 is applied to MV-HEVC shown in Non-Patent Document 1, the mounting cost increases.

そこで、本発明は、上述の課題を鑑みてなされたものであり、実装コストを抑制しつつ、パースの異なる複数の視点の映像の符号化効率を向上させることを目的とする。 Therefore, the present invention has been made in view of the above-described problems, and an object of the present invention is to improve the encoding efficiency of videos from a plurality of viewpoints with different perspectives while suppressing mounting costs.

本発明は、上記の課題を解決するために、以下の事項を提案している。
（１）本発明は、パースの異なる複数の視点の動画像を符号化して符号化データを生成する動画像符号化装置（例えば、図１の動画像符号化装置１に相当）であって、前記複数の視点の動画像のそれぞれを、視点ごとに符号化する単視点符号化手段（例えば、図２の第１の符号化部１０および第２の符号化部２０に相当）と、前記複数の視点のうちの１つを基準視点とし、当該複数の視点から当該基準視点を除いたもののうちの１つを拡張視点として、当該基準視点の動画像を前記単視点符号化手段により符号化した際に得られた当該基準視点のローカル復号画像（例えば、図２の基準視点のローカル復号画像ＳＩＧ３に相当）を、当該基準視点と当該拡張視点との関係に応じて幾何変換して、当該基準視点の変換後のローカル復号画像（例えば、図２の基準視点の変換後のローカル復号画像ＳＩＧ４に相当）を生成する視点間処理手段（例えば、図２の視点間処理部３０に相当）と、前記視点間処理手段により生成された前記基準視点の変換後のローカル復号画像を、前記拡張視点の動画像を前記単視点符号化手段により符号化する際の参照画像として利用可能にする参照画像リスト追加手段（例えば、図２の視点間処理部３０に相当）と、を備えることを特徴とする動画像符号化装置を提案している。 The present invention proposes the following matters in order to solve the above problems.
(1) The present invention is a moving image encoding device (e.g., corresponding to the moving image encoding device 1 in FIG. 1) that generates encoded data by encoding moving images of a plurality of viewpoints with different perspectives. Single-viewpoint encoding means (for example, corresponding to the first encoding unit 10 and the second encoding unit 20 in FIG. 2) for encoding each of the moving images of the plurality of viewpoints for each viewpoint; One of the viewpoints is set as a reference viewpoint, and one of the plurality of viewpoints excluding the reference viewpoint is set as an extended viewpoint, and the moving image of the reference viewpoint is encoded by the single-viewpoint encoding unit. The local decoded image of the reference viewpoint obtained at this time (e.g., corresponding to the local decoded image SIG3 of the reference viewpoint in FIG. 2) is geometrically transformed according to the relationship between the reference viewpoint and the extended viewpoint, and the reference Local decoded image after viewpoint conversion (example) For example, the inter-viewpoint processing unit (equivalent to the inter-viewpoint processing unit 30 in FIG. 2) that generates the local decoded image SIG4 after conversion of the reference viewpoint in FIG. 2 and the inter-viewpoint processing unit Reference image list addition means (for example, the viewpoint in FIG. 2) that enables the local decoded image after the conversion of the base viewpoint to be used as a reference image when the extended viewpoint moving image is encoded by the single-viewpoint encoding means. A video encoding device characterized in that it is equivalent to the inter-processing unit 30).

この発明によれば、単視点符号化手段により、複数の視点の動画像のそれぞれを視点ごとに符号化することとした。また、視点間処理手段により、基準視点の動画像を単視点符号化手段により符号化した際に得られた基準視点のローカル復号画像を、基準視点と拡張視点との関係に応じて幾何変換して、基準視点の変換後のローカル復号画像を生成することとした。また、参照画像リスト追加手段により、視点間処理手段により生成された基準視点の変換後のローカル復号画像を、拡張視点の動画像を単視点符号化手段により符号化する際の参照画像として利用可能にすることとした。 According to the present invention, each of the moving images of the plurality of viewpoints is encoded for each viewpoint by the single viewpoint encoding means. Further, the inter-viewpoint processing means geometrically transforms the local decoded image of the reference viewpoint obtained when the moving image of the reference viewpoint is encoded by the single-viewpoint encoding means according to the relationship between the reference viewpoint and the extended viewpoint. Thus, the local decoded image after the conversion of the reference viewpoint is generated. In addition, the reference image list adding means can use the local decoded image after the conversion of the standard viewpoint generated by the inter-viewpoint processing means as a reference image when the extended viewpoint moving image is encoded by the single-viewpoint encoding means. I decided to make it.

このため、参照画像リストに別レイヤのフレームを追加することになり、非特許文献１に示されているＭＶ−ＨＥＶＣと同様のフレームワークを用いて動画像を符号化することができるので、ビデオ符号化レイヤ以下の変更が不要であるとともに、既存のＨＥＶＣコーデック設計を最大限に流用することができる。したがって、実装コストを抑制することができる。 For this reason, a frame of another layer is added to the reference image list, and a moving image can be encoded using a framework similar to MV-HEVC disclosed in Non-Patent Document 1. The change below the encoding layer is not necessary, and the existing HEVC codec design can be utilized to the maximum extent. Therefore, the mounting cost can be suppressed.

また、基準視点のローカル復号画像を、基準視点と拡張視点との関係に応じて幾何変換して基準視点の変換後のローカル復号画像を生成し、この基準視点の変換後のローカル復号画像を用いて、拡張視点の動画像を符号化することができる。ここで、同一の被写体を撮影した多視点映像には、視点ごとの動画像を、幾何変換により相互に類似した動画像に変換可能であるという特徴がある。このため、基準視点と拡張視点との関係に応じた幾何変換により、参照画像として、類似した画像を用いることができるので、符号化性能を向上させることができる。 Further, the local decoded image of the reference viewpoint is geometrically transformed according to the relationship between the reference viewpoint and the extended viewpoint to generate a local decoded image after conversion of the reference viewpoint, and the local decoded image after conversion of the reference viewpoint is used. Thus, an extended viewpoint video can be encoded. Here, a multi-view video obtained by photographing the same subject has a feature that a moving image for each viewpoint can be converted into a similar moving image by geometric conversion. For this reason, since a similar image can be used as a reference image by geometric transformation according to the relationship between the base viewpoint and the expanded viewpoint, the encoding performance can be improved.

（２）本発明は、（１）の動画像符号化装置について、前記視点間処理手段は、幾何変換の際に用いたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）を、前記符号化データを復号する動画像復号装置に送信することを特徴とする動画像符号化装置を提案している。 (2) The present invention relates to the moving picture coding apparatus according to (1), wherein the inter-viewpoint processing means uses parameters used for geometric transformation (for example, corresponding to each element of a projective transformation matrix s described later), A moving picture coding apparatus is proposed that transmits the coded data to a moving picture decoding apparatus that decodes the coded data.

この発明によれば、（１）の動画像符号化装置において、視点間処理手段により、幾何変換の際に用いたパラメータを、動画像復号装置に送信することとした。このため、動画像復号装置は、符号化データを復号する際に、動画像符号化装置で用いられたパラメータを用いて幾何変換を行うことができる。 According to the present invention, in the moving picture encoding apparatus of (1), the parameters used in the geometric transformation are transmitted to the moving picture decoding apparatus by the inter-viewpoint processing means. For this reason, the moving image decoding apparatus can perform geometric transformation using the parameters used in the moving image encoding apparatus when decoding the encoded data.

（３）本発明は、（１）の動画像符号化装置について、前記視点間処理手段は、前記基準視点の動画像を前記単視点符号化手段により符号化した際に得られた当該基準視点のローカル復号画像を、当該基準視点と前記拡張視点との関係に応じて幾何変換および照度補償して、当該基準視点の変換後のローカル復号画像を生成することを特徴とする動画像符号化装置を提案している。 (3) The present invention relates to the moving picture coding apparatus according to (1), wherein the inter-viewpoint processing means is configured to obtain the reference viewpoint obtained when the moving picture at the reference viewpoint is coded by the single-viewpoint coding means. The local decoded image is subjected to geometric transformation and illuminance compensation in accordance with the relationship between the reference viewpoint and the extended viewpoint, and a local decoded image after conversion of the reference viewpoint is generated. Has proposed.

この発明によれば、（１）の動画像符号化装置において、視点間処理手段により、基準視点の動画像を単視点符号化手段により符号化した際に得られた基準視点のローカル復号画像を、基準視点と拡張視点との関係に応じて幾何変換および照度補償（Illumination Compensation）して、基準視点の変換後のローカル復号画像を生成することとした。このため、基準視点と拡張視点とで照明条件といった撮影環境が異なっていることが原因で、基準視点の動画像と拡張視点の動画像とで照度が異なってしまっている場合でも、照度の差異を補償することができる。したがって、基準視点の動画像と拡張視点の動画像とで照度が異なる場合でも、基準視点と拡張視点との関係に応じた照度補償により、符号化性能を向上させることができる。 According to the present invention, in the moving image encoding apparatus of (1), the local decoded image of the reference viewpoint obtained when the moving image of the reference viewpoint is encoded by the single-viewpoint encoding means by the inter-viewpoint processing means. The local decoded image after the conversion of the reference viewpoint is generated by performing geometric conversion and illumination compensation according to the relationship between the reference viewpoint and the extended viewpoint. For this reason, even if the illuminance differs between the moving image of the reference viewpoint and the moving image of the extended viewpoint due to the difference in the shooting environment such as the illumination condition between the reference viewpoint and the extended viewpoint, the difference in illuminance Can be compensated. Therefore, even when the illuminance differs between the reference viewpoint moving image and the extended viewpoint moving image, the encoding performance can be improved by illuminance compensation in accordance with the relationship between the reference viewpoint and the extended viewpoint.

（４）本発明は、（３）の動画像符号化装置について、前記視点間処理手段は、幾何変換の際に用いたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）と、照度補償の際に用いたパラメータ（例えば、後述の照度補償パラメータａ、ｂに相当）とを、前記符号化データを復号する動画像復号装置に送信することを特徴とする動画像符号化装置を提案している。 (4) The present invention relates to the moving picture encoding apparatus according to (3), wherein the inter-viewpoint processing means includes parameters used for geometric transformation (for example, corresponding to each element of a projective transformation matrix s described later), A moving picture coding apparatus that transmits parameters used for illuminance compensation (e.g., corresponding to illuminance compensation parameters a and b described later) to a moving picture decoding apparatus that decodes the encoded data. is suggesting.

この発明によれば、（３）の動画像符号化装置において、視点間処理手段により、幾何変換の際に用いたパラメータと、照度補償の際に用いたパラメータとを、動画像復号装置に送信することとした。このため、動画像復号装置は、符号化データを復号する際に、動画像符号化装置で用いられた幾何変換のパラメータおよび照度補償のパラメータを用いて幾何変換および照度補償を行うことができる。 According to the present invention, in the moving picture encoding apparatus of (3), the parameters used in the geometric transformation and the parameters used in the illumination compensation are transmitted to the moving picture decoding apparatus by the inter-viewpoint processing means. It was decided to. Therefore, the moving image decoding apparatus can perform geometric conversion and illuminance compensation using the geometric conversion parameters and the illuminance compensation parameters used in the moving image encoding apparatus when decoding the encoded data.

（５）本発明は、（１）から（４）のいずれかの動画像符号化装置について、前記視点間処理手段は、幾何変換として、射影変換と、三角形パッチ分割によるアフィン変換と、のいずれかを行うことを特徴とする動画像符号化装置を提案している。 (5) In the moving image encoding apparatus according to any one of (1) to (4), the inter-viewpoint processing unit may perform any one of projective transformation and affine transformation by triangular patch division as geometric transformation. We have proposed a video encoding apparatus characterized by performing such a process.

この発明によれば、（１）から（４）のいずれかの動画像符号化装置において、視点間処理手段により、幾何変換として、射影変換と、三角形パッチ分割によるアフィン変換と、のいずれかを行うことができる。 According to the present invention, in the moving picture encoding apparatus according to any one of (1) to (4), either the projective transformation or the affine transformation by triangular patch division is performed as the geometric transformation by the inter-viewpoint processing means. It can be carried out.

（６）本発明は、（１）から（５）のいずれかの動画像符号化装置について、前記参照画像リスト追加手段は、前記視点間処理手段により生成された前記基準視点の変換後のローカル復号画像を、前記拡張視点の動画像を前記単視点符号化手段により符号化する際の長期参照フレームとして利用可能にすることを特徴とする動画像符号化装置を提案している。 (6) In the moving image encoding device according to any one of (1) to (5), the reference image list adding unit is configured to convert the reference viewpoint generated by the inter-viewpoint processing unit after conversion of the reference viewpoint. There has been proposed a moving picture coding apparatus characterized in that a decoded picture can be used as a long-term reference frame when the moving picture of the extended viewpoint is coded by the single-view coding means.

この発明によれば、（１）から（５）のいずれかの動画像符号化装置において、参照画像リスト追加手段により、視点間処理手段により生成された基準視点の変換後のローカル復号画像を、拡張視点の動画像を単視点符号化手段により符号化する際の長期参照フレームとして利用可能にすることとした。このため、基準視点における動きベクトルの再計算が不要になるので、符号化処理量を削減することができる。 According to the present invention, in the moving picture encoding apparatus according to any one of (1) to (5), the reference decoded image generated by the inter-viewpoint processing unit is converted by the reference image list adding unit, The extended viewpoint video is made available as a long-term reference frame when encoded by single-viewpoint encoding means. This eliminates the need to recalculate the motion vector at the reference viewpoint, thereby reducing the amount of encoding processing.

（７）本発明は、パースの異なる複数の視点の動画像を符号化して得られた符号化データを、復号する動画像復号装置（例えば、図１の動画像復号装置１００に相当）であって、前記符号化データを復号して、前記複数の視点のそれぞれの復号済み画像（例えば、図５の拡張視点の復号済み画像ＳＩＧ１０３、および基準視点の復号済み画像ＳＩＧ１０１、ＳＩＧ１０４に相当）を生成するとともに、前記符号化データの生成時において幾何変換の際に用いられたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）を導出する単視点復号手段（例えば、図５の第１の復号部１１０および第２の復号部１２０に相当）と、前記複数の視点のうち、前記符号化データの生成時に基準視点であった視点を復号側基準視点とし、前記符号化データの生成時に拡張視点であった視点を復号側拡張視点とすると、前記単視点復号手段により生成された当該復号側基準視点の復号画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１に相当）を、前記単視点復号手段により導出されたパラメータを用いて幾何変換して、当該復号側基準視点の変換後の復号画像（例えば、図５の基準視点の変換後の復号済み画像ＳＩＧ１０２に相当）を生成する視点間処理手段（例えば、図５の視点間処理部１３０に相当）と、前記視点間処理手段により生成された前記復号側基準視点の変換後の復号画像を、前記復号側拡張視点の符号化データを前記単視点復号手段により復号する際の参照画像として利用可能にする参照画像リスト追加手段（例えば、図５の視点間処理部１３０に相当）と、を備えることを特徴とする動画像復号装置を提案している。 (7) The present invention is a video decoding device (for example, equivalent to the video decoding device 100 in FIG. 1) that decodes encoded data obtained by encoding video from a plurality of viewpoints with different perspectives. Then, the encoded data is decoded to generate decoded images of the plurality of viewpoints (for example, the decoded images SIG103 of the extended viewpoint and the decoded images SIG101 and SIG104 of the reference viewpoint in FIG. 5). In addition, single-viewpoint decoding means (for example, the first view of FIG. 5) for deriving parameters (for example, corresponding to each element of a projective transformation matrix s to be described later) used in the geometric transformation at the time of generating the encoded data. And the viewpoint that was the reference viewpoint at the time of generating the encoded data among the plurality of viewpoints as a decoding-side reference viewpoint, and the code If the viewpoint that was the extended viewpoint at the time of data generation is the decoding-side extended viewpoint, the decoded image of the decoding-side reference viewpoint generated by the single-view decoding means (for example, corresponding to the decoded image SIG101 of the reference viewpoint in FIG. 5) ) Is geometrically transformed using the parameters derived by the single-viewpoint decoding means, and the decoded image after conversion of the decoding-side reference viewpoint (for example, the decoded image SIG102 after conversion of the reference viewpoint in FIG. 5) ) For generating the inter-viewpoint processing means (e.g., corresponding to the inter-viewpoint processing unit 130 in FIG. 5) and the decoded side-converted decoded image generated by the inter-viewpoint processing means as the decoding side extension A reference image list adding unit (e.g., corresponding to the inter-view processing unit 130 in FIG. 5) that can be used as a reference image when decoding encoded viewpoint data by the single-viewpoint decoding unit; A moving picture decoding apparatus characterized by comprising:

この発明によれば、単視点復号手段により、符号化データを復号して、複数の視点のそれぞれの復号済み画像を生成するとともに、符号化データの生成時において幾何変換の際に用いられたパラメータを導出することとした。また、視点間処理手段により、単視点復号手段により生成された復号側基準視点の復号画像を、単視点復号手段により導出されたパラメータを用いて幾何変換して、復号側基準視点の変換後の復号画像を生成することとした。また、参照画像リスト追加手段により、視点間処理手段により生成された復号側基準視点の変換後の復号画像を、復号側拡張視点の符号化データを単視点復号手段により復号する際の参照画像として利用可能にすることとした。 According to the present invention, the encoded data is decoded by the single-viewpoint decoding unit to generate the decoded images of the plurality of viewpoints, and the parameters used for the geometric transformation at the time of generating the encoded data It was decided to derive. Further, the inter-viewpoint processing means geometrically transforms the decoded image of the decoding-side reference viewpoint generated by the single-viewpoint decoding means using the parameters derived by the single-viewpoint decoding means, and after the decoding-side reference viewpoint is converted. The decoded image is generated. Further, the reference image list adding means serves as a reference image for decoding the decoded side reference viewpoint generated by the inter-viewpoint processing means and decoding the decoded side extended viewpoint encoded data by the single viewpoint decoding means. We decided to make it available.

このため、参照画像リストに別レイヤのフレームを追加することになり、非特許文献１に示されているＭＶ−ＨＥＶＣと同様のフレームワークを用いて符号化データを復号することができるので、ビデオ符号化レイヤ以下の変更が不要であるとともに、既存のＨＥＶＣコーデック設計を最大限に流用することができる。したがって、実装コストを抑制することができる。 For this reason, a frame of another layer is added to the reference image list, and encoded data can be decoded using a framework similar to MV-HEVC disclosed in Non-Patent Document 1, so that video The change below the encoding layer is not necessary, and the existing HEVC codec design can be utilized to the maximum extent. Therefore, the mounting cost can be suppressed.

また、復号側基準視点の復号済み画像を、復号側基準視点と復号側拡張視点との関係に応じて幾何変換して復号側基準視点の変換後の復号済み画像を生成し、この復号側基準視点の変換後の復号済み画像を用いて、復号側拡張視点の符号化データを復号することができる。ここで、同一の被写体を撮影した多視点映像には、視点ごとの動画像を、幾何変換により相互に類似した動画像に変換可能であるという特徴がある。このため、復号側基準視点と復号側拡張視点との関係に応じた幾何変換により、参照画像として、類似した画像を用いることができるので、符号化性能を向上させることができる。 Further, the decoded image of the decoding side reference viewpoint is geometrically transformed according to the relationship between the decoding side reference viewpoint and the decoding side extended viewpoint to generate a decoded image after conversion of the decoding side reference viewpoint. Using the decoded image after the viewpoint conversion, the encoded data of the decoding-side extended viewpoint can be decoded. Here, a multi-view video obtained by photographing the same subject has a feature that a moving image for each viewpoint can be converted into a similar moving image by geometric conversion. For this reason, since a similar image can be used as a reference image by geometric transformation according to the relationship between the decoding-side reference viewpoint and the decoding-side extended viewpoint, the encoding performance can be improved.

（８）本発明は、（７）の動画像復号化装置について、前記単視点復号手段は、前記符号化データを復号して、前記符号化データの生成時において照度補償の際に用いられたパラメータ（例えば、後述の照度補償パラメータａ、ｂに相当）も導出し、前記視点間処理手段は、前記復号側基準視点の動画像を前記単視点復号手段により復号した際に得られた当該復号側基準視点の復号画像を、前記単視点復号手段により導出されたパラメータを用いて幾何変換および照度補償して、当該復号側基準視点の変換後の復号画像を生成することを特徴とする動画像復号装置を提案している。 (8) The present invention relates to the moving picture decoding apparatus according to (7), wherein the single-viewpoint decoding unit decodes the encoded data and is used for illuminance compensation when the encoded data is generated. Parameters (e.g., corresponding to illuminance compensation parameters a and b described later) are also derived, and the inter-viewpoint processing unit obtains the decoding obtained when the single-viewpoint decoding unit decodes the moving image of the decoding-side reference viewpoint. A moving image characterized in that a decoded image of a side reference viewpoint is subjected to geometric transformation and illuminance compensation using a parameter derived by the single-viewpoint decoding means to generate a decoded image after conversion of the decoding side reference viewpoint A decoding device is proposed.

この発明によれば、（７）の動画像復号装置において、単視点復号手段により、符号化データを復号して、符号化データの生成時において照度補償の際に用いられたパラメータも導出することとした。また、視点間処理手段により、復号側基準視点の動画像を単視点復号手段により復号した際に得られた復号側基準視点の復号画像を、単視点復号手段により導出されたパラメータを用いて幾何変換および照度補償して、復号側基準視点の変換後の復号画像を生成することとした。このため、符号化データを復号する際に、動画像符号化装置で用いられた幾何変換のパラメータおよび照度補償のパラメータを用いて幾何変換を行うことができる。 According to the present invention, in the moving picture decoding apparatus according to (7), the encoded data is decoded by the single-viewpoint decoding means, and the parameters used in the illumination compensation at the time of generating the encoded data are also derived. It was. Further, the inter-viewpoint processing means decodes the decoding image of the decoding-side reference viewpoint obtained when the moving image of the decoding-side reference viewpoint is decoded by the single-viewpoint decoding means using the parameters derived by the single-viewpoint decoding means. The decoded image after conversion of the decoding side reference viewpoint is generated by performing conversion and illumination compensation. Therefore, when the encoded data is decoded, the geometric conversion can be performed using the geometric conversion parameters and the illumination compensation parameters used in the moving image encoding apparatus.

（９）本発明は、（７）または（８）の動画像復号装置について、前記視点間処理手段は、幾何変換として、射影変換と、三角形パッチ分割によるアフィン変換と、のいずれかを行うことを特徴とする動画像復号装置を提案している。 (9) In the moving image decoding apparatus according to (7) or (8), the inter-viewpoint processing unit performs either a projective transformation or an affine transformation by triangular patch division as a geometric transformation. Has been proposed.

この発明によれば、（７）または（８）の動画像復号装置において、視点間処理手段により、幾何変換として、射影変換と、三角形パッチ分割によるアフィン変換と、のいずれかを行うことができる。 According to the present invention, in the video decoding device according to (7) or (8), either the projective transformation or the affine transformation based on the triangular patch division can be performed as the geometric transformation by the inter-viewpoint processing means. .

（１０）本発明は、（７）から（９）のいずれかの動画像復号装置について、前記参照画像リスト追加手段は、前記視点間処理手段により生成された前記復号側基準視点の変換後の復号画像を、前記復号側拡張視点の動画像を前記単視点復号手段により復号する際の長期参照フレームとして利用可能にすることを特徴とする動画像復号装置を提案している。 (10) In the moving image decoding device according to any one of (7) to (9), the present invention provides the reference image list adding means after conversion of the decoding-side reference viewpoint generated by the inter-viewpoint processing means. A moving picture decoding apparatus is proposed in which a decoded picture can be used as a long-term reference frame when the moving picture at the decoding side extended viewpoint is decoded by the single-view decoding means.

この発明によれば、（７）から（９）のいずれかの動画像復号装置において、参照画像リスト追加手段により、視点間処理手段により生成された復号側基準視点の変換後の復号画像を、復号側拡張視点の動画像を単視点復号手段により復号する際の長期参照フレームとして利用可能にすることとした。このため、基準視点における動きベクトルの再計算が不要になるので、復号処理量を削減することができる。 According to this invention, in the moving image decoding device according to any one of (7) to (9), the decoded image after conversion of the decoding-side reference viewpoint generated by the inter-viewpoint processing means by the reference image list adding means, The decoding side extended viewpoint video is made available as a long-term reference frame for decoding by the single viewpoint decoding means. This eliminates the need to recalculate the motion vector at the reference viewpoint, thereby reducing the amount of decoding processing.

（１１）本発明は、パースの異なる複数の視点の動画像を符号化して符号化データを生成する動画像符号化装置（例えば、図１の動画像符号化装置１に相当）と、当該動画像符号化装置により生成された符号化データを復号する動画像復号装置（例えば、図１の動画像復号装置１００に相当）と、を備える動画像処理システム（例えば、図１の動画像処理システムＡＡに相当）であって、前記動画像符号化装置は、前記複数の視点の動画像のそれぞれを、視点ごとに符号化する単視点符号化手段（例えば、図２の第１の符号化部１０および第２の符号化部２０に相当）と、前記複数の視点のうちの１つを基準視点とし、当該複数の視点から当該基準視点を除いたもののうちの１つを拡張視点として、当該基準視点の動画像を前記単視点符号化手段により符号化した際に得られた当該基準視点のローカル復号画像（例えば、図２の基準視点のローカル復号画像ＳＩＧ３に相当）を、当該基準視点と当該拡張視点との関係に応じて幾何変換して、当該基準視点の変換後のローカル復号画像（例えば、図２の基準視点の変換後のローカル復号画像ＳＩＧ４に相当）を生成する符号化側視点間処理手段（例えば、図２の視点間処理部３０に相当）と、前記符号化側視点間処理手段により生成された前記基準視点の変換後のローカル復号画像を、前記拡張視点の動画像を前記単視点符号化手段により符号化する際の参照画像として利用可能にする符号化側参照画像リスト追加手段（例えば、図２の視点間処理部３０に相当）と、を備え、前記動画像復号装置は、前記符号化データを復号して、前記複数の視点のそれぞれの復号済み画像（例えば、図５の拡張視点の復号済み画像ＳＩＧ１０３、および基準視点の復号済み画像ＳＩＧ１０１、ＳＩＧ１０４に相当）を生成するとともに、前記動画像符号化装置において幾何変換の際に用いられたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）を導出する単視点復号手段（例えば、図５の第１の復号部１１０および第２の復号部１２０に相当）と、前記単視点復号手段により生成された前記基準視点の復号画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１に相当）を、前記単視点復号手段により導出されたパラメータを用いて幾何変換して、当該基準視点の変換後の復号画像（例えば、図５の基準視点の変換後の復号済み画像ＳＩＧ１０２に相当）を生成する復号側視点間処理手段（例えば、図５の視点間処理部１３０に相当）と、前記復号側視点間処理手段により生成された前記基準視点の変換後の復号画像を、前記拡張視点の動画像を前記単視点復号手段により復号する際の参照画像として利用可能にする復号側参照画像リスト追加手段（例えば、図５の視点間処理部１３０に相当）と、を備えることを特徴とする動画像処理システムを提案している。 (11) The present invention relates to a moving image encoding device (e.g., corresponding to the moving image encoding device 1 in FIG. 1) that encodes moving images of a plurality of viewpoints with different perspectives to generate encoded data, and the moving image. A moving image processing system (for example, the moving image processing system in FIG. 1) including a moving image decoding device (for example, corresponding to the moving image decoding device 100 in FIG. 1) that decodes encoded data generated by the image encoding device. The video encoding device is a single-viewpoint encoding unit (for example, the first encoding unit in FIG. 2) that encodes each of the video images of the plurality of viewpoints for each viewpoint. 10 and the second encoding unit 20), one of the plurality of viewpoints as a reference viewpoint, and one of the plurality of viewpoints excluding the reference viewpoint as an extended viewpoint, The single viewpoint code is used as the reference viewpoint video. The local decoded image of the reference viewpoint (for example, equivalent to the local decoded image SIG3 of the reference viewpoint in FIG. 2) obtained by encoding by means is geometrically transformed according to the relationship between the reference viewpoint and the extended viewpoint. Then, encoding side inter-viewpoint processing means (for example, between the viewpoints in FIG. 2) that generates the local decoded image after the conversion of the reference viewpoint (for example, the local decoded image SIG4 after the conversion of the reference viewpoint in FIG. 2). And a local decoded image after conversion of the reference viewpoint generated by the encoding inter-viewpoint processing unit, and a moving image of the extended viewpoint by the single-viewpoint encoding unit. Encoding side reference image list addition means (for example, corresponding to the inter-viewpoint processing unit 30 in FIG. 2) that can be used as a reference image of the video, and the video decoding device decodes the encoded data , The decoded images of each of the plurality of viewpoints (for example, the decoded image SIG103 of the extended viewpoint in FIG. 5 and the decoded images SIG101 and SIG104 of the reference viewpoint) are generated, and the geometrical image is generated in the moving image encoding apparatus. Single-viewpoint decoding means (for example, the first decoding unit 110 and the second decoding unit 120 in FIG. 5) that derive parameters (e.g., corresponding to each element of the projective transformation matrix s described later) used in the conversion. And the decoded image of the reference viewpoint generated by the single-viewpoint decoding means (for example, the decoded image SIG101 of the reference viewpoint of FIG. 5) using the parameters derived by the single-viewpoint decoding means. Reconstruction that generates a decoded image after conversion of the reference viewpoint (for example, equivalent to the decoded image SIG102 after conversion of the reference viewpoint in FIG. 5) by performing geometric conversion. The inter-side-view processing means (e.g., corresponding to the inter-view processing section 130 in FIG. 5) and the decoded image after the conversion of the reference viewpoint generated by the decoding-side inter-view processing means is used as the extended viewpoint moving image. Moving image processing comprising: a decoding-side reference image list adding means (for example, corresponding to the inter-viewpoint processing unit 130 in FIG. 5) that can be used as a reference image for decoding by the single-viewpoint decoding means A system is proposed.

この発明によれば、上述した効果と同様の効果を奏することができる。 According to the present invention, the same effects as described above can be obtained.

（１２）本発明は、単視点符号化手段（例えば、図２の第１の符号化部１０および第２の符号化部２０に相当）、視点間処理手段（例えば、図２の視点間処理部３０に相当）、および参照画像リスト追加手段（例えば、図２の視点間処理部３０に相当）を備え、パースの異なる複数の視点の動画像を符号化して符号化データを生成する動画像符号化装置（例えば、図１の動画像符号化装置１に相当）における動画像復号方法であって、前記単視点符号化手段が、前記複数の視点のうちの１つを基準視点とし、当該複数の視点から当該基準視点を除いたもののうちの１つを拡張視点として、当該基準視点の動画像を符号化する第１のステップと、前記視点間処理手段が、前記基準視点の動画像を前記第１のステップにより符号化した際に得られた当該基準視点のローカル復号画像（例えば、図２の基準視点のローカル復号画像ＳＩＧ３に相当）を、当該基準視点と前記拡張視点との関係に応じて幾何変換して、当該基準視点の変換後のローカル復号画像（例えば、図２の基準視点の変換後のローカル復号画像ＳＩＧ４に相当）を生成する第２のステップと、前記参照画像リスト追加手段が、前記第２のステップにより生成された前記基準視点の変換後のローカル復号画像を、参照画像として利用可能にする第３のステップと、前記単視点符号化手段が、前記第３のステップにより利用可能になった参照画像を用いて、前記拡張視点の動画像を符号化する第４のステップと、を備えることを特徴とする動画像符号化方法を提案している。 (12) The present invention relates to single-viewpoint encoding means (for example, equivalent to the first encoding unit 10 and the second encoding unit 20 in FIG. 2) and inter-viewpoint processing means (for example, inter-viewpoint processing in FIG. 2). And a reference image list adding means (e.g., corresponding to the inter-viewpoint processing unit 30 in FIG. 2), and encodes moving images from a plurality of viewpoints with different perspectives to generate encoded data. A video decoding method in an encoding device (e.g., corresponding to the video encoding device 1 in FIG. 1), wherein the single-viewpoint encoding means uses one of the plurality of viewpoints as a reference viewpoint, and A first step of encoding a moving image of the reference viewpoint using one of a plurality of viewpoints excluding the reference viewpoint as an extended viewpoint, and the inter-viewpoint processing unit converts the moving image of the reference viewpoint Obtained when encoding by the first step. The local decoded image of the reference viewpoint (e.g., equivalent to the local decoded image SIG3 of the reference viewpoint in FIG. 2) is subjected to geometric transformation according to the relationship between the reference viewpoint and the extended viewpoint, and after the conversion of the reference viewpoint A second step of generating a local decoded image (e.g., corresponding to the local decoded image SIG4 after conversion of the reference viewpoint in FIG. 2), and the reference image list adding means is generated by the second step. A third step of making the local decoded image after conversion of the reference viewpoint available as a reference image; and the single-viewpoint encoding means uses the reference image made available by the third step, and And a fourth step of encoding a moving image of an extended viewpoint, and proposes a moving image encoding method.

（１３）本発明は、単視点復号手段（例えば、図５の第１の復号部１１０および第２の復号部１２０に相当）、視点間処理手段（例えば、図５の視点間処理部１３０に相当）、および参照画像リスト追加手段（例えば、図５の視点間処理部１３０に相当）を備え、パースの異なる複数の視点の動画像を符号化して得られた符号化データを、復号する動画像復号装置（例えば、図１の動画像復号装置１００に相当）における動画像復号方法であって、前記単視点復号手段が、前記複数の視点のうち、前記符号化データの生成時に基準視点であった視点を復号側基準視点とし、前記符号化データの生成時に拡張視点であった視点を復号側拡張視点とすると、前記符号化データを復号して、当該復号側基準視点の復号済み画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１、ＳＩＧ１０４に相当）を生成するとともに、前記符号化データの生成時において幾何変換の際に用いられたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）を導出する第１のステップと、前記視点間処理手段が、前記第１のステップにより生成された前記復号側基準視点の復号画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１に相当）を、前記第１のステップにより導出されたパラメータを用いて幾何変換して、当該復号側基準視点の変換後の復号画像（例えば、図５の基準視点の変換後の復号済み画像ＳＩＧ１０２に相当）を生成する第２のステップと、前記参照画像リスト追加手段が、前記第２のステップにより生成された前記復号側基準視点の変換後の復号画像を、参照画像として利用可能にする第３のステップと、前記単視点復号手段が、前記第３のステップにより利用可能になった参照画像を用いて、前記復号側拡張視点の符号化データを復号する第４のステップと、を備えることを特徴とする動画像復号方法を提案している。 (13) The present invention provides a single-viewpoint decoding unit (for example, equivalent to the first decoding unit 110 and the second decoding unit 120 in FIG. 5) and an inter-viewpoint processing unit (for example, the inter-viewpoint processing unit 130 in FIG. 5). And a reference image list adding means (e.g., corresponding to the inter-viewpoint processing unit 130 in FIG. 5), and a moving image for decoding encoded data obtained by encoding moving images of a plurality of viewpoints with different perspectives. A video decoding method in an image decoding device (e.g., corresponding to the video decoding device 100 in FIG. 1), wherein the single-view decoding means uses a reference viewpoint when generating the encoded data among the plurality of viewpoints. If the viewpoint that has been used is the decoding-side reference viewpoint and the viewpoint that was the extended viewpoint at the time of generation of the encoded data is the decoding-side extended viewpoint, the encoded data is decoded and the decoded image ( For example, the basis of FIG. The viewpoint decoded images SIG101 and SIG104) and parameters used for the geometric transformation at the time of generating the encoded data (for example, corresponding to each element of the projective transformation matrix s described later) are derived. The inter-viewpoint processing means, the decoded image of the decoding-side reference viewpoint generated by the first step (for example, corresponding to the decoded image SIG101 of the reference viewpoint in FIG. 5), Geometric transformation is performed using the parameters derived in the first step to generate a decoded image after conversion of the decoding-side reference viewpoint (for example, the decoded image SIG102 after conversion of the reference viewpoint in FIG. 5). A second step, wherein the reference image list adding means converts the decoded reference image generated after the second step into a reference image; A third step of making available as a fourth step, wherein the single-view decoding means decodes the encoded data of the decoding-side extended viewpoint using the reference image made available by the third step A video decoding method characterized by comprising: a step.

（１４）本発明は、単視点符号化手段（例えば、図２の第１の符号化部１０および第２の符号化部２０に相当）、視点間処理手段（例えば、図２の視点間処理部３０に相当）、および参照画像リスト追加手段（例えば、図２の視点間処理部３０に相当）を備え、パースの異なる複数の視点の動画像を符号化して符号化データを生成する動画像符号化装置（例えば、図１の動画像符号化装置１に相当）における動画像復号方法を、コンピュータに実行させるためのプログラムであって、前記単視点符号化手段が、前記複数の視点のうちの１つを基準視点とし、当該複数の視点から当該基準視点を除いたもののうちの１つを拡張視点として、当該基準視点の動画像を符号化する第１のステップと、前記視点間処理手段が、前記基準視点の動画像を前記第１のステップにより符号化した際に得られた当該基準視点のローカル復号画像（例えば、図２の基準視点のローカル復号画像ＳＩＧ３に相当）を、当該基準視点と前記拡張視点との関係に応じて幾何変換して、当該基準視点の変換後のローカル復号画像（例えば、図２の基準視点の変換後のローカル復号画像ＳＩＧ４に相当）を生成する第２のステップと、前記参照画像リスト追加手段が、前記第２のステップにより生成された前記基準視点の変換後のローカル復号画像を、参照画像として利用可能にする第３のステップと、前記単視点符号化手段が、前記第３のステップにより利用可能になった参照画像を用いて、前記拡張視点の動画像を符号化する第４のステップと、をコンピュータに実行させるためのプログラムを提案している。 (14) The present invention relates to single-viewpoint encoding means (for example, equivalent to the first encoding unit 10 and the second encoding unit 20 in FIG. 2) and inter-viewpoint processing means (for example, inter-viewpoint processing in FIG. 2). And a reference image list adding means (e.g., corresponding to the inter-viewpoint processing unit 30 in FIG. 2), and encodes moving images from a plurality of viewpoints with different perspectives to generate encoded data. A program for causing a computer to execute a moving image decoding method in an encoding device (e.g., corresponding to the moving image encoding device 1 in FIG. 1), wherein the single-viewpoint encoding means includes a plurality of viewpoints. A first step of encoding a moving image of the reference viewpoint using one of the plurality of viewpoints as an extended viewpoint and one of the plurality of viewpoints excluding the reference viewpoint, and the inter-viewpoint processing means Is the video of the reference viewpoint The local decoded image of the reference viewpoint (for example, equivalent to the local decoded image SIG3 of the reference viewpoint in FIG. 2) obtained when encoding is performed in the first step, is related to the relationship between the reference viewpoint and the extended viewpoint. And a second step of generating a local decoded image after conversion of the reference viewpoint (for example, corresponding to the local decoded image SIG4 after conversion of the reference viewpoint in FIG. 2), and adding the reference image list A third step in which the means makes it possible to use, as a reference image, the local decoded image after the transformation of the base viewpoint generated in the second step; and the single-viewpoint encoding means has the third step A program for causing a computer to execute a fourth step of encoding the extended viewpoint moving image using a reference image made available by .

（１５）本発明は、単視点復号手段（例えば、図５の第１の復号部１１０および第２の復号部１２０に相当）、視点間処理手段（例えば、図５の視点間処理部１３０に相当）、および参照画像リスト追加手段（例えば、図５の視点間処理部１３０に相当）を備え、パースの異なる複数の視点の動画像を符号化して得られた符号化データを、復号する動画像復号装置（例えば、図１の動画像復号装置１００に相当）における動画像復号方法を、コンピュータに実行させるためのプログラムであって、前記単視点復号手段が、前記複数の視点のうち、前記符号化データの生成時に基準視点であった視点を復号側基準視点とし、前記符号化データの生成時に拡張視点であった視点を復号側拡張視点とすると、前記符号化データを復号して、当該復号側基準視点の復号済み画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１、ＳＩＧ１０４に相当）を生成するとともに、前記符号化データの生成時において幾何変換の際に用いられたパラメータ（例えば、後述の射影変換行列ｓの各要素に相当）を導出する第１のステップと、前記視点間処理手段が、前記第１のステップにより生成された前記復号側基準視点の復号画像（例えば、図５の基準視点の復号済み画像ＳＩＧ１０１に相当）を、前記第１のステップにより導出されたパラメータを用いて幾何変換して、当該復号側基準視点の変換後の復号画像（例えば、図５の基準視点の変換後の復号済み画像ＳＩＧ１０２に相当）を生成する第２のステップと、前記参照画像リスト追加手段が、前記第２のステップにより生成された前記復号側基準視点の変換後の復号画像を、参照画像として利用可能にする第３のステップと、前記単視点復号手段が、前記第３のステップにより利用可能になった参照画像を用いて、前記復号側拡張視点の符号化データを復号する第４のステップと、をコンピュータに実行させるためのプログラムを提案している。 (15) The present invention provides a single-viewpoint decoding unit (for example, equivalent to the first decoding unit 110 and the second decoding unit 120 in FIG. 5) and an inter-viewpoint processing unit (for example, the inter-viewpoint processing unit 130 in FIG. 5). And a reference image list adding means (e.g., corresponding to the inter-viewpoint processing unit 130 in FIG. 5), and a moving image for decoding encoded data obtained by encoding moving images of a plurality of viewpoints with different perspectives. A program for causing a computer to execute a moving image decoding method in an image decoding device (e.g., corresponding to the moving image decoding device 100 in FIG. 1), wherein the single-viewpoint decoding unit includes: If the viewpoint that was the reference viewpoint at the time of generating the encoded data is the decoding side reference viewpoint, and the viewpoint that was the extended viewpoint at the time of generating the encoded data is the decoding side extended viewpoint, the encoded data is decoded, Recovery A side reference viewpoint decoded image (for example, corresponding to the reference viewpoint decoded images SIG101 and SIG104 in FIG. 5) is generated, and parameters used for geometric transformation at the time of generation of the encoded data (for example, A first step of deriving a projection transformation matrix s (to be described later) and a decoded image of the decoding-side reference viewpoint generated by the first step by the inter-viewpoint processing means (for example, FIG. 5). The decoded image SIG101 of the reference viewpoint is geometrically transformed using the parameters derived in the first step, and the decoded image after the conversion of the decoding-side reference viewpoint (for example, the reference viewpoint of FIG. 5). And the reference image list adding means is generated by the second step. A third step of making the decoded image after conversion of the decoding-side reference viewpoint usable as a reference image, and the single-view decoding means using the reference image made available by the third step, A program for causing a computer to execute the fourth step of decoding the encoded data of the decoding side extended viewpoint is proposed.

本発明によれば、実装コストを抑制しつつ、パースの異なる複数の視点の映像の符号化効率を向上させることができる。 According to the present invention, it is possible to improve the encoding efficiency of videos from a plurality of viewpoints with different perspectives while suppressing the mounting cost.

本発明の一実施形態に係る動画像処理システムのブロック図である。1 is a block diagram of a moving image processing system according to an embodiment of the present invention. 本発明の一実施形態に係る動画像符号化装置のブロック図である。It is a block diagram of the moving image encoder which concerns on one Embodiment of this invention. 本発明の一実施形態に係る動画像符号化装置が備える第１の符号化部のブロック図である。It is a block diagram of the 1st encoding part with which the moving image encoder which concerns on one Embodiment of this invention is provided. 本発明の一実施形態に係る動画像符号化装置が備える第２の符号化部のブロック図である。It is a block diagram of the 2nd encoding part with which the moving image encoder which concerns on one Embodiment of this invention is provided. 本発明の一実施形態に係る動画像復号装置のブロック図である。It is a block diagram of the moving image decoding apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る動画像復号装置が備える第１の復号部のブロック図である。It is a block diagram of the 1st decoding part with which the moving image decoding apparatus which concerns on one Embodiment of this invention is provided. 本発明の一実施形態に係る動画像復号装置が備える第２の復号部のブロック図である。It is a block diagram of the 2nd decoding part with which the moving image decoding apparatus which concerns on one Embodiment of this invention is provided. 基準視点と拡張視点との組み合わせを示す図である。It is a figure which shows the combination of a reference viewpoint and an extended viewpoint. 基準視点と拡張視点との組み合わせを示す図である。It is a figure which shows the combination of a reference viewpoint and an extended viewpoint. ＭＶ−ＨＥＶＣにおける参照画像を説明するための図である。It is a figure for demonstrating the reference image in MV-HEVC.

以下、本発明の実施の形態について図面を参照しながら説明する。なお、以下の実施形態における構成要素は適宜、既存の構成要素などとの置き換えが可能であり、また、他の既存の構成要素との組み合わせを含む様々なバリエーションが可能である。したがって、以下の実施形態の記載をもって、特許請求の範囲に記載された発明の内容を限定するものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the constituent elements in the following embodiments can be appropriately replaced with existing constituent elements, and various variations including combinations with other existing constituent elements are possible. Accordingly, the description of the following embodiments does not limit the contents of the invention described in the claims.

［動画像処理システムＡＡの構成および動作］
図１は、本発明の一実施形態に係る動画像処理システムＡＡのブロック図である。動画像処理システムＡＡは、パースの異なる複数の視点の動画像を符号化して符号化データ（図２の拡張視点のビットストリームＳＩＧ６および基準視点のビットストリームＳＩＧ７を参照）を生成する動画像符号化装置１と、動画像符号化装置１により生成された符号化データを復号する動画像復号装置１００と、を備える。これら動画像符号化装置１と動画像復号装置１００とは、上述の符号化データを、例えば伝送路を介して送受信する。 [Configuration and Operation of Moving Image Processing System AA]
FIG. 1 is a block diagram of a moving image processing system AA according to an embodiment of the present invention. The moving image processing system AA encodes moving images of a plurality of viewpoints with different perspectives to generate encoded data (see the extended viewpoint bit stream SIG6 and the reference viewpoint bit stream SIG7 in FIG. 2). The apparatus 1 and the moving image decoding apparatus 100 which decodes the encoding data produced | generated by the moving image encoding apparatus 1 are provided. These moving image encoding apparatus 1 and moving image decoding apparatus 100 transmit and receive the above-described encoded data via a transmission path, for example.

なお、本実施形態では、上述のパースの異なる複数の視点として、基準視点および拡張視点の２つの視点が存在しており、基準視点をベースレイヤとし、拡張視点をエンハンスメントレイヤとするものとする。 In the present embodiment, there are two viewpoints, that is, a reference viewpoint and an extended viewpoint, as a plurality of viewpoints with different perspectives, and the reference viewpoint is a base layer and the extended viewpoint is an enhancement layer.

［動画像符号化装置１の構成および動作］
図２は、動画像符号化装置１のブロック図である。動画像符号化装置１は、第１の符号化部１０、第２の符号化部２０、および視点間処理部３０を備える。第１の符号化部１０は、拡張視点の原画像ＳＩＧ１を符号化し、拡張視点のビットストリームＳＩＧ６として出力する。第２の符号化部２０は、基準視点の原画像ＳＩＧ２を符号化し、基準視点のビットストリームＳＩＧ７として出力する。視点間処理部３０は、基準視点と拡張視点との関係に応じて基準視点のローカル復号画像ＳＩＧ３を射影変換および照度補償して基準視点の変換後のローカル復号画像ＳＩＧ４を生成し、この基準視点の変換後のローカル復号画像ＳＩＧ４を、第１の符号化部１０がインター予測を行う際に参照画像として用いることができるようにする。第１の符号化部１０、第２の符号化部２０、および視点間処理部３０のそれぞれの動作について、以下に詳述する。 [Configuration and Operation of Video Encoding Device 1]
FIG. 2 is a block diagram of the moving picture encoding apparatus 1. The moving image encoding apparatus 1 includes a first encoding unit 10, a second encoding unit 20, and an inter-viewpoint processing unit 30. The first encoding unit 10 encodes the extended-view original image SIG1 and outputs it as an extended-viewpoint bitstream SIG6. The second encoding unit 20 encodes the original image SIG2 of the reference viewpoint and outputs it as a bit stream SIG7 of the reference viewpoint. The inter-viewpoint processing unit 30 generates a local decoded image SIG4 after conversion of the reference viewpoint by performing projective transformation and illuminance compensation on the local decoded image SIG3 of the reference viewpoint according to the relationship between the reference viewpoint and the extended viewpoint. The local decoded image SIG4 after the conversion of the first encoding unit 10 can be used as a reference image when the first encoding unit 10 performs inter prediction. The operations of the first encoding unit 10, the second encoding unit 20, and the inter-viewpoint processing unit 30 will be described in detail below.

図３は、第１の符号化部１０のブロック図である。第１の符号化部１０は、インター予測部１１、イントラ予測部１２、変換・量子化部１３、エントロピー符号化部１４、逆量子化・逆変換部１５、インループフィルタ部１６、およびバッファ部１７を備える。 FIG. 3 is a block diagram of the first encoding unit 10. The first encoding unit 10 includes an inter prediction unit 11, an intra prediction unit 12, a transform / quantization unit 13, an entropy encoding unit 14, an inverse quantization / inverse transform unit 15, an in-loop filter unit 16, and a buffer unit. 17.

インター予測部１１は、拡張視点の原画像ＳＩＧ１と、バッファ部１７から供給される後述のフィルタ後局所復号画像ＳＩＧ１８と、を入力とする。このインター予測部１１は、拡張視点の原画像ＳＩＧ１およびフィルタ後局所復号画像ＳＩＧ１８を用いてインター予測を行ってインター予測画像ＳＩＧ１１を生成し、出力する。 The inter prediction unit 11 receives an extended viewpoint original image SIG1 and a filtered local decoded image SIG18 (described later) supplied from the buffer unit 17. The inter prediction unit 11 performs inter prediction using the extended viewpoint original image SIG1 and the filtered local decoded image SIG18 to generate and output an inter predicted image SIG11.

イントラ予測部１２は、拡張視点の原画像ＳＩＧ１と、後述のフィルタ前局所復号画像ＳＩＧ１６と、を入力とする。このイントラ予測部１２は、拡張視点の原画像ＳＩＧ１およびフィルタ前局所復号画像ＳＩＧ１６を用いてイントラ予測を行ってイントラ予測画像ＳＩＧ１２を生成し、出力する。 The intra prediction unit 12 receives an extended viewpoint original image SIG1 and an after-filter local decoded image SIG16 described later. The intra prediction unit 12 performs intra prediction using the extended viewpoint original image SIG1 and the pre-filter local decoded image SIG16 to generate and output an intra predicted image SIG12.

変換・量子化部１３は、拡張視点の原画像ＳＩＧ１と、インター予測画像ＳＩＧ１１またはイントラ予測画像ＳＩＧ１２と、の誤差（残差）信号ＳＩＧ１３を入力とする。この変換・量子化部１３は、入力された残差信号ＳＩＧ１３を変換および量子化して量子化係数ＳＩＧ１４を生成し、出力する。 The transform / quantization unit 13 receives an error (residual) signal SIG13 between the extended viewpoint original image SIG1 and the inter predicted image SIG11 or the intra predicted image SIG12. The transform / quantization unit 13 transforms and quantizes the input residual signal SIG13 to generate a quantized coefficient SIG14 and outputs it.

エントロピー符号化部１４は、量子化係数ＳＩＧ１４と、後述の変換パラメータＳＩＧ５と、図示しないサイド情報（画素値の再構成に必要な予測モードや動きベクトルなどの関連情報）と、を入力とする。このエントロピー符号化部１４は、入力された信号をエントロピー符号化し、拡張視点のビットストリームＳＩＧ６として出力する。 The entropy encoding unit 14 receives a quantization coefficient SIG14, a transformation parameter SIG5 (described later), and side information (not shown) (related information such as a prediction mode and a motion vector necessary for pixel value reconstruction) as inputs. The entropy encoding unit 14 performs entropy encoding on the input signal and outputs it as an extended viewpoint bit stream SIG6.

逆量子化・逆変換部１５は、量子化係数ＳＩＧ１４を入力とする。この逆量子化・逆変換部１５は、量子化係数ＳＩＧ１４を逆量子化および逆変換して、逆変換された残差信号ＳＩＧ１５を生成し、出力する。 The inverse quantization / inverse transform unit 15 receives the quantization coefficient SIG14. The inverse quantization / inverse transform unit 15 inversely quantizes and inversely transforms the quantization coefficient SIG14 to generate and output an inversely transformed residual signal SIG15.

インループフィルタ部１６は、フィルタ前局所復号画像ＳＩＧ１６を入力とする。フィルタ前局所復号画像ＳＩＧ１６とは、インター予測画像ＳＩＧ１１またはイントラ予測画像ＳＩＧ１２と、逆変換された残差信号ＳＩＧ１５と、を合算した信号のことである。インループフィルタ部１６は、フィルタ前局所復号画像ＳＩＧ１６に対してデブロックフィルタといったインループフィルタを適用して、フィルタ後局所復号画像ＳＩＧ１７を生成し、出力する。 The in-loop filter unit 16 receives the pre-filter local decoded image SIG16. The pre-filter local decoded image SIG16 is a signal obtained by adding up the inter prediction image SIG11 or the intra prediction image SIG12 and the inversely converted residual signal SIG15. The in-loop filter unit 16 applies an in-loop filter such as a deblocking filter to the pre-filter local decoded image SIG16 to generate and output a post-filter local decoded image SIG17.

バッファ部１７は、フィルタ後局所復号画像ＳＩＧ１７と、基準視点の変換後のローカル復号画像ＳＩＧ４と、を蓄積し、適宜、フィルタ後局所復号画像ＳＩＧ１８としてインター予測部１１に供給する。 The buffer unit 17 accumulates the filtered local decoded image SIG17 and the local decoded image SIG4 after conversion of the reference viewpoint, and appropriately supplies the filtered local decoded image SIG18 to the inter prediction unit 11 as the filtered local decoded image SIG18.

図４は、第２の符号化部２０のブロック図である。第２の符号化部２０は、インター予測部２１、イントラ予測部２２、変換・量子化部２３、エントロピー符号化部２４、逆量子化・逆変換部２５、インループフィルタ部２６、およびバッファ部２７を備える。 FIG. 4 is a block diagram of the second encoding unit 20. The second encoding unit 20 includes an inter prediction unit 21, an intra prediction unit 22, a transform / quantization unit 23, an entropy encoding unit 24, an inverse quantization / inverse transform unit 25, an in-loop filter unit 26, and a buffer unit. 27.

インター予測部２１、イントラ予測部２２、変換・量子化部２３、エントロピー符号化部２４、逆量子化・逆変換部２５、およびインループフィルタ部２６は、それぞれ、図３のインター予測部１１、イントラ予測部１２、変換・量子化部１３、エントロピー符号化部１４、逆量子化・逆変換部１５、およびインループフィルタ部１６と同様に動作する。一方、バッファ部２７は、図３のバッファ部１７とは異なる動作を行う。 The inter prediction unit 21, the intra prediction unit 22, the transform / quantization unit 23, the entropy encoding unit 24, the inverse quantization / inverse transform unit 25, and the in-loop filter unit 26 are respectively the inter prediction unit 11, FIG. It operates in the same manner as the intra prediction unit 12, transform / quantization unit 13, entropy coding unit 14, inverse quantization / inverse transform unit 15, and in-loop filter unit 16. On the other hand, the buffer unit 27 performs an operation different from that of the buffer unit 17 of FIG.

バッファ部２７は、インループフィルタ部２６から出力されたフィルタ後局所復号画像ＳＩＧ２７を蓄積し、適宜、フィルタ後局所復号画像ＳＩＧ２８としてインター予測部２１に供給するとともに、基準視点のローカル復号画像ＳＩＧ３として出力する。 The buffer unit 27 accumulates the filtered local decoded image SIG27 output from the in-loop filter unit 26, and appropriately supplies the filtered local decoded image SIG28 as the filtered local decoded image SIG28 to the inter prediction unit 21, and as the local decoded image SIG3 of the reference viewpoint Output.

図２に戻って、視点間処理部３０は、拡張視点の原画像ＳＩＧ１と、基準視点の原画像ＳＩＧ２と、基準視点のローカル復号画像ＳＩＧ３と、を入力とする。この視点間処理部３０は、射影変換行列導出処理、照度補償パラメータ導出処理、および変換処理を行う。 Returning to FIG. 2, the inter-viewpoint processing unit 30 receives the expanded viewpoint original image SIG1, the reference viewpoint original image SIG2, and the reference viewpoint local decoded image SIG3. The inter-viewpoint processing unit 30 performs projective transformation matrix derivation processing, illuminance compensation parameter derivation processing, and conversion processing.

まず、射影変換行列導出処理について説明する。射影変換行列導出処理では、視点間処理部３０は、まず、ＳＩＦＴ（Scale-Invariant Feature Transform）アルゴリズムといった特徴点検出アルゴリズムにより、拡張視点の原画像ＳＩＧ１と、基準視点の原画像ＳＩＧ２と、から特徴点を抽出し、拡張視点の原画像ＳＩＧ１におけるそれぞれの特徴点と、基準視点の原画像ＳＩＧ２におけるそれぞれの特徴点と、の一致度を計算する。次に、一致度の最も高いものから順番に４組の特徴点を、拡張視点の原画像ＳＩＧ１と、基準視点の原画像ＳＩＧ２と、の間で対応する特徴点として特定する。次に、射影変換行列の要素を変数（未知数は８）とする８次元連立方程式を立て、射影変換行列ｓを導出する（例えば、＜Jan Erik Solem著、相川愛三訳、「実践コンピュータビジョン」株式会社オライリー・ジャパン、２０１３年０３月＞を参照）。 First, the projective transformation matrix derivation process will be described. In the projective transformation matrix derivation process, the inter-viewpoint processing unit 30 first uses the feature point detection algorithm such as a SIFT (Scale-Invariant Feature Transform) algorithm to perform a feature from the expanded viewpoint original image SIG1 and the reference viewpoint original image SIG2. A point is extracted, and a degree of coincidence between each feature point in the original image SIG1 at the extended viewpoint and each feature point in the original image SIG2 at the reference viewpoint is calculated. Next, four sets of feature points are identified as the corresponding feature points between the original image SIG1 of the extended viewpoint and the original image SIG2 of the reference viewpoint in order from the one with the highest degree of coincidence. Next, an 8-dimensional simultaneous equation with the elements of the projective transformation matrix as variables (8 unknowns) is set up to derive the projective transformation matrix s (for example, <Jan Erik Solem, Aikawa Aizo, "Practical Computer Vision" See O'Reilly Japan, Inc., March 2013>).

次に、照度補償パラメータ導出処理について説明する。照度補償パラメータ導出処理では、視点間処理部３０は、まず、以下の数式（１）で表される線形予測の数式を仮定する。次に、数式（１）のベクトルｙに、拡張視点の原画像ＳＩＧ１の輝度値を代入するとともに、数式（１）のベクトルｘに、基準視点の原画像ＳＩＧ２の輝度値を代入し、数式（１）を満たすスカラーａ、ｂを最小二乗法により導出して、照度補償パラメータａ、ｂとする。 Next, the illumination compensation parameter derivation process will be described. In the illumination compensation parameter derivation process, the inter-viewpoint processing unit 30 first assumes a linear prediction formula expressed by the following formula (1). Next, the luminance value of the expanded viewpoint original image SIG1 is substituted into the vector y of the equation (1), and the luminance value of the reference viewpoint original image SIG2 is substituted into the vector x of the equation (1). Scalars a and b satisfying 1) are derived by the least square method and set as illumination compensation parameters a and b.

なお、ベクトルｙに代入する拡張視点の原画像ＳＩＧ１の輝度値は、拡張視点の原画像ＳＩＧ１を構成する全ての画素の輝度値であってもよいし、拡張視点の原画像ＳＩＧ１を構成する全ての画素のうち予め定められた領域（例えば中央部など）の画素の輝度値であってもよいし、拡張視点の原画像ＳＩＧ１を構成する全ての画素のうち予め定められた画素（例えば１つおきの画素など）を間引いた後の画素の輝度値であってもよい。また、ベクトルｘに代入する基準視点の原画像ＳＩＧ２の輝度値は、基準視点の原画像ＳＩＧ２を構成する全ての画素の輝度値であってもよいし、基準視点の原画像ＳＩＧ２を構成する全ての画素のうち予め定められた領域（例えば中央部など）の画素の輝度値であってもよいし、基準視点の原画像ＳＩＧ２を構成する全ての画素のうち予め定められた画素（例えば１つおきの画素など）を間引いたものの輝度値であってもよい。また、数式（１）のベクトルｘには、基準視点の原画像ＳＩＧ２の輝度値の代わりに、基準視点のローカル復号画像ＳＩＧ３の輝度値を代入してもよい。 Note that the luminance value of the extended viewpoint original image SIG1 to be substituted into the vector y may be the luminance values of all the pixels constituting the extended viewpoint original image SIG1, or all of the elements constituting the extended viewpoint original image SIG1. May be a luminance value of a pixel in a predetermined region (for example, the central portion) of the pixels, or a predetermined pixel (for example, one pixel) of all the pixels constituting the expanded viewpoint original image SIG1. The luminance value of the pixel after thinning out every other pixel may be used. Further, the luminance value of the reference viewpoint original image SIG2 to be substituted into the vector x may be the luminance values of all the pixels constituting the reference viewpoint original image SIG2, or all of the reference viewpoint original image SIG2 constituting the reference viewpoint. May be a luminance value of a pixel in a predetermined region (for example, the central portion) of the pixels, or a predetermined pixel (for example, one pixel) among all the pixels constituting the reference viewpoint original image SIG2. The luminance value may be obtained by thinning out every other pixel. In addition, the luminance value of the local decoded image SIG3 of the reference viewpoint may be substituted for the vector x in Expression (1) instead of the luminance value of the original image SIG2 of the reference viewpoint.

射影変換行列導出処理により得られた射影変換行列ｓの各要素と、照度補償パラメータ導出処理により得られた照度補償パラメータａ、ｂとは、上述の変換パラメータＳＩＧ５として、第１の符号化部１０のエントロピー符号化部１４に送信される。 Each element of the projective transformation matrix s obtained by the projective transformation matrix derivation process and the illuminance compensation parameters a and b obtained by the illuminance compensation parameter derivation process are converted into the first encoding unit 10 as the above-described transformation parameter SIG5. To the entropy encoding unit 14.

ここで、射影変換行列ｓは、３行３列の行列であり、９つの要素のうちの１つは常に「１」である。このため、上述の射影変換行列ｓの各要素とは、射影変換行列ｓの９つの要素のうち「１」を除く８つの要素のことであり、これら８つの要素がスカラーとして第１の符号化部１０のエントロピー符号化部１４に送信されることになる。 Here, the projective transformation matrix s is a 3 × 3 matrix, and one of the nine elements is always “1”. For this reason, each element of the above-described projective transformation matrix s is eight elements excluding “1” among the nine elements of the projective transformation matrix s, and these eight elements are first encoded as scalars. It is transmitted to the entropy encoding unit 14 of the unit 10.

また、射影変換行列ｓを、基準視点における画像の角に対応する、拡張視点における画像中の４点の座標として表現することもできる。ここで、射影変換行列ｓの導出は、動画像符号化装置１および動画像復号装置１００の双方で行われることになるが、動画像復号装置１００は、ＳＩＦＴアルゴリズムといった特徴点検出アルゴリズムを行うことができない。このため、上述の対応する４点の座標を、上述の８つの要素の代わりに第１の符号化部１０のエントロピー符号化部１４に送信することとしてもよい。 The projective transformation matrix s can also be expressed as the coordinates of four points in the image at the extended viewpoint corresponding to the corner of the image at the reference viewpoint. Here, the derivation of the projective transformation matrix s is performed by both the moving image coding apparatus 1 and the moving image decoding apparatus 100. The moving image decoding apparatus 100 performs a feature point detection algorithm such as a SIFT algorithm. I can't. For this reason, it is good also as transmitting the coordinate of the four said points | pieces to the entropy encoding part 14 of the 1st encoding part 10 instead of the above-mentioned eight elements.

次に、変換処理について説明する。変換処理では、視点間処理部３０は、基準視点のローカル復号画像ＳＩＧ３に対して、射影変換行列ｓを適用して射影変換を行った後に、照度補償パラメータａ、ｂを適用して照度補償を行って、基準視点の変換後のローカル復号画像ＳＩＧ４を生成し、第１の符号化部１０のバッファ部１７に長期参照フレームとして蓄積させる。 Next, the conversion process will be described. In the conversion process, the inter-viewpoint processing unit 30 applies the projection conversion matrix s to the local decoded image SIG3 of the reference viewpoint and performs the projection conversion, and then applies the illuminance compensation parameters a and b to perform the illuminance compensation. Then, the local decoded image SIG4 after the conversion of the base viewpoint is generated and stored as a long-term reference frame in the buffer unit 17 of the first encoding unit 10.

［動画像復号装置１００の構成および動作］
図５は、動画像復号装置１００のブロック図である。動画像復号装置１００は、第１の復号部１１０、第２の復号部１２０、および視点間処理部１３０を備える。第１の復号部１１０は、拡張視点のビットストリームＳＩＧ６を復号し、拡張視点の復号済み画像ＳＩＧ１０３として出力する。第２の復号部１２０は、基準視点のビットストリームＳＩＧ７を復号し、基準視点の復号済み画像ＳＩＧ１０４として出力する。視点間処理部１３０は、動画像符号化装置１で導出された射影変換行列ｓおよび照度補償パラメータａ、ｂを用いて、基準視点の復号済み画像ＳＩＧ１０１を射影変換および照度補償して、基準視点の変換後の復号済み画像ＳＩＧ１０２を生成し、この基準視点の変換後の復号済み画像ＳＩＧ１０２を、第１の復号部１１０がインター予測を行う際に参照画像として用いることができるようにする。第１の復号部１１０、第２の復号部１２０、および視点間処理部１３０のそれぞれの動作について、以下に詳述する。 [Configuration and Operation of Video Decoding Device 100]
FIG. 5 is a block diagram of the video decoding device 100. The moving picture decoding apparatus 100 includes a first decoding unit 110, a second decoding unit 120, and an inter-viewpoint processing unit 130. The first decoding unit 110 decodes the extended-viewpoint bitstream SIG6 and outputs it as an extended-viewpoint decoded image SIG103. The second decoding unit 120 decodes the reference viewpoint bit stream SIG7 and outputs it as a reference viewpoint decoded image SIG104. The inter-viewpoint processing unit 130 performs the projective conversion and the illuminance compensation on the decoded image SIG101 of the reference viewpoint using the projection transformation matrix s and the illuminance compensation parameters a and b derived by the video encoding device 1, and the reference viewpoint The decoded image SIG102 after the conversion is generated, and the decoded image SIG102 after the base viewpoint conversion can be used as a reference image when the first decoding unit 110 performs the inter prediction. The operations of the first decoding unit 110, the second decoding unit 120, and the inter-viewpoint processing unit 130 will be described in detail below.

図６は、第１の復号部１１０のブロック図である。第１の復号部１１０は、エントロピー復号部１１１、逆変換・逆量子化部１１２、インター予測部１１３、イントラ予測部１１４、インループフィルタ部１１５、およびバッファ部１１６を備える。 FIG. 6 is a block diagram of the first decoding unit 110. The first decoding unit 110 includes an entropy decoding unit 111, an inverse transform / inverse quantization unit 112, an inter prediction unit 113, an intra prediction unit 114, an in-loop filter unit 115, and a buffer unit 116.

エントロピー復号部１１１は、拡張視点のビットストリームＳＩＧ６を入力とする。このエントロピー復号部１１１は、拡張視点のビットストリームＳＩＧ６をエントロピー復号し、量子化係数レベルＳＩＧ１１１と、動画像符号化装置１で生成された変換パラメータＳＩＧ５と、サイド情報（画素値の再構成に必要な予測モードや動きベクトルなどの関連情報）と、を導出して出力する。 The entropy decoding unit 111 receives the extended view bit stream SIG6. The entropy decoding unit 111 entropy-decodes the bit stream SIG6 of the extended viewpoint, the quantization coefficient level SIG111, the transformation parameter SIG5 generated by the video encoding device 1, and side information (necessary for pixel value reconstruction) Related information such as a prediction mode and a motion vector).

逆変換・逆量子化部１１２は、量子化係数レベルＳＩＧ１１１を入力とする。この逆変換・逆量子化部１１２は、量子化係数レベルＳＩＧ１１１を逆変換および逆量子化して、逆変換された残差信号ＳＩＧ１１２を生成し、出力する。 The inverse transform / inverse quantization unit 112 receives the quantization coefficient level SIG111. The inverse transform / inverse quantization unit 112 performs inverse transform and inverse quantization on the quantized coefficient level SIG111 to generate and output an inversely transformed residual signal SIG112.

インター予測部１１３は、バッファ部１１６から供給される後述のフィルタ後局所復号画像ＳＩＧ１１７を入力とする。このインター予測部１１３は、フィルタ後局所復号画像ＳＩＧ１１７を用いてインター予測を行ってインター予測画像ＳＩＧ１１３を生成し、出力する。 The inter prediction unit 113 receives a filtered local decoded image SIG117 described later supplied from the buffer unit 116 as an input. The inter prediction unit 113 performs inter prediction using the filtered local decoded image SIG117 to generate and output an inter predicted image SIG113.

イントラ予測部１１４は、フィルタ前局所復号画像ＳＩＧ１１５を入力とする。フィルタ前局所復号画像ＳＩＧ１１５とは、逆変換された残差信号ＳＩＧ１１２と、インター予測画像ＳＩＧ１１３またはイントラ予測画像ＳＩＧ１１４と、を合算した信号のことである。イントラ予測部１１４は、フィルタ前局所復号画像ＳＩＧ１１５を用いてイントラ予測を行ってイントラ予測画像ＳＩＧ１１４を生成し、出力する。 The intra prediction unit 114 receives the pre-filter local decoded image SIG115 as input. The pre-filter local decoded image SIG115 is a signal obtained by summing the inversely transformed residual signal SIG112 and the inter predicted image SIG113 or the intra predicted image SIG114. The intra prediction unit 114 performs intra prediction using the pre-filter local decoded image SIG115 to generate and output an intra predicted image SIG114.

インループフィルタ部１１５は、フィルタ前局所復号画像ＳＩＧ１１５を入力とする。このインループフィルタ部１１５は、フィルタ前局所復号画像ＳＩＧ１１５に対してデブロックフィルタといったインループフィルタを適用して、フィルタ後局所復号画像ＳＩＧ１１６を生成し、出力する。 The in-loop filter unit 115 receives the pre-filter local decoded image SIG115 as input. The in-loop filter unit 115 applies an in-loop filter such as a deblocking filter to the pre-filter local decoded image SIG 115 to generate and output a post-filter local decoded image SIG 116.

バッファ部１１６は、基準視点の変換後の復号済み画像ＳＩＧ１０２と、フィルタ後局所復号画像ＳＩＧ１１６と、を蓄積し、適宜、フィルタ後局所復号画像ＳＩＧ１１７としてインター予測部１１３に供給するとともに、拡張視点の復号済み画像ＳＩＧ１０３として出力する。 The buffer unit 116 accumulates the decoded image SIG102 after conversion of the reference viewpoint and the filtered local decoded image SIG116, and supplies the filtered image to the inter prediction unit 113 as a filtered local decoded image SIG117 as appropriate. The decoded image SIG103 is output.

図７は、第２の復号部１２０のブロック図である。第２の復号部１２０は、エントロピー復号部１２１、逆変換・逆量子化部１２２、インター予測部１２３、イントラ予測部１２４、インループフィルタ部１２５、およびバッファ部１２６を備える。 FIG. 7 is a block diagram of the second decoding unit 120. The second decoding unit 120 includes an entropy decoding unit 121, an inverse transform / inverse quantization unit 122, an inter prediction unit 123, an intra prediction unit 124, an in-loop filter unit 125, and a buffer unit 126.

エントロピー復号部１２１、逆変換・逆量子化部１２２、インター予測部１２３、イントラ予測部１２４、およびインループフィルタ部１２５は、それぞれ、図６のエントロピー復号部１１１、逆変換・逆量子化部１１２、インター予測部１１３、イントラ予測部１１４、およびインループフィルタ部１１５と同様に動作する。一方、バッファ部１２６は、図６のバッファ部１１６とは異なる動作を行う。 The entropy decoding unit 121, the inverse transform / inverse quantization unit 122, the inter prediction unit 123, the intra prediction unit 124, and the in-loop filter unit 125 are respectively the entropy decoding unit 111, the inverse transform / inverse quantization unit 112 in FIG. , The inter prediction unit 113, the intra prediction unit 114, and the in-loop filter unit 115 operate in the same manner. On the other hand, the buffer unit 126 performs an operation different from that of the buffer unit 116 of FIG.

バッファ部１２６は、インループフィルタ部１２５から出力されたフィルタ後局所復号画像ＳＩＧ１２６を蓄積し、適宜、フィルタ後局所復号画像ＳＩＧ１２７としてインター予測部１２３に供給するとともに、基準視点の復号済み画像ＳＩＧ１０１、ＳＩＧ１０４として出力する。 The buffer unit 126 accumulates the filtered local decoded image SIG126 output from the in-loop filter unit 125, and appropriately supplies the filtered local decoded image SIG127 to the inter prediction unit 123 as the filtered local decoded image SIG127. Output as SIG104.

図５に戻って、視点間処理部１３０は、動画像符号化装置１で生成された変換パラメータＳＩＧ５と、基準視点の復号済み画像ＳＩＧ１０１と、を入力とする。この視点間処理部１３０は、基準視点の復号済み画像ＳＩＧ１０１に対して、変換パラメータＳＩＧ５に含まれる射影変換行列ｓを適用して射影変換を行った後に、変換パラメータＳＩＧ５に含まれる照度補償パラメータａ、ｂを適用して照度補償を行って、基準視点の変換後の復号済み画像ＳＩＧ１０２を生成し、第１の復号部１１０のバッファ部１１６に長期参照フレームとして蓄積させる。 Returning to FIG. 5, the inter-viewpoint processing unit 130 receives the transformation parameter SIG5 generated by the video encoding device 1 and the decoded image SIG101 of the reference viewpoint. The inter-viewpoint processing unit 130 applies projection conversion matrix s included in the conversion parameter SIG5 to the decoded image SIG101 of the reference viewpoint, and then performs illuminance compensation parameter a included in the conversion parameter SIG5. , B is applied to perform illuminance compensation to generate a decoded image SIG102 after conversion of the base viewpoint, and store it as a long-term reference frame in the buffer unit 116 of the first decoding unit 110.

以上の動画像符号化装置１によれば、以下の効果を奏することができる。 According to the above moving picture coding apparatus 1, the following effects can be produced.

動画像符号化装置１は、拡張視点の参照画像リストに基準視点のフレームを追加する。このため、非特許文献１に示されているＭＶ−ＨＥＶＣと同様のフレームワークを用いて動画像を符号化することができるので、ビデオ符号化レイヤ以下の変更が不要であるとともに、既存のＨＥＶＣコーデック設計を最大限に流用することができる。したがって、実装コストを抑制することができる。 The moving image encoding apparatus 1 adds a frame of the base viewpoint to the reference image list of the extended viewpoint. For this reason, since a moving image can be encoded using the same framework as MV-HEVC shown in Non-Patent Document 1, it is not necessary to change the video encoding layer and below, and the existing HEVC The codec design can be diverted to the maximum extent. Therefore, the mounting cost can be suppressed.

また、動画像符号化装置１は、基準視点のローカル復号画像ＳＩＧ３を、基準視点と拡張視点との関係に応じて射影変換および照度補償を行って基準視点の変換後のローカル復号画像ＳＩＧ４を生成し、この基準視点の変換後のローカル復号画像ＳＩＧ４を用いて、拡張視点の原画像ＳＩＧ１を符号化することができる。ここで、同一の被写体を撮影した多視点映像には、視点ごとの動画像を、射影変換により相互に類似した動画像に変換可能であるという特徴がある。このため、基準視点と拡張視点との関係に応じた射影変換により、参照画像として、類似した画像を用いることができるので、符号化性能を向上させることができる。 In addition, the moving image encoding apparatus 1 generates a local decoded image SIG4 after conversion of the reference viewpoint by performing projective conversion and illumination compensation on the local decoded image SIG3 of the reference viewpoint according to the relationship between the reference viewpoint and the extended viewpoint. Then, the original image SIG1 of the extended viewpoint can be encoded using the local decoded image SIG4 after the conversion of the reference viewpoint. Here, a multi-view video obtained by photographing the same subject has a feature that a moving image for each viewpoint can be converted into a similar moving image by projective transformation. For this reason, a similar image can be used as a reference image by projective transformation according to the relationship between the base viewpoint and the extended viewpoint, so that the encoding performance can be improved.

また、動画像符号化装置１は、上述のように、基準視点のローカル復号画像ＳＩＧ３を、基準視点と拡張視点との関係に応じて射影変換および照度補償を行って基準視点の変換後のローカル復号画像ＳＩＧ４を生成し、この基準視点の変換後のローカル復号画像ＳＩＧ４を用いて、拡張視点の原画像ＳＩＧ１を符号化することができる。このため、基準視点と拡張視点とで照明条件といった撮影環境が異なっていることが原因で、基準視点の原画像ＳＩＧ２と拡張視点の原画像ＳＩＧ１とで照度が異なってしまっている場合でも、照度の差異を補償することができる。したがって、基準視点の原画像ＳＩＧ２と拡張視点の原画像ＳＩＧ１とで照度が異なる場合でも、基準視点と拡張視点との関係に応じた照度補償により、符号化性能を向上させることができる。 Further, as described above, the moving image encoding device 1 performs the projective conversion and the illuminance compensation on the local decoded image SIG3 of the reference viewpoint according to the relationship between the reference viewpoint and the extended viewpoint, and converts the local viewpoint after the conversion of the reference viewpoint. The decoded image SIG4 is generated, and the original image SIG1 of the extended viewpoint can be encoded using the local decoded image SIG4 after the conversion of the reference viewpoint. For this reason, even if the illuminance differs between the reference viewpoint original image SIG2 and the extended viewpoint original image SIG1, even if the illuminance is different between the reference viewpoint and the extended viewpoint, the shooting environment such as the illumination condition is different. Can compensate for the difference. Therefore, even when the illuminance differs between the reference viewpoint original image SIG2 and the expanded viewpoint original image SIG1, the encoding performance can be improved by illuminance compensation according to the relationship between the reference viewpoint and the expanded viewpoint.

また、動画像符号化装置１は、射影変換行列ｓの各要素および照度補償パラメータａ、ｂを、動画像復号装置１００に伝送する。このため、動画像復号装置１００でも、射影変換行列ｓおよび照度補償パラメータａ、ｂを用いて、動画像符号化装置１と同様の射影変換および照度補償を行うことができる。 In addition, the moving image encoding device 1 transmits each element of the projective transformation matrix s and the illumination compensation parameters a and b to the moving image decoding device 100. Therefore, the moving picture decoding apparatus 100 can perform the same projection conversion and illuminance compensation as the moving picture encoding apparatus 1 by using the projection transformation matrix s and the illuminance compensation parameters a and b.

また、動画像符号化装置１は、視点間処理部３０により、基準視点の変換後のローカル復号画像ＳＩＧ４をバッファ部１７に長期参照フレームとして蓄積させる。このため、基準視点における動きベクトルの再計算が不要になるので、符号化処理量を削減することができる。 Also, the moving image encoding apparatus 1 causes the inter-viewpoint processing unit 30 to store the local decoded image SIG4 after the base viewpoint conversion in the buffer unit 17 as a long-term reference frame. This eliminates the need to recalculate the motion vector at the reference viewpoint, thereby reducing the amount of encoding processing.

以上の動画像復号装置１００によれば、以下の効果を奏することができる。 According to the above video decoding device 100, the following effects can be obtained.

動画像復号装置１００は、拡張視点の参照画像リストに基準視点のフレームを追加する。このため、非特許文献１に示されているＭＶ−ＨＥＶＣと同様のフレームワークを用いて符号化データを復号することができるので、ビデオ符号化レイヤ以下の変更が不要であるとともに、既存のＨＥＶＣコーデック設計を最大限に流用することができる。したがって、実装コストを抑制することができる。 The moving image decoding apparatus 100 adds the frame of the standard viewpoint to the extended viewpoint reference image list. For this reason, since encoded data can be decoded using the same framework as MV-HEVC shown in Non-Patent Document 1, it is not necessary to change the video encoding layer and below, and existing HEVC The codec design can be diverted to the maximum extent. Therefore, the mounting cost can be suppressed.

また、動画像復号装置１００は、基準視点の復号済み画像ＳＩＧ１０１を、基準視点と拡張視点との関係に応じて射影変換および照度補償を行って基準視点の変換後の復号済み画像ＳＩＧ１０２を生成し、この基準視点の変換後の復号済み画像ＳＩＧ１０２を用いて、拡張視点のビットストリームＳＩＧ６を復号することができる。ここで、同一の被写体を撮影した多視点映像には、視点ごとの動画像を、射影変換により相互に類似した動画像に変換可能であるという特徴がある。このため、基準視点と拡張視点との関係に応じた射影変換により、参照画像として、類似した画像を用いることができるので、符号化性能を向上させることができる。 Further, the moving picture decoding apparatus 100 generates a decoded image SIG102 after the conversion of the reference viewpoint by performing projection conversion and illumination compensation on the decoded image SIG101 of the reference viewpoint according to the relationship between the reference viewpoint and the extended viewpoint. The extended view bitstream SIG6 can be decoded using the decoded image SIG102 after the conversion of the reference view. Here, a multi-view video obtained by photographing the same subject has a feature that a moving image for each viewpoint can be converted into a similar moving image by projective transformation. For this reason, a similar image can be used as a reference image by projective transformation according to the relationship between the base viewpoint and the extended viewpoint, so that the encoding performance can be improved.

また、動画像復号装置１００は、基準視点の変換後の復号済み画像ＳＩＧ１０２をバッファ部１１６に長期参照フレームとして蓄積させる。このため、基準視点における動きベクトルの再計算が不要になるので、復号処理量を削減することができる。 In addition, the moving image decoding apparatus 100 causes the buffer unit 116 to accumulate the decoded image SIG102 after the base viewpoint conversion as a long-term reference frame. This eliminates the need to recalculate the motion vector at the reference viewpoint, thereby reducing the amount of decoding processing.

なお、本発明の動画像符号化装置１や動画像復号装置１００の処理を、コンピュータ読み取り可能な非一時的な記録媒体に記録し、この記録媒体に記録されたプログラムを動画像符号化装置１や動画像復号装置１００に読み込ませ、実行することによって、本発明を実現できる。 Note that the processing of the moving image encoding device 1 and the moving image decoding device 100 of the present invention is recorded on a computer-readable non-transitory recording medium, and the program recorded on the recording medium is recorded as the moving image encoding device 1. Alternatively, the present invention can be realized by being read and executed by the video decoding device 100.

ここで、上述の記録媒体には、例えば、ＥＰＲＯＭやフラッシュメモリといった不揮発性のメモリ、ハードディスクといった磁気ディスク、ＣＤ−ＲＯＭなどを適用できる。また、この記録媒体に記録されたプログラムの読み込みおよび実行は、動画像符号化装置１や動画像復号装置１００に設けられたプロセッサによって行われる。 Here, for example, a nonvolatile memory such as an EPROM or a flash memory, a magnetic disk such as a hard disk, a CD-ROM, or the like can be applied to the above-described recording medium. In addition, reading and execution of the program recorded on the recording medium is performed by a processor provided in the moving image encoding device 1 or the moving image decoding device 100.

また、上述のプログラムは、このプログラムを記憶装置などに格納した動画像符号化装置１や動画像復号装置１００から、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネットなどのネットワーク（通信網）や電話回線などの通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。 Further, the above-described program is transferred from the moving image encoding device 1 or the moving image decoding device 100 storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. May be transmitted. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.

また、上述のプログラムは、上述の機能の一部を実現するためのものであってもよい。さらに、上述の機能を動画像符号化装置１や動画像復号装置１００にすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Further, the above-described program may be for realizing a part of the above-described function. Furthermore, what can implement | achieve the above-mentioned function in combination with the program already recorded on the moving image encoder 1 or the moving image decoder 100, what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態につき、図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計なども含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes a design that does not depart from the gist of the present invention.

例えば、上述の実施形態では、視点間処理部３０は、変換処理において、基準視点のローカル復号画像ＳＩＧ３に対して、射影変換行列ｓを適用して射影変換を行った後に、照度補償パラメータａ、ｂを適用して照度補償を行うこととした。しかし、これに限らず、基準視点のローカル復号画像ＳＩＧ３に対して、照度補償パラメータａ、ｂを適用して照度補償を行った後に、射影変換行列ｓを適用して射影変換を行うこととしてもよい。また、視点間処理部１３０についても、視点間処理部３０と同様に、基準視点の復号済み画像ＳＩＧ１０１に対して、照度補償パラメータａ、ｂを適用して照度補償を行った後に、射影変換行列ｓを適用して射影変換を行うこととしてもよい。 For example, in the above-described embodiment, the inter-viewpoint processing unit 30 performs the projective transformation by applying the projective transformation matrix s to the local decoded image SIG3 of the reference viewpoint in the transformation process, and then the illumination compensation parameter a, Illuminance compensation was performed by applying b. However, the present invention is not limited thereto, and the projection transformation may be performed by applying the projection transformation matrix s to the local decoded image SIG3 of the reference viewpoint after performing the illumination compensation by applying the illumination compensation parameters a and b. Good. Further, similarly to the inter-view processing unit 30, the inter-view processing unit 130 performs illuminance compensation by applying the illuminance compensation parameters a and b to the decoded image SIG 101 of the reference viewpoint, and then performs a projective transformation matrix. Projection transformation may be performed by applying s.

また、上述の実施形態では、幾何変換として射影変換を行うこととしたが、これに限らず、幾何変換として、例えば三角形パッチ分割によるアフィン変換を行うこととしてもよい。 In the above-described embodiment, the projective transformation is performed as the geometric transformation. However, the present invention is not limited to this. For example, affine transformation by triangular patch division may be performed as the geometric transformation.

また、上述の実施形態では、視点間処理部３０は、基準視点の変換後のローカル復号画像ＳＩＧ４をバッファ部１７に長期参照フレームとして蓄積させ、視点間処理部１３０は、基準視点の変換後の復号済み画像ＳＩＧ１０２をバッファ部１１６に長期参照フレームとして蓄積させることとした。また、上述の実施形態では、視点間処理部３０は、基準視点の変換後の復号済み画像ＳＩＧ１０２をバッファ部１１６に長期参照フレームとして蓄積させることとした。しかし、長期参照フレームとしてではなく、短期参照フレームとして蓄積させることとしてもよい。 In the above-described embodiment, the inter-view processing unit 30 accumulates the local decoded image SIG4 after conversion of the base viewpoint as a long-term reference frame in the buffer unit 17, and the inter-view processing unit 130 converts the base viewpoint after conversion of the base viewpoint. The decoded image SIG102 is stored in the buffer unit 116 as a long-term reference frame. In the above-described embodiment, the inter-view processing unit 30 stores the decoded image SIG102 after the base viewpoint conversion in the buffer unit 116 as a long-term reference frame. However, it may be stored as a short-term reference frame, not as a long-term reference frame.

また、上述の実施形態では、基準視点および拡張視点の２つの視点が存在しているものとした。しかし、これに限らず、３つ以上の視点が存在していてもよい。例えば３つの視点が存在している場合には、図８に示すように、視点Ｂを拡張視点とした際の基準視点を視点Ａとし、視点Ｃを拡張視点とした際の基準視点も視点Ａとしてもよいし、図９に示すように、視点Ｂを拡張視点とした際の基準視点を視点Ａとし、視点Ｃを拡張視点とした際の基準視点を視点Ｂとしてもよい。 In the above-described embodiment, it is assumed that there are two viewpoints, the reference viewpoint and the extended viewpoint. However, the present invention is not limited to this, and three or more viewpoints may exist. For example, when there are three viewpoints, as shown in FIG. 8, the reference viewpoint when the viewpoint B is the extended viewpoint is the viewpoint A, and the reference viewpoint when the viewpoint C is the extended viewpoint is also the viewpoint A. Alternatively, as shown in FIG. 9, the reference viewpoint when the viewpoint B is an extended viewpoint may be the viewpoint A, and the reference viewpoint when the viewpoint C is the extended viewpoint may be the viewpoint B.

ＡＡ・・・動画像処理システム
１・・・動画像符号化装置
１０・・・第１の符号化部
２０・・・第２の符号化部
３０・・・視点間処理部
１００・・・動画像復号装置
１１０・・・第１の復号部
１２０・・・第２の復号部
１３０・・・視点間処理部 AA ... Moving image processing system 1 ... Moving image encoding device 10 ... First encoding unit 20 ... Second encoding unit 30 ... Inter-viewpoint processing unit 100 ... Movie Image decoding apparatus 110... First decoding unit 120... Second decoding unit 130.

Claims

A moving image encoding device that generates encoded data by encoding moving images of a plurality of viewpoints with different perspectives,
Single-viewpoint encoding means for encoding each of the moving images of the plurality of viewpoints for each viewpoint;
One of the plurality of viewpoints is set as a reference viewpoint, and one of the plurality of viewpoints excluding the reference viewpoint is set as an extended viewpoint, and a moving image of the reference viewpoint is encoded by the single viewpoint encoding unit. Inter-view processing for generating a local decoded image after conversion of the reference viewpoint by geometrically transforming the local decoded image of the reference viewpoint obtained at the time of conversion into a reference according to the relationship between the reference viewpoint and the extended viewpoint Means,
Reference image list that makes it possible to use the local decoded image after conversion of the base viewpoint generated by the inter-viewpoint processing unit as a reference image when the moving image of the extended viewpoint is encoded by the single-viewpoint encoding unit A moving image encoding apparatus comprising: an adding unit;

2. The moving picture coding apparatus according to claim 1, wherein the inter-viewpoint processing means transmits the parameters used in the geometric transformation to a moving picture decoding apparatus that decodes the coded data.

The inter-viewpoint processing unit determines a local decoded image of the reference viewpoint obtained when the moving image of the reference viewpoint is encoded by the single-viewpoint encoding unit according to the relationship between the reference viewpoint and the extended viewpoint. 2. The moving image encoding apparatus according to claim 1, wherein a local decoded image after conversion of the reference viewpoint is generated by performing geometric conversion and illuminance compensation.

4. The inter-viewpoint processing means transmits a parameter used for geometric transformation and a parameter used for illuminance compensation to a moving picture decoding apparatus that decodes the encoded data. The moving image encoding apparatus described in 1.

5. The moving picture encoding apparatus according to claim 1, wherein the inter-viewpoint processing unit performs any one of projective transformation and affine transformation by triangular patch division as geometric transformation.

The reference image list adding unit refers to a long-term reference when the local viewpoint image generated by the inter-viewpoint processing unit after the conversion of the base viewpoint is encoded by the single-viewpoint encoding unit. 6. The moving picture coding apparatus according to claim 1, wherein the moving picture coding apparatus is usable as a frame.

A video decoding device that decodes encoded data obtained by encoding video from a plurality of viewpoints with different perspectives,
A single-viewpoint decoding means for decoding the encoded data to generate a decoded image for each of the plurality of viewpoints, and for deriving parameters used in geometric transformation at the time of generation of the encoded data;
Of the plurality of viewpoints, when the viewpoint that was the reference viewpoint at the time of generating the encoded data is a decoding-side reference viewpoint, and the viewpoint that was the extended viewpoint at the time of generating the encoded data is a decoding-side extended viewpoint, the single viewpoint is Viewpoint for generating a decoded image after conversion of the decoding-side reference viewpoint by geometrically transforming the decoded image of the decoding-side reference viewpoint generated by the viewpoint decoding means using the parameters derived by the single-viewpoint decoding means Interprocessing means;
Reference that enables the decoded image after conversion of the decoding-side reference viewpoint generated by the inter-viewpoint processing unit to be used as a reference image when the encoded data of the decoding-side extended viewpoint is decoded by the single-viewpoint decoding unit A moving picture decoding apparatus comprising: an image list adding means;

The single-viewpoint decoding means decodes the encoded data and derives a parameter used for illuminance compensation at the time of generation of the encoded data,
The inter-view processing means uses a parameter derived by the single-view decoding means for the decoded image of the decoding-side reference viewpoint obtained when the moving picture of the decoding-side reference viewpoint is decoded by the single-view decoding means. The moving picture decoding apparatus according to claim 7, wherein the decoded picture after conversion of the decoding-side reference viewpoint is generated by performing geometric conversion and illuminance compensation.

The moving image decoding apparatus according to claim 7 or 8, wherein the inter-viewpoint processing means performs any one of projective transformation and affine transformation by triangular patch division as geometric transformation.

The reference image list adding unit is a long-term decoding unit that decodes the decoded image of the decoding-side reference viewpoint generated by the inter-viewpoint processing unit and the moving image of the decoding-side extended viewpoint by the single-viewpoint decoding unit. 10. The moving picture decoding apparatus according to claim 7, wherein the moving picture decoding apparatus is usable as a reference frame.

A moving image encoding device that encodes moving images from a plurality of viewpoints with different perspectives to generate encoded data, and a moving image decoding device that decodes encoded data generated by the moving image encoding device. A moving image processing system,
The moving image encoding device is:
Single-viewpoint encoding means for encoding each of the moving images of the plurality of viewpoints for each viewpoint;
One of the plurality of viewpoints is set as a reference viewpoint, and one of the plurality of viewpoints excluding the reference viewpoint is set as an extended viewpoint, and a moving image of the reference viewpoint is encoded by the single viewpoint encoding unit. Coding side that generates a local decoded image after conversion of the reference viewpoint by geometrically transforming the local decoded image of the reference viewpoint obtained at the time of conversion into a reference according to the relationship between the reference viewpoint and the extended viewpoint Inter-viewpoint processing means;
The local decoded image after conversion of the reference viewpoint generated by the encoding inter-viewpoint processing unit can be used as a reference image when the extended viewpoint moving image is encoded by the single-viewpoint encoding unit. Encoding side reference image list addition means,
The moving picture decoding device comprises:
A single-viewpoint decoding means for decoding the encoded data to generate a decoded image for each of the plurality of viewpoints, and for deriving parameters used in geometric transformation in the video encoding device;
A decoding-side viewpoint that generates a decoded image after conversion of the reference viewpoint by geometrically converting the decoded image of the reference viewpoint generated by the single-viewpoint decoding unit using the parameters derived by the single-viewpoint decoding unit Interprocessing means;
A decoding-side reference image that can be used as a reference image when the extended-view moving image is decoded by the single-view decoding unit, using the decoded image after the conversion of the base viewpoint generated by the decoding-side viewpoint processing unit. And a list adding means.

A moving picture decoding method in a moving picture coding apparatus, comprising a single-viewpoint coding means, an inter-viewpoint processing means, and a reference picture list adding means, and coding coded moving pictures from a plurality of viewpoints with different perspectives to generate coded data There,
The single-viewpoint encoding means uses one of the plurality of viewpoints as a reference viewpoint, and uses one of the plurality of viewpoints excluding the reference viewpoint as an extended viewpoint. A first step of encoding;
The inter-viewpoint processing means determines the local decoded image of the reference viewpoint obtained when the moving image of the reference viewpoint is encoded in the first step according to the relationship between the reference viewpoint and the extended viewpoint. A second step of performing geometric transformation to generate a local decoded image after transformation of the reference viewpoint;
A third step in which the reference image list adding means makes the local decoded image after the conversion of the standard viewpoint generated in the second step usable as a reference image;
The moving image characterized in that the single-viewpoint encoding unit includes a fourth step of encoding the extended-viewpoint moving image using the reference image made available in the third step. Encoding method.

Moving picture decoding in a moving picture decoding apparatus that decodes encoded data obtained by encoding moving pictures of a plurality of viewpoints with different perspectives, comprising a single-view decoding means, an inter-view processing means, and a reference image list adding means A method,
The single-viewpoint decoding means sets the viewpoint that was the reference viewpoint when the encoded data was generated among the plurality of viewpoints as a decoding-side reference viewpoint, and the viewpoint that was the extended viewpoint when the encoded data was generated as the decoding side As an extended viewpoint, the encoded data is decoded to generate a decoded image of the decoding-side reference viewpoint, and a parameter used for geometric transformation at the time of generation of the encoded data is derived. And the steps
The inter-view processing means geometrically transforms the decoded image of the decoding-side reference viewpoint generated in the first step using the parameters derived in the first step, and A second step of generating a transformed decoded image;
A third step in which the reference image list adding means makes the decoded image after conversion of the decoding-side standard viewpoint generated in the second step available as a reference image;
A moving image characterized in that the single-view decoding means includes a fourth step of decoding the encoded data of the decoding-side extended viewpoint using the reference image made available in the third step. Image decoding method.

A moving picture decoding method in a moving picture coding apparatus that includes a single-viewpoint encoding unit, an inter-viewpoint processing unit, and a reference image list addition unit, and generates encoded data by encoding moving images of a plurality of viewpoints with different perspectives. A program for causing a computer to execute,
The single-viewpoint encoding means uses one of the plurality of viewpoints as a reference viewpoint, and uses one of the plurality of viewpoints excluding the reference viewpoint as an extended viewpoint. A first step of encoding;
The inter-viewpoint processing means determines the local decoded image of the reference viewpoint obtained when the moving image of the reference viewpoint is encoded in the first step according to the relationship between the reference viewpoint and the extended viewpoint. A second step of performing geometric transformation to generate a local decoded image after transformation of the reference viewpoint;
A third step in which the reference image list adding means makes the local decoded image after the conversion of the standard viewpoint generated in the second step usable as a reference image;
A program for causing a computer to execute a fourth step in which the single-viewpoint encoding unit encodes a moving image of the extended viewpoint using a reference image that has been made available in the third step.

Moving picture decoding in a moving picture decoding apparatus that decodes encoded data obtained by encoding moving pictures of a plurality of viewpoints with different perspectives, comprising a single-view decoding means, an inter-view processing means, and a reference image list adding means A program for causing a computer to execute the method,
The single-viewpoint decoding means sets the viewpoint that was the reference viewpoint when the encoded data was generated among the plurality of viewpoints as a decoding-side reference viewpoint, and the viewpoint that was the extended viewpoint when the encoded data was generated as the decoding side As an extended viewpoint, the encoded data is decoded to generate a decoded image of the decoding-side reference viewpoint, and a parameter used for geometric transformation at the time of generation of the encoded data is derived. And the steps
The inter-view processing means geometrically transforms the decoded image of the decoding-side reference viewpoint generated in the first step using the parameters derived in the first step, and A second step of generating a transformed decoded image;
A third step in which the reference image list adding means makes the decoded image after conversion of the decoding-side standard viewpoint generated in the second step available as a reference image;
A program for causing the computer to execute a fourth step in which the single-viewpoint decoding unit decodes the encoded data of the decoding-side extended viewpoint using the reference image made available in the third step. .