JP6139953B2

JP6139953B2 - Video encoding method, video decoding method, video encoding device, video decoding device, video encoding program, video decoding program, and recording medium

Info

Publication number: JP6139953B2
Application number: JP2013084198A
Authority: JP
Inventors: 志織杉本; 信哉志水; 木全　英明; 英明木全; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-04-12
Filing date: 2013-04-12
Publication date: 2017-05-31
Anticipated expiration: 2033-04-12
Also published as: JP2014207573A

Description

本発明は、映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム、映像復号プログラム及び記録媒体に関し、特に双方向予測符号化方法に関する。 The present invention relates to a video encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, a video decoding program, and a recording medium, and particularly to a bidirectional predictive encoding method.

一般的な映像符号化では、被写体の空間的／時間的な連続性を利用して、映像の各フレームを処理単位ブロックに分割し、ブロック毎にその映像信号を空間的／時間的に予測し、その予測方法を示す予測情報と予測残差信号とを符号化することで、映像信号そのものを符号化する場合に比べて大幅な符号化効率の向上を図っている。一般的な二次元映像符号化では、同じフレーム内の既に符号化済みのブロックを参照して符号化対象信号を予測するイントラ予測と、既に符号化済みの他のフレームを参照して動き補償などに基づき符号化対象信号を予測するフレーム間予測を行う。 In general video encoding, each frame of video is divided into processing unit blocks using spatial / temporal continuity of the subject, and the video signal is predicted spatially / temporally for each block. By encoding the prediction information indicating the prediction method and the prediction residual signal, the encoding efficiency is greatly improved as compared with the case of encoding the video signal itself. In general 2D video coding, intra prediction for predicting a signal to be encoded with reference to an already encoded block in the same frame, motion compensation with reference to another already encoded frame, etc. Based on the above, inter-frame prediction for predicting the encoding target signal is performed.

ここで、多視点映像符号化について説明する。多視点映像符号化とは、同一のシーンを複数のカメラで撮影した複数の映像を、その映像間の冗長性を利用して高い効率で符号化するものである。多視点映像符号化については非特許文献１に詳しい。多視点映像符号化においては、一般的な映像符号化で用いられる予測方法の他に、既に符号化済みの別の視点の映像を参照して視差補償に基づき符号化対象信号を予測する視点間予測と、フレーム間予測により符号化対象信号を予測し、その残差信号を既に符号化済みの別の視点の映像の符号化時の残差信号を参照して予測する視点間残差予測などの方法が用いられる。視点間予測は、ＭＶＣなどの多視点映像符号化ではフレーム間予測とまとめてインター予測として扱われ、Ｂピクチャにおいては２つ以上の予測画像を補間して予測画像とする双方向予測にも用いることができる。 Here, multi-view video encoding will be described. Multi-view video encoding is to encode a plurality of videos obtained by photographing the same scene with a plurality of cameras with high efficiency by using redundancy between the videos. Multi-view video coding is detailed in Non-Patent Document 1. In multi-view video encoding, in addition to the prediction method used in general video encoding, between the viewpoints that predict the encoding target signal based on parallax compensation with reference to video of another viewpoint that has already been encoded. Inter-viewpoint residual prediction that predicts a signal to be encoded by prediction and interframe prediction, and predicts the residual signal with reference to the residual signal at the time of encoding another viewpoint video that has already been encoded The method is used. Inter-view prediction is handled as inter prediction together with inter-frame prediction in multi-view video coding such as MVC, and is also used for bi-prediction in B picture to interpolate two or more prediction images to make a prediction image. be able to.

残差予測は、高い相関を持つ２つの画像をそれぞれ予測符号化した場合にその予測残差も互いに相関を持つことを利用した、予測残差の符号量を抑えるための方法である。残差予測については非特許文献２に詳しい。多視点映像符号化において用いられる視点間残差予測では、異なる視点の映像における符号化対象画像と対応する領域の符号化時の予測残差信号を符号化対象の予測残差信号から差し引くことによって残差信号のエネルギーを低減し、符号化効率を向上することが可能である。視点間の対応関係は、例えば既に符号化済みの周辺ブロックが視差補償予測で符号化されている場合に、その視差補償ベクトルによって符号化対象ブロックに対応する別の視点の領域を設定するなどの方法で求められる。視点間残差予測は、Ｂピクチャにおいてフレーム間予測が用いられる場合に、その予測とは別に残差に対する更なる処理として用いられる。 Residual prediction is a method for suppressing the code amount of a prediction residual using the fact that when two images having high correlation are predictively encoded, the prediction residuals are also correlated with each other. The residual prediction is detailed in Non-Patent Document 2. In the inter-view residual prediction used in multi-view video encoding, the prediction residual signal at the time of encoding the region corresponding to the encoding target image in the video of different viewpoint is subtracted from the prediction residual signal of the encoding target. It is possible to reduce the energy of the residual signal and improve the encoding efficiency. The correspondence relationship between viewpoints is, for example, when an already encoded peripheral block is encoded by disparity compensation prediction, a region of another viewpoint corresponding to the encoding target block is set by the disparity compensation vector, etc. Required by the method. Inter-view residual prediction is used as a further process for the residual separately from the prediction when inter-frame prediction is used in a B picture.

M. Flierl and B. Girod, "Multiview video compression," Signal Processing Magazine, IEEE, no. November 2007, pp. 66-76, 2007.M. Flierl and B. Girod, "Multiview video compression," Signal Processing Magazine, IEEE, no. November 2007, pp. 66-76, 2007. X. Wang and J. Ridge, "Improved video coding with residual prediction for extended spatial scalability," Communications, Control and Signal Processing, 2008. ISCCSP 2008. 3rd International Symposium on, no. March, pp. 1041-1046, 2008.X. Wang and J. Ridge, "Improved video coding with residual prediction for extended spatial scalability," Communications, Control and Signal Processing, 2008. ISCCSP 2008. 3rd International Symposium on, no. March, pp. 1041-1046, 2008.

多視点映像符号化においては、フレーム間予測と視点間予測の両方を行うことができるピクチャにおいてはフレーム間予測と視点間予測による双方向予測を行うことができる。 In multi-view video coding, bi-directional prediction based on inter-frame prediction and inter-view prediction can be performed on a picture that can perform both inter-frame prediction and inter-view prediction.

しかしながら、動き補償予測と視差補償予測とでは誤差の性質が異なり、シーケンスの性質によってはフレーム間予測だけからなる双方向予測に比べて互いに誤差を打ち消しあう効果が得られにくい。そのような誤差には例えば動き補償予測では被写体の変形等によるものやブレによるもの、視差補償予測ではカメラの性質の違いによるものやオクルージョンの発生によるものなどがある。そのような場合には精度の高い方の予測方法が偏って選択され、双方向予測はほとんど用いられない。このため、例えば前方向予測と視点間予測が可能な種類のＢピクチャにおいて、構造上は双方向予測が可能であるにもかかわらず、実際には単方向予測しか用いられないために十分な効果が得られない場合がある。 However, motion-compensated prediction and disparity-compensated prediction have different error properties, and depending on the sequence properties, it is difficult to obtain an effect of canceling errors from each other as compared to bi-directional prediction including only inter-frame prediction. Such errors include, for example, those caused by deformation of the subject or motion blur in motion compensation prediction, and those caused by differences in camera properties and occurrence of occlusion in parallax compensation prediction. In such a case, the prediction method with higher accuracy is selected biased, and bi-directional prediction is hardly used. For this reason, for example, in a B picture of a type capable of forward prediction and inter-view prediction, although only bi-directional prediction is possible structurally, only uni-directional prediction is actually used. May not be obtained.

また、従来の視点間残差予測は、符号化対象の原画像に対して、符号化済みの別の視点の復号画像上の対応する領域を参照し、その領域の符号化時の予測残差を符号化対象領域の予測残差から差し引いたものを符号化対象としているが、これは互いに相関の高い画像同士はその予測残差の相関も高いという仮定に基づく。実際の映像符号化においては、２つの画像が一方はインター予測でもう一方はイントラ予測というようなそれぞれ異なる方法で予測符号化される場合や、同一の予測方法であっても片方が前方向予測で片方が後方向予測であるというような予測方向が異なる場合や、あるいは予測領域の大きさが異なる場合や一方が予測領域の継ぎ目に当たる場合など、上記仮定が成り立たない場合が多数存在し、そのような条件下では従来の視点間残差予測機能は有効に働かないという問題がある。 In addition, the conventional inter-viewpoint residual prediction refers to a corresponding region on a decoded image of another viewpoint that has been encoded with respect to an original image to be encoded, and a prediction residual at the time of encoding the region Is subtracted from the prediction residual of the encoding target region as the encoding target, which is based on the assumption that the images having high correlation with each other have high correlation between the prediction residuals. In actual video coding, two images are predicted and encoded by different methods, one of which is inter-prediction and the other is intra-prediction. There are many cases where the above assumption does not hold, such as when the prediction direction is different such that one is backward prediction, the size of the prediction region is different, or when one of the prediction regions is the joint of the prediction region, Under such conditions, there is a problem that the conventional inter-viewpoint residual prediction function does not work effectively.

本発明は、このような事情に鑑みてなされたもので、予測残差符号化に必要な符号量を削減することができる映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム、映像復号プログラム及び記録媒体を提供することを目的とする。 The present invention has been made in view of such circumstances, and a video encoding method, a video decoding method, a video encoding device, a video decoding device, and a video encoding method that can reduce the amount of code required for predictive residual encoding, An object is to provide a video encoding program, a video decoding program, and a recording medium.

本発明は、時間方向と視差方向の両方で予測を行う双方向予測を用いて、一方の予測として符号化対象画像を予測し、他方の予測として予測残差を予測する残差予測を行う際に、該残差予測の予測残差を符号化する映像符号化装置が行う映像符号化方法であって、前記時間方向または前記視差方向の既に復号済みの画像を参照ピクチャとして前記符号化対象画像を予測し、一次予測画像を生成する一次予測ステップと、前記一次予測画像と前記符号化対象画像とから一次予測残差を生成する一次予測残差生成ステップと、前記一次予測ステップで参照する方向と異なる方向の既に復号済みの画像の符号化時の予測残差を参照ピクチャとして前記一次予測残差を予測し、予測予測残差を生成する残差予測ステップと、前記一次予測残差と前記予測予測残差とから前記予測残差を生成する予測残差生成ステップとを有することを特徴とする。 The present invention uses bi-directional prediction in which prediction is performed in both the temporal direction and the parallax direction, and predicts an encoding target image as one prediction and performs residual prediction that predicts a prediction residual as the other prediction. And a video encoding method performed by a video encoding device that encodes the prediction residual of the residual prediction, wherein the image to be encoded is a reference picture that has already been decoded in the temporal direction or the parallax direction. A primary prediction step of generating a primary prediction image, a primary prediction residual generation step of generating a primary prediction residual from the primary prediction image and the encoding target image, and a direction referred to in the primary prediction step Predicting the primary prediction residual using a prediction residual at the time of encoding an already decoded image in a different direction as a reference picture, and generating a prediction prediction residual, the primary prediction residual, and the Forecast And having a prediction residual generation step of generating the prediction residual and a prediction residual.

本発明は、前記予測予測残差と前記一次予測画像とから予測画像を更新する予測画像更新ステップを更に有し、前記予測残差生成ステップでは、前記予測画像と前記符号化対象画像とから予測残差を生成することを特徴とする。 The present invention further includes a prediction image update step of updating a prediction image from the prediction prediction residual and the primary prediction image, and in the prediction residual generation step, prediction is performed from the prediction image and the encoding target image. It is characterized by generating a residual.

本発明は、前記残差予測における予測参照先を特定する情報である残差予測情報を生成する残差予測情報生成ステップを更に有することを特徴とする。 The present invention further includes a residual prediction information generation step of generating residual prediction information that is information for specifying a prediction reference destination in the residual prediction.

本発明は、前記残差予測情報を符号化する残差予測情報符号化ステップを更に有することを特徴とする。 The present invention is characterized by further comprising a residual prediction information encoding step for encoding the residual prediction information.

本発明は、既に復号済みの画像の符号化時の予測残差を参照ピクチャリストに含める参照ピクチャリスト更新ステップを更に有し、前記残差予測情報は、参照ピクチャリスト中の参照予測残差ピクチャを特定する参照インデックスとその上の領域を特定するベクトルであることを特徴とする。 The present invention further includes a reference picture list update step of including a prediction residual at the time of encoding an already decoded image in a reference picture list, wherein the residual prediction information is a reference prediction residual picture in the reference picture list. It is a reference index for identifying the vector and a vector for identifying the region above it.

本発明は、一方の予測における前記参照インデックスが前記参照予測残差ピクチャを示す場合に残差予測を実行する残差予測判定ステップを更に有することを特徴とする。 The present invention is further characterized by further comprising a residual prediction determination step of performing residual prediction when the reference index in one prediction indicates the reference prediction residual picture.

本発明は、前記残差予測情報は、前記参照ピクチャを特定するインデックスとその上の領域を特定するベクトルであり、前記残差予測ステップでは、符号化時の予測残差を参照して予測予測残差を生成することを特徴とする。 In the present invention, the residual prediction information is an index that specifies the reference picture and a vector that specifies a region above the index. In the residual prediction step, prediction prediction is performed with reference to a prediction residual at the time of encoding. It is characterized by generating a residual.

本発明は、前記残差予測情報を予測し、予測残差予測情報を生成する残差予測情報予測ステップを更に有し、前記残差予測情報符号化ステップでは、前記残差予測情報と前記予測残差予測情報との差分である残差予測情報差分を符号化することを特徴とする。 The present invention further includes a residual prediction information prediction step that predicts the residual prediction information and generates prediction residual prediction information. In the residual prediction information encoding step, the residual prediction information and the prediction The residual prediction information difference, which is a difference with the residual prediction information, is encoded.

本発明は、時間方向と視差方向の両方で予測を行う双方向予測を用いて、一方の予測として符号化対象画像を予測し、他方の予測として予測残差を予測する残差予測を行う際に、該残差予測の予測残差を符号化した符号データから復号画像を生成する映像復号装置が行う映像復号方法であって、前記時間方向または前記視差方向の既に復号済みの画像を参照ピクチャとして復号対象画像を予測し、一次予測画像を生成する一次予測ステップと、前記一次予測ステップで参照する方向と異なる方向の既に復号済みの画像の符号化時の予測残差を参照ピクチャとして前記一次予測画像と復号画像との差分である一次予測残差を予測し予測予測残差を生成する残差予測ステップと、前記符号データを復号した前記予測残差と前記予測予測残差とから前記一次予測残差を生成する一次予測残差生成ステップと、前記一次予測残差と前記一次予測画像とから前記復号画像を生成する復号画像生成ステップとを有することを特徴とする。 The present invention uses bi-directional prediction in which prediction is performed in both the temporal direction and the parallax direction, and predicts an encoding target image as one prediction and performs residual prediction that predicts a prediction residual as the other prediction. And a video decoding method performed by a video decoding apparatus that generates a decoded image from code data obtained by encoding a prediction residual of the residual prediction, wherein an already decoded image in the temporal direction or the parallax direction is referred to as a reference picture A primary prediction step for predicting a decoding target image and generating a primary prediction image, and a prediction residual at the time of encoding an already decoded image in a direction different from a direction referred to in the primary prediction step, as the reference picture. A residual prediction step that predicts a primary prediction residual that is a difference between a predicted image and a decoded image and generates a predicted prediction residual, and the prediction residual and the prediction prediction residual obtained by decoding the code data A primary prediction residual generation step of generating the serial primary prediction residual, and having a decoded image generation step of generating the decoded image from the primary prediction residual and the primary predicted image.

本発明は、前記一次予測残差生成ステップに代えて、前記一次予測画像と前記予測予測残差とから予測画像を生成する予測画像生成ステップを有し、前記復号画像生成ステップは、前記予測残差と前記予測画像とから復号画像を生成することを特徴とする。 The present invention has a prediction image generation step of generating a prediction image from the primary prediction image and the prediction prediction residual instead of the primary prediction residual generation step, and the decoded image generation step includes the prediction residual A decoded image is generated from the difference and the predicted image.

本発明は、前記残差予測における参照先を指定する情報である残差予測情報を生成する残差予測情報生成ステップを更に有し、前記残差予測ステップでは、前記残差予測情報に基づいて予測予測残差を生成することを特徴とする。 The present invention further includes a residual prediction information generation step for generating residual prediction information that is information for designating a reference destination in the residual prediction, wherein the residual prediction step is based on the residual prediction information. A prediction prediction residual is generated.

本発明は、前記残差予測における参照先を指定する情報である残差予測情報を復号する残差予測情報復号ステップを更に有し、前記残差予測ステップでは、前記残差予測情報に基づいて予測予測残差を生成することを特徴とする。 The present invention further includes a residual prediction information decoding step for decoding residual prediction information which is information for designating a reference destination in the residual prediction, and the residual prediction step is based on the residual prediction information. A prediction prediction residual is generated.

本発明は、既に復号済みの画像の前記予測残差を参照ピクチャリストに含める参照ピクチャリスト更新ステップを更に有し、前記残差予測情報は、参照ピクチャリスト中の参照予測残差ピクチャを特定する参照インデックスとその上の領域を特定するベクトルであることを特徴とする。 The present invention further includes a reference picture list update step of including the prediction residual of an already decoded image in a reference picture list, and the residual prediction information specifies a reference prediction residual picture in the reference picture list It is a vector that specifies a reference index and a region above it.

本発明は、一方の予測における参照インデックスが参照予測残差ピクチャを示す場合に残差予測を実行する残差予測判定ステップを更に有することを特徴とする。 The present invention is further characterized by further comprising a residual prediction determination step for executing residual prediction when a reference index in one prediction indicates a reference prediction residual picture.

本発明は、前記残差予測情報は、参照ピクチャを特定するインデックスとその上の領域を特定するベクトルであり、前記残差予測ステップでは、前記領域の符号化時の予測残差を参照して予測予測残差を生成することを特徴とする。 In the present invention, the residual prediction information is an index that specifies a reference picture and a vector that specifies a region above the reference picture, and the residual prediction step refers to a prediction residual at the time of encoding the region. A prediction prediction residual is generated.

本発明は、前記残差予測情報を予測し、予測残差予測情報を生成する残差予測情報予測ステップを更に有し、前記残差予測情報復号ステップでは、前記残差予測情報と前記予測残差予測情報との差分である残差予測情報差分を復号し、前記予測残差予測情報と合わせて残差予測情報を生成することを特徴とする。 The present invention further includes a residual prediction information prediction step that predicts the residual prediction information and generates prediction residual prediction information. In the residual prediction information decoding step, the residual prediction information and the prediction residual are included. A residual prediction information difference that is a difference from the difference prediction information is decoded, and residual prediction information is generated together with the prediction residual prediction information.

本発明は、時間方向と視差方向の両方で予測を行う双方向予測を用いて、一方の予測として符号化対象画像を予測し、他方の予測として予測残差を予測する残差予測を行う際に、該残差予測の予測残差を符号化する映像符号化装置であって、前記時間方向または前記視差方向の既に復号済みの画像を参照ピクチャとして前記符号化対象画像を予測し、一次予測画像を生成する一次予測手段と、前記一次予測画像と前記符号化対象画像とから一次予測残差を生成する一次予測残差生成手段と、前記一次予測手段において参照する方向と異なる方向の既に復号済みの画像の符号化時の予測残差を参照ピクチャとして前記一次予測残差を予測し、予測予測残差を生成する残差予測手段と、前記一次予測残差と前記予測予測残差とから前記予測残差を生成する予測残差生成手段とを備えることを特徴とする。 The present invention uses bi-directional prediction in which prediction is performed in both the temporal direction and the parallax direction, and predicts an encoding target image as one prediction and performs residual prediction that predicts a prediction residual as the other prediction. In addition, a video encoding apparatus that encodes the prediction residual of the residual prediction, predicts the encoding target image using the already decoded image in the temporal direction or the parallax direction as a reference picture, and performs primary prediction Primary prediction means for generating an image, primary prediction residual generation means for generating a primary prediction residual from the primary prediction image and the encoding target image, and already decoding in a direction different from a direction referred to in the primary prediction means A residual prediction unit that predicts the primary prediction residual using a prediction residual at the time of encoding an already-encoded image as a reference picture, and generates a prediction prediction residual, and the primary prediction residual and the prediction prediction residual The prediction residual Characterized in that it comprises a prediction residual generation means for forming.

本発明は、時間方向と視差方向の両方で予測を行う双方向予測を用いて、一方の予測として符号化対象画像を予測し、他方の予測として予測残差を予測する残差予測を行う際に、該残差予測の予測残差を符号化した符号データから復号画像を生成する映像復号装置であって、前記時間方向または前記視差方向の既に復号済みの画像を参照ピクチャとして復号対象画像を予測し、一次予測画像を生成する一次予測手段と、前記一次予測手段において参照する方向と異なる方向の既に復号済みの画像の符号化時の予測残差を参照ピクチャとして前記一次予測画像と復号画像との差分である一次予測残差を予測し予測予測残差を生成する残差予測手段と、前記符号データを復号した前記予測残差と前記予測予測残差とから前記一次予測残差を生成する一次予測残差生成手段と、前記一次予測残差と前記一次予測画像とから前記復号画像を生成する復号画像生成手段とを有することを特徴とする。 The present invention uses bi-directional prediction in which prediction is performed in both the temporal direction and the parallax direction, and predicts an encoding target image as one prediction and performs residual prediction that predicts a prediction residual as the other prediction. In addition, a video decoding apparatus that generates a decoded image from code data obtained by encoding a prediction residual of the residual prediction, wherein a decoding target image is determined using an already decoded image in the temporal direction or the parallax direction as a reference picture. A primary prediction unit that predicts and generates a primary prediction image; and the primary prediction image and the decoded image with reference to a prediction residual at the time of encoding an already decoded image in a direction different from a direction referred to in the primary prediction unit The primary prediction residual is generated from the prediction prediction residual that predicts the primary prediction residual that is the difference between the prediction prediction residual and the prediction residual that is obtained by decoding the code data. You And having a primary predicted residual generating means, and a decoded image generating means for generating the decoded image from the primary prediction residual and the primary predicted image.

本発明は、前記映像符号化方法をコンピュータに実行させるための映像符号化プログラムである。 The present invention is a video encoding program for causing a computer to execute the video encoding method.

本発明は、前記映像復号方法をコンピュータに実行させるための映像復号プログラムである。 The present invention is a video decoding program for causing a computer to execute the video decoding method.

本発明は、前記映像符号化プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention is a computer-readable recording medium on which the video encoding program is recorded.

本発明は、前記映像復号プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention is a computer-readable recording medium on which the video decoding program is recorded.

本発明によれば、予測残差符号化に必要な符号量を削減することができるため、符号化効率を向上させることができるという効果が得られる。 According to the present invention, since it is possible to reduce the amount of code necessary for predictive residual encoding, an effect of improving encoding efficiency can be obtained.

本発明の一実施形態による映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video coding apparatus by one Embodiment of this invention. 図１に示す映像符号化装置１００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video coding apparatus 100 shown in FIG. 本発明の一実施形態による映像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video decoding apparatus by one Embodiment of this invention. 図３に示す映像復号装置２００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video decoding apparatus 200 shown in FIG. 予測残差を求める動作を示す説明図である。It is explanatory drawing which shows the operation | movement which calculates | requires a prediction residual. 映像符号化装置１００をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア図である。It is a hardware diagram in the case where the video encoding device 100 is configured by a computer and a software program. 映像復号装置２００をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア図である。It is a hardware figure in the case of comprising the video decoding apparatus 200 by a computer and a software program.

以下、図面を参照して、本発明の一実施形態による映像符号化装置、映像復号装置を説明する。始めに、映像符号化装置について説明する。図１は同実施形態による映像符号化装置の構成を示すブロック図である。映像符号化装置１００は、図１に示すように、符号化対象映像入力部１０１、入力画像メモリ１０２、参照ピクチャメモリ１０３、参照予測残差ピクチャメモリ１０４、予測部１０５、予測画像生成部１０６、減算部１０７、予測残差予測部１０８、予測予測残差生成部１０９、減算部１１０、変換・量子化部１１１、逆量子化・逆変換部１１２、加算部１１３、加算部１１４、およびエントロピー符号化部１１５を備えている。なお、本明細書において、画像とは、静止画像、または動画像を構成する１フレーム分の画像のことをいう。また映像とは、動画像と同じ意味であり、一連の画像の集合である。 Hereinafter, a video encoding device and a video decoding device according to an embodiment of the present invention will be described with reference to the drawings. First, the video encoding device will be described. FIG. 1 is a block diagram showing a configuration of a video encoding apparatus according to the embodiment. As shown in FIG. 1, the video encoding device 100 includes an encoding target video input unit 101, an input image memory 102, a reference picture memory 103, a reference prediction residual picture memory 104, a prediction unit 105, a predicted image generation unit 106, Subtraction unit 107, prediction residual prediction unit 108, prediction prediction residual generation unit 109, subtraction unit 110, transformation / quantization unit 111, inverse quantization / inverse transformation unit 112, addition unit 113, addition unit 114, and entropy code A conversion unit 115 is provided. Note that in this specification, an image means a still image or an image for one frame constituting a moving image. A video has the same meaning as a moving image, and is a set of a series of images.

符号化対象映像入力部１０１は、符号化対象となる映像を入力する。以下は、この符号化対象となる映像のことを符号化対象映像と呼び、特に処理を行うフレームを符号化対象フレームまたは符号化対象画像と呼ぶ。入力画像メモリ１０２は、入力された符号化対象映像を記憶する。参照ピクチャメモリ１０３は、それまでに符号化・復号された画像を記憶する。以下は、この記憶されたフレームを参照フレームまたは参照ピクチャと呼ぶ。参照予測残差ピクチャメモリ１０４は、それまでに符号化・復号された画像の予測残差を記憶する。以下は、この記憶されたフレームを参照予測残差フレームあるいは参照予測残差ピクチャと呼ぶ。 The encoding target video input unit 101 inputs a video to be encoded. Hereinafter, the video to be encoded is referred to as an encoding target video, and a frame to be processed in particular is referred to as an encoding target frame or an encoding target image. The input image memory 102 stores the input encoding target video. The reference picture memory 103 stores images that have been encoded and decoded so far. Hereinafter, this stored frame is referred to as a reference frame or reference picture. The reference prediction residual picture memory 104 stores prediction residuals of images that have been encoded and decoded so far. Hereinafter, this stored frame is referred to as a reference prediction residual frame or a reference prediction residual picture.

予測部１０５は、記憶された参照ピクチャ上で符号化対象画像に対する予測を行い、予測情報を生成する。予測画像生成部１０６は、予測情報に基づき一次予測画像を生成する。減算部１０７は、符号化対象画像と一次予測画像の差分値をとり、一次予測残差を生成する。予測残差予測部１０８は、記憶された参照予測残差ピクチャ上で一次予測残差に対する予測を行い、残差予測情報を生成する。予測予測残差生成部１０９は、残差予測情報に基づき予測予測残差を生成する。減算部１１０は、一次予測残差と予測予測残差の差分値をとり、予測残差を生成する。 The prediction unit 105 performs prediction on the encoding target image on the stored reference picture, and generates prediction information. The predicted image generation unit 106 generates a primary predicted image based on the prediction information. The subtraction unit 107 takes the difference value between the encoding target image and the primary prediction image and generates a primary prediction residual. The prediction residual prediction unit 108 performs prediction on the primary prediction residual on the stored reference prediction residual picture, and generates residual prediction information. The prediction prediction residual generation unit 109 generates a prediction prediction residual based on the residual prediction information. The subtraction unit 110 takes a difference value between the primary prediction residual and the prediction prediction residual, and generates a prediction residual.

変換・量子化部１１１は、生成された予測残差を変換・量子化し、量子化データを生成する。逆量子化・逆変換部１１２は、生成された量子化データを逆量子化・逆変換し、復号予測残差を生成する。加算部１１３は、復号予測残差と予測予測残差から復号一次予測残差を生成し、加算部１１４は、復号一次予測残差と一次予測画像とを加算し復号画像を生成する。エントロピー符号化部１１５は、量子化データをエントロピー符号化し符号データを生成して出力する。 The transform / quantization unit 111 transforms / quantizes the generated prediction residual and generates quantized data. The inverse quantization / inverse transform unit 112 performs inverse quantization / inverse transform on the generated quantized data to generate a decoded prediction residual. The adding unit 113 generates a decoded primary prediction residual from the decoded prediction residual and the predicted prediction residual, and the adding unit 114 adds the decoded primary prediction residual and the primary predicted image to generate a decoded image. The entropy encoding unit 115 entropy-encodes the quantized data to generate and output code data.

次に、図２を参照して、図１に示す映像符号化装置１００の処理動作を説明する。図２は、図１に示す映像符号化装置１００の処理動作を示すフローチャートである。符号化対象映像は、多視点映像のうちの一つの映像であり、多視点映像はフレーム毎に１視点ずつ全視点の映像を符号化し復号する構造をとる。ここでは符号化対象映像中のある１フレームを符号化する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の符号化が実現できる。 Next, the processing operation of the video encoding device 100 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the video encoding apparatus 100 shown in FIG. The encoding target video is one of the multi-view videos, and the multi-view video has a structure in which videos of all viewpoints are encoded and decoded for each frame. Here, a process of encoding one frame in the video to be encoded will be described. By repeating the processing described for each frame, video encoding can be realized.

まず、符号化対象映像入力部１０１は、符号化対象フレームを入力し、入力画像メモリ１０２に記憶する（ステップＳ１０１）。なお、符号化対象映像中の幾つかのフレームは既に符号化されているものとし、その復号フレームが参照ピクチャメモリ１０３に記憶されているとする。また、符号化対象フレームと同じフレームまでの参照可能な別の視点の映像も既に符号化され復号されて入力画像メモリ１０２に記憶されていることとする。また、それらの復号時の予測残差が参照予測残差ピクチャメモリ１０４に記憶されているとする。ここで記憶されている予測残差は、残差予測を行う前の復号された予測残差値を記憶していてもよいし、復号フレームが残差予測によって符号化され復号されている場合には、残差予測を行った後の再構成された予測残差を記憶していてもよい。 First, the encoding target video input unit 101 inputs an encoding target frame and stores it in the input image memory 102 (step S101). It is assumed that some frames in the encoding target video have already been encoded and the decoded frames are stored in the reference picture memory 103. In addition, it is assumed that the video of another viewpoint that can be referred to up to the same frame as the encoding target frame is already encoded and decoded and stored in the input image memory 102. Further, it is assumed that the prediction residual at the time of decoding is stored in the reference prediction residual picture memory 104. The prediction residual stored here may store a decoded prediction residual value before performing residual prediction, or when a decoded frame is encoded and decoded by residual prediction. May store the reconstructed prediction residual after performing the residual prediction.

次に、映像入力の後、符号化対象フレームを符号化対象ブロックに分割し、ブロック毎に符号化対象フレームの映像信号を符号化する。ステップＳ１０３〜Ｓ１１１の処理はフレーム全てのブロックに対して繰り返し実行する（ステップＳ１０２〜Ｓ１１２）。 Next, after video input, the encoding target frame is divided into encoding target blocks, and the video signal of the encoding target frame is encoded for each block. The processes in steps S103 to S111 are repeatedly executed for all blocks in the frame (steps S102 to S112).

符号化対象ブロックごとに繰り返される処理において、まず、予測部１０５は、符号化対象ブロックに対するインター予測を行い予測情報を生成する。そして、予測画像生成部１０６は、予測情報に基づき一次予測画像を生成する（ステップＳ１０３）。予測や予測情報の生成にはどのような方法を用いてもよいし、予測情報としてどのような情報を設定してもよい。一般的なものとしては、参照可能な全ての参照ピクチャ上で符号化対象ブロックに対するブロックマッチングなどの探索を行い、最も予測精度の高い場合の参照ピクチャを特定するインデックスと参照ピクチャ上での参照先を示す動きベクトルまたは視差ベクトルを予測情報とする方法がある。または、既に符号化し復号済みの周辺ブロックの予測情報から決定するという方法も適用できる。 In the process repeated for each coding target block, first, the prediction unit 105 performs inter prediction on the coding target block to generate prediction information. Then, the predicted image generation unit 106 generates a primary predicted image based on the prediction information (Step S103). Any method may be used for prediction and generation of prediction information, and any information may be set as prediction information. Generally, a search such as block matching for the encoding target block is performed on all reference pictures that can be referred to, and an index for identifying the reference picture when the prediction accuracy is the highest and a reference destination on the reference picture There is a method of using a motion vector or a disparity vector indicating the prediction information. Alternatively, a method of determining from prediction information of neighboring blocks that have already been encoded and decoded can also be applied.

また別の方法としては、例えば符号化対象ブロックに対する予測では時間方向の参照ピクチャを参照し、一次予測残差に対する残差予測では視差方向の参照予測残差ピクチャを参照するという組み合わせの場合に、既に復号済みの別の視点の映像上で符号化対象ブロックに対応する領域を定め、時間方向の参照ピクチャのフレーム番号を対応する領域の符号化／復号時の参照フレーム番号と同じとすることにより、一次予測残差と別の視点の映像の符号化／復号時の予測残差との相関を高めることにより、視点間残差予測の性能を向上させるという方法も適用できる。 As another method, for example, in the case of a combination of referring to a reference picture in the temporal direction in prediction for a current block and referring to a reference prediction residual picture in a parallax direction in residual prediction for a primary prediction residual, By defining a region corresponding to the encoding target block on a video of another viewpoint that has already been decoded, and making the frame number of the reference picture in the time direction the same as the reference frame number at the time of encoding / decoding of the corresponding region A method of improving the performance of inter-viewpoint residual prediction by increasing the correlation between the primary prediction residual and the prediction residual at the time of encoding / decoding video of another viewpoint can also be applied.

別の視点の映像上の対応領域は、後述の残差予測において参照する領域と同じとしてもよいし、別の領域でもよい。別の領域である場合には、視差補償探索を行なって決定してもよいし、既に復号済みの周辺ブロックの符号化時の視差ベクトルを用いて決定してもよいし、他にどのような方法で決定してもよい。 The corresponding area on the video of another viewpoint may be the same as the area referred to in the residual prediction described later, or may be another area. In the case of another region, it may be determined by performing a disparity compensation search, or may be determined using a disparity vector at the time of encoding of a peripheral block that has already been decoded. It may be determined by a method.

また、予測方法と残差予測方法の組み合わせは逆でもよいし、他にどのような組み合わせでもよい。またはすべての予測方法と残差予測方法の組み合わせについて同様に予測情報を生成し、更に最もよいものを選んでもよい。予測情報は符号化し映像の符号データと多重化してもよいし、前述のように周辺の予測情報や自身の残差予測情報等から導き出せる場合には符号化しなくてもよい。また、予測情報を予測しその残差を符号化してもよい。 Further, the combination of the prediction method and the residual prediction method may be reversed, or any other combination. Alternatively, prediction information may be similarly generated for all combinations of prediction methods and residual prediction methods, and the best one may be selected. The prediction information may be encoded and multiplexed with the video code data, or may not be encoded when it can be derived from the surrounding prediction information, its own residual prediction information, or the like. Moreover, prediction information may be predicted and the residual may be encoded.

次に、減算部１０７は、符号化対象ブロックと一次予測画像の差分値を求め、一次予測残差を生成する（ステップＳ１０４）。続いて、予測残差予測部１０８は、記憶された参照予測残差ピクチャを参照ピクチャとして一次予測残差に対するインター予測を行い、残差予測情報を生成する。そして、予測予測残差生成部１０９は、残差予測情報に基づき予測予測残差を生成する（ステップＳ１０５）。 Next, the subtraction unit 107 obtains a difference value between the encoding target block and the primary prediction image, and generates a primary prediction residual (step S104). Subsequently, the prediction residual prediction unit 108 performs inter prediction on the primary prediction residual using the stored reference prediction residual picture as a reference picture, and generates residual prediction information. And the prediction prediction residual production | generation part 109 produces | generates a prediction prediction residual based on residual prediction information (step S105).

残差予測や残差予測情報の生成にはどのような方法を用いても良いし、残差予測情報としてどのような情報を設定してもよい。例えば参照可能な全ての参照予測残差ピクチャ上で一次予測残差に対するブロックマッチングなどの探索を行い、最も予測精度の高い場合の参照予測残差ピクチャを特定するインデックスと参照予測残差ピクチャ上での参照先を示す動きベクトルを予測情報とする方法がある。または、既に符号化し復号済みの周辺ブロックの残差予測情報から決定するという方法も適用できるし、残差予測情報ではなく予測情報を用いるという方法も適用できるし、両方を用いる方法も適用できる。 Any method may be used to generate the residual prediction and the residual prediction information, and any information may be set as the residual prediction information. For example, a search such as block matching for the primary prediction residual is performed on all reference prediction residual pictures that can be referred to, and an index for identifying the reference prediction residual picture in the case of the highest prediction accuracy and the reference prediction residual picture There is a method in which a motion vector indicating a reference destination is used as prediction information. Alternatively, a method of determining from residual prediction information of neighboring blocks that have already been encoded and decoded can be applied, a method of using prediction information instead of residual prediction information can be applied, and a method of using both can also be applied.

その他に前述の予測部１０５における例にあげた方法を用いてもよいし、その場合に予測部１０５において仮定した視点間対応やフレーム間対応に基づき参照残差インデックスや動きベクトルを決定してもよいし、別途探索等を行なって決定してもよい。残差予測情報は符号化し映像の符号データと多重化してもよいし、前述のように周辺の残差予測情報や自身の予測情報等から導き出せる場合には符号化しなくてもよい。また、残差予測情報を予測しその残差を符号化してもよい。その場合に、前述のように既に復号済みの周辺ブロックの符号化時の予測情報や残差予測情報を使用しても構わないし、デプスマップなどの付加情報が存在する場合は使用しても構わない。 In addition, the method exemplified in the above-described prediction unit 105 may be used. In this case, the reference residual index and the motion vector may be determined based on the correspondence between viewpoints and the correspondence between frames assumed in the prediction unit 105. It may be determined by performing a separate search or the like. The residual prediction information may be encoded and multiplexed with the video code data, or may not be encoded if it can be derived from the surrounding residual prediction information, own prediction information, or the like as described above. Further, residual prediction information may be predicted and the residual may be encoded. In this case, as described above, prediction information and residual prediction information at the time of encoding of neighboring blocks that have already been decoded may be used, or may be used when additional information such as a depth map exists. Absent.

前述のように、予測部１０５における予測と予測残差予測部１０８における残差予測は、それぞれ独立に行なってもよいし、どちらかを先に実行し固定してもよいし、交互に繰り返し実行し最適化してもよい。あるいは、予測方向の組み合わせだけを予め定めておき、それに基づきそれぞれ独立に予測を行なってもよいし、順番に行ってもよい。また、組み合わせを特定する情報を符号化して映像の符号データと多重化してもよい。 As described above, the prediction in the prediction unit 105 and the residual prediction in the prediction residual prediction unit 108 may be performed independently, or one of them may be executed first and fixed, or may be repeatedly executed alternately. And may be optimized. Alternatively, only combinations of prediction directions may be determined in advance, and predictions may be performed independently based on the combinations, or may be performed in order. Further, information specifying the combination may be encoded and multiplexed with video code data.

また、予測情報及び残差予測情報は、必要であれば両方符号化してもよいし、予め定めた規則によって決定できるのであればしなくてもよい。例えば、予測残差予測部１０８における残差予測情報は予測部１０５における参照先の符号化時の残差予測情報から決定することとして、予測部１０５における参照ピクチャインデックスと動きベクトルだけを符号化し、残差予測情報は符号化しなくてもよい。また逆に残差予測情報だけを符号化してもよいし、他にどのような規則を定めてもよい。例えば、予測が時間方向に行われる場合には、残差予測で参照する参照予測残差ピクチャは符号化対象フレームと同じフレームの異なる視点の画像の符号化時の予測残差であると予め定めておき、残差予測情報としてはベクトルだけを符号化してもよいし、逆の場合もある。 Further, both the prediction information and the residual prediction information may be encoded if necessary, or may not be determined if they can be determined by a predetermined rule. For example, as the residual prediction information in the prediction residual prediction unit 108 is determined from the residual prediction information at the time of encoding of the reference destination in the prediction unit 105, only the reference picture index and the motion vector in the prediction unit 105 are encoded, The residual prediction information may not be encoded. Conversely, only the residual prediction information may be encoded, or any other rule may be defined. For example, when prediction is performed in the temporal direction, the reference prediction residual picture to be referred to in the residual prediction is predetermined as a prediction residual at the time of encoding an image of a different viewpoint in the same frame as the encoding target frame. In addition, only the vector may be encoded as the residual prediction information, or vice versa.

予測情報及び残差予測情報を符号化し映像の付加情報とする場合、どのように符号化し付加してもよい。例えば、通常の双方向予測と同様に参照ピクチャリストを用いてもよい。例えば通常の予測をＬ０予測とし残差予測をＬ１予測と予め定め、それぞれの参照ピクチャインデックスや動きベクトル等を割り当ててもよいし、逆でもよい。あるいは別の方法として、予測・残差予測を区別せずに参照ピクチャリストに割り当て、参照ピクチャインデックスや動きベクトルなどの情報から通常の予測を行うか残差予測を行うかを判定してもよい。例えば、通常の予測は時間方向に行い残差予測は視差方向に行うと予め定めておき、視差方向を示す情報を持つ方が残差予測であるとするなどの方法がある。 When the prediction information and the residual prediction information are encoded and used as video additional information, any information may be encoded and added. For example, a reference picture list may be used as in normal bi-directional prediction. For example, normal prediction may be L0 prediction and residual prediction may be predetermined as L1 prediction, and each reference picture index, motion vector, or the like may be assigned, or vice versa. Alternatively, as another method, prediction / residual prediction may be assigned to a reference picture list without distinction, and it may be determined whether normal prediction or residual prediction is performed from information such as a reference picture index and a motion vector. . For example, there is a method in which it is determined in advance that normal prediction is performed in the time direction and residual prediction is performed in the parallax direction, and the one having information indicating the parallax direction is the residual prediction.

残差予測における参照予測残差ピクチャを示すインデックスは、その参照予測残差ピクチャに対応する参照ピクチャを示すものでもよいし、通常の参照ピクチャと区別可能なインデックスでもよい。例えば通常の予測であっても残差予測であっても参照ピクチャリストは通常の参照ピクチャを示すインデックスを持っているとして、前述の判定方法などによって通常の予測と残差予測を区別し、残差予測である方はインデックスが示す参照ピクチャの符号化時の予測残差である参照予測残差ピクチャを参照して残差予測を行うとしてもよい。 The index indicating the reference prediction residual picture in the residual prediction may indicate a reference picture corresponding to the reference prediction residual picture, or may be an index that can be distinguished from a normal reference picture. For example, regardless of whether the prediction is a normal prediction or a residual prediction, the reference picture list has an index indicating a normal reference picture. In the case of differential prediction, residual prediction may be performed with reference to a reference prediction residual picture that is a prediction residual at the time of encoding a reference picture indicated by an index.

あるいは、参照予測残差ピクチャが通常の参照ピクチャと同列に参照ピクチャの一種でありインデックスにより区別することができるとして、参照予測残差ピクチャのピクチャを示すインデックスを持つ方が残差予測にあたるとしてもよい。その他にどのような方法を用いてもよい。 Alternatively, if the reference prediction residual picture is a kind of reference picture in the same row as the normal reference picture and can be distinguished by an index, the one having an index indicating the picture of the reference prediction residual picture may correspond to the residual prediction. Good. Any other method may be used.

次に、予測予測残差を生成したら、減算部１１０は、一次予測残差と予測予測残差の差分値を求め、予測残差を生成する（ステップＳ１０６）。ここでは一次予測残差を更新する形で予測残差を生成しているが、予測予測残差に基づき一次予測画像を更新する形で予測画像を生成し、一次予測残差との差分を取り予測残差を決定してもよい。 Next, after generating the prediction prediction residual, the subtraction unit 110 obtains a difference value between the primary prediction residual and the prediction prediction residual, and generates a prediction residual (step S106). Here, the prediction residual is generated by updating the primary prediction residual, but the prediction image is generated by updating the primary prediction image based on the prediction prediction residual, and the difference from the primary prediction residual is obtained. A prediction residual may be determined.

次に、予測残差の生成が終了したら、変換・量子化部１１１は、予測残差を変換・量子化し、量子化データを生成する（ステップＳ１０７）。この変換・量子化は、復号側で正しく逆量子化・逆変換できるものであればどのような方法を用いてもよい。そして、変換・量子化が終了したら、逆量子化・逆変換部１１２は、量子化データを逆量子化・逆変換し復号予測残差を生成する（ステップＳ１０８）。 Next, when the generation of the prediction residual is completed, the transform / quantization unit 111 converts and quantizes the prediction residual, and generates quantized data (step S107). For this transformation / quantization, any method can be used as long as it can be correctly inverse-quantized / inverse-transformed on the decoding side. When the transform / quantization is completed, the inverse quantization / inverse transform unit 112 inversely quantizes / inversely transforms the quantized data to generate a decoded prediction residual (step S108).

次に、復号予測残差の生成が終了したら、加算部１１３は、復号予測残差と予測予測残差とを加算し復号一次予測残差を生成し、参照予測残差ピクチャとして参照予測残差ピクチャメモリ１０４に記憶する（ステップＳ１０９）。そして、復号一次予測残差の生成が終了したら、加算部１１４は、復号一次予測残差と一次予測画像とを加算し復号画像を生成し、参照ピクチャメモリ１０３に記憶する（ステップＳ１１０）。ここでは予測予測残差に基づき復号予測残差を更新する形で予測画像に対する予測残差を生成しているが、予測予測残差に基づき一次予測画像を更新する形で予測画像を生成し、その予測残差として復号予測残差を加算してもよい。必要であれば復号画像にループフィルタをかけてもよい。通常の映像符号化では、デブロッキングフィルタやその他のフィルタを使用して符号化ノイズを除去する。 Next, when the generation of the decoded prediction residual is completed, the adding unit 113 adds the decoded prediction residual and the prediction prediction residual to generate a decoded primary prediction residual, and the reference prediction residual is used as a reference prediction residual picture. Store in the picture memory 104 (step S109). When the generation of the decoded primary prediction residual is completed, the adding unit 114 adds the decoded primary prediction residual and the primary prediction image to generate a decoded image, and stores it in the reference picture memory 103 (step S110). Here, the prediction residual for the prediction image is generated in the form of updating the decoded prediction residual based on the prediction prediction residual, but the prediction image is generated in the form of updating the primary prediction image based on the prediction prediction residual, A decoded prediction residual may be added as the prediction residual. If necessary, a loop filter may be applied to the decoded image. In normal video coding, coding noise is removed using a deblocking filter or other filters.

次に、エントロピー符号化部１１５は、量子化データをエントロピー符号化し符号データを生成し、必要であれば、予測情報や残差予測情報その他の付加情報も符号化し符号データと多重化し、全てのブロックについて処理が終了したら、符号データを出力する（ステップＳ１１２）。 Next, the entropy encoding unit 115 generates encoded data by entropy encoding the quantized data, and if necessary, also encodes prediction information, residual prediction information, and other additional information and multiplexes them with the encoded data. When the process is completed for the block, code data is output (step S112).

次に、映像復号装置について説明する。図３は、本発明の一実施形態による映像復号装置の構成を示すブロック図である。映像復号装置２００は、図３に示すように、符号データ入力部２０１、符号データメモリ２０２、参照ピクチャメモリ２０３、参照予測残差ピクチャメモリ２０４、エントロピー復号部２０５、逆量子化・逆変換部２０６、予測予測残差生成部２０７、加算部２０８、予測画像生成部２０９、加算部２１０を備えている。 Next, the video decoding device will be described. FIG. 3 is a block diagram showing a configuration of a video decoding apparatus according to an embodiment of the present invention. As shown in FIG. 3, the video decoding apparatus 200 includes a code data input unit 201, a code data memory 202, a reference picture memory 203, a reference prediction residual picture memory 204, an entropy decoding unit 205, an inverse quantization / inverse conversion unit 206. A prediction prediction residual generation unit 207, an addition unit 208, a prediction image generation unit 209, and an addition unit 210.

符号データ入力部２０１は、復号対象となる映像符号データを入力する。この復号対象となる映像符号データのことを復号対象映像符号データと呼び、特に処理を行うフレームを復号対象フレームまたは復号対象画像と呼ぶ。符号データメモリ２０２は、入力された復号対象映像を記憶する。参照ピクチャメモリ２０３は、すでに復号済みの画像を記憶する。参照予測残差ピクチャメモリ２０４は、既に復号済みの画像の復号時の予測残差を記憶する。 The code data input unit 201 inputs video code data to be decoded. This video code data to be decoded is called decoding target video code data, and a frame to be processed in particular is called a decoding target frame or a decoding target image. The code data memory 202 stores the input decoding target video. The reference picture memory 203 stores an already decoded image. The reference prediction residual picture memory 204 stores a prediction residual at the time of decoding an already decoded image.

エントロピー復号部２０５は、復号対象フレームの符号データをエントロピー復号し量子化データを生成し、逆量子化・逆変換部２０６は量子化データに逆量子化／逆変換を施して復号予測残差を生成する。予測予測残差生成部２０７は、予測予測残差を生成する。加算部２０８は、予測予測残差と復号予測残差とを加算し復号一次予測残差を生成し、予測画像生成部２０９は、一次予測画像を生成し、加算部２１０は、復号一次予測残差と一次予測画像とを加算し復号画像を生成する。 The entropy decoding unit 205 entropy-decodes the code data of the decoding target frame to generate quantized data, and the inverse quantization / inverse transform unit 206 performs inverse quantization / inverse transformation on the quantized data to obtain a decoded prediction residual. Generate. The prediction prediction residual generation unit 207 generates a prediction prediction residual. The addition unit 208 adds the prediction prediction residual and the decoded prediction residual to generate a decoded primary prediction residual, the prediction image generation unit 209 generates a primary prediction image, and the addition unit 210 adds the decoded primary prediction residual. The difference and the primary prediction image are added to generate a decoded image.

次に、図４を参照して、図３に示す画像復号装置の処理動作を説明する。図４は、図３に示す映像復号装置２００の処理動作を示すフローチャートである。復号対象映像は多視点映像のうちの一つの映像であることとし、多視点映像はフレーム毎に１視点ずつ全視点の映像を復号する構造をとるとする。ここでは符号データ中のある１フレームを復号する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の復号が実現できる。 Next, the processing operation of the image decoding apparatus shown in FIG. 3 will be described with reference to FIG. FIG. 4 is a flowchart showing the processing operation of the video decoding apparatus 200 shown in FIG. It is assumed that the decoding target video is one of the multi-view videos, and the multi-view video has a structure in which the videos of all viewpoints are decoded one by one for each frame. Here, a process of decoding one frame in the code data will be described. By repeating the processing described for each frame, video decoding can be realized.

まず、符号データ入力部２０１は符号データを入力し、符号データメモリ２０２に記憶する（ステップＳ２０１）。なお、復号対象映像中の幾つかのフレームは既に復号されているものとし、その復号フレームが参照ピクチャメモリ２０３に記憶されているとする。また、復号対象フレームと同じフレームまでの参照可能な別の視点の映像も既に復号され復号されて参照ピクチャメモリ２０３に記憶されていることとする。また、それらの復号時の予測残差が参照予測残差ピクチャメモリ２０４に記憶されているとする。ここで記憶されている予測残差は、残差予測を行う前の復号された予測残差値を記憶していてもよいし、復号フレームが残差予測によって復号され復号されている場合には、残差予測を行った後の再構成された予測残差を記憶していてもよい。 First, the code data input unit 201 inputs code data and stores it in the code data memory 202 (step S201). It is assumed that some frames in the video to be decoded have already been decoded and the decoded frames are stored in the reference picture memory 203. In addition, it is assumed that videos of different viewpoints that can be referred to up to the same frame as the decoding target frame are already decoded, decoded, and stored in the reference picture memory 203. Further, it is assumed that the prediction residual at the time of decoding is stored in the reference prediction residual picture memory 204. The prediction residual stored here may store a decoded prediction residual value before performing the residual prediction, or when the decoded frame is decoded and decoded by the residual prediction. The reconstructed prediction residual after performing the residual prediction may be stored.

次に、映像入力の後、復号対象フレームを復号対象ブロックに分割し、ブロック毎に復号対象フレームの映像信号を復号する。ステップＳ２０３〜Ｓ２０８の処理はフレーム全てのブロックに対して繰り返し実行する（ステップＳ２０２〜Ｓ２０９）。 Next, after video input, the decoding target frame is divided into decoding target blocks, and the video signal of the decoding target frame is decoded for each block. The processing in steps S203 to S208 is repeatedly executed for all blocks in the frame (steps S202 to S209).

復号対象ブロックごとに繰り返される処理おいて、まず、エントロピー復号部２０５は、符号データをエントロピー復号し（ステップＳ２０３）する。しして、逆量子化・逆変換部２０６は、逆量子化・逆変換を行い、復号予測残差を生成する（ステップＳ２０４）。予測情報やその他の付加情報が符号データに含まれる場合は、それらも復号し適宜必要な情報を生成してもよい。 In the process repeated for each decoding target block, first, the entropy decoding unit 205 performs entropy decoding on the code data (step S203). Then, the inverse quantization / inverse transformation unit 206 performs inverse quantization / inverse transformation to generate a decoded prediction residual (step S204). When the prediction data and other additional information are included in the code data, they may be decoded to generate necessary information as appropriate.

次に、予測予測残差生成部２０７は、記憶された参照予測残差ピクチャを参照ピクチャとして、予測残差に対するインター予測に基づき予測予測残差を生成する。残差予測情報が符号化され映像の符号データと多重化されている場合にはその情報を利用して予測予測残差の生成を行ってもよいし、前述のように周辺の残差予測情報や自身の予測情報等から導き出せる場合にはなくてもよい。また、残差予測情報の予測残差が符号化されている場合には、残差予測情報の予測を行なってもよい。残差予測の詳細は、符号化装置と同様である。 Next, the prediction prediction residual generation unit 207 generates a prediction prediction residual based on the inter prediction with respect to the prediction residual using the stored reference prediction residual picture as a reference picture. When the residual prediction information is encoded and multiplexed with the video code data, the prediction prediction residual may be generated using the information, and as described above, the residual prediction information around Or if it can be derived from its own prediction information or the like. Moreover, when the prediction residual of residual prediction information is encoded, prediction of residual prediction information may be performed. The details of the residual prediction are the same as those of the encoding device.

次に、予測予測残差の生成が終了したら、加算部２０８は、予測予測残差と復号予測残差を加算し、復号一次予測残差を生成し、参照予測残差ピクチャメモリに記憶する（ステップＳ２０６）。そして、予測画像生成部１０６は、インター予測に基づき一次予測画像を生成する（ステップＳ２０７）。予測情報が符号化され映像の符号データと多重化されている場合にはその情報を利用して予測画像の生成を行ってもよいし、前述のように周辺の予測情報や自身の残差予測情報等から導き出せる場合にはなくてもよい。また、予測情報の予測残差が符号化されている場合には、予測情報の予測を行なってもよい。予測の詳細は、符号化装置と同様である。 Next, when the generation of the prediction prediction residual is completed, the adding unit 208 adds the prediction prediction residual and the decoded prediction residual, generates a decoded primary prediction residual, and stores it in the reference prediction residual picture memory ( Step S206). Then, the predicted image generation unit 106 generates a primary predicted image based on the inter prediction (step S207). If the prediction information is encoded and multiplexed with the video code data, the information may be used to generate a prediction image, or as described above, the surrounding prediction information and its own residual prediction If it can be derived from information, etc., it is not necessary. Moreover, when the prediction residual of prediction information is encoded, prediction information may be predicted. The details of the prediction are the same as those of the encoding device.

次に、一次予測画像の生成が終了したら、加算部２１０は、復号一次予測残差と一次予測画像を加算し、復号画像を生成し、参照ピクチャメモリに記憶する（ステップＳ２０８）。必要であれば復号画像にループフィルタをかけてもよい。通常の映像復号では、デブロッキングフィルタやその他のフィルタを使用して符号化ノイズを除去する。そして、全てのブロックについて処理が終了したら、復号フレームとして出力する（ステップＳ２０９）。 Next, when the generation of the primary predicted image is completed, the adding unit 210 adds the decoded primary prediction residual and the primary predicted image, generates a decoded image, and stores it in the reference picture memory (step S208). If necessary, a loop filter may be applied to the decoded image. In normal video decoding, a coding noise is removed using a deblocking filter or other filters. When all the blocks are processed, the decoded frame is output (step S209).

なお、前述した一部の処理動作は、その順序が前後してもよい。また、前述した説明では時間方向と視差方向の双方向予測において一方の予測を残差予測とする映像符号化の方法を説明しているが、双方向とも通常の予測を行う通常の双方向予測と共に使用してもよい。両方法を別々の予測モードとしてもよいし、付加情報を与えても良いし、予測情報から判定してもよい。例えば、付加情報として残差予測を行うかどうかを示す情報を符号化し映像の符号データとともに多重化するという方法が適用できる。あるいは、参照予測残差ピクチャそのものが参照ピクチャリストに含まれ通常の参照ピクチャと区別可能なインデックスを持つ場合には、双方向予測の２つの参照ピクチャのうちの片方が参照予測残差ピクチャであるか否かで残差予測を行うか否かを判定する方法も適用できる。 Note that the order of some of the processing operations described above may be reversed. In addition, in the above description, the video encoding method in which one prediction is a residual prediction in the bi-directional prediction in the temporal direction and the parallax direction is described, but normal bi-directional prediction in which normal prediction is performed in both directions. May be used together. Both methods may be in different prediction modes, additional information may be given, or determination may be made based on the prediction information. For example, a method of encoding information indicating whether to perform residual prediction as additional information and multiplexing it with video code data can be applied. Alternatively, when the reference prediction residual picture itself is included in the reference picture list and has an index that can be distinguished from a normal reference picture, one of the two reference pictures for bidirectional prediction is the reference prediction residual picture. It is also possible to apply a method for determining whether to perform residual prediction based on whether or not.

次に、図５を参照して、予測残差を求める動作を説明する。図５は、予測残差を求める動作を示す説明図である。まず、符号化対象画像Ａ内の符号化対象ブロックａに対して、予測を行い、一次予測画像ｂを得る。複数の一次予測画像ｂは参照ピクチャＢを構成する。そして、符号化対象ブロックａと、一次予測画像ｂとの差から一次予測残差ｄを生成する。次に、一次予測残差ｄに対して残差予測を行い、予測予測残差ｃを生成する。複数の予測予測残差ｃは、参照予測残差ピクチャＣを構成する。そして、一次予測残差ｄと予測予測残差ｃとの差から予測残差ｅを生成する。これにより、予測残差が生成されることになる。 Next, the operation for obtaining the prediction residual will be described with reference to FIG. FIG. 5 is an explanatory diagram illustrating an operation for obtaining a prediction residual. First, prediction is performed on the encoding target block a in the encoding target image A to obtain a primary prediction image b. A plurality of primary predicted images b constitute a reference picture B. Then, a primary prediction residual d is generated from the difference between the encoding target block a and the primary prediction image b. Next, a residual prediction is performed on the primary prediction residual d to generate a prediction prediction residual c. The plurality of prediction prediction residuals c constitute a reference prediction residual picture C. Then, a prediction residual e is generated from the difference between the primary prediction residual d and the prediction prediction residual c. As a result, a prediction residual is generated.

なお、以上説明した映像符号化装置及び映像復号装置の処理は、コンピュータとソフトウェアプログラムとによっても実現することができ、そのプログラムをコンピュータで読み取り可能な記録媒体に記録して提供することも、ネットワークを通して提供することも可能である。 The processing of the video encoding device and the video decoding device described above can also be realized by a computer and a software program, and the program can be recorded on a computer-readable recording medium and provided. It is also possible to provide through.

図６は、前述した映像符号化装置１００をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア図である。本システムは、プログラムを実行するＣＰＵ３０と、ＣＰＵ３０がアクセスするプログラムやデータが記憶されるＲＡＭ等のメモリ３１と、カメラ等からの符号化対象の映像信号を入力する符号化対象映像入力部３２（ディスク装置などによる映像信号を記憶する記憶部でもよい）と、図２に示す処理動作をＣＰＵ３０に実行させるソフトウェアプログラムである映像符号化プログラム３３１が記憶されたプログラム記憶装置３３と、ＣＰＵ３０がメモリ３１にロードされた映像符号化プログラムを実行することにより生成された符号データを、例えばネットワークを介して出力する符号データ出力部３４（ディスク装置などによる符号データを記憶する記憶部でもよい）とが、バスで接続された構成になっている。図示省略するが、他に、符号データ記憶部、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、映像信号符号データ記憶部、予測情報符号データ記憶部などが用いられることもある。 FIG. 6 is a hardware diagram in the case where the video encoding apparatus 100 described above is configured by a computer and a software program. This system includes a CPU 30 that executes a program, a memory 31 such as a RAM that stores programs and data accessed by the CPU 30, and an encoding target video input unit 32 that inputs an encoding target video signal from a camera or the like. A program storage device 33 in which a video encoding program 331 which is a software program for causing the CPU 30 to execute the processing operation shown in FIG. A code data output unit 34 (which may be a storage unit that stores code data by a disk device or the like) that outputs code data generated by executing the video encoding program loaded on the network, for example, It is configured to be connected by a bus. Although not shown, other hardware such as a code data storage unit and a reference frame storage unit is provided and used to implement this method. Also, a video signal code data storage unit, a prediction information code data storage unit, and the like may be used.

図７は、前述した映像復号装置２００をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア図である。本システムは、プログラムを実行するＣＰＵ４０と、ＣＰＵ４０がアクセスするプログラムやデータが記憶されるＲＡＭ等のメモリ４１と、映像符号化装置が本手法により符号化した符号データを入力する符号データ入力部４２（ディスク装置などによる符号データを記憶する記憶部でもよい）と、図４に示す処理動作をＣＰＵ４０に実行させるソフトウェアプログラムである映像復号プログラム４３１が記憶されたプログラム記憶装置４３と、ＣＰＵ４０がメモリ４１にロードされた映像復号プログラムを実行することにより生成された復号映像を、再生装置などに出力する復号映像出力部４４とが、バスで接続された構成になっている。図示省略するが、他に、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、映像信号符号データ記憶部、予測情報符号データ記憶部などが用いられることもある。 FIG. 7 is a hardware diagram in the case where the video decoding apparatus 200 described above is configured by a computer and a software program. This system includes a CPU 40 that executes a program, a memory 41 such as a RAM that stores programs and data accessed by the CPU 40, and a code data input unit 42 that inputs code data encoded by the video encoding apparatus according to this method. (A storage unit that stores code data by a disk device or the like may be used), a program storage device 43 that stores a video decoding program 431 that is a software program that causes the CPU 40 to execute the processing operation shown in FIG. The decoded video output unit 44 that outputs the decoded video generated by executing the video decoding program loaded on the video to a playback device is connected by a bus. Although not shown in the drawing, other hardware such as a reference frame storage unit is provided and used to implement this method. Also, a video signal code data storage unit, a prediction information code data storage unit, and the like may be used.

以上説明したように、多視点映像符号化におけるフレーム間予測と視点間予測との双方向予測を行うことができるピクチャにおいて、第一の予測として、時間方向または視差方向の既に復号済みの画像を参照ピクチャとして符号化対象画像を予測し、第二の予測として、第一の予測と異なる方向の既に復号済みの画像の符号化時の予測残差を参照ピクチャとして第一の予測の予測残差を予測する残差予測を行い、通常の双方向予測において符号化対象となる１つの予測残差と２つの予測情報の代わりに、１つの残差予測の誤差と１つの予測情報と１つの残差予測情報とを符号化対象とすることで予測残差符号化に必要な符号量を削減することができる。 As described above, in a picture capable of performing bi-directional prediction between inter-frame prediction and inter-view prediction in multi-view video coding, as a first prediction, an already decoded image in the temporal direction or the parallax direction is used. The prediction target image is predicted as a reference picture, and the prediction residual of the first prediction is encoded using a prediction residual when encoding an already decoded image in a direction different from the first prediction as a second prediction. In the normal bi-directional prediction, instead of one prediction residual and two prediction information, one residual prediction error, one prediction information, and one residual are predicted. By using the difference prediction information as an encoding target, it is possible to reduce the amount of code necessary for prediction residual encoding.

なお、図１に示す映像符号化装置及び図３に示す映像復号装置をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 Note that the video encoding device shown in FIG. 1 and the video decoding device shown in FIG. 3 may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other changes of the components may be made without departing from the technical idea and scope of the present invention.

時間方向と視差方向とからなる通常の双方向予測が不敵であるため単方向予測が用いられ予測残差の符号量が増大する場合に、予測残差を予測符号化することで符号量を低減することが不可欠な用途に適用できる。 Since the normal bi-directional prediction consisting of the time direction and the parallax direction is invincible, when the unidirectional prediction is used and the code amount of the prediction residual increases, the code amount is calculated by predictively encoding the prediction residual. It can be applied to applications where reduction is essential.

１０１・・・符号化対象映像入力部、１０２・・・入力画像メモリ、１０３・・・参照ピクチャメモリ、１０４・・・参照予測残差ピクチャメモリ、１０５・・・予測部、１０６・・・予測画像生成部、１０７、１１０・・・減算器、１０８・・・予測残差予測部、１０９・・・予測予測残差生成部、１１１・・・変換・量子化部、１１２・・・逆量子化・逆変換部、１１３、１１４・・・加算器、１１５・・・エントロピー符号化部、２０１・・・符号データ入力部、２０２・・・符号データメモリ、２０３・・・参照ピクチャメモリ、２０４・・・参照予測残差ピクチャメモリ、２０５・・・エントロピー復号部、２０６・・・逆量子化・逆変換部、２０７・・・予測予測残差生成部、２０８、２１０・・・加算器、２０９・・・予測画像生成部 DESCRIPTION OF SYMBOLS 101 ... Video object input part for encoding, 102 ... Input image memory, 103 ... Reference picture memory, 104 ... Reference prediction residual picture memory, 105 ... Prediction part, 106 ... Prediction Image generation unit 107, 110 ... Subtractor 108 ... Prediction residual prediction unit 109 ... Prediction prediction residual generation unit 111 ... Conversion / quantization unit 112 ... Inverse quantum ... Inverse conversion unit 113 113 114 Adder 115 Entropy encoder 201 Code data input unit 202 Code data memory 203 Reference picture memory 204 Reference prediction residual picture memory, 205 ... Entropy decoding unit, 206 ... Inverse quantization / inverse conversion unit, 207 ... Prediction prediction residual generation unit, 208, 210 ... Adder, 209 ... Prediction image Generating unit

Claims

Using bi-directional prediction that performs prediction in both the temporal direction and the parallax direction, the residual image is predicted when the encoding target image is predicted as one prediction and the prediction residual is predicted as the other prediction. A video encoding method performed by a video encoding device that encodes a prediction residual of difference prediction,
A primary prediction step of predicting the encoding target image using a previously decoded image in the temporal direction or the parallax direction as a reference picture, and generating a primary prediction image;
A primary prediction residual generating step for generating a primary prediction residual from the primary prediction image and the encoding target image;
A residual prediction step of predicting the primary prediction residual using a prediction residual at the time of encoding an already decoded image in a direction different from a direction referred to in the primary prediction step as a reference picture, and generating a prediction prediction residual; ,
It possesses a prediction residual generation step of generating the prediction residual from the said predicted prediction residual and the primary prediction residual,
A predicted image update step of updating a predicted image from the predicted prediction residual and the primary predicted image
Further comprising
In the prediction residual generation step, a prediction residual is generated from the prediction image and the encoding target image,
This is information for specifying a prediction reference destination in the residual prediction, and is obtained by performing prediction using a reference prediction residual picture that is a prediction residual of an already encoded image with respect to the primary prediction residual as a reference candidate. Residual prediction information generating step for generating generated residual prediction information
Further comprising
Residual prediction information encoding step for encoding the residual prediction information
Further image encoding method, characterized by have a.

In the residual prediction information generation step,
Prediction in which pictures already encoded for the primary prediction residual are used as reference candidates is obtained by performing block matching on the primary prediction residual by using all the encoded pictures as reference candidates. The video encoding method according to claim 1, wherein:

The video encoding method is a video encoding method in which the encoding target image is divided into blocks, and encoding is performed for each block,
In the residual prediction information generation step,
The video encoding according to claim 1, wherein the residual prediction information of the encoding target block is residual prediction information or prediction information of a neighboring block that has already been encoded of the encoding target block. Method.

A reference picture list update step of including in the reference picture list a prediction residual at the time of encoding an already decoded image;
The residual prediction information may reference index identifying the reference prediction residual picture in the reference picture list as a vector to identify areas thereon claim 1, wherein in any one of 3 The video encoding method described.

5. The video encoding method according to claim 4 , further comprising a residual prediction determination step of performing residual prediction when the reference index in one prediction indicates the reference prediction residual picture.

The residual prediction information is an index that specifies the reference picture and a vector that specifies a region above the index.
In the residual prediction step, the image encoding method according to any one of claims 1 to 3, wherein the generating a predicted prediction residual with reference to prediction residual at the time of encoding.

Further comprising a residual prediction information prediction step for predicting the residual prediction information and generating prediction residual prediction information;
In the residual prediction information coding step, either of claims 1 to 6, characterized in that coding the residual prediction information difference that is a difference between said residual prediction information the prediction residual prediction information 1 The video encoding method according to item.

Using bi-directional prediction that performs prediction in both the temporal direction and the parallax direction, the residual image is predicted when the encoding target image is predicted as one prediction and the prediction residual is predicted as the other prediction. A video decoding method performed by a video decoding device that generates a decoded image from code data obtained by encoding a prediction residual of difference prediction,
A primary prediction step of predicting a decoding target image by using an already decoded image in the temporal direction or the parallax direction as a reference picture, and generating a primary prediction image;
Prediction prediction is performed by predicting a primary prediction residual that is a difference between the primary prediction image and the decoded image using a prediction residual at the time of encoding an already decoded image in a direction different from the direction referred to in the primary prediction step as a reference picture. A residual prediction step for generating a residual,
It possesses a decoded image generation step of generating the decoded image from the previous SL primary prediction residual between the primary prediction image,
A predicted image generation step of generating a predicted image from the primary predicted image and the predicted prediction residual
Further comprising
In the decoded image generation step, a decoded image is generated from the prediction residual and the predicted image,
This is information for specifying a reference destination in the residual prediction, and is obtained by performing prediction using a reference prediction residual picture, which is a prediction residual of an image already encoded with respect to the primary prediction residual, as a reference candidate. Residual prediction information generation step for generating residual prediction information
Further comprising
In the residual prediction step, a prediction prediction residual is generated based on the residual prediction information,
Residual prediction information decoding step for decoding residual prediction information that is information for specifying a reference destination in the residual prediction
Further comprising
In the residual prediction step, a prediction prediction residual is generated based on the residual prediction information .

A reference picture list update step of including the prediction residual of the already decoded picture in a reference picture list;
The video decoding method according to claim 8 , wherein the residual prediction information is a reference index that specifies a reference prediction residual picture in a reference picture list and a vector that specifies a region above the reference index.

The video decoding method according to claim 9 , further comprising a residual prediction determination step of performing residual prediction when a reference index in one prediction indicates a reference prediction residual picture.

The residual prediction information is an index that identifies a reference picture and a vector that identifies an area above the index.
The video decoding method according to claim 8 , wherein, in the residual prediction step, a prediction prediction residual is generated with reference to a prediction residual at the time of encoding the region.

Further comprising a residual prediction information prediction step for predicting the residual prediction information and generating prediction residual prediction information;
In the residual prediction information decoding step, a residual prediction information difference that is a difference between the residual prediction information and the prediction residual prediction information is decoded, and residual prediction information is generated together with the prediction residual prediction information The video decoding method according to claim 8 , wherein:

Using bi-directional prediction that performs prediction in both the temporal direction and the parallax direction, the residual image is predicted when the encoding target image is predicted as one prediction and the prediction residual is predicted as the other prediction. A video encoding device for encoding a prediction residual of difference prediction,
Primary prediction means for predicting the encoding target image using a previously decoded image in the temporal direction or the parallax direction as a reference picture and generating a primary prediction image;
Primary prediction residual generating means for generating a primary prediction residual from the primary prediction image and the encoding target image;
A residual prediction unit that predicts the primary prediction residual using a prediction residual at the time of encoding an already decoded image in a direction different from a reference direction in the primary prediction unit as a reference picture, and generates a prediction prediction residual; ,
A prediction residual generating means for generating the prediction residual from the primary prediction residual and the prediction prediction residual ;
Predicted image update means for updating a predicted image from the predicted prediction residual and the primary predicted image
Further comprising
The prediction residual generation means generates a prediction residual from the prediction image and the encoding target image,
This is information for specifying a prediction reference destination in the residual prediction, and is obtained by performing prediction using a reference prediction residual picture that is a prediction residual of an already encoded image with respect to the primary prediction residual as a reference candidate. Residual prediction information generating means for generating generated residual prediction information
Further comprising
Residual prediction information encoding means for encoding the residual prediction information
Further comprising a video encoding apparatus according to claim Rukoto a.

Using bi-directional prediction that performs prediction in both the temporal direction and the parallax direction, the residual image is predicted when the encoding target image is predicted as one prediction and the prediction residual is predicted as the other prediction. A video decoding device that generates a decoded image from code data obtained by encoding a prediction residual of difference prediction,
Primary prediction means for predicting a decoding target image using a picture already decoded in the temporal direction or the parallax direction as a reference picture, and generating a primary prediction image;
Prediction prediction is performed by predicting a primary prediction residual that is a difference between the primary prediction image and the decoded image, using a prediction residual at the time of encoding an already decoded image in a direction different from a reference direction in the primary prediction means as a reference picture. A residual prediction means for generating a residual,
Primary prediction residual generation means for generating the primary prediction residual from the prediction residual obtained by decoding the code data and the prediction prediction residual;
Wherein possess a decoded image generating means for generating the decoded image from the primary prediction residual between the primary prediction image,
Predicted image generating means for generating a predicted image from the primary predicted image and the predicted prediction residual
Further comprising
The decoded image generation means generates a decoded image from the prediction residual and the predicted image,
Residual prediction information generating means for generating residual prediction information which is information for designating a reference destination in the residual prediction
Further comprising
The residual prediction means generates a predicted prediction residual based on the residual prediction information,
This is information for designating a reference destination in the residual prediction, and is obtained by performing prediction using a reference prediction residual picture that is a prediction residual of an already encoded image with respect to the primary prediction residual as a reference candidate. Residual prediction information decoding means for decoding residual prediction information
Further comprising
The video decoding apparatus , wherein the residual prediction means generates a prediction prediction residual based on the residual prediction information .

Video encoding program for executing a video encoding method according to the computer in any one of claims 1 to 7.

Video decoding program for executing the image decoding method according to the computer in any one of claims 8 1 2.

A computer-readable recording medium on which the video encoding program according to claim 15 is recorded.

A computer-readable recording medium on which the video decoding program according to claim 16 is recorded.