JP6306883B2

JP6306883B2 - Video encoding method, video decoding method, video encoding device, video decoding device, video encoding program, video decoding program, and recording medium

Info

Publication number: JP6306883B2
Application number: JP2013273293A
Authority: JP
Inventors: 志織杉本; 信哉志水; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2018-04-04
Anticipated expiration: 2033-12-27
Also published as: JP2015128250A

Description

本発明は、映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム、映像復号プログラム及び記録媒体に関する。 The present invention relates to a video encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, a video decoding program, and a recording medium.

一般的な映像符号化では、被写体の空間的／時間的な連続性を利用して、映像の各フレームを処理単位ブロックに分割し、ブロック毎にその映像信号を空間的／時間的に予測し、その予測方法を示す予測情報と予測残差信号とを符号化することで、映像信号そのものを符号化する場合に比べて大幅な符号化効率の向上を図っている。一般的な二次元映像符号化では、同じフレーム内の既に符号化済みのブロックを参照して符号化対象信号を予測するイントラ予測と、既に符号化済みの他のフレームを参照して動き補償などに基づき符号化対象信号を予測するフレーム間予測を行う。 In general video encoding, each frame of video is divided into processing unit blocks using spatial / temporal continuity of the subject, and the video signal is predicted spatially / temporally for each block. By encoding the prediction information indicating the prediction method and the prediction residual signal, the encoding efficiency is greatly improved as compared with the case of encoding the video signal itself. In general 2D video coding, intra prediction for predicting a signal to be encoded with reference to an already encoded block in the same frame, motion compensation with reference to another already encoded frame, etc. Based on the above, inter-frame prediction for predicting the encoding target signal is performed.

ここで、多視点映像符号化について説明する。多視点映像符号化とは、同一のシーンを複数のカメラで撮影した複数の映像を、その映像間の冗長性を利用して高い効率で符号化するものである。多視点映像符号化については非特許文献１に詳しい。多視点映像符号化においては、一般的な映像符号化で用いられる予測方法の他に、既に符号化済みの別の視点の映像を参照して視差補償に基づき符号化対象信号を予測する視点間予測と、フレーム間予測により符号化対象信号を予測し、その残差信号を既に符号化済みの別の視点の映像の符号化時の残差信号を参照して予測する視点間残差予測などの方法が用いられる。視点間予測は、ＭＶＣ（Multiview Video Coding）などの多視点映像符号化ではフレーム間予測とまとめてインター予測として扱われ、Ｂピクチャにおいては２つ以上の予測画像を補間して予測画像とする双方向予測にも用いることができる。このように、多視点映像符号化においては、フレーム間予測と視点間予測の両方を行うことができるピクチャにおいてはフレーム間予測と視点間予測による双方向予測を行うことができる。 Here, multi-view video encoding will be described. Multi-view video encoding is to encode a plurality of videos obtained by photographing the same scene with a plurality of cameras with high efficiency by using redundancy between the videos. Multi-view video coding is detailed in Non-Patent Document 1. In multi-view video encoding, in addition to the prediction method used in general video encoding, between the viewpoints that predict the encoding target signal based on parallax compensation with reference to video of another viewpoint that has already been encoded. Inter-viewpoint residual prediction that predicts a signal to be encoded by prediction and interframe prediction, and predicts the residual signal with reference to the residual signal at the time of encoding another viewpoint video that has already been encoded The method is used. Inter-view prediction is treated as inter prediction together with inter-frame prediction in multi-view video coding such as MVC (Multiview Video Coding), and two or more predicted images are interpolated into a predicted image in a B picture. It can also be used for direction prediction. As described above, in multi-view video encoding, bi-directional prediction based on inter-frame prediction and inter-view prediction can be performed on a picture that can perform both inter-frame prediction and inter-view prediction.

インター予測を行う場合にはその参照先を示す参照ピクチャインデックスや動きベクトルなどの参照情報を得る必要が有る。一般的には参照情報は予測情報として符号化し映像とともに多重化するが、その符号量を削減するために何らかの方法で参照情報を予測することもある。一般的な方法では、既に符号化済みの符号化対象画像の周辺ブロックが符号化時に使用した予測情報を取得し、符号化対象画像の予測に用いる参照情報とするダイレクトモードや、周辺ブロックの予測情報を候補リスト（Candidate List）としてリスト化し、リスト中から予測情報を取得する対象ブロックを識別する識別子を符号化するマージモードなどがある。 When performing inter prediction, it is necessary to obtain reference information such as a reference picture index and a motion vector indicating the reference destination. In general, the reference information is encoded as prediction information and multiplexed with the video, but the reference information may be predicted by some method in order to reduce the code amount. In a general method, the prediction information used when the peripheral block of the encoding target image that has already been encoded is used for encoding and the reference information used for prediction of the encoding target image is used. There is a merge mode in which information is listed as a candidate list and an identifier for identifying a target block from which prediction information is obtained is encoded from the list.

また他の方法として残差予測がある。残差予測は、高い相関を持つ２つの画像をそれぞれ予測符号化した場合にその予測残差も互いに相関を持つことを利用した予測残差の符号量を抑えるための方法である。残差予測については非特許文献２に詳しい。多視点映像符号化において用いられる視点間残差予測では、異なる視点の映像における符号化対象画像と対応する領域の符号化時の予測残差信号を符号化対象の予測残差信号から差し引くことによって残差信号のエネルギーを低減し、符号化効率を向上することが可能である。視点間の対応関係は、例えば既に符号化済みの周辺ブロックが視差補償予測で符号化されている場合に、その視差ベクトルによって符号化対象ブロックに対応する別の視点の領域を設定するなどの方法で求められる。この方法で求められる視差ベクトルはＤｉｓｐａｒｉｔｙｖｅｃｔｏｒｆｒｏｍｎｅｉｇｈｂｏｕｒｉｎｇｂｌｏｃｋｓ（ＮＢＤＶ）と呼ばれる。視点間残差予測は、Ｂピクチャにおいてフレーム間予測が用いられる場合に、その予測とは別に残差に対する更なる処理として用いられる。 Another method is residual prediction. Residual prediction is a method for suppressing the amount of code of a prediction residual using the fact that when two images having high correlation are predictively encoded, the prediction residuals are also correlated with each other. The residual prediction is detailed in Non-Patent Document 2. In inter-view residual prediction used in multi-view video encoding, by subtracting the prediction residual signal at the time of encoding the region corresponding to the encoding target image in the video of different viewpoint from the prediction residual signal of the encoding target. It is possible to reduce the energy of the residual signal and improve the encoding efficiency. The correspondence between viewpoints is, for example, a method of setting an area of another viewpoint corresponding to a block to be encoded by the disparity vector when an already encoded peripheral block is encoded by parallax compensation prediction. Is required. The disparity vector obtained by this method is called “Disparity vector from Neighboring blocks” (NBDV). Inter-view residual prediction is used as a further process for the residual separately from the prediction when inter-frame prediction is used in a B picture.

また元々の予測方法が符号化対象視点と異なる視点の参照ピクチャを使用したインター予測である場合には、符号化対象フレームと異なるフレームにおける符号化対象画像と対応する領域の符号化時の予測残差信号を使用して同様に残差予測を行う方法もある。また、いずれの場合にも対応する領域の符号化時の予測残差を参照する代わりに、対応する領域に対して符号化対象画像で使用するものと同じ動き情報を使用して予測画像を生成し、対応する領域の画像との差分を取ることで予測残差の予測値を生成する方法がある。 In addition, when the original prediction method is inter prediction using a reference picture of a viewpoint different from the encoding target viewpoint, the prediction remaining at the time of encoding of the area corresponding to the encoding target image in a frame different from the encoding target frame. There is also a method of performing residual prediction in the same manner using a difference signal. In either case, instead of referring to the prediction residual at the time of encoding the corresponding region, a predicted image is generated using the same motion information as that used in the encoding target image for the corresponding region. There is a method of generating a prediction value of a prediction residual by taking a difference from an image of a corresponding region.

なお、本明細書中において、画像とは動画像の１つのフレームまたは静止画像のことであり、複数のフレーム（画像）が集まったもの（動画像）を映像と称する。 In the present specification, an image is one frame or a still image of a moving image, and a collection of a plurality of frames (images) (moving image) is referred to as a video.

M. Flierl and B. Girod, "Multiview video compression," Signal Processing Magazine, IEEE, no. November 2007, pp. 66-76, 2007.M. Flierl and B. Girod, "Multiview video compression," Signal Processing Magazine, IEEE, no. November 2007, pp. 66-76, 2007. X. Wang and J. Ridge, "Improved video coding with residual prediction for extended spatial scalability," Communications, Control and Signal Processing, 2008. ISCCSP 2008. 3rd International Symposium on, no. March, pp. 1041-1046, 2008.X. Wang and J. Ridge, "Improved video coding with residual prediction for extended spatial scalability," Communications, Control and Signal Processing, 2008. ISCCSP 2008. 3rd International Symposium on, no. March, pp. 1041-1046, 2008.

多視点映像の符号化において、残差予測は有効な符号量削減方法である。しかしながら、その予測精度は視点間対応の精度に大きく依存する。視点間対応の精度が十分でない場合には予測された予測残差と予測画像の間にずれが生じるため、符号化対象の予測残差を十分低減できない、または復号画像にノイズが発生するなどにより、十分な効果が得られないという問題がある。また視点間で信号特性が大きく異なる場合や、符号化対象映像にノイズが乗っている場合、また参照ピクチャ毎に符号化に起因する歪が乗っている場合にも、同様の問題が発生するという問題もある。 Residual prediction is an effective code amount reduction method in multi-view video encoding. However, the prediction accuracy largely depends on the accuracy of correspondence between viewpoints. When the accuracy of correspondence between viewpoints is not sufficient, there is a gap between the predicted prediction residual and the predicted image, so that the prediction residual of the encoding target cannot be sufficiently reduced or noise occurs in the decoded image. There is a problem that a sufficient effect cannot be obtained. Also, the same problem occurs when the signal characteristics differ greatly between viewpoints, when noise is added to the video to be encoded, or when distortion caused by encoding is added to each reference picture. There is also a problem.

本発明は、このような事情に鑑みてなされたもので、符号化効率または復号映像品質を向上することができる映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム、映像復号プログラム及び記録媒体を提供することを目的とする。 The present invention has been made in view of such circumstances, and a video encoding method, a video decoding method, a video encoding device, a video decoding device, and a video encoding capable of improving encoding efficiency or decoded video quality. An object is to provide a program, a video decoding program, and a recording medium.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う映像符号化方法であって、前記参照ピクチャから前記画面間予測によって予測予測残差を生成する残差予測ステップと、前記予測予測残差を更新して新たな予測予測残差とする予測予測残差更新ステップとを有することを特徴とする。 The present invention is a video encoding method that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image, A residual prediction step of generating a prediction prediction residual from the reference picture by the inter-screen prediction, and a prediction prediction residual update step of updating the prediction prediction residual to be a new prediction prediction residual. And

本発明は、前記参照ピクチャから前記画面間予測によって一次予測画像を生成する一次予測画像生成ステップと、前記一次予測画像と前記予測予測残差から前記予測画像を生成する予測画像生成ステップと、さらに有することを特徴とする。 The present invention provides a primary prediction image generation step for generating a primary prediction image from the reference picture by the inter-screen prediction, a prediction image generation step for generating the prediction image from the primary prediction image and the prediction prediction residual, and It is characterized by having.

本発明は、前記参照ピクチャから前記画面間予測によって前記予測画像を生成する予測画像生成ステップと、前記予測予測残差と前記予測残差とから符号化対象予測残差を生成する予測残差生成ステップとさらに有することを特徴とする。 The present invention provides a prediction image generation step of generating the prediction image by the inter-screen prediction from the reference picture, and a prediction residual generation for generating an encoding target prediction residual from the prediction prediction residual and the prediction residual And further comprising steps.

本発明は、前記予測予測残差更新ステップでは、前記予測予測残差に対してフィルタを適用して予測画像を更新することを特徴とする。 The present invention is characterized in that in the prediction prediction residual update step, a prediction image is updated by applying a filter to the prediction prediction residual.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う符号化方法であって、前記参照ピクチャからフィルタを使用した画面間予測によって予測予測残差を生成する残差予測ステップを有することを特徴とする。 The present invention is an encoding method for further performing residual prediction on an image generated by inter-screen prediction from a reference picture when performing prediction on a prediction target region and generating a prediction image. It has a residual prediction step of generating a prediction prediction residual by inter-screen prediction using a filter from a picture.

本発明は、前記フィルタを生成するフィルタ生成ステップをさらに有することを特徴とする。 The present invention further includes a filter generation step of generating the filter.

本発明は、前記フィルタを決定するフィルタ決定ステップをさらに有し、前記フィルタ決定ステップでは、前記予測予測残差または一次予測画像の情報に基づき前記フィルタを決定することを特徴とする。 The present invention further includes a filter determination step for determining the filter, wherein the filter determination step determines the filter based on information of the prediction prediction residual or primary prediction image.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う符号化方法であって、
フィルタを決定するフィルタ決定ステップと、前記参照ピクチャから前記フィルタを使用した画面間予測によって予測予測残差を生成する残差予測ステップと、を有し、前記フィルタ決定ステップでは、一次予測画像の情報に基づき前記フィルタを決定することを特徴とする。 The present invention is an encoding method that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
A filter determining step for determining a filter, and a residual prediction step for generating a prediction prediction residual by inter-screen prediction using the filter from the reference picture, and in the filter determination step, information on the primary prediction image The filter is determined based on:

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う映像復号方法であって、前記参照ピクチャから前記画面間予測によって予測予測残差を生成する残差予測ステップと、前記予測予測残差を更新して新たな予測予測残差とする予測予測残差更新ステップとを有することを特徴とする。 The present invention is a video decoding method for further performing residual prediction on an image generated by inter-screen prediction from a reference picture when performing prediction on a prediction target region and generating a prediction image, A residual prediction step for generating a prediction prediction residual from the picture by the inter-screen prediction, and a prediction prediction residual update step for updating the prediction prediction residual to be a new prediction prediction residual, To do.

本発明は、前記参照ピクチャから前記画面間予測によって一次予測画像を生成する一次予測画像生成ステップと、前記一次予測画像と前記予測予測残差とから前記予測画像を生成する予測画像生成ステップと、さらに有することを特徴とする。 The present invention includes a primary prediction image generation step of generating a primary prediction image from the reference picture by the inter-screen prediction, a prediction image generation step of generating the prediction image from the primary prediction image and the prediction prediction residual, Furthermore, it is characterized by having.

本発明は、前記予測予測残差更新ステップでは、前記予測予測残差に対してフィルタを適用して前記予測画像を更新することを特徴とする。 In the prediction prediction residual update step, the present invention is characterized in that the prediction image is updated by applying a filter to the prediction prediction residual.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う復号方法であって、前記参照ピクチャからフィルタを使用した前記画面間予測によって予測予測残差を生成する残差予測ステップを有することを特徴とする。 The present invention is a decoding method for further performing residual prediction on an image generated by inter-screen prediction from a reference picture when performing prediction on a prediction target region and generating a predicted image, wherein the reference picture And a residual prediction step of generating a prediction prediction residual by the inter-screen prediction using a filter.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う映像符号化装置であって、前記参照ピクチャから画面間予測によって予測予測残差を生成する残差予測手段と、前記予測予測残差を更新して新たな予測予測残差とする予測予測残差更新手段とを備えることを特徴とする。 The present invention is a video encoding apparatus that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when performing prediction on a prediction target region and generating a prediction image, A residual prediction unit that generates a prediction prediction residual by inter-screen prediction from a reference picture; and a prediction prediction residual update unit that updates the prediction prediction residual to obtain a new prediction prediction residual. To do.

本発明は、予測対象領域に対して予測を行い予測画像を生成する際に、参照ピクチャから画面間予測によって生成された画像に対して更に残差予測を行う映像復号装置であって、前記参照ピクチャから前記画面間予測によって予測予測残差を生成する残差予測手段と、前記予測予測残差を更新して新たな予測予測残差とする予測予測残差更新手段とを備えることを特徴とする。 The present invention is a video decoding apparatus that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when performing prediction on a prediction target region and generating a prediction image, A residual prediction unit that generates a prediction prediction residual from the picture by the inter-screen prediction, and a prediction prediction residual update unit that updates the prediction prediction residual to be a new prediction prediction residual. To do.

本発明は、前記映像符号化方法をコンピュータに実行させるための映像符号化プログラムである。 The present invention is a video encoding program for causing a computer to execute the video encoding method.

本発明は、前記映像復号方法をコンピュータに実行させるための映像復号プログラムである。 The present invention is a video decoding program for causing a computer to execute the video decoding method.

本発明は、前記映像符号化プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention is a computer-readable recording medium on which the video encoding program is recorded.

本発明は、前記映像復号プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The present invention is a computer-readable recording medium on which the video decoding program is recorded.

本発明によれば、残差予測による残差予測値を更新あるいはフィルタリングし、主に残差予測の精度に由来するノイズを防止することにより符号化効率または復号映像品質を向上することができるという効果が得られる。 According to the present invention, it is possible to improve encoding efficiency or decoded video quality by updating or filtering a residual prediction value by residual prediction and preventing noise mainly derived from the accuracy of residual prediction. An effect is obtained.

本発明の第１実施形態による映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video coding apparatus by 1st Embodiment of this invention. 図１に示す映像符号化装置１００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video coding apparatus 100 shown in FIG. 本発明の第１実施形態による映像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video decoding apparatus by 1st Embodiment of this invention. 図３に示す映像復号装置２００の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video decoding apparatus 200 shown in FIG. 本発明の第２実施形態による映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video coding apparatus by 2nd Embodiment of this invention. 図５に示す映像符号化装置１００ａの処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video coding apparatus 100a shown in FIG. 本発明の第２実施形態による映像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video decoding apparatus by 2nd Embodiment of this invention. 図７に示す映像復号装置２００ａの処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video decoding apparatus 200a shown in FIG.

＜第１実施形態＞
以下、図面を参照して、本発明の第１実施形態による映像符号化装置を説明する。図１は、本発明の第１実施形態による映像符号化装置１００の構成を示すブロック図である。映像符号化装置１００は、図１に示すように、符号化対象映像入力部１０１、入力映像メモリ１０２、参照ピクチャメモリ１０３、一次予測画像生成部１０４、予測予測残差生成部１０５，予測予測残差更新部１０６、予測画像生成部１０７、減算部１０８、変換・量子化部１０９、逆量子化・逆変換部１１０、加算部１１１、及びエントロピー符号化部１１２を備えている。 <First Embodiment>
Hereinafter, a video encoding apparatus according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a video encoding device 100 according to the first embodiment of the present invention. As shown in FIG. 1, the video encoding apparatus 100 includes an encoding target video input unit 101, an input video memory 102, a reference picture memory 103, a primary prediction image generation unit 104, a prediction prediction residual generation unit 105, a prediction prediction residual, A difference update unit 106, a predicted image generation unit 107, a subtraction unit 108, a transform / quantization unit 109, an inverse quantization / inverse transform unit 110, an addition unit 111, and an entropy encoding unit 112 are provided.

符号化対象映像入力部１０１は、符号化対象となる映像を入力する。以下の説明では、この符号化対象となる映像のことを符号化対象映像と呼び、特に処理を行うフレームを符号化対象フレームまたは符号化対象ピクチャと呼ぶ。入力映像メモリ１０２は、入力された符号化対象映像を記憶する。参照ピクチャメモリ１０３は、それまでに符号化・復号された画像を記憶する。以下は、この記憶されたフレームを参照フレームまたは参照ピクチャと呼ぶ。 The encoding target video input unit 101 inputs a video to be encoded. In the following description, the video to be encoded is referred to as an encoding target video, and a frame to be processed in particular is referred to as an encoding target frame or an encoding target picture. The input video memory 102 stores the input encoding target video. The reference picture memory 103 stores images that have been encoded and decoded so far. Hereinafter, this stored frame is referred to as a reference frame or reference picture.

一次予測画像生成部１０４は、参照ピクチャメモリ１０３に記憶された参照ピクチャを使用して符号化対象領域に対する予測を行い、一次予測画像を生成する。予測予測残差生成部１０５は、参照ピクチャメモリ１０３に記憶された参照ピクチャと一次予測画像生成時の予測情報を使用して予測予測残差を生成する。予測予測残差更新部１０６は、生成された予測予測残差を更新し新たな予測予測残差とする。予測画像生成部１０７は、予測予測残差と一次予測画像とから予測画像を生成する。減算部１０８は、符号化対象画像と予測画像の差分を求め、予測残差を生成する。 The primary prediction image generation unit 104 performs prediction on the encoding target region using the reference picture stored in the reference picture memory 103, and generates a primary prediction image. The prediction prediction residual generation unit 105 generates a prediction prediction residual using the reference picture stored in the reference picture memory 103 and the prediction information at the time of primary prediction image generation. The prediction prediction residual update unit 106 updates the generated prediction prediction residual to obtain a new prediction prediction residual. The predicted image generation unit 107 generates a predicted image from the predicted prediction residual and the primary predicted image. The subtraction unit 108 obtains a difference between the encoding target image and the predicted image, and generates a prediction residual.

変換・量子化部１０９は、生成された予測残差を変換・量子化し、量子化データを生成する。逆量子化・逆変換部１１０は、生成された量子化データを逆量子化・逆変換し、復号予測残差を生成する。加算部１１１は、復号予測残差と予測画像とを加算し復号画像を生成する。エントロピー符号化部１１２は、量子化データをエントロピー符号化し符号データを生成する。 The transform / quantization unit 109 transforms / quantizes the generated prediction residual to generate quantized data. The inverse quantization / inverse transform unit 110 performs inverse quantization / inverse transform on the generated quantized data to generate a decoded prediction residual. The adding unit 111 adds the decoded prediction residual and the predicted image to generate a decoded image. The entropy encoding unit 112 entropy encodes the quantized data to generate code data.

次に、図２を参照して、図１に示す映像符号化装置１００の処理動作を説明する。図２は、図１に示す映像符号化装置１００の処理動作を示すフローチャートである。ここでは、符号化対象映像は多視点映像のうちの一つの映像であることとし、多視点映像はフレーム毎に１視点ずつ全視点の映像を符号化し復号する構造をとるものとする。ここでは符号化対象映像中のある１フレームを符号化する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の符号化を実現することができる。 Next, the processing operation of the video encoding device 100 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the video encoding apparatus 100 shown in FIG. Here, it is assumed that the encoding target video is one of the multi-view videos, and the multi-view video has a structure in which videos of all viewpoints are encoded and decoded for each frame. Here, a process of encoding one frame in the video to be encoded will be described. By repeating the processing described for each frame, video encoding can be realized.

まず、符号化対象映像入力部１０１は、符号化対象ピクチャを入力し、入力映像メモリ１０２に記憶する（ステップＳ１０１）。なお、符号化対象映像中の幾つかのフレームは既に符号化されているものとし、その復号結果が参照ピクチャメモリ１０３に記憶されているものとする。また、符号化対象ピクチャと同じフレームまでの参照可能な別の視点の映像も既に符号化され復号されて参照ピクチャメモリ１０３に記憶されているものとする。 First, the encoding target video input unit 101 inputs an encoding target picture and stores it in the input video memory 102 (step S101). It is assumed that some frames in the video to be encoded have already been encoded and the decoding results are stored in the reference picture memory 103. In addition, it is assumed that a video of another viewpoint that can be referred to up to the same frame as the current picture to be coded is already coded and decoded and stored in the reference picture memory 103.

映像入力の後、符号化対象ピクチャを符号化対象ブロックに分割し、ブロック毎に符号化対象ピクチャの映像信号を符号化する（ステップＳ１０２〜Ｓ１１２の繰り返しループ）。以下では、符号化対象となるブロックの画像のことを符号化対象ブロックまたは符号化対象画像と呼ぶ。以下のステップＳ１０３〜Ｓ１１０の処理はピクチャの全てのブロックに対して繰り返し実行する。 After the video input, the encoding target picture is divided into encoding target blocks, and the video signal of the encoding target picture is encoded for each block (repetition loop of steps S102 to S112). Hereinafter, an image of a block to be encoded is referred to as an encoding target block or an encoding target image. The following steps S103 to S110 are repeatedly executed for all the blocks of the picture.

符号化対象ブロックごとに繰り返される処理において、まず、一次予測画像生成部１０４は、符号化対象ブロックに対して、参照ピクチャメモリ内の参照ピクチャを参照するインター予測を行い、参照先を示す情報である動き情報を決定し、参照先の画像から一次予測画像を生成する（ステップＳ１０３）。予測はどのような方法で行ってもよいし、動き情報はどのようなものでもよい。参照情報として一般的なものとして、参照ピクチャを特定する参照ピクチャインデックス情報と、参照ピクチャ上での参照位置を示す動きベクトルの組み合わせなどがある。 In the process repeated for each encoding target block, first, the primary prediction image generation unit 104 performs inter prediction that refers to the reference picture in the reference picture memory for the encoding target block, and uses information indicating a reference destination. Certain motion information is determined, and a primary predicted image is generated from the reference destination image (step S103). The prediction may be performed by any method, and the motion information may be any method. Typical reference information includes a combination of reference picture index information for specifying a reference picture and a motion vector indicating a reference position on the reference picture.

予測方法として一般的なものとしては、候補となる参照ピクチャ上でマッチングを行い参照先を決定する方法や、ダイレクトモードやマージモードと呼ばれる既に符号化済みの周辺ブロックの符号化時の予測に用いた動き情報を継承する方法などがある。その他どのような予測方法、動き情報を使用してもよい。動き情報は符号化し映像の符号データと多重化してもよいし、また動き情報を特定可能な情報を別に符号化し多重化してもよいし、前述のように周辺の動き情報や候補リストから導き出せる場合には符号化しなくてもよい。また、動き情報を予測しその残差を符号化してもよい。 Common prediction methods include matching on candidate reference pictures to determine the reference destination, and prediction when encoding already-encoded peripheral blocks called direct mode or merge mode. There is a method to inherit the motion information. Any other prediction method and motion information may be used. Motion information may be encoded and multiplexed with video code data, or information that can identify motion information may be encoded and multiplexed separately, or can be derived from surrounding motion information or candidate lists as described above Need not be encoded. Further, motion information may be predicted and the residual may be encoded.

次に、予測予測残差生成部１０５は、符号化対象領域に対する残差予測を行い、予測予測残差を生成する（ステップＳ１０４）。残差予測はどのような方法で行ってもよい。一般的な方法としては、符号化対象領域に対応する参照ピクチャメモリ内の参照ピクチャ上の別の領域を参照領域とし、参照領域における符号化時の予測残差を取得し、符号化対象領域における予測予測残差とする方法などがある。あるいは、参照領域に対して符号化対象画像で使用するものと同じ動き情報を使用して、参照領域に対する予測画像を生成し、参照領域の画像との差分を取ることで予測予測残差を生成する方法がある。 Next, the prediction prediction residual generation unit 105 performs residual prediction on the encoding target region, and generates a prediction prediction residual (step S104). The residual prediction may be performed by any method. As a general method, another region on the reference picture in the reference picture memory corresponding to the encoding target region is used as a reference region, and a prediction residual at the time of encoding in the reference region is obtained. There are methods such as predictive residual. Alternatively, using the same motion information as that used in the encoding target image for the reference region, a prediction image for the reference region is generated, and a prediction prediction residual is generated by taking a difference from the image of the reference region There is a way to do it.

また、参照領域はどのような方法で決定してもよい。一般的な方法では、符号化対象領域と異なる視点の映像における符号化対象画像と対応する領域を参照領域とする場合に、既に符号化済みの周辺ブロックの視差補償予測で使用した動きベクトルによって参照領域を決定する方法などがある。 The reference area may be determined by any method. In a general method, when a region corresponding to an image to be encoded in a video with a different viewpoint from the region to be encoded is set as a reference region, the reference is made by a motion vector used in the parallax compensation prediction of a peripheral block that has already been encoded. There is a method for determining the area.

次に、予測予測残差更新部１０６は、生成された予測予測残差を更新し新たな予測予測残差とする（ステップＳ１０５）。更新の方法はどのような方法でもよい。以下では例として、参照領域と符号化対象領域の対応関係に誤差が見込まれる場合の残差予測による予測精度を向上する例を説明する。参照領域と符号化対象領域の対応関係に整数画素レベルあるいは小数画素レベルのずれが見込まれる場合、残差予測で生成された予測予測残差も同程度ずれが見込まれ、このずれによって残差予測の性能が大きく低下する。このような場合には、予測予測残差を平滑化フィルタやその他のローパスフィルタなどを適用して更新することによりずれによる誤差を平均化するなどしてもよい。 Next, the prediction prediction residual update unit 106 updates the generated prediction prediction residual to obtain a new prediction prediction residual (step S105). Any update method may be used. Hereinafter, as an example, an example will be described in which the prediction accuracy by residual prediction is improved when an error is expected in the correspondence between the reference region and the encoding target region. If the correspondence between the reference area and the encoding target area is expected to shift at an integer pixel level or a fractional pixel level, the prediction prediction residual generated by the residual prediction is also expected to be shifted to the same extent. The performance of the system is greatly reduced. In such a case, the error due to the deviation may be averaged by updating the prediction prediction residual by applying a smoothing filter or other low-pass filter.

また別の例として、符号化対象映像に何らかのノイズが乗っており、そのため予測予測残差にも同種のノイズが残留している場合についても説明する。この場合には、予測予測残差を何らかのノイズ除去フィルタを適用して更新することにより誤差を低減するなどしてもよい。この場合、参照領域やその予測画像などを使用してノイズモデルを見積もり、更新に利用するなどしてもよい。またあるいは、参照ピクチャに符号化歪が乗っている場合などには、符号化歪みによるノイズを参照領域の符号化時の量子化パラメータなどに基づいて見積もってもよい。 As another example, a case will be described in which some noise is present on the video to be encoded, and therefore the same kind of noise remains in the prediction prediction residual. In this case, the error may be reduced by updating the prediction prediction residual by applying some noise removal filter. In this case, the noise model may be estimated using the reference region and its predicted image and used for updating. Alternatively, when encoding distortion is on the reference picture, noise due to encoding distortion may be estimated based on a quantization parameter at the time of encoding the reference area.

また別の例として、アクセス可能領域の違いなどの問題から参照領域に対して符号化対象画像で使用するものと同じ動き情報や同じフレームの参照ピクチャを使用せず、異なる動き情報や異なるフレームの参照ピクチャを使用して参照領域に対する予測画像を生成して予測予測残差生成に使用している場合や、参照領域における符号化時の予測残差を取得し符号化対象領域における予測予測残差としている場合の残差予測の精度向上についても説明する。この場合には、動き情報の違いやずれに起因して予測予測残差にもずれが見込まれる。このような場合には、符号化対象領域と参照領域のそれぞれで使用した動き情報などから更新に必要な情報を決定してもよい。例えば、動きベクトルの差からフィルタのカーネル幅や強度などを見積もるなどしてもよい。 As another example, due to problems such as differences in accessible areas, the same motion information and reference picture of the same frame as those used in the encoding target image are not used for the reference area, and different motion information and different frames are used. When a prediction picture is generated for a reference area using a reference picture and used for prediction prediction residual generation, or when a prediction residual is obtained at the time of encoding in the reference area, a prediction prediction residual in the encoding target area is obtained. The improvement of the accuracy of residual prediction in the case of In this case, a deviation is also expected in the prediction prediction residual due to a difference or deviation in motion information. In such a case, information necessary for updating may be determined from the motion information used in each of the encoding target region and the reference region. For example, the kernel width or strength of the filter may be estimated from the difference between the motion vectors.

また、予め定められたフィルタの中から適切なものを選択し使用するとしてもよい。上述のいずれの例の場合にも、更新に使用したパラメータやフィルタ係数や、それらを特定可能な情報を付加情報として符号化し映像とともに多重化してもよい。 In addition, an appropriate filter may be selected and used from predetermined filters. In any of the above examples, the parameters and filter coefficients used for the update, and information that can identify them may be encoded as additional information and multiplexed with the video.

予測予測残差を更新したら、予測画像生成部１０７は、第一予測画像と予測予測残差とから予測画像を生成する（ステップＳ１０６）。予測画像はどのように生成してもよい。一般的には、一次予測画像と予測予測残差とを加算することで予測画像を生成する方法などがある。次に、減算部１０８は予測画像と符号化対象ブロックの差分をとり、予測残差を生成する（ステップＳ１０７）。次に、予測残差の生成が終了したら、変換・量子化部１０９は予測残差を変換・量子化し、量子化データを生成する（ステップＳ１０８）。この変換・量子化は、復号側で正しく逆量子化・逆変換できるものであればどのような方法を用いてもよい。そして、変換・量子化が終了したら、逆量子化・逆変換部１１０は、量子化データを逆量子化・逆変換し復号予測残差を生成する（ステップＳ１０９）。 After updating the prediction prediction residual, the prediction image generation unit 107 generates a prediction image from the first prediction image and the prediction prediction residual (step S106). The predicted image may be generated in any way. In general, there is a method of generating a prediction image by adding a primary prediction image and a prediction prediction residual. Next, the subtraction unit 108 takes the difference between the prediction image and the encoding target block, and generates a prediction residual (step S107). Next, when the generation of the prediction residual is completed, the transform / quantization unit 109 converts and quantizes the prediction residual, and generates quantized data (step S108). For this transformation / quantization, any method can be used as long as it can be correctly inverse-quantized / inverse-transformed on the decoding side. Then, when the transform / quantization is completed, the inverse quantization / inverse transform unit 110 performs inverse quantization / inverse transform on the quantized data to generate a decoded prediction residual (step S109).

次に、復号予測残差の生成が終了したら、加算部１１１は、復号予測残差と予測画像とを加算し復号画像を生成し、参照ピクチャメモリ１０３に記憶する（ステップＳ１１０）。必要であれば復号画像にループフィルタをかけてもよい。通常の映像符号化では、デブロッキングフィルタやその他のフィルタを使用して符号化ノイズを除去する。 Next, when the generation of the decoded prediction residual is completed, the addition unit 111 generates a decoded image by adding the decoded prediction residual and the predicted image, and stores the decoded image in the reference picture memory 103 (step S110). If necessary, a loop filter may be applied to the decoded image. In normal video coding, coding noise is removed using a deblocking filter or other filters.

次に、エントロピー符号化部１１２は、量子化データをエントロピー符号化し符号データを生成し、必要であれば、予測情報や残差予測情報その他の付加情報も符号化し符号データと多重化し（ステップＳ１１１）、全てのブロックについて処理が終了したら、符号データを出力する（ステップＳ１１２）。 Next, the entropy encoding unit 112 generates encoded data by entropy encoding the quantized data, and if necessary, also encodes prediction information, residual prediction information, and other additional information, and multiplexes with the encoded data (step S111). ) When all the blocks have been processed, code data is output (step S112).

次に、本発明の第１実施形態による映像復号装置について説明する。図３は、本発明の第１実施形態による映像復号装置２００の構成を示すブロック図である。映像復号装置２００は、図３に示すように、符号データ入力部２０１、符号データメモリ２０２、参照ピクチャメモリ２０３、エントロピー復号部２０４、逆量子化・逆変換部２０５、一次予測画像生成部２０６、予測予測残差生成部２０７、予測予測残差更新部２０８、予測画像生成部２０９、加算部２１０を備えている。 Next, the video decoding apparatus according to the first embodiment of the present invention will be described. FIG. 3 is a block diagram showing a configuration of the video decoding apparatus 200 according to the first embodiment of the present invention. As shown in FIG. 3, the video decoding apparatus 200 includes a code data input unit 201, a code data memory 202, a reference picture memory 203, an entropy decoding unit 204, an inverse quantization / inverse transform unit 205, a primary prediction image generation unit 206, A prediction prediction residual generation unit 207, a prediction prediction residual update unit 208, a prediction image generation unit 209, and an addition unit 210 are provided.

符号データ入力部２０１は、復号対象となる映像符号データを入力する。この復号対象となる映像符号データのことを復号対象映像符号データと呼び、特に処理を行うフレームを復号対象フレームまたは復号対象ピクチャと呼ぶ。符号データメモリ２０２は、入力された復号対象映像を記憶する。参照ピクチャメモリ２０３は、すでに復号済みの画像を記憶する。エントロピー復号部２０４は、復号対象ピクチャの符号データをエントロピー復号し量子化データを生成し、逆量子化・逆変換部２０５は量子化データに逆量子化／逆変換を施して復号予測残差を生成する。 The code data input unit 201 inputs video code data to be decoded. This video code data to be decoded is called decoding target video code data, and a frame to be processed in particular is called a decoding target frame or a decoding target picture. The code data memory 202 stores the input decoding target video. The reference picture memory 203 stores an already decoded image. The entropy decoding unit 204 entropy-decodes the code data of the decoding target picture to generate quantized data, and the inverse quantization / inverse transform unit 205 performs inverse quantization / inverse transformation on the quantized data to obtain a decoded prediction residual. Generate.

一次予測画像生成部２０６は、参照ピクチャメモリ２０３に記憶された参照ピクチャを使用して符号化対象領域に対する予測を行い、一次予測画像を生成する。予測予測残差生成部２０７は、参照ピクチャメモリ２０３に記憶された参照ピクチャと一次予測画像生成時の予測情報を使用して予測予測残差を生成する。予測予測残差更新部２０８は、生成された予測予測残差を更新し新たな予測予測残差とする。予測画像生成部２０９は、予測予測残差と一次予測画像とから予測画像を生成する。加算部２１０は、復号予測残差と予測画像とを加算し復号画像を生成する。 The primary prediction image generation unit 206 performs prediction on the encoding target region using the reference picture stored in the reference picture memory 203, and generates a primary prediction image. The prediction prediction residual generation unit 207 generates a prediction prediction residual using the reference picture stored in the reference picture memory 203 and prediction information at the time of primary prediction image generation. The prediction prediction residual update unit 208 updates the generated prediction prediction residual to obtain a new prediction prediction residual. The predicted image generation unit 209 generates a predicted image from the predicted prediction residual and the primary predicted image. The adding unit 210 adds the decoded prediction residual and the predicted image to generate a decoded image.

次に、図４を参照して、図３に示す映像復号装置２００の処理動作を説明する。図４は、図３に示す映像復号装置２００の処理動作を示すフローチャートである。復号対象映像は多視点映像のうちの一つの映像であるものとし、多視点映像はフレーム毎に１視点ずつ全視点の映像を復号する構造をとるものとする。ここでは符号データ中のある１フレームを復号する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の復号が実現できる。 Next, the processing operation of the video decoding apparatus 200 shown in FIG. 3 will be described with reference to FIG. FIG. 4 is a flowchart showing the processing operation of the video decoding apparatus 200 shown in FIG. It is assumed that the decoding target video is one of the multi-view videos, and the multi-view video has a structure in which the videos of all viewpoints are decoded one by one for each frame. Here, a process of decoding one frame in the code data will be described. By repeating the processing described for each frame, video decoding can be realized.

まず、符号データ入力部２０１は符号データを入力し、符号データメモリ２０２に記憶する（ステップＳ２０１）。なお、復号対象映像中の幾つかのフレームは既に復号されているものとし、その復号結果が参照ピクチャメモリ２０３に記憶されているとする。また、復号対象ピクチャと同じフレームまでの参照可能な別の視点の映像も既に復号され復号されて参照ピクチャメモリ２０３に記憶されていることとする。 First, the code data input unit 201 inputs code data and stores it in the code data memory 202 (step S201). It is assumed that some frames in the video to be decoded have already been decoded and the decoding results are stored in the reference picture memory 203. Also, it is assumed that the video of another viewpoint that can be referred to up to the same frame as the decoding target picture has already been decoded, decoded, and stored in the reference picture memory 203.

次に、符号データ入力の後、復号対象ピクチャを復号対象ブロックに分割し、ブロック毎に復号対象ピクチャの映像信号を復号する（ステップＳ２０２〜Ｓ２１０の繰り返しループ）。以下では、復号対象となるブロックの画像のことを復号対象ブロックまたは復号対象画像と呼ぶ。ステップＳ２０３〜Ｓ２０９の処理はフレーム全てのブロックに対して繰り返し実行する。 Next, after the code data is input, the decoding target picture is divided into decoding target blocks, and the video signal of the decoding target picture is decoded for each block (repetition loop of steps S202 to S210). Hereinafter, an image of a block to be decoded is referred to as a decoding target block or a decoding target image. The processes in steps S203 to S209 are repeatedly executed for all blocks in the frame.

復号対象ブロックごとに繰り返される処理において、まず、エントロピー復号部２０４は、符号データをエントロピー復号する（ステップＳ２０３）。逆量子化・逆変換部２０５は、逆量子化・逆変換を行い、復号予測残差を生成する（ステップＳ２０４）。予測情報やその他の付加情報が符号データに含まれる場合は、それらも復号し適宜必要な情報を生成してもよい。 In the process repeated for each decoding target block, first, the entropy decoding unit 204 performs entropy decoding on the code data (step S203). The inverse quantization / inverse transform unit 205 performs inverse quantization / inverse transformation to generate a decoded prediction residual (step S204). When the prediction data and other additional information are included in the code data, they may be decoded to generate necessary information as appropriate.

ステップＳ２０５からステップＳ２０８までの処理は、映像符号化装置１００におけるステップＳ１０３からステップＳ１０６までの処理と同様であるので、ここでは簡単に説明する。一次予測画像生成部２０６は、符号化対象ブロックに対して、参照ピクチャメモリ内の参照ピクチャを参照するインター予測を行い、参照先を示す情報である動き情報を決定し、参照先の画像から一次予測画像を生成する（ステップＳ２０５）。次に、予測予測残差生成部２０７は、符号化対象領域に対する残差予測を行い、予測予測残差を生成する（ステップＳ２０６）。次に、予測予測残差更新部２０８は、生成された予測予測残差を更新し新たな予測予測残差とする（ステップＳ２０７）。予測予測残差を更新したら、予測画像生成部２０９は、一次予測画像と予測予測残差とから予測画像を生成する（ステップＳ２９８）。 Since the processing from step S205 to step S208 is the same as the processing from step S103 to step S106 in the video encoding device 100, it will be briefly described here. The primary prediction image generation unit 206 performs inter prediction with reference to the reference picture in the reference picture memory for the encoding target block, determines motion information that is information indicating a reference destination, and performs primary prediction from the reference destination image. A predicted image is generated (step S205). Next, the prediction prediction residual generation unit 207 performs residual prediction on the encoding target region, and generates a prediction prediction residual (step S206). Next, the prediction prediction residual update unit 208 updates the generated prediction prediction residual to obtain a new prediction prediction residual (step S207). After updating the prediction prediction residual, the prediction image generation unit 209 generates a prediction image from the primary prediction image and the prediction prediction residual (step S298).

次に、予測画像の生成が終了したら、加算部２１０は、復号予測残差と予測画像を加算し、復号画像を生成し、参照ピクチャメモリに記憶する（ステップＳ２０９）。必要であれば復号画像に更にループフィルタをかけてもよい。通常の映像復号では、デブロッキングフィルタやその他のフィルタを使用して符号化ノイズを除去する。そして、全てのブロックについて処理が終了したら、復号フレームとして出力する（ステップＳ２１０）。 Next, when the generation of the predicted image is completed, the adding unit 210 adds the decoded prediction residual and the predicted image, generates a decoded image, and stores the decoded image in the reference picture memory (step S209). If necessary, the decoded image may be further subjected to a loop filter. In normal video decoding, a coding noise is removed using a deblocking filter or other filters. When all the blocks are processed, the decoded frame is output (step S210).

＜第２実施形態＞
次に、本発明の第２実施形態による映像符号化装置を説明する。図５は、本発明の第２実施形態による映像符号化装置１００ａの構成を示すブロック図である。この図において、図１に示す装置と同一の部分には同一の符号を付し、その説明を省略する。この図に示す装置が図１に示す装置と異なる点は、予測予測残差更新部１０６を省略し、新にフィルタ決定部１１３を備えている点である。フィルタ決定部１１３は、残差予測に使用するフィルタを決定する。また、予測予測残差生成部１０５は、決定されたフィルタを使用して残差予測を行い予測予測残差を生成する。 Second Embodiment
Next, a video encoding apparatus according to the second embodiment of the present invention will be described. FIG. 5 is a block diagram showing a configuration of a video encoding device 100a according to the second embodiment of the present invention. In this figure, the same parts as those in the apparatus shown in FIG. The apparatus shown in this figure is different from the apparatus shown in FIG. 1 in that the prediction prediction residual update unit 106 is omitted and a filter determination unit 113 is newly provided. The filter determination unit 113 determines a filter to be used for residual prediction. Also, the prediction prediction residual generation unit 105 performs residual prediction using the determined filter and generates a prediction prediction residual.

次に、図６を参照して、図５に示す映像符号化装置１００ａの処理動作を説明する。図６は、図５に示す映像符号化装置１００ａの処理動作を示すフローチャートである。図６において、図２に示す処理と同一の部分には同一の符号を付し、その説明を簡単に行う。 Next, the processing operation of the video encoding device 100a shown in FIG. 5 will be described with reference to FIG. FIG. 6 is a flowchart showing the processing operation of the video encoding device 100a shown in FIG. In FIG. 6, the same parts as those shown in FIG.

まず、符号化対象映像入力部１０１は、符号化対象ピクチャを入力し、入力映像メモリ１０２に記憶する（ステップＳ１０１）。映像入力の後、符号化対象ピクチャを符号化対象ブロックに分割し、ブロック毎に符号化対象ピクチャの映像信号を符号化する（ステップＳ１０２〜Ｓ１１２の繰り返しループ）。以下では、符号化対象となるブロックの画像のことを符号化対象ブロックまたは符号化対象画像と呼ぶ。以下のステップＳ１０３〜Ｓ１１１の処理はピクチャの全てのブロックに対して繰り返し実行する。 First, the encoding target video input unit 101 inputs an encoding target picture and stores it in the input video memory 102 (step S101). After the video input, the encoding target picture is divided into encoding target blocks, and the video signal of the encoding target picture is encoded for each block (repetition loop of steps S102 to S112). Hereinafter, an image of a block to be encoded is referred to as an encoding target block or an encoding target image. The following steps S103 to S111 are repeatedly executed for all blocks of the picture.

符号化対象ブロックごとに繰り返される処理において、まず、一次予測画像生成部１０４は、符号化対象ブロックに対して、参照ピクチャメモリ内の参照ピクチャを参照するインター予測を行い、参照先を示す情報である動き情報を決定し、参照先の画像から一次予測画像を生成する（ステップＳ１０３）。 In the process repeated for each encoding target block, first, the primary prediction image generation unit 104 performs inter prediction that refers to the reference picture in the reference picture memory for the encoding target block, and uses information indicating a reference destination. Certain motion information is determined, and a primary predicted image is generated from the reference destination image (step S103).

次に、フィルタ決定部１１３は、残差予測に使用するフィルタを決定し（ステップＳ１０４ａ）、フィルタを決定したら、予測予測残差生成部１０５は、決定されたフィルタを使用して予測予測残差を生成する（ステップＳ１０５ａ）。残差予測において、参照領域の画像やその予測画像は一般的な画面間予測によって生成され、また一般に画面間予測では小数画素単位での画像取得のため補間フィルタが使用される。前述した第１実施形態についての説明のとおり、通常残差予測によって生成される予測予測残差には様々なノイズやずれが見込まれる。そのようなノイズやずれを補正するあるいは分散するようなフィルタを使用して画面間予測を行うことによって残差予測における参照領域画像やその予測画像を生成し、予測予測残差生成に使用してもよい。またあるいは、通常の画面間予測によって生成した参照領域画像やその予測画像に対してフィルタ処理を行い新な画像とし、予測予測残差生成に使用してもよい。 Next, the filter determination unit 113 determines a filter to be used for residual prediction (step S104a). After determining the filter, the prediction prediction residual generation unit 105 uses the determined filter to predict the prediction prediction residual. Is generated (step S105a). In residual prediction, an image of a reference region and a predicted image thereof are generated by general inter-screen prediction. In general, inter-screen prediction uses an interpolation filter to acquire an image in units of decimal pixels. As described above for the first embodiment, various noises and shifts are expected in the prediction prediction residual generated by the normal residual prediction. A reference region image and its prediction image in residual prediction are generated by performing inter-screen prediction using a filter that corrects or disperses such noise and deviation, and is used for prediction prediction residual generation. Also good. Alternatively, the reference region image generated by the normal inter-screen prediction and the predicted image may be filtered to obtain a new image, which may be used for generating a predicted prediction residual.

またあるいは、一般に画面間予測には６タップから８タップなどの精度の高い補間フィルタが使用されるが、残差予測において参照領域を示す情報やその予測先を示す情報にある程度の誤差が見込まれる場合には補間フィルタの精度を適応的に下げることで演算量の削減を図ってもよい。フィルタ生成の方法はどのような方法でもよい。第１実施例で説明した方法でも良いし、また別の方法でもよい。 Or, generally, inter-screen prediction uses a high-accuracy interpolation filter such as 6 taps to 8 taps, but a certain amount of error is expected in the information indicating the reference region and the information indicating the prediction destination in the residual prediction. In this case, the amount of calculation may be reduced by adaptively lowering the accuracy of the interpolation filter. Any method may be used for generating the filter. The method described in the first embodiment may be used, or another method may be used.

また、予め定められたフィルタの中から適切なものを選択し使用するとしてもよい。上述のいずれの例の場合にも、更新に使用したパラメータやフィルタ係数や、それらを特定可能な情報を付加情報として符号化し映像とともに多重化してもよい。また、ここで決定したフィルタを一次予測画像生成に使用してもよい。 In addition, an appropriate filter may be selected and used from predetermined filters. In any of the above examples, the parameters and filter coefficients used for the update, and information that can identify them may be encoded as additional information and multiplexed with the video. Moreover, you may use the filter determined here for primary prediction image generation.

予測予測残差生成の方法は第１実施形態で説明したとおりであるが、参照領域の画像やその予測画像、あるいは参照領域の符号化時の予測残差などを決定されたフィルタを使用して生成し予測予測残差を生成してもよいし。通常の画面間予測で生成したそれらにフィルタ処理を施し予測予測残差を生成してもよい。 The method of generating the prediction prediction residual is as described in the first embodiment, but using a filter in which the image of the reference region, its prediction image, or the prediction residual at the time of encoding the reference region is determined. Or a prediction prediction residual may be generated. A prediction prediction residual may be generated by performing filter processing on those generated in normal inter-screen prediction.

予測予測残差を更新したら、予測画像生成部１０７は、第一予測画像と予測予測残差とから予測画像を生成する（ステップＳ１０６）。次に、減算部１０８は予測画像と符号化対象ブロックの差分をとり、予測残差を生成する（ステップＳ１０７）。次に、予測残差の生成が終了したら、変換・量子化部１０９は予測残差を変換・量子化し、量子化データを生成する（ステップＳ１０８）。そして、変換・量子化が終了したら、逆量子化・逆変換部１１０は、量子化データを逆量子化・逆変換し復号予測残差を生成する（ステップＳ１０９）。 After updating the prediction prediction residual, the prediction image generation unit 107 generates a prediction image from the first prediction image and the prediction prediction residual (step S106). Next, the subtraction unit 108 takes the difference between the prediction image and the encoding target block, and generates a prediction residual (step S107). Next, when the generation of the prediction residual is completed, the transform / quantization unit 109 converts and quantizes the prediction residual, and generates quantized data (step S108). Then, when the transform / quantization is completed, the inverse quantization / inverse transform unit 110 performs inverse quantization / inverse transform on the quantized data to generate a decoded prediction residual (step S109).

次に、本発明の第２実施形態による映像復号装置について説明する。図７は、本発明の第２実施形態による映像復号装置２００ａの構成を示すブロック図である。この図において、図３に示す装置と同一の部分には同一の符号を付し、その説明を省略する。この図に示す装置が図３に示す装置と異なる点は、予測予測残差更新部２０８を省略し、新にフィルタ決定部２１１を備えている点である。フィルタ決定部２１１は、残差予測に使用するフィルタを決定する。また、予測予測残差生成部２０７は、決定されたフィルタを使用して残差予測を行い予測予測残差を生成する。 Next, a video decoding apparatus according to the second embodiment of the present invention will be described. FIG. 7 is a block diagram showing a configuration of a video decoding apparatus 200a according to the second embodiment of the present invention. In this figure, the same parts as those in the apparatus shown in FIG. The apparatus shown in this figure differs from the apparatus shown in FIG. 3 in that the prediction prediction residual update unit 208 is omitted and a filter determination unit 211 is newly provided. The filter determination unit 211 determines a filter to be used for residual prediction. Also, the prediction prediction residual generation unit 207 performs residual prediction using the determined filter and generates a prediction prediction residual.

次に、図８を参照して、図７に示す映像復号装置２００ａの処理動作を説明する。図８は、図７に示す映像復号装置２００ａの処理動作を示すフローチャートである。図８において、図４に示す処理と同一の部分には同一の符号を付し、その説明を簡単に行う。まず、符号データ入力部２０１は符号データを入力し、符号データメモリ２０２に記憶する（ステップＳ２０１）。なお、復号対象映像中の幾つかのフレームは既に復号されているものとし、その復号結果が参照ピクチャメモリ２０３に記憶されているとする。また、復号対象ピクチャと同じフレームまでの参照可能な別の視点の映像も既に復号され復号されて参照ピクチャメモリ２０３に記憶されていることとする。 Next, the processing operation of the video decoding apparatus 200a shown in FIG. 7 will be described with reference to FIG. FIG. 8 is a flowchart showing the processing operation of the video decoding apparatus 200a shown in FIG. In FIG. 8, the same parts as those shown in FIG. First, the code data input unit 201 inputs code data and stores it in the code data memory 202 (step S201). It is assumed that some frames in the video to be decoded have already been decoded and the decoding results are stored in the reference picture memory 203. Also, it is assumed that the video of another viewpoint that can be referred to up to the same frame as the decoding target picture has already been decoded, decoded, and stored in the reference picture memory 203.

ステップＳ２０５からステップＳ２０８までの処理は、映像符号化装置１００におけるステップＳ１０３からステップＳ１０６までの処理と同様であるので、ここでは簡単に説明する。一次予測画像生成部２０６は、符号化対象ブロックに対して、参照ピクチャメモリ内の参照ピクチャを参照するインター予測を行い、参照先を示す情報である動き情報を決定し、参照先の画像から一次予測画像を生成する（ステップＳ２０５）。 Since the processing from step S205 to step S208 is the same as the processing from step S103 to step S106 in the video encoding device 100, it will be briefly described here. The primary prediction image generation unit 206 performs inter prediction with reference to the reference picture in the reference picture memory for the encoding target block, determines motion information that is information indicating a reference destination, and performs primary prediction from the reference destination image. A predicted image is generated (step S205).

次に、フィルタ決定部２１１は、残差予測に使用するフィルタを決定する（ステップＳ２０６ａ）。フィルタを決定したら、予測予測残差生成部２０７は、決定されたフィルタを使用して予測予測残差を生成する（ステップＳ２０７ａ）。詳細は映像符号化装置１００ａと同様である。 Next, the filter determination unit 211 determines a filter to be used for residual prediction (step S206a). When the filter is determined, the prediction prediction residual generation unit 207 generates a prediction prediction residual using the determined filter (step S207a). The details are the same as those of the video encoding device 100a.

次に、予測予測残差生成部２０７は、符号化対象領域に対する残差予測を行い、予測予測残差を生成する（ステップＳ２０６）。次に、予測予測残差更新部２０８は、生成された予測予測残差を更新し新たな予測予測残差とする（ステップＳ２０７）。予測予測残差を更新したら、予測画像生成部２０９は、一次予測画像と予測予測残差とから予測画像を生成する（ステップＳ２９８）。 Next, the prediction prediction residual generation unit 207 performs residual prediction on the encoding target region, and generates a prediction prediction residual (step S206). Next, the prediction prediction residual update unit 208 updates the generated prediction prediction residual to obtain a new prediction prediction residual (step S207). After updating the prediction prediction residual, the prediction image generation unit 209 generates a prediction image from the primary prediction image and the prediction prediction residual (step S298).

なお、前述した実施形態では生成した予測予測残差を使用して予測画像を更新し新な予測画像とする場合について説明したが、一次予測画像をそのまま予測画像とし予測残差を決定した後に予測予測残差を使用して予測残差を更新して符号化対象予測残差としてもよい。また、前述した実施形態では符号化対象映像が多視点映像のうちの一つの映像である場合を説明したが、他にスケーラブル映像の一つの映像である場合など、互いに相関のある映像を共に符号化し多重化する場合には、同様の方法で残差予測を適用できる映像について、同様の方法で残差予測のノイズを低減させてもよい。 In the above-described embodiment, the case where the prediction image is updated using the generated prediction prediction residual to obtain a new prediction image has been described. However, after the prediction residual is determined using the primary prediction image as it is as the prediction image, the prediction is performed. The prediction residual may be updated using the prediction residual as the encoding target prediction residual. In the above-described embodiment, the case where the video to be encoded is one of the multi-view videos has been described. However, in the case where the video to be encoded is one video of the scalable video, the videos that are correlated with each other are encoded together. In the case of multiplex and multiplexing, noise of residual prediction may be reduced by a similar method for a video to which residual prediction can be applied by a similar method.

また、前述した第１〜第２実施形態における一部の処理は、その順序が前後しても構わない。 The order of some processes in the first and second embodiments described above may be changed.

以上説明したように、残差予測による残差予測値を更新あるいはフィルタリングし、主に残差予測の精度に由来するノイズを防止することで符号化効率または復号映像品質を向上することができる。 As described above, the encoding efficiency or the decoded video quality can be improved by updating or filtering the residual prediction value based on the residual prediction to prevent noise mainly derived from the accuracy of the residual prediction.

前述した実施形態における映像符号化装置及び映像復号装置をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve the video encoding apparatus and video decoding apparatus in embodiment mentioned above with a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

残差予測による残差予測値を更新あるいはフィルタリングし、主に残差予測の精度に由来するノイズを防止することで符号化効率または復号映像品質を向上することが不可欠な用途に適用できる。 The present invention can be applied to applications where it is indispensable to improve encoding efficiency or decoded video quality by updating or filtering residual prediction values based on residual prediction to prevent noise mainly derived from the accuracy of residual prediction.

１００・・・映像符号化装置、１０１・・・符号化対象映像入力部、１０２・・・入力映像メモリ、１０３・・・参照ピクチャメモリ、１０４・・・一次予測画像生成部、１０５・・・予測予測残差生成部、１０６・・・予測予測残差更新部、１０７・・・予測画像生成部、１０８・・・減算部、１０９・・・変換・量子化部、１１０・・・逆量子化・逆変換部、１１１・・・加算部、１１２・・・エントロピー符号化部、１１３・・・フィルタ決定部、２００・・・映像復号装置、２０１・・・符号データ入力部、２０２・・・符号データメモリ、２０３・・・参照ピクチャメモリ、２０４・・・エントロピー復号部、２０５・・・逆量子化・逆変換部、２０６・・・一次予測画像生成部、２０７・・・予測予測残差生成部、２０８・・・予測予測残差更新部、２０９・・・予測画像生成部、２１０・・・加算部、２１１・・・フィルタ決定部 DESCRIPTION OF SYMBOLS 100 ... Video coding apparatus, 101 ... Encoding target video input unit, 102 ... Input video memory, 103 ... Reference picture memory, 104 ... Primary prediction image generation unit, 105 ... Prediction prediction residual generation unit, 106 ... Prediction prediction residual update unit, 107 ... Prediction image generation unit, 108 ... Subtraction unit, 109 ... Conversion / quantization unit, 110 ... Inverse quantum Conversion unit, 111 ... adder, 112 ... entropy encoding unit, 113 ... filter determination unit, 200 ... video decoding device, 201 ... code data input unit, 202 ... Code data memory, 203 ... reference picture memory, 204 ... entropy decoding unit, 205 ... inverse quantization / inverse transformation unit, 206 ... primary prediction image generation unit, 207 ... prediction prediction residual Difference generator, 208... Prediction residual updating unit measurement, 209 ... prediction image generation unit, 210 ... adding unit, 211 ... filter determining unit

Claims

A video encoding method that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
A residual prediction step of generating a prediction prediction residual by the inter-screen prediction from the reference picture;
A prediction prediction residual update step for updating the prediction prediction residual to be a new prediction prediction residual ;
A primary prediction image generating step of generating a primary prediction image by the inter-screen prediction from the reference picture;
A predicted image generation step of generating the predicted image from the primary predicted image and the predicted prediction residual;
A prediction residual generating step for generating an encoding target prediction residual from the prediction prediction residual and the prediction residual;
Have
In the prediction prediction residual update step, the prediction prediction residual is updated by applying a filter to the prediction prediction residual .

A filter determining step for determining the filter;
The video encoding method according to claim 1, wherein in the filter determination step, the filter is determined based on information of the prediction prediction residual or primary prediction image.

An encoding method that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
A filter determination step for determining a filter;
A residual prediction step for generating a prediction prediction residual by inter-screen prediction using the filter from the reference picture;
Have
In the filter determining step, the filter is determined based on information of a primary predicted image.

A video decoding method that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
A residual prediction step of generating a prediction prediction residual by the inter-screen prediction from the reference picture;
A prediction prediction residual update step for updating the prediction prediction residual to be a new prediction prediction residual ;
A primary prediction image generating step of generating a primary prediction image by the inter-screen prediction from the reference picture;
A predicted image generation step of generating the predicted image from the primary predicted image and the predicted prediction residual;
A prediction residual generating step for generating an encoding target prediction residual from the prediction prediction residual and the prediction residual;
Have
In the predictive prediction residual update step, the predictive image is updated by applying a filter to the predictive predictive residual .

A filter determining step for determining the filter;
5. The video decoding method according to claim 4, wherein, in the filter determination step, the filter is determined based on information of the prediction prediction residual or primary prediction image.

A video encoding device that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
Residual prediction means for generating a prediction prediction residual by inter-screen prediction from the reference picture;
A prediction prediction residual update unit that updates the prediction prediction residual to obtain a new prediction prediction residual ;
Primary predicted image generation means for generating a primary predicted image from the reference picture by the inter-screen prediction;
Predicted image generation means for generating the predicted image from the primary predicted image and the predicted prediction residual;
A prediction residual generating means for generating an encoding target prediction residual from the prediction prediction residual and the prediction residual;
With
The predictive prediction residual update unit updates the predictive predictive residual by applying a filter to the predictive predictive residual .

A video decoding device that further performs residual prediction on an image generated by inter-screen prediction from a reference picture when predicting a prediction target region and generating a predicted image,
Residual prediction means for generating a prediction prediction residual from the reference picture by the inter-screen prediction;
A prediction prediction residual update unit that updates the prediction prediction residual to obtain a new prediction prediction residual ;
Primary predicted image generation means for generating a primary predicted image from the reference picture by the inter-screen prediction;
Predicted image generation means for generating the predicted image from the primary predicted image and the predicted prediction residual;
A prediction residual generating means for generating an encoding target prediction residual from the prediction prediction residual and the prediction residual;
With
The prediction decoding residual update unit applies a filter to the prediction prediction residual to update the prediction image .

A video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 3 .

A video decoding program for causing a computer to execute the video decoding method according to any one of claims 4 to 6 .

A computer-readable recording medium on which the video encoding program according to claim 9 is recorded.

The computer-readable recording medium which recorded the video decoding program of Claim 10 .