JP2007096541A

JP2007096541A - Coding method

Info

Publication number: JP2007096541A
Application number: JP2005280882A
Authority: JP
Inventors: Mitsuru Suzuki; 満鈴木; Shinichiro Okada; 伸一郎岡田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2005-09-27
Filing date: 2005-09-27
Publication date: 2007-04-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a moving image coding technique of reducing the coding amount due to motion vector information. <P>SOLUTION: In the coding method of obtaining multiple layers with different frame rates by performing motion compensation temporal filtering for a moving image in a recursive manner, the motion vector precision used for motion compensation prediction in each layer by each layer can be varied. The layer and the motion vector precision can be associated with each other in advance or the layer and the motion vector precision can be associated with each other by the prescribed number of pictures. Correlation information thus established is included in the coded data of the moving image. As the frame rate of the layer decreases, it is preferable to reduce the motion vector precision. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、動画像を符号化する符号化方法に関する。 The present invention relates to an encoding method for encoding a moving image.

ブロードバンドネットワークが急速に発展しており、高品質な動画像を利用したサービスに期待が集まっている。また、ＤＶＤなど大容量の記録媒体が利用されており、高画質の画像を楽しむユーザ層が広がっている。動画像を通信回線で伝送したり、記録媒体に蓄積するために不可欠な技術として圧縮符号化がある。動画像圧縮符号化技術の国際標準として、ＭＰＥＧ４の規格やＨ．２６４／ＡＶＣ規格がある。また、ひとつのストリームにおいて高画質のストリームと低画質のストリームを併せもつＳＶＣ（Scalable Video Coding）のような次世代画像圧縮技術がある。 Broadband networks are rapidly developing, and there are high expectations for services that use high-quality moving images. In addition, a large-capacity recording medium such as a DVD is used, and a user group who enjoys high-quality images is expanding. There is compression coding as an indispensable technique for transmitting moving images via a communication line or storing them in a recording medium. As an international standard for moving image compression coding technology, the MPEG4 standard and H.264 standard. There is a H.264 / AVC standard. In addition, there is a next-generation image compression technique such as SVC (Scalable Video Coding) in which one stream includes a high-quality stream and a low-quality stream.

Ｈ．２６４／ＡＶＣ規格では、動き補償においてよりきめ細かな予測を行うために、動き補償のブロックサイズを可変にしたり、動き補償の画素精度を１／４画素精度まで細かくすることができるようになっており、動きベクトルに関する符号量が多くなる。また、次世代画像圧縮技術であるＳＶＣでは、時間的スケーラビリティを高めるために、ＭＣＴＦ（Motion Compensated Temporal Filtering、動き補償時間方向フィルタ）技術が検討されている。これは、時間軸方向のサブバンド分割に動き補償を組み合わせたものであり、階層的な動き補償を行うため、動きベクトルの情報が非常に多くなる。このように最近の動画圧縮符号化技術では、動きベクトルに関する情報量が増えることにより動画ストリーム全体のデータ量が増大する傾向にあり、動きベクトル情報に起因する符号量を削減する技術が一層求められている。 H. In the H.264 / AVC standard, in order to perform more detailed prediction in motion compensation, the block size of motion compensation can be made variable, and the pixel accuracy of motion compensation can be reduced to ¼ pixel accuracy. The amount of code related to the motion vector increases. In SVC, which is a next-generation image compression technology, MCTF (Motion Compensated Temporal Filtering) technology is being studied in order to improve temporal scalability. This is a combination of subband division in the time axis direction and motion compensation. Since hierarchical motion compensation is performed, information on motion vectors becomes very large. As described above, the recent video compression coding technology tends to increase the data amount of the entire video stream due to an increase in the amount of information related to motion vectors, and there is a further demand for a technology for reducing the amount of codes resulting from motion vector information. ing.

特許文献１には、動きベクトルの符号化精度をブロック毎に切り換える符号化方法が開示されている。これによって、低レート符号化における動きベクトルの符号量が削減できる。
特開２００４−４８５２２号公報 Patent Document 1 discloses an encoding method for switching the encoding accuracy of a motion vector for each block. As a result, the amount of motion vector coding in low-rate coding can be reduced.
JP 2004-48522 A

高周波成分が多く、かつ参照フレームとの相関が強いフレームにおいては、動きベクトルの精度を高くして高精度の動き補償を実施することによって予測誤差を小さくできるが、フレーム内を移動する物体の動きが速いために参照フレームとの相関が弱いフレームや、高周波成分の少ないフレームにおいては、動き補償の精度を高くしても予測誤差の低減は貢献することはなく、高精度の動きベクトルの情報が無駄になってしまう。 In a frame with many high-frequency components and strong correlation with the reference frame, the prediction error can be reduced by increasing the accuracy of the motion vector and performing high-precision motion compensation, but the motion of the object moving in the frame For frames that are weakly correlated with the reference frame due to high speed and frames that have few high-frequency components, reducing the prediction error does not contribute even if the accuracy of motion compensation is increased. It will be useless.

本発明はこうした状況に鑑みてなされたもので、その目的は、動きベクトル情報に起因する符号量を削減することのできる動画像の符号化技術を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a moving image encoding technique capable of reducing the amount of code caused by motion vector information.

上記課題を解決するために、本発明のある態様は、動画像からスケーラビリティを有する複数の階層をもつ符号化データを生成する符号化方法において、階層毎に、その階層内での動き補償予測に用いられる動きベクトルの精度が変更可能であることを特徴とする。 In order to solve the above-described problem, an aspect of the present invention provides an encoding method for generating encoded data having a plurality of layers having scalability from a moving image, for each layer, for motion compensation prediction within the layer. It is characterized in that the accuracy of the motion vector used can be changed.

この態様によると、それぞれの階層に最適な動きベクトル精度を用いることで、予測誤差の低減につながらない無駄な動きベクトルの符号量を抑えることにより、動画像の圧縮効率が向上する。スケーラビリティのタイプとしては、時間スケーラビリティと空間スケーラビリティが考えられる。 According to this aspect, by using the optimal motion vector accuracy for each layer, it is possible to improve the compression efficiency of moving images by suppressing the amount of code of useless motion vectors that does not lead to a reduction in prediction errors. Scalability types include temporal scalability and spatial scalability.

動画像に対して再帰的に動き補償時間フィルタリングを実施することによって、フレームレートの異なる複数の階層を求めてもよい。また、ＭＣＴＦ技術にしたがって、動画像に動き補償時間フィルタリングを実施してフレームレートの異なる複数の階層を求める符号化方法に対しても、上述の方法を適用できる。これによると、階層毎に動きベクトル情報が求められるＭＣＴＦにおいて、動きベクトル情報の符号量を削減できるので、動画像の圧縮効率が向上する。 A plurality of layers having different frame rates may be obtained by recursively performing motion compensation time filtering on a moving image. Further, the above method can be applied to an encoding method for obtaining a plurality of layers having different frame rates by performing motion compensation time filtering on a moving image according to the MCTF technique. According to this, since the code amount of motion vector information can be reduced in MCTF in which motion vector information is obtained for each layer, the compression efficiency of moving images is improved.

階層と動きベクトルの精度とを予め対応付けておき、その対応情報を動画像の符号化データ内に含めるようにしてもよい。これによって、符号化データストリーム毎に、各階層での動き補償予測に使用される動きベクトルの精度を定めることができる。 The hierarchy and the accuracy of the motion vector may be associated in advance, and the correspondence information may be included in the encoded data of the moving image. Thereby, the accuracy of the motion vector used for motion compensation prediction in each layer can be determined for each encoded data stream.

一定枚数のピクチャ毎に階層と動きベクトルの精度とが対応付けされており、その対応情報を動画像の符号化データ内に含めるようにしてもよい。これによって、ＧＯＰなどの一定単位のピクチャ毎に、各階層での動き補償予測に使用される動きベクトルの精度を定めることができる。
なお、「ピクチャ」は、フレーム、フィールド、ＶＯＰ（Video Object Plane）などを含む符号化の単位である。 The hierarchy and the accuracy of the motion vector are associated with each fixed number of pictures, and the correspondence information may be included in the encoded data of the moving image. This makes it possible to determine the accuracy of the motion vector used for motion compensation prediction in each layer for each fixed unit picture such as GOP.
A “picture” is a unit of encoding including a frame, a field, a VOP (Video Object Plane), and the like.

階層と動きベクトルの精度とが予め対応付けされており、この対応付けにしたがって各階層における動きベクトルの精度を決定してもよい。これによって、階層と動きベクトル精度との対応情報を符号化データに含める必要がなくなる。 The hierarchy and the accuracy of the motion vector are associated in advance, and the accuracy of the motion vector in each hierarchy may be determined according to this association. This eliminates the need to include correspondence information between the hierarchy and the motion vector accuracy in the encoded data.

階層を経る毎に動きベクトルの精度を段階的に変化させるようにしてもよい。また、階層のフレームレートが低下するにしたがって動きベクトルの精度を粗くしてもよい。一般に、フレームレートが低下してフレーム間の相関が低下すれば、動きベクトルの精度を粗くしても予測誤差に与える影響は小さいと考えられるので、これによって動きベクトル情報の符号量を削減でき、動画像の圧縮効率が向上する。 You may make it change the precision of a motion vector in steps every time it passes through a hierarchy. Further, the accuracy of the motion vector may be coarsened as the frame rate of the hierarchy is lowered. In general, if the frame rate decreases and the correlation between frames decreases, even if the accuracy of the motion vector is reduced, it is considered that the influence on the prediction error is small, so this can reduce the code amount of the motion vector information, The compression efficiency of moving images is improved.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、動画像の符号化において、動きベクトル情報に起因する符号量を削減することができる。 ADVANTAGE OF THE INVENTION According to this invention, the encoding amount resulting from motion vector information can be reduced in the encoding of a moving image.

図１は、実施の形態に係る符号化装置１００の構成図である。これらの構成は、ハードウエア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウエア的にはメモリにロードされた画像符号化機能のあるプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウエアのみ、ソフトウエアのみ、またはそれらの組み合わせによっていろいろな形で実現できることは、当業者には理解されるところである。 FIG. 1 is a configuration diagram of an encoding apparatus 100 according to an embodiment. These configurations can be realized in hardware by a CPU, memory, or other LSI of an arbitrary computer, and in software, it is realized by a program having an image encoding function loaded in the memory. Here, functional blocks realized by the cooperation are depicted. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.

本実施の形態の符号化装置１００は、国際標準化機関であるＩＳＯ（International Organization for Standardization）／ＩＥＣ（International Electrotechnical Commission）、および電気通信に関する国際標準機関であるＩＴＵ−Ｔ（International Telecommunication Union-Telecommunication Standardization Sector）によって合同で標準化された最新の動画像圧縮符号化標準規格であるＨ．２６４／ＡＶＣ（両機関における正式勧告名はそれぞれMPEG-4 Part 10: Advanced Video CodingとH.264）に準拠して動画像の符号化を行う。 The encoding apparatus 100 according to the present embodiment includes an ISO (International Organization for Standardization) / IEC (International Electrotechnical Commission) which is an international standardization organization, and an ITU-T (International Telecommunication Union-Telecommunication Standardization) which is an international standard organization related to telecommunications. H., the latest video compression coding standard jointly standardized by Sector). H.264 / AVC (official recommendation names in both organizations are MPEG-4 Part 10: Advanced Video Coding and H.264 respectively).

符号化装置１００の画像取得部１０は、入力画像のＧＯＰ（Group of Pictures）を受け取り、各フレームを画像保持部６０の専用の領域に格納する。画像取得部１０は、必要に応じて各フレームをマクロブロックに分割してもよい。 The image acquisition unit 10 of the encoding device 100 receives a GOP (Group of Pictures) of the input image and stores each frame in a dedicated area of the image holding unit 60. The image acquisition unit 10 may divide each frame into macro blocks as necessary.

ＭＣＴＦ処理部２０は、ＭＣＴＦ技術にしたがった動き補償時間フィルタリングを実施する。ＭＣＴＦ処理部２０は、画像保持部６０に格納されているフレームから動きベクトルを求め、動きベクトルを用いて時間フィルタリングを実施する。時間フィルタリングは、ハール（Haar）ウェーブレット変換を用いて実施され、この結果、各階層に高域フレームＨと低域フレームＬとを含むフレームレートの異なる複数の階層に分解される。分解された高域フレームと低域フレームは、階層毎に画像保持部６０の専用の領域に格納され、動きベクトルも階層毎に動きベクトル保持部７０の専用の領域に格納される。ＭＣＴＦ処理部２０の詳細は後述する。 The MCTF processing unit 20 performs motion compensation time filtering according to the MCTF technique. The MCTF processing unit 20 obtains a motion vector from the frame stored in the image holding unit 60, and performs temporal filtering using the motion vector. Temporal filtering is performed using a Haar wavelet transform, and as a result, the temporal filtering is decomposed into a plurality of layers having different frame rates including a high frequency frame H and a low frequency frame L in each layer. The decomposed high-frequency frame and low-frequency frame are stored in a dedicated region of the image holding unit 60 for each layer, and motion vectors are also stored in a dedicated region of the motion vector holding unit 70 for each layer. Details of the MCTF processing unit 20 will be described later.

ＭＣＴＦ処理部２０における処理が終了すると、画像保持部６０内のすべての階層の高域フレームと最終的な階層の低域フレームは、画像符号化部８０に送られる。また、動きベクトル保持部７０内のすべての階層の動きベクトルは、動きベクトル符号化部９０に送られる。 When the processing in the MCTF processing unit 20 is completed, the high frequency frames of all layers and the low frequency frame of the final layer in the image holding unit 60 are sent to the image encoding unit 80. Also, the motion vectors of all layers in the motion vector holding unit 70 are sent to the motion vector encoding unit 90.

画像符号化部８０は、画像保持部６０から供給されたフレームに対してウェーブレット変換を用いた空間フィルタリングを施した後、符号化を実行する。符号化されたフレームは多重化部９２に送られる。動きベクトル符号化部９０は、動きベクトル保持部７０から供給された動きベクトルに対して符号化を実行し、多重化部９２に与える。符号化の方法は既知であるため、詳細な説明は省略する。 The image encoding unit 80 performs encoding after performing spatial filtering using wavelet transform on the frame supplied from the image holding unit 60. The encoded frame is sent to the multiplexing unit 92. The motion vector encoding unit 90 performs encoding on the motion vector supplied from the motion vector holding unit 70 and supplies the encoded motion vector to the multiplexing unit 92. Since the encoding method is known, detailed description is omitted.

多重化部９２は、画像符号化部８０から与えられた符号化後のフレーム情報と、動きベクトル符号化部９０から与えられた符号化後の動きベクトル情報とを多重化し、符号化ストリームを生成する。 The multiplexing unit 92 multiplexes the encoded frame information given from the image coding unit 80 and the coded motion vector information given from the motion vector coding unit 90 to generate a coded stream. To do.

続いて、図２および図３を参照して、ＭＣＴＦ技術にしたがった時間フィルタリング処理を説明する。
ＭＣＴＦ処理部２０は、ひとつのＧＯＰ内で連続する二枚のフレームを順次取得して、高域フレームと低域フレームを生成する。二枚のフレームを、時間順に「フレームＡ」、「フレームＢ」と呼ぶことにする。 Subsequently, a time filtering process according to the MCTF technique will be described with reference to FIGS. 2 and 3.
The MCTF processing unit 20 sequentially acquires two consecutive frames in one GOP and generates a high frequency frame and a low frequency frame. The two frames will be referred to as “frame A” and “frame B” in time order.

ＭＣＴＦ処理部２０は、フレームＡおよびフレームＢから動きベクトルＭＶを検出する。図２および図３では、説明を簡単にするためにフレーム単位で動きベクトルを検出しているが、マクロブロック単位で動きベクトルを検出してもよいし、ブロック（８×８画素または４×４画素）単位で動きベクトルを検出してもよい。
次に、フレームＡを動きベクトルＭＶで動き補償した画像（以下、「フレームＡ’」と表記する）を生成する。
低域フレームＬは、図２に示すように、フレームＡ’とフレームＢの平均値として定義される。
Ｌ＝１／２・（Ａ’＋Ｂ）（１） The MCTF processing unit 20 detects a motion vector MV from the frame A and the frame B. In FIG. 2 and FIG. 3, motion vectors are detected in units of frames for the sake of simplicity. However, motion vectors may be detected in units of macroblocks, or blocks (8 × 8 pixels or 4 × 4). The motion vector may be detected in units of pixels.
Next, an image (hereinafter referred to as “frame A ′”) in which frame A is motion-compensated with a motion vector MV is generated.
The low frequency frame L is defined as an average value of the frames A ′ and B as shown in FIG.
L = 1/2 · (A ′ + B) (1)

次に、フレームＢを動きベクトルＭＶの反転値−ＭＶで動き補償した画像（以下、「フレームＢ’」と表記する）を生成する。
高域フレームＨは、図３に示すように、フレームＡとフレームＢ’の差分として定義される。
Ｈ＝Ａ−Ｂ’ （２） Next, an image (hereinafter referred to as “frame B ′”) in which the frame B is motion-compensated with the inversion value −MV of the motion vector MV is generated.
The high frequency frame H is defined as a difference between the frame A and the frame B ′ as shown in FIG.
H = A−B ′ (2)

式（２）を変形する。
Ａ＝Ｂ’＋Ｈ（３）
右辺、左辺とも動きベクトルＭＶだけ動き補償したとすると、次式が成り立つ。なお、「Ｈ’」は、高域フレームＨを動きベクトルＭＶで動き補償した画像を表す。
Ａ’＝Ｂ＋Ｈ’ （４）
式（２）に式（４）を代入すると、次式のようになる。
Ｌ＝１／２・（Ａ’＋Ｂ）
＝１／２・（Ｂ＋Ｈ’＋Ｂ）
＝Ｂ＋１／２・Ｈ’ （５）
つまり、低域フレームＬは、フレームＢの各画素値と、高域フレームＨ’の各画素値を１／２にしたものとを足し合わせることで生成することができる。 Equation (2) is transformed.
A = B '+ H (3)
If motion compensation is performed by the motion vector MV on both the right side and the left side, the following equation is established. “H ′” represents an image obtained by motion compensation of the high frequency frame H with the motion vector MV.
A '= B + H' (4)
Substituting equation (4) into equation (2) gives the following equation.
L = 1/2 · (A '+ B)
= 1/2 ・ (B + H '+ B)
= B + 1/2 · H '(5)
That is, the low-frequency frame L can be generated by adding the pixel values of the frame B and the pixel values of the high-frequency frame H ′ that are halved.

生成された低域フレームＬを新たにフレームＡ、フレームＢとして上述と同様の操作を繰り返すことで、次の階層の高域フレーム、低域フレーム、および動きベクトルが生成される。この操作は、生成される低域フレームがひとつになるまで再帰的に繰り返される。したがって、得られる階層の数は、ＧＯＰに含まれるフレーム数によって決まる。例えば、ＧＯＰに８フレームが含まれる場合は、一回目の操作で４つの高域フレームと４つの低域フレームが生成され（階層２）、二回目の操作で２つの高域フレームと２つの低域フレームが生成され（階層１）、三回目の操作でひとつの高域フレームとひとつの高域フレームが生成される（階層０）。 By repeating the same operation as described above with the generated low-frequency frame L as a new frame A and frame B, a high-frequency frame, a low-frequency frame, and a motion vector of the next hierarchy are generated. This operation is recursively repeated until one low frequency frame is generated. Therefore, the number of layers obtained is determined by the number of frames included in the GOP. For example, if the GOP contains 8 frames, the first operation generates four high frequency frames and four low frequency frames (layer 2), and the second operation generates two high frequency frames and two low frequency frames. A region frame is generated (layer 1), and one high-frequency frame and one high region frame are generated by the third operation (layer 0).

図４は、ＭＣＴＦ処理部２０の構成を示す。動きベクトル検出部２１には、画像保持部６０に格納されているフレームＡ、フレームＢが入力される。上述したように、階層２ではフレームＡ、フレームＢはＧＯＰを構成するフレームであるが、階層１以降では、直前の階層で生成された低域フレームＬがフレームＡ、フレームＢになることに注意する。 FIG. 4 shows the configuration of the MCTF processing unit 20. The frame A and the frame B stored in the image holding unit 60 are input to the motion vector detection unit 21. As described above, frame A and frame B are frames constituting the GOP in layer 2, but note that in layer 1 and later, the low-frequency frame L generated in the immediately preceding layer becomes frame A and frame B. To do.

動きベクトル精度決定部２８は、動き補償予測に使用する動きベクトルの精度、すなわち最小画素を決定し、動きベクトル検出部２１に与える。後述するように、本実施の形態においては、動きベクトルの精度は階層毎に決めることができる。したがって、動きベクトル精度決定部２８は、その時点でいずれの階層のフレームの動き補償を実行しているかを判別して、動きベクトルの精度を決定する。 The motion vector accuracy determination unit 28 determines the accuracy of the motion vector used for motion compensation prediction, that is, the minimum pixel, and provides the motion vector detection unit 21 with the accuracy. As will be described later, in the present embodiment, the accuracy of the motion vector can be determined for each layer. Accordingly, the motion vector accuracy determination unit 28 determines which layer of frame motion compensation is being performed at that time, and determines the accuracy of the motion vector.

動きベクトル検出部２１は、フレームＢ内の各マクロブロックについて、誤差の最も小さい予測領域をフレームＡから探索し、マクロブロックから予測領域へのずれを示す動きベクトルＭＶを求める。このとき、動きベクトル精度決定部２８から与えられた精度で動きベクトルＭＶを求める。動きベクトルＭＶは、動きベクトル保持部７０に格納されるとともに、動き補償部２２、２４に供給される。 The motion vector detection unit 21 searches the frame A for a prediction area with the smallest error for each macroblock in the frame B, and obtains a motion vector MV indicating a deviation from the macroblock to the prediction area. At this time, the motion vector MV is obtained with the accuracy given from the motion vector accuracy determination unit 28. The motion vector MV is stored in the motion vector holding unit 70 and supplied to the motion compensation units 22 and 24.

動き補償部２２は、フレームＢに対して、動きベクトル検出部２１から出力された動きベクトルＭＶを反転させた（−ＭＶ）を用いてマクロブロック毎に動き補償を行い、フレームＢ’を生成する。 The motion compensation unit 22 performs motion compensation for each macroblock by using (−MV) obtained by inverting the motion vector MV output from the motion vector detection unit 21 with respect to the frame B, and generates a frame B ′. .

画像合成部２３は、フレームＡと、動き補償部２２から出力されるフレームＢ’の各画素を加算して、高域フレームＨを生成する。高域フレームＨは、画像保持部６０に格納されるとともに、動き補償部２４に供給される。動き補償部２４は、高域フレームＨについて動きベクトルＭＶを用いてマクロブロック毎に動き補償を行い、フレームＨ’を求める。求められたフレームＨ’は、処理ブロック２５によって１／２が乗じられ、画像合成部２６に供給される。 The image composition unit 23 adds the pixels of the frame A and the frame B ′ output from the motion compensation unit 22 to generate a high frequency frame H. The high frequency frame H is stored in the image holding unit 60 and supplied to the motion compensation unit 24. The motion compensation unit 24 performs motion compensation for each macroblock using the motion vector MV for the high frequency frame H, and obtains a frame H ′. The obtained frame H ′ is multiplied by ½ by the processing block 25 and supplied to the image composition unit 26.

画像合成部２６は、フレームＢとフレームＨ’の各画素を加算して低域フレームＬを生成する。生成された低域フレームＬは、画像保持部６０に格納される。 The image composition unit 26 adds the pixels of the frame B and the frame H ′ to generate the low-frequency frame L. The generated low frequency frame L is stored in the image holding unit 60.

図５は、ＧＯＰが８フレームで構成される場合に、各階層で出力される画像と動きベクトルを示す図である。図６は、ＭＣＴＦ技術にしたがった符号化方法を示すフローチャートである。図５と図６をともに参照して、具体例を説明する。 FIG. 5 is a diagram illustrating an image and a motion vector output in each layer when the GOP is configured with 8 frames. FIG. 6 is a flowchart showing an encoding method according to the MCTF technique. A specific example will be described with reference to FIGS.

以下では、階層ｎの高域フレームをＨ_ｎ、低域フレームをＬ_ｎ、動きベクトルをＭＶ_ｎと表記する。図５の例では、ＧＯＰ内のフレーム１０１〜１０８のうち、フレーム１０１、１０３、１０５、１０７がフレームＡになり、フレーム１０２、１０４、１０６、１０８がフレームＢになる。 Hereinafter, the high-frequency frame of layer n is expressed as H _n , the low-frequency frame is _expressed as L _n , and the motion vector is expressed as MV _n . In the example of FIG. 5, among the frames 101 to 108 in the GOP, the frames 101, 103, 105, and 107 become the frame A, and the frames 102, 104, 106, and 108 become the frame B.

まず、画像取得部１０がフレームＡ、フレームＢを受け取り、画像保持部６０に格納する（Ｓ１０）。このとき、画像取得部１０はフレームをマクロブロックに分割してもよい。続いて、ＭＣＴＦ処理部２０は、フレームＡおよびフレームＢを画像保持部６０から読み出し、一回目の時間フィルタリング処理を実行する（Ｓ１２）。生成された高域フレームＨ_２および低域フレームＬ_２は画像保持部６０に格納され、動きベクトルＭＶ_２は動きベクトル保持部７０に格納される（Ｓ１４）。フレーム１０１〜１０８の処理が終了すると、ＭＣＴＦ処理部２０は、画像保持部６０から低域フレームＬ_２を読み出し、二回目の時間フィルタリング処理を実行する（Ｓ１６）。生成された高域フレームＨ_１および低域フレームＬ_１は画像保持部６０に格納され、動きベクトルＭＶ_１は動きベクトル保持部７０に格納される（Ｓ１８）。続いて、ＭＣＴＦ処理部２０は、画像保持部６０から二枚の低域フレームＬ_１を読み出し、三回目の時間フィルタリング処理を実行する（Ｓ２０）。生成された高域フレームＨ_０および低域フレームＬ_０は画像保持部６０に格納され、動きベクトルＭＶ_０は動きベクトル保持部７０に格納される（Ｓ２２）。 First, the image acquisition unit 10 receives the frames A and B and stores them in the image holding unit 60 (S10). At this time, the image acquisition unit 10 may divide the frame into macro blocks. Subsequently, the MCTF processing unit 20 reads out the frame A and the frame B from the image holding unit 60, and executes the first time filtering process (S12). The generated high frequency frame H ₂ and low frequency frame L ₂ are stored in the image holding unit 60, and the motion vector MV ₂ is stored in the motion vector holding unit 70 (S14). When the processing of the frame 101-108 is finished, MCTF processor 20, the image storing unit 60 reads out the low band frames _{L 2,} executes the second time temporal filtering process (S16). The generated high frequency frame H ₁ and low frequency frame L ₁ are stored in the image holding unit 60, and the motion vector MV ₁ is stored in the motion vector holding unit 70 (S18). Subsequently, MCTF processor 20 reads from the image holding unit 60 the two low-pass frames _{L 1,} executes the third time temporal filtering process (S20). The generated high frequency frame H ₀ and low frequency frame L ₀ are stored in the image holding unit 60, and the motion vector MV ₀ is stored in the motion vector holding unit 70 (S22).

高域フレームＨ_０〜Ｈ_２、および低域フレームＬ_０は画像符号化部８０で符号化され（Ｓ２４）、動きベクトルＭＶ_０〜ＭＶ_２は動きベクトル符号化部９０で符号化される（Ｓ２６）。符号化されたフレームと動きベクトルは、多重化部９２で多重化されて、符号化ストリームとして出力される（Ｓ２８）。 The high frequency frames H _{0 to} H ₂ and the low frequency frame L ₀ are encoded by the image encoding unit 80 (S24), and the motion vectors MV _{0 to} MV ₂ are encoded by the motion vector encoding unit 90 (S26). ). The encoded frame and motion vector are multiplexed by the multiplexing unit 92 and output as an encoded stream (S28).

高域フレームＨはフレーム間の差分であるから、符号化時のデータ量は低下する。また、図５をみれば分かるように、一回の時間フィルタリング処理を経る毎に低域フレームＬの数は１／２に減少するが、低域フレームＬは上位階層のフレーム間の平均値であるから、画質および解像度は低下していないフレーム列が得られる。ひとつの具体例として、元の動画像が６０ｆｐｓであるとすると、階層２では３０ｆｐｓ、階層１では１５ｆｐｓ、階層０では７．５ｆｐｓのように、階層を経る毎にフレームレートは低下する。したがって、フレームレートの異なる動画像をひとつのビットストリームで送信することができる。 Since the high frequency frame H is a difference between frames, the amount of data at the time of encoding decreases. Further, as can be seen from FIG. 5, the number of low-frequency frames L decreases to ½ each time a time filtering process is performed, but the low-frequency frames L are average values between frames of higher layers. Therefore, a frame sequence in which the image quality and the resolution are not deteriorated can be obtained. As one specific example, if the original moving image is 60 fps, the frame rate decreases every time the layer passes, such as 30 fps in layer 2, 15 fps in layer 1, and 7.5 fps in layer 0. Therefore, moving images having different frame rates can be transmitted in one bit stream.

符号化ストリームを受け取った復号装置は、下位の階層から順に復号処理を実行する。下位階層のみを復号すれば低フレームレートの動画像が得られ、上位の階層まで復号するほど、フレームレートが増加した動画像が得られる。このように、ＭＣＴＦ技術にしたがった時間フィルタリングによって、時間的スケーラビリティを実現することができる。 The decoding device that has received the encoded stream executes decoding processing in order from the lower layer. If only the lower layer is decoded, a moving image with a low frame rate is obtained, and a moving image with an increased frame rate is obtained as the upper layer is decoded. Thus, temporal scalability can be achieved by temporal filtering according to MCTF technology.

本実施の形態では、動きベクトル精度決定部２８によって、階層毎にその階層内での動き補償予測に用いられる動きベクトルの精度を変更できるようにしている。階層と動きベクトル精度との対応付けは、符号化の規格として予め定められていてもよいし、または任意に設定できるようにしてもよい。例えば、階層毎に動きベクトル精度を設定した場合には、図７に示すように、符号化ストリーム内の各階層のヘッダにそれぞれの動きベクトルの精度データを格納しておく。階層と動きベクトル精度との対応付けが符号化の規格として定められている場合には、対応付けの情報を符号化ストリームに含める必要はない。
符号化ストリーム毎に、階層と動きベクトル精度との対応付けを決定するようにしてもよい。この場合は、符号化ストリーム全体のヘッダにその対応付けの情報を格納しておく。さらに、ＧＯＰなどの一定枚数のピクチャ毎に、階層と動きベクトル精度との対応付けを決定するようにしてもよい。この場合は、ＧＯＰのヘッダなどにその対応付けの情報を格納しておく。 In the present embodiment, the motion vector accuracy determination unit 28 can change the accuracy of the motion vector used for motion compensation prediction in each layer for each layer. The association between the hierarchy and the motion vector accuracy may be determined in advance as an encoding standard, or may be arbitrarily set. For example, when the motion vector accuracy is set for each layer, as shown in FIG. 7, the accuracy data of each motion vector is stored in the header of each layer in the encoded stream. When the association between the layer and the motion vector accuracy is determined as an encoding standard, it is not necessary to include the association information in the encoded stream.
You may make it determine matching with a hierarchy and a motion vector precision for every encoding stream. In this case, the association information is stored in the header of the entire encoded stream. Further, the association between the hierarchy and the motion vector accuracy may be determined for each predetermined number of pictures such as GOP. In this case, the association information is stored in the GOP header or the like.

図８は、階層のフレームレートと動きベクトル精度との対応付けの一例を示す。この例では、階層のフレームレートが３０〜６０ｆｐｓの場合は１／４画素精度、１５〜３０ｆｐｓの場合は１／２画素精度、１５ｆｐｓ以下の場合は１画素精度としている。上述の動きベクトル精度決定部２８は、動き補償の処理の対象となっている階層のフレームレートに応じて、図８のテーブルにしたがった動きベクトル精度を動きベクトル検出部２１に提供する。動きベクトル精度決定部２８は、図８のような予め決められたテーブルにしたがって各階層の動きベクトル精度を決定する代わりに、差分画像の符号量が最小となるように各階層の動きベクトル精度を決定してもよい。または、動きベクトル精度決定部２８は、符号化を実行する際に、外部から各階層の動きベクトル精度の指令を受け取るようにしてもよい。 FIG. 8 shows an example of the correspondence between the frame rate of the hierarchy and the motion vector accuracy. In this example, when the frame rate of the hierarchy is 30 to 60 fps, 1/4 pixel accuracy is used, when it is 15 to 30 fps, 1/2 pixel accuracy is used, and when it is 15 fps or less, 1 pixel accuracy is set. The motion vector accuracy determination unit 28 described above provides the motion vector detection unit 21 with the motion vector accuracy according to the table of FIG. 8 according to the frame rate of the layer that is the target of the motion compensation process. The motion vector accuracy determination unit 28 determines the motion vector accuracy of each layer so that the code amount of the difference image is minimized, instead of determining the motion vector accuracy of each layer according to a predetermined table as shown in FIG. You may decide. Alternatively, the motion vector accuracy determination unit 28 may receive a motion vector accuracy command for each layer from the outside when executing the encoding.

図８に示すように、階層のフレームレートが低下するほど、動きベクトルの精度を粗くすることが好ましい。これは、以下の理由による。すなわち、一般に、フレームレートが低くなると、すなわちフレーム間の時間的距離が長くなると、フレーム間の相関は低下するので、動きベクトルの精度を上げて探索を行っても、差分画像の差分値が小さくなるとはいえない。逆にいえば、精度良く探索することによって差分画像の差分値をより減らすことができれば、動きベクトルの符号化に要するビット数が増大しても、結果的に符号量を減らせることになる。したがって、階層のフレームレートが低くなるほど、動きベクトルの精度を落として動きベクトルの符号量を削減した方が、動画像の符号化効率が向上する。なお、動画像の符号量の削減につながる場合は、階層のフレームレートが高くなるほど動きベクトルの精度を粗くしてもよい。 As shown in FIG. 8, it is preferable that the accuracy of the motion vector becomes coarser as the frame rate of the hierarchy decreases. This is due to the following reason. That is, in general, when the frame rate is lowered, that is, when the temporal distance between frames is increased, the correlation between the frames is reduced. It cannot be said. In other words, if the difference value of the difference image can be further reduced by searching with high accuracy, the code amount can be reduced as a result even if the number of bits required to encode the motion vector increases. Therefore, as the frame rate of the hierarchy is lowered, moving picture coding efficiency is improved by reducing the accuracy of the motion vector and reducing the code amount of the motion vector. In addition, when it leads to reduction of the code amount of a moving image, you may coarsen the precision of a motion vector, so that the frame rate of a hierarchy becomes high.

図９は、実施の形態に係る復号装置３００の構成図である。復号装置３００のストリーム解析部３１０には、符号化ストリームが入力される。ストリーム解析部３１０は、必要な階層に対応するデータ部分を抜き出し、さらにフレームの復号データと動きベクトルの復号データとを分離する。フレームデータは画像復号部３２０に与えられ、動きベクトルデータは動きベクトル復号部３３０に与えられる。また、符号化ストリームに動きベクトルの精度データが含まれている場合には、精度データを分離して動きベクトル復号部３３０に与える。 FIG. 9 is a configuration diagram of the decoding device 300 according to the embodiment. The encoded stream is input to the stream analysis unit 310 of the decoding device 300. The stream analysis unit 310 extracts a data portion corresponding to a necessary hierarchy, and further separates the decoded data of the frame from the decoded data of the motion vector. The frame data is provided to the image decoding unit 320, and the motion vector data is provided to the motion vector decoding unit 330. If the encoded stream includes motion vector accuracy data, the accuracy data is separated and supplied to the motion vector decoding unit 330.

画像復号部３２０は、エントロピー復号化、逆ウェーブレット変換を施して、最下位階層の低域フレームＬ_０と、すべての高域フレームＨ_０〜Ｈ_２を生成する。画像復号部３２０で復号されたフレームは、画像保持部３５０の専用の領域に格納される。 The image decoding unit 320 performs entropy decoding and inverse wavelet transform, and generates a low-frequency frame L _{0 in the} lowest hierarchy and all high-frequency frames H _{0 to} H ₂ . The frame decoded by the image decoding unit 320 is stored in a dedicated area of the image holding unit 350.

動きベクトル復号部３３０は、動きベクトルの精度データを利用して動きベクトル情報を復号した後、最下位階層における動きベクトルＭＶ_０と、より上位の階層の動きベクトルＭＶ_１、ＭＶ_２を計算する。動きベクトル復号部３３０で復号された動きベクトルは、動きベクトル保持部３６０の専用の領域に格納される。 The motion vector decoding unit 330 uses the motion vector accuracy data to decode the motion vector information, and then calculates the motion vector MV ₀ in the lowest hierarchy and the motion vectors MV ₁ and MV ₂ in the higher hierarchy. The motion vector decoded by the motion vector decoding unit 330 is stored in a dedicated area of the motion vector holding unit 360.

画像合成部３７０は、上述のＭＣＴＦ処理とは逆の手順でフレームを合成する。合成されたフレームは外部に出力されるとともに、さらに上位の階層のフレームが必要な場合は、後の処理のために合成したフレームを画像保持部３５０に格納する。 The image synthesis unit 370 synthesizes frames in the reverse procedure of the above-described MCTF processing. The synthesized frame is output to the outside, and when a higher layer frame is required, the synthesized frame is stored in the image holding unit 350 for later processing.

画像合成部で合成処理をする毎に、フレームレートの高い動画像の再生が可能になり、最終的には入力画像と同じフレームレートの動画像が得られる。 Each time the image composition unit performs composition processing, it is possible to reproduce a moving image having a high frame rate, and finally, a moving image having the same frame rate as the input image can be obtained.

以上述べたように、本実施の形態の符号化装置１００によれば、動きベクトルを符号化する際に、時間スケーラビリティの階層毎に適切な精度の動きベクトルを用いることによって、動きベクトル情報の符号量を削減できる。動画像の階層的符号化においては、動きベクトルの符号量自体が多くなるため、動きベクトルを効率的に符号化する必要があるが、本実施の形態によれば、動画像ストリーム全体の符号量を減らして圧縮効率を高めることができる。 As described above, according to the encoding apparatus 100 of the present embodiment, when encoding a motion vector, the motion vector information is encoded by using a motion vector with appropriate accuracy for each layer of temporal scalability. The amount can be reduced. In the hierarchical encoding of moving images, the amount of code of the motion vector itself increases, so it is necessary to efficiently encode the motion vector. However, according to the present embodiment, the amount of code of the entire moving image stream And the compression efficiency can be increased.

本実施の形態では、ＭＣＴＦの階層と動きベクトルの精度との間の相関に注目した。一般に、高周波成分が多く、かつ参照フレームとの相関が強いフレームにおいては、動きベクトルの精度を高めた高精度の動き補償を実施することによって予測誤差を小さくできるが、フレーム内を移動する物体の動きが速いために参照フレームとの相関が弱いフレームや、高周波成分の少ないフレームにおいては、動き補償の精度を高くしても予測誤差の低減には貢献することはなく、高精度の動きベクトルの情報が無駄になってしまう。これに対し、本実施の形態では、それぞれの階層に最適な動きベクトル精度を用いて符号化し、予測誤差の低減につながらない無駄な動きベクトルの符号量を抑えることにより、動画像の圧縮効率を向上させる。 In the present embodiment, attention is paid to the correlation between the MCTF hierarchy and the accuracy of the motion vector. In general, in a frame that has many high-frequency components and a strong correlation with the reference frame, the prediction error can be reduced by performing high-precision motion compensation with improved motion vector accuracy. For frames with weak correlation with the reference frame due to fast motion and frames with few high-frequency components, increasing the accuracy of motion compensation does not contribute to reducing the prediction error. Information is wasted. On the other hand, in the present embodiment, encoding is performed using the optimal motion vector accuracy for each layer, and the compression amount of moving images is improved by suppressing the amount of code of useless motion vectors that does not reduce the prediction error. Let

また、動きベクトルの精度を階層毎に同一のものとするので、例えばマクロブロック毎に動きベクトルを変えた場合には、動きベクトルの符号量は低下するものの符号化に要する演算量が増加するのに対し、本実施の形態では、演算量を増やすことなく動きベクトルの符号量を削減できる。 In addition, since the accuracy of the motion vector is the same for each layer, for example, when the motion vector is changed for each macroblock, the amount of computation required for encoding increases although the code amount of the motion vector decreases. On the other hand, in the present embodiment, the code amount of the motion vector can be reduced without increasing the calculation amount.

特に、ＭＣＴＦ技術にしたがった時間フィルタリングによる動画像の符号化においては、各階層で動きベクトルを符号化しなければならず、動きベクトル情報の符号量が増大するので、本実施形態は有効である。 In particular, in encoding a moving image by temporal filtering according to the MCTF technique, a motion vector must be encoded in each layer, and the amount of code of motion vector information increases, so this embodiment is effective.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

上記では、連続する２つのフレームからひとつの低域フレームを生成するハール（Haar）ウェーブレット変換によるＭＣＴＦ処理を行う場合の動きベクトルを例として説明したが、本発明は、連続する５つのフレームからひとつの低域フレームを生成し、連続する３つのフレームからひとつの高域フレームを生成する５／３ウェーブレット変換によるＭＣＴＦ処理を行う場合の動きベクトルについても適用することができる。 In the above description, a motion vector in the case of performing MCTF processing by Haar wavelet transform that generates one low-frequency frame from two consecutive frames has been described as an example. However, the present invention provides one from five consecutive frames. The present invention can also be applied to a motion vector in the case of performing MCTF processing by 5/3 wavelet transform in which a low-frequency frame is generated and one high-frequency frame is generated from three consecutive frames.

上記の説明では、符号化装置１００および復号装置３００は、Ｈ．２６４／ＡＶＣに準拠して動画像の符号化および復号を行ったが、本発明は、時間的スケーラビリティを持つ階層的な動画像の符号化および復号を行う他の方式にも適用することができる。 In the above description, the encoding device 100 and the decoding device 300 are H.264. H.264 / AVC is compliant with video coding and decoding, but the present invention can also be applied to other systems that perform hierarchical video coding and decoding with temporal scalability. .

また、上記の説明では、時間スケーラビリティを持つ動画像の符号化について説明したが、本発明にしたがった動きベクトルの符号化は、空間スケーラビリティを持つ階層的な動画像の符号化および復号を行う場合にも適用することができる。 In the above description, encoding of a moving image having temporal scalability has been described. However, encoding of a motion vector according to the present invention is performed when hierarchical moving image encoding and decoding having spatial scalability are performed. It can also be applied to.

実施の形態に係る符号化装置の構成図である。It is a block diagram of the encoding apparatus which concerns on embodiment. 低域フレームの生成方法を示す図である。It is a figure which shows the production | generation method of a low-pass frame. 高域フレームの生成方法を示す図である。It is a figure which shows the production | generation method of a high region frame. ＭＣＴＦ処理部の構成図である。It is a block diagram of a MCTF processing part. 各階層で出力される画像と動きベクトルを示す図である。It is a figure which shows the image and motion vector which are output in each hierarchy. ＭＣＴＦ技術にしたがった符号化方法を示すフローチャートである。It is a flowchart which shows the encoding method according to MCTF technique. 階層毎に動きベクトルの精度データを格納する様子を示す図である。It is a figure which shows a mode that the precision data of a motion vector are stored for every hierarchy. 階層のフレームレートと動きベクトル精度の対応付けの一例を示すテーブルである。It is a table which shows an example of matching of the frame rate of a hierarchy, and motion vector accuracy. 実施の形態に係る復号装置の構成図である。It is a block diagram of the decoding apparatus which concerns on embodiment.

Explanation of symbols

１０画像取得部、２０ＭＣＴＦ処理部、２１動きベクトル検出部、２８動きベクトル精度決定部、６０画像保持部、７０動きベクトル保持部、８０画像符号化部、９０動きベクトル符号化部、９２多重化部、１００符号化装置、３００復号装置、３１０ストリーム解析部、３２０画像復号部、３３０動きベクトル復号部、３５０画像保持部、３６０動きベクトル保持部、３７０画像合成部。 10 image acquisition unit, 20 MCTF processing unit, 21 motion vector detection unit, 28 motion vector accuracy determination unit, 60 image holding unit, 70 motion vector holding unit, 80 image encoding unit, 90 motion vector encoding unit, 92 multiplexing Unit, 100 encoding device, 300 decoding device, 310 stream analysis unit, 320 image decoding unit, 330 motion vector decoding unit, 350 image holding unit, 360 motion vector holding unit, 370 image synthesizing unit.

Claims

In an encoding method for generating encoded data having a plurality of layers having scalability from a moving image,
An encoding method characterized in that the accuracy of a motion vector used for motion compensation prediction in each layer can be changed for each layer.

In an encoding method for recursively performing motion compensation time filtering on a moving image to obtain a plurality of layers having different frame rates,
An encoding method characterized in that the accuracy of a motion vector used for motion compensation prediction in each layer can be changed for each layer.

The encoding method according to claim 1, wherein the hierarchy and the accuracy of the motion vector are associated in advance, and the corresponding information is included in encoded data of a moving image.

The encoding according to claim 1 or 2, wherein the hierarchy and the accuracy of the motion vector are associated with each other for a certain number of pictures, and the correspondence information is included in the encoded data of the moving image. Method.

The encoding method according to claim 1 or 2, wherein the hierarchy and the accuracy of the motion vector are associated in advance, and the accuracy of the motion vector in each layer is determined according to the association.

6. The encoding method according to claim 1, wherein the accuracy of the motion vector changes step by step every time a hierarchy is passed.

The encoding method according to claim 2, wherein the accuracy of the motion vector is coarsened as the frame rate of the hierarchy is lowered.