JP2008512023A

JP2008512023A - Method and apparatus for motion prediction

Info

Publication number: JP2008512023A
Application number: JP2007529081A
Authority: JP
Inventors: ワン，ジン
Original assignee: Koninklijke Philips NV; Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2004-08-31
Filing date: 2005-08-23
Publication date: 2008-04-17
Also published as: KR20070051294A; WO2006024988A3; WO2006024988A2; EP1790166A2

Abstract

ビデオストリームの空間階層型の方法及び装置が開示される。基準の動きベクトルは、本発明の圧縮スキームに導入され、この基準の動きベクトルに従って、ベースレイヤ及びエンハンスメントレイヤは、ビデオストリームの画像の対応するフレームの動きベクトルをそれぞれ取得し、これによりベースレイヤ及びエンハンスメントレイヤをそれぞれ生成する。導入される基準の動きベクトルは、ベースレイヤに関する動き予測をエンハンスメントレイヤに関する動き予測と関連付けし、これによりベースレイヤとエンハンスメントレイヤに関する動き予測の計算の全体量が低減される。さらに、基準の動きベクトルを取得するための基準フレームがオリジナルのビデオ系列から取得され、オリジナルビデオ系列に更なる厳しい動作が行われないため、基準の動きベクトルはビデオ系列内での実際の動きを良好に反映する。A spatial hierarchy method and apparatus for video streams is disclosed. A reference motion vector is introduced into the compression scheme of the present invention, and according to this reference motion vector, the base layer and the enhancement layer respectively obtain the motion vector of the corresponding frame of the image of the video stream, whereby the base layer and Each enhancement layer is generated. The introduced reference motion vector associates motion prediction for the base layer with motion prediction for the enhancement layer, thereby reducing the overall amount of motion prediction computation for the base layer and enhancement layer. Furthermore, since the reference frame for obtaining the reference motion vector is obtained from the original video sequence and no further strict operation is performed on the original video sequence, the reference motion vector represents the actual motion within the video sequence. Reflect well.

Description

本発明は、ビデオストリームを圧縮する方法及び装置に関し、より詳細には、空間階層型圧縮スキーム（spatial layered compression scheme）を使用することでビデオストリームを圧縮する方法及び装置に関する。 The present invention relates to a method and apparatus for compressing a video stream, and more particularly, to a method and apparatus for compressing a video stream using a spatial layered compression scheme.

デジタルビデオに含まれる大量のデータのため、高精細テレビ番組を制作するとき、高解像度のビデオ信号を送信することは大きな問題である。特に、デジタル画像のそれぞれのフレームは、画素ポイント（ピクセルとも呼ばれる）のグループから構成されるスチルピクチャ（画像とも呼ばれる）である。画素の量は、特定のシステムのディスプレイの精細度に依存する。したがって、高精細ビデオのオリジナルのデジタル情報の量は非常に大きい。ＭＰＥＧ−２，ＭＰＥＧ−４及びＨ．２６３等のような多くのビデオ圧縮規格は、送信されるべき必要なデータを低減するために生成される。 Due to the large amount of data contained in digital video, transmitting a high resolution video signal is a major problem when producing high definition television programs. In particular, each frame of a digital image is a still picture (also called an image) composed of a group of pixel points (also called pixels). The amount of pixels depends on the display definition of the particular system. Therefore, the amount of original digital information of high definition video is very large. MPEG-2, MPEG-4 and H.264. Many video compression standards, such as H.263, are generated to reduce the necessary data to be transmitted.

上述された規格の全ては、空間レイヤ、時間レイヤ、ＳＮＲレイヤ等を含む、レイヤ技術をサポートする。階層符号化では、ビットストリームは、符号化のための２種類のビットストリーム又はレイヤを超えるビットストリーム又はレイヤに分割される。その後、デコードの間、それぞれのレイヤは、高解像度信号を形成するため、望まれるように結合される。たとえば、ベースレイヤは、低解像度ビデオストリームを提供し、エンハンスメントレイヤは、ベースレイヤ画像をエンハンスするため、更なる情報を提供する。 All of the above-mentioned standards support layer technologies, including spatial layers, temporal layers, SNR layers, etc. In hierarchical coding, a bitstream is divided into two types of bitstreams or layers for coding that exceed the bitstream or layer. Thereafter, during decoding, the respective layers are combined as desired to form a high resolution signal. For example, the base layer provides a low resolution video stream, and the enhancement layer provides additional information to enhance the base layer image.

現在の空間階層型の圧縮スキームのなかで、前記階層型の圧縮スキームを適合することに加えて、前のフレームと後のフレームとの間の関係に従って予測画像を取得するために動き予測が使用される。圧縮される前に、入力ビデオストリームは、Ｉ，Ｐ及びＢフレームを形成し、パラメータ設定に従う系列を形成するために処理される。Ｉフレームは、それ自身の情報のみに従って符号化され、Ｐフレームは、フロントにおけるそれに最も近いＩ又はＰフレームに従って予測的に符号化し、Ｂフレームは、それ自身又はその前後のフレームに従って予測的に符号化する。 In addition to adapting the hierarchical compression scheme within the current spatial hierarchical compression scheme, motion prediction is used to obtain a predicted image according to the relationship between the previous frame and the subsequent frame Is done. Before being compressed, the input video stream is processed to form I, P and B frames and form a sequence according to the parameter settings. An I frame is encoded according to its own information only, a P frame is encoded predictively according to the closest I or P frame at the front, and a B frame is encoded predictively according to itself or the frames before and after it. Turn into.

図１は、ＭＰＥＧ−２／ＭＰＥＧ−４の空間階層型圧縮をサポートするビデオコーダ１００のブロック図である。ビデオエンコーダ１００は、ベースエンコーダ１１２及びエンハンスメンとエンコーダ１１４を有する。ベースエンコーダは、ダウンサンプラ１２０、動き予測（ＭＥ）手段１２２、動き補償器（ＭＣ）１２４，ライトアングル変換回路（たとえば離散コサイン変換（ＤＣＴ））１３５、量子化器（Ｑ）１３２、可変長符号化器（ＶＬＣ）１３４、ビットレート制御回路１３５、逆量子化器（ＩＱ）１３８、逆変換回路（ＩＤＣＴ）１４０、スイッチ１２８及び１４４、並びにアップサンプラ１５０を有する。エンハンスメントエンコーダ１１４は、動き予測手段１５４、動き補償器１５５、ライトアングル変換（たとえばＤＣＴ変換）回路１５８、量子化器１６０、可変長符号化器１６２、ビットレート制御回路１６４、逆量子化器１６６、逆変換回路（ＩＤＣＴ）１６８及びスイッチ１７０及び１７２を有する。上述した手段の全ての機能は、当該技術分野では公知であり、ここでは詳細に説明しない。 FIG. 1 is a block diagram of a video coder 100 that supports MPEG-2 / MPEG-4 spatial hierarchical compression. The video encoder 100 includes a base encoder 112 and an enhancement and encoder 114. The base encoder includes a downsampler 120, motion prediction (ME) means 122, motion compensator (MC) 124, right angle conversion circuit (for example, discrete cosine transform (DCT)) 135, quantizer (Q) 132, variable length code. It includes a quantizer (VLC) 134, a bit rate control circuit 135, an inverse quantizer (IQ) 138, an inverse transform circuit (IDCT) 140, switches 128 and 144, and an upsampler 150. The enhancement encoder 114 includes a motion predictor 154, a motion compensator 155, a right angle transform (eg, DCT transform) circuit 158, a quantizer 160, a variable length encoder 162, a bit rate control circuit 164, an inverse quantizer 166, An inverse conversion circuit (IDCT) 168 and switches 170 and 172 are included. All functions of the above means are known in the art and will not be described in detail here.

動き予測は、ビデオ圧縮システムにおける最も時間を消費する部分のうちの１つであり、すなわち、動き予測の計算量が多くなると、ビデオ圧縮システムの符号化効率が低くなることが知られている。上述された階層符号化圧縮スキームでは、同じフレームのビデオ画像を予測する間、動き予測は、ベースレイヤとエンハンスメントレイヤの両者について行われ、それらの間には関連性が存在しない。しかし、動き予測がベースレイヤとエンハンスメントレイヤのそれぞれについて行われるとき、予測は同じ画像フレームについて行われるので、比較的大きな部分のサーチプロセスが繰り返され、これにより動き予測の多くの計算量となる、圧縮スキームの低い符号化効率となる。したがって、良好な符号化効率をもつ空間階層型のビデオ圧縮スキームが必要とされる。 Motion prediction is one of the most time-consuming parts in a video compression system, that is, it is known that the coding efficiency of a video compression system decreases as the amount of motion prediction calculations increases. In the hierarchical coding compression scheme described above, while predicting a video image of the same frame, motion prediction is performed for both the base layer and the enhancement layer, and there is no relationship between them. However, when motion prediction is performed for each of the base layer and enhancement layer, the prediction is performed for the same image frame, so a relatively large portion of the search process is repeated, which results in a large amount of computation for motion prediction. The coding efficiency of the compression scheme is low. Therefore, there is a need for a spatial layered video compression scheme with good coding efficiency.

本発明は、基準となる動きベクトルを導入することで、上述された空間階層型の圧縮スキームの問題点を克服するため、非常の効率的な空間階層型の圧縮方法に向けられ、本発明により、ベースレイヤの動き予測がエンハンスメントレイヤの動き予測と関連付けされるのが可能となり、本来に反復的なサーチプロセスは１度に終了することができ、少量のサーチが実行され、これにより、これに基づいて、動き予測の計算上の複雑度が低減され、圧縮される符号化の効率が改善される。 The present invention is directed to a highly efficient spatial hierarchical compression method in order to overcome the problems of the spatial hierarchical compression scheme described above by introducing a reference motion vector. The base layer motion prediction can be associated with the enhancement layer motion prediction, the inherently iterative search process can be completed at once, and a small amount of search is performed, which Based on this, the computational complexity of motion prediction is reduced and the efficiency of the compressed encoding is improved.

本発明に係る実施の形態は、ビデオストリームの空間階層型の圧縮の方法、及びその装置を開示する。第一に、ビデオストリームの画像のフレームのそれぞれについて、基準となる動きベクトルを取得するためにオリジナルのビデオストリームを処理し、次いで、基準となる動きベクトルをダウンサンプリングし、ビデオストリームをダウンサンプリングし、第二に、ダウンサンプリングされた基準の動きベクトルに従って、ダウンサンプリングされたビデオストリームの画像の対応するフレームの動きベクトルを取得し、つぎに、それぞれ動きベクトルを使用することで、ダウンサンプリングビデオストリームの画像の対応するフレームを処理して、これによりベースレイヤを生成し、最終的に、基準となる動きベクトルに従って、エンハンスメントレイヤを生成する間にビデオストリームの画像の対応するフレームの動きベクトルを取得し、動きベクトル及びベースレイヤを使用することでビデオストリームを処理して、これによりエンハンスメントレイヤを生成する。 Embodiments according to the present invention disclose a method and apparatus for spatially hierarchical compression of a video stream. First, for each frame of the image in the video stream, process the original video stream to obtain a reference motion vector, then downsample the reference motion vector and downsample the video stream Secondly, obtaining the motion vector of the corresponding frame of the image of the downsampled video stream according to the downsampled reference motion vector, and then using each motion vector to obtain the downsampled video stream Process the corresponding frame of the image to generate a base layer, and finally obtain the motion vector of the corresponding frame of the image of the video stream while generating the enhancement layer according to the reference motion vector Shi Processing the video stream by using the motion vector and the base layer, thereby generating an enhancement layer.

本発明に係る代替的な実施の形態は、ビデオストリームの空間階層型の圧縮の更に別の方法、及びその装置を開示する。第一に、ダウンサンプリングされたビデオストリームの画像のそれぞれのフレームについて基準となる動きベクトルを取得するためにビデオストリームをダウンサンプリングし、第二に、前記基準となる動きベクトルに従って、ダウンサンプリングされたビデオストリームの画像の対応するフレームの動きベクトルを処理し、次いで、動きベクトルを使用することでダウンサンプリングされたビデオストリームを処理して、これによりベースレイヤを生成し、最終的に、エンハンスメントレイヤを生成する間に前記基準の動きベクトルをアップサンプリングし、アップサンプリングされた基準の動きベクトルに従って、生成されたビデオストリームの画像の対応するフレームの動きベクトルを取得し、動きベクトル及びベースレイヤを使用することでビデオストリームを処理して、これによりエンハンスメントレイヤを生成する。 An alternative embodiment according to the present invention discloses yet another method and apparatus for spatial hierarchical compression of a video stream. First, the video stream is downsampled to obtain a reference motion vector for each frame of the image of the downsampled video stream, and second, downsampled according to the reference motion vector. Process the motion vector of the corresponding frame of the image in the video stream, then use the motion vector to process the downsampled video stream, thereby generating the base layer, and finally the enhancement layer During the generation, the reference motion vector is upsampled, the motion vector of the corresponding frame of the image of the generated video stream is obtained according to the upsampled reference motion vector, and the motion vector and the base layer are used. Processing the video stream by, thereby generating the enhancement layer.

本発明に係る別の実施の形態は、ビデオストリームの空間階層型の圧縮の更なる方法、及びその装置を開示する。第一に、ビデオストリームを処理して、これによりベースレイヤを生成し、次いで、ベースレイヤの画像のそれぞれのフレームについて動きベクトルをアップサンプリングし、これにより画像の対応するフレームの基準となる動きベクトルを取得し、最終的に、基準となる動きベクトルに従って、ビデオストリームの画像の対応するフレームの動きベクトルを取得し、これにより動きベクトル及びベースレイヤを使用してビデオストリームを処理してエンハンスメントレイヤを生成する。 Another embodiment according to the invention discloses a further method and apparatus for spatially hierarchical compression of a video stream. First, the video stream is processed, thereby generating a base layer, and then up-sampling the motion vector for each frame of the base layer image, thereby providing a reference to the corresponding frame of the image And finally obtain the motion vector of the corresponding frame of the video stream image according to the reference motion vector, thereby processing the enhancement stream by processing the video stream using the motion vector and the base layer. Generate.

他の目的及び効果は、本発明の完全な理解と共に、以下の詳細な説明、並びに添付図面及び特許請求の範囲を参照することで、明らかとなるであろう。本発明は、実施の形態、及び添付図面を参照して更に詳細に説明される。図面を通して、同じ参照符号は、類似又は対応する特徴又は機能を示している。 Other objects and advantages will become apparent upon reading the following detailed description and the accompanying drawings and claims, along with a thorough understanding of the present invention. The present invention will be described in more detail with reference to embodiments and the accompanying drawings. Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.

図２は、本発明の１実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。符号化システム２００は、階層型圧縮のために使用され、ベースレイヤ部分は、ビデオストリームの低解像度のベース情報を提供するために使用され、エンハンスメントレイヤは、エッジエンハンスメント情報を送信するために使用され、両方の種類の情報は、高解像度のピクチャ情報を形成するために受信端末で再結合される。 FIG. 2 is a conceptual diagram of an encoding system using a reference motion vector according to an embodiment of the present invention. The encoding system 200 is used for hierarchical compression, the base layer portion is used to provide low resolution base information of the video stream, and the enhancement layer is used to transmit edge enhancement information. Both types of information are recombined at the receiving terminal to form high resolution picture information.

符号化システム２００は、取得手段２１６、ベースレイヤ取得手段２１２、及びエンハンスメントレイヤ取得手段２１４を有する。 The encoding system 200 includes an acquisition unit 216, a base layer acquisition unit 212, and an enhancement layer acquisition unit 214.

ここで、取得手段２１６は、オリジナルのビデオストリームを処理するために使用され、これによりビデオストリームの画像のそれぞれのフレームの基準となる動きベクトルを取得する。取得手段２１６は、動き予測手段２７６及びフレームメモリ２８２を有する。フレームメモリ２８２は、オリジナルビデオ系列を記憶するために使用される。動き予測手段２７６は、フレームメモリ２８２から基準フレーム（たとえばＩ又はＰフレーム）を取得するために使用され、基準フレームに従って現在のフレーム（たとえばＰフレーム）に動き予測を行い、これにより計算により現在のフレームの基準の動きベクトルを導出する。 Here, the acquisition unit 216 is used to process the original video stream, and thereby acquires a motion vector that is a reference for each frame of the image of the video stream. The acquisition unit 216 includes a motion prediction unit 276 and a frame memory 282. The frame memory 282 is used for storing the original video sequence. The motion prediction means 276 is used to obtain a reference frame (eg, I or P frame) from the frame memory 282 and performs motion prediction on the current frame (eg, P frame) according to the reference frame, thereby calculating the current frame by calculation. A reference motion vector of the frame is derived.

ベースレイヤ取得手段２１２は、基準となる動きベクトルを使用してビデオストリームを処理し、これによりベースレイヤを生成する。手段２１２は、ダウンサンプル１２０，２８６を有する。ダウンサンプラ１２０は、オリジナルビデオストリームをダウンサンプルするために使用される。ダウンサンプラ２８６は、基準の動きベクトルをダウンサンプリングするために使用される。勿論、当業者であれば、１つのダウンサンプラでオリジナルのビデオストリーム及び基準の動きベクトルについてダウンサンプリングを実行することが可能なことを知っている。 The base layer acquisition unit 212 processes the video stream using the reference motion vector, thereby generating a base layer. Means 212 has downsamples 120, 286. Downsampler 120 is used to downsample the original video stream. The downsampler 286 is used to downsample the reference motion vector. Of course, those skilled in the art know that it is possible to perform downsampling on the original video stream and the reference motion vector with one downsampler.

ベースレイヤ取得手段２１２は、動きベクトル取得手段２２２を更に有する。動きベクトル取得手段２２２は、ダウンサンプリングされた基準の動きベクトルに基づいて、ダウンサンプリングされたビデオストリームの画像の対応するフレームの動きベクトルを取得するために使用される。動きベクトル取得手段２２２が動きベクトルを取得するプロセスは、以下のように記載される。 The base layer acquisition unit 212 further includes a motion vector acquisition unit 222. The motion vector acquisition means 222 is used to acquire the motion vector of the corresponding frame of the image of the downsampled video stream based on the downsampled reference motion vector. The process by which the motion vector acquisition unit 222 acquires a motion vector is described as follows.

ベースレイヤ取得手段２１２は、ダウンサンプリングビデオストリームを処理し、これによりベースレイヤを生成するために動きベクトルを使用して、ベースレイヤ生成手段２１３を更に有する。ダウンサンプラ１２０，２８６及び動きベクトル取得手段２２２を除いて、ベースレイヤ取得手段２１２内の全ての他の手段は、図１のベースレイヤエンコーダと基本的に同じであり、ベースレイヤ生成手段２１３に属し、動き補償器１２４、ＤＣＴ変換回路１３０、量子化器１３２、可変長エンコーダ１３４、ビットレート制御回路１３５、逆量子化器１３８、逆変換回路１４０、算術ユニット１２５，１４８、スイッチ１２８，１４４、並びにアップサンプラ１５０を含む。ベースレイヤ生成手段２１３が動きベクトル取得手段２２２から出力された動きベクトルに基づいてベースレイヤを生成するプロセスは、従来技術のそれと実質的に同じであり、以下に詳細に説明される。 Base layer acquisition means 212 further comprises base layer generation means 213, using the motion vectors to process the downsampled video stream and thereby generate the base layer. Except for the downsamplers 120 and 286 and the motion vector acquisition unit 222, all other units in the base layer acquisition unit 212 are basically the same as the base layer encoder in FIG. 1 and belong to the base layer generation unit 213. , Motion compensator 124, DCT transform circuit 130, quantizer 132, variable length encoder 134, bit rate control circuit 135, inverse quantizer 138, inverse transform circuit 140, arithmetic units 125 and 148, switches 128 and 144, and An upsampler 150 is included. The process in which the base layer generation unit 213 generates the base layer based on the motion vector output from the motion vector acquisition unit 222 is substantially the same as that of the prior art, and will be described in detail below.

図１と比較して、上述されたベースレイヤ取得手段２１２では、同じ参照符号は、同一又は類似の機能又は特徴を有するコンポーネントを示す。動き予測手段１２２と動きベクトル取得手段２２２との間の違いは、動きベクトルを取得する方式である。図１の動き予測手段１２２は、ビデオストリームの画像の対応するフレームの動きベクトルを取得するため、大きなサーチウィンドウでサーチするためにフレームメモリ（図示せず）の基準フレームを直接使用し、図２の動きベクトル取得手段２２２は、ビデオストリームの画像の対応するフレームの動きベクトルを取得するため、前記基準の動きベクトルに基づいて小さなサーチウィンドウでサーチする。 Compared to FIG. 1, in the base layer acquisition means 212 described above, the same reference numerals indicate components having the same or similar functions or features. The difference between the motion prediction unit 122 and the motion vector acquisition unit 222 is a method for acquiring a motion vector. The motion prediction means 122 of FIG. 1 directly uses a reference frame in a frame memory (not shown) for searching in a large search window to obtain the motion vector of the corresponding frame of the image of the video stream, and FIG. The motion vector acquisition means 222 performs a search with a small search window based on the reference motion vector in order to acquire the motion vector of the corresponding frame of the video stream image.

エンハンスメントレイヤ取得手段２１４は、基準の動きベクトル及びベースレイヤを使用することでビデオストリームを処理し、これによりエンハンスメントレイヤを生成する。手段２１４は、動きベクトル取得手段２５４及びエンハンスメントレイヤ生成手段２１５を有する。 The enhancement layer acquisition unit 214 processes the video stream by using the reference motion vector and the base layer, thereby generating an enhancement layer. The means 214 includes a motion vector acquisition means 254 and an enhancement layer generation means 215.

動きベクトル取得手段２５４は、基準の動きベクトルに基づいてビデオストリームの画像の対応するフレームの動きベクトルを取得するために使用される。 The motion vector acquisition means 254 is used to acquire the motion vector of the corresponding frame of the image of the video stream based on the reference motion vector.

エンハンスメントレイヤ生成手段２１５は、動きベクトル及びベースレイヤを使用することでビデオストリームを処理し、これによりエンハンスメントレイヤを生成する。エンハンスメントレイヤ取得手段２１４では、動きベクトル取得手段２５４を除いて、図１のエンハンスメントレイヤエンコーダ１１４におけるのと実質的に同じコンポーネントであり、それらの全ては、エンハンスメントレイヤ生成手段２１５に属し、動き補償器１５５、ＤＣＴ回路１５８、量子化器１６０、可変長符号化器１６２、ビットレート制御回路１６４、逆量子化器１６６、逆ＤＣＴ回路１６８、及びスイッチ１７０、７２を含む。これらのコンポーネントは、機能に関してベースレイヤ取得手段２１２の対応するコンポーネントに同様である。エンハンスメントレイヤ生成手段２１５が動きベクトル取得手段２５４から出力された動きベクトルを使用することでエンハンスメントレイヤを生成するプロセスは、従来技術のプロセスと本質的に同じであり、詳細な説明は以下のように与えられる。 The enhancement layer generation means 215 processes the video stream by using the motion vector and the base layer, thereby generating an enhancement layer. The enhancement layer acquisition unit 214 is substantially the same components as those in the enhancement layer encoder 114 of FIG. 1 except for the motion vector acquisition unit 254, all of which belong to the enhancement layer generation unit 215 and are motion compensators. 155, a DCT circuit 158, a quantizer 160, a variable length encoder 162, a bit rate control circuit 164, an inverse quantizer 166, an inverse DCT circuit 168, and switches 170 and 72. These components are similar in function to the corresponding components of the base layer acquisition means 212. The process of generating the enhancement layer by using the motion vector output from the motion vector acquisition unit 254 by the enhancement layer generation unit 215 is essentially the same as the prior art process, and the detailed description is as follows. Given.

図１と比較して、上述されたベースレイヤ取得手段２１４で、同じ参照符号は、同一又は類似の特徴及び機能を有するコンポーネントを示している。動き予測手段１５４と動きベクトル取得手段２５４との間の違いは、これらが動きベクトルを取得する方法である。図１の動き予測手段１５４は、ビデオストリームの画像の対応するフレームの動きベクトルを取得するため、大きなサーチウィンドウでサーチするためにフレームメモリ（図示せず）の基準フレームを直接使用し、図２の動きベクトル取得手段２５４は、ビデオストリームの画像の対応するフレームの動きベクトルを取得するため、前記基準の動きベクトルに基づいて小さなサーチウィンドウで更にサーチする。 Compared to FIG. 1, in the base layer acquisition means 214 described above, the same reference numerals indicate components having the same or similar features and functions. The difference between the motion prediction means 154 and the motion vector acquisition means 254 is how they acquire the motion vector. The motion prediction means 154 of FIG. 1 directly uses a reference frame in a frame memory (not shown) to search in a large search window to obtain the motion vector of the corresponding frame of the image of the video stream. The motion vector acquisition means 254 further performs a search in a small search window based on the reference motion vector in order to acquire the motion vector of the corresponding frame of the image of the video stream.

図２と共に、ベースレイヤ取得手段２１２及びエンハンスメントレイヤ取得手段２１４が、取得手段２１６により出力される基準の動きベクトルを使用することでそれぞれの動きベクトルを取得し、これによりベースレイヤ及びエンハンスメントレイヤを生成するプロセスは、以下に更に詳細に記載される。 As shown in FIG. 2, the base layer acquisition unit 212 and the enhancement layer acquisition unit 214 use the reference motion vector output by the acquisition unit 216 to acquire each motion vector, thereby generating the base layer and the enhancement layer. The process to do is described in more detail below.

オリジナルのビデオストリームは、取得手段２１６に入力され、次いで、動き予測手段２７６及びフレームメモリ２８２にそれぞれ供給される。なお、取得手段２１６に供給される前に、ビデオストリームは、Ｉ，Ｐ，Ｂフレームを形成するために処理され、パラメータ設定に従ってＩ，Ｂ，Ｐ，Ｂ，Ｐ．．．，Ｂ，Ｐとしてかかる系列を形成する。入力ビデオ系列は、フレームメモリ２８２に記憶される。動き予測手段２７６は、フレームメモリ２８２から基準のフレーム（たとえばＩフレーム）を取得するために使用され、基準のフレームに従って現在のフレーム（たとえばＰフレーム）に動き予測を行い、これにより現在のフレームのマクロブロックの基準の動きベクトルを計算する。マクロブロックは、現在符号化されたフレーム内で１６*１６画素をもつサブブロックであり、現在のマクロブロックと基準フレームの間のブロックを照合させ、現在のマクロブロックの基準の動きベクトルを計算し、これにより現在のフレームの基準の動きベクトルを取得するために使用される。 The original video stream is input to the acquisition unit 216 and then supplied to the motion prediction unit 276 and the frame memory 282, respectively. Note that, before being supplied to the acquisition means 216, the video stream is processed to form I, P, B frames, and I, B, P, B, P.P. . . , B, P form such a series. The input video sequence is stored in the frame memory 282. The motion predictor 276 is used to obtain a reference frame (eg, I frame) from the frame memory 282, and performs motion prediction on the current frame (eg, P frame) according to the reference frame, thereby Calculate the reference motion vector of the macroblock. A macroblock is a sub-block with 16 * 16 pixels in the current encoded frame, and the block between the current macroblock and the reference frame is matched to calculate the reference motion vector of the current macroblock. This is used to obtain the reference motion vector of the current frame.

ＭＰＥＧにおける画像予測について使用される、イントラフレーム符号化、前方予測符号化、後方予測符号化及び双方向予測符号化を含む４つの方法がある。Ｉフレームは、イントラフレーム符号化画像であり、Ｐフレームはイントラフレーム符号化又は前方予測符号化又は後方予測符号化の画像であり、Ｂフレームは、イントラフレーム符号化又は前方予測符号化又は双方向予測符号化の画像である。 There are four methods used for picture prediction in MPEG, including intra-frame coding, forward prediction coding, backward prediction coding and bi-directional prediction coding. The I frame is an intra frame encoded image, the P frame is an image of intra frame encoding, forward prediction encoding, or backward prediction encoding, and the B frame is intra frame encoding, forward prediction encoding, or bidirectional. It is an image of predictive coding.

動き予測手段２７６は、Ｐフレームに対する前方予測を行い、その基準の動きベクトルを計算する。さらに、動き予測手段は、Ｂフレームに対する前方又は双方向予測を行い、その基準の動きベクトルを計算する。イントラフレーム符号化について動き予測が必要とされない。 The motion prediction unit 276 performs forward prediction on the P frame and calculates a reference motion vector. Furthermore, the motion prediction means performs forward or bi-directional prediction on the B frame and calculates a reference motion vector. Motion prediction is not required for intra-frame coding.

例としてＰフレームへの前方予測を採用すると、基準の動きベクトルを計算するプロセスは、以下のように記載される。動き予測手段２７６は、フレームメモリ２８２から前の基準のフレームを読み出し、現在のフレームの画素ブロックに最も整合するマクロブロックの前の基準フレームのサーチウィンドウでサーチする。従来技術においてマッチサーチングの幾つかのアルゴリズムがあり、一般にマッチングの状態は、現在の入力ブロックの画素と基準フレームの対応するブロックの画素との間の平方自乗誤差（ＭＡＤ）又は絶対値（ＭＳＥ）により判断される。最小値ＭＡＤ又はＭＳＥを有する基準フレームの対応するブロックは、最適なマッチングブロックであり、現在のブロックの位置に対する前記最適なマッチングブロックの相対的な位置は、基準の動きベクトルである。 Taking forward prediction to a P frame as an example, the process of calculating a reference motion vector is described as follows. The motion prediction means 276 reads the previous reference frame from the frame memory 282 and searches in the search window of the previous reference frame of the macroblock that most closely matches the pixel block of the current frame. There are several algorithms of match searching in the prior art, and generally the state of matching is the square-square error (MAD) or absolute value (MSE) between the pixel of the current input block and the pixel of the corresponding block of the reference frame. ). The corresponding block of the reference frame with the minimum value MAD or MSE is the optimal matching block, and the relative position of the optimal matching block with respect to the current block position is the reference motion vector.

上述された処理により、取得手段２１６における動き予測手段２７６は、ビデオストリームの画像のフレームの基準の動きベクトルを取得する。ダウンサンプラ２８６によりダウンサンプリングされた後、動き予測手段２２２がベースレイヤの画像の同じフレームに更なる予測を行うように、基準の動きベクトルは、ベースレイヤ取得手段２１２の動き予測手段２２２に供給される。さらに、動き予測手段２５４がエンハンスメントレイヤでの画像の同じフレームに更なる動き予測をなすように、基準の動きベクトルは、動き予測手段２５４、エンハンスメントレイヤ取得手段２１４に供給される。 Through the processing described above, the motion prediction unit 276 in the acquisition unit 216 acquires the reference motion vector of the frame of the image of the video stream. After being downsampled by the downsampler 286, the reference motion vector is supplied to the motion prediction means 222 of the base layer acquisition means 212 so that the motion prediction means 222 performs further prediction on the same frame of the base layer image. The Further, the reference motion vector is supplied to the motion prediction unit 254 and the enhancement layer acquisition unit 214 so that the motion prediction unit 254 performs further motion prediction on the same frame of the image in the enhancement layer.

取得手段２１６が入力ビデオストリームを動き予測する間、ベースレイヤ取得手段２１２及びエンハンスメントレイヤ取得手段２１４は、入力ビデオストリームを予測符号化するが、前記予測符号化は、ベースレイヤ及びエンハンスメントレイヤ基準の動きベクトルに基づいて更なる動き予測をなす必要があるため、時間的に僅かに遅延される。
ベースレイヤが先の基準の動きベクトルに基づいて更なる動き予測を行うプロセスは、以下に説明される。 While the acquisition unit 216 predicts the motion of the input video stream, the base layer acquisition unit 212 and the enhancement layer acquisition unit 214 predictively encode the input video stream, and the predictive encoding is based on the motion based on the base layer and the enhancement layer. There is a slight delay in time due to the need for further motion prediction based on the vector.
The process by which the base layer performs further motion prediction based on previous reference motion vectors is described below.

オリジナルの入力ビデオ信号は、セパレータにより分割され、ベースレイヤ取得手段２１２及びエンハンスメントレイヤ取得手段のそれぞれに供給される。ベースレイヤ取得手段では、入力ビデオストリームは、ダウンサンプラ１２０に供給される。ダウンサンプラは、入力ビデオストリームの解像度を低減するために使用される低域通過フィルタである。ダウンサンプリングされたビデオストリームは、動きベクトル取得手段２２２に供給される。動きベクトル取得手段２２２は、フレームメモリに記憶されたビデオ系列の前の基準フレームの画像を取得し、先のダウンサンプラ２８６から出力された現在のフレームのダウンサンプリングされた基準の動きベクトルに基づいて前の基準フレームの小さいサーチ窓内の現在のフレームに最良に一致するマクロブロックをサーチし、これによりダウンサンプリングされたビデオストリームの画像の対応するフレームのビデオ動きベクトルを取得する。 The original input video signal is divided by the separator and supplied to each of the base layer acquisition unit 212 and the enhancement layer acquisition unit. In the base layer acquisition unit, the input video stream is supplied to the downsampler 120. A downsampler is a low pass filter that is used to reduce the resolution of an input video stream. The downsampled video stream is supplied to the motion vector acquisition unit 222. The motion vector acquisition means 222 acquires an image of the previous reference frame of the video sequence stored in the frame memory, and based on the down-sampled reference motion vector of the current frame output from the previous downsampler 286. Search for the macroblock that best matches the current frame within the small search window of the previous reference frame, thereby obtaining the video motion vector of the corresponding frame of the image of the downsampled video stream.

動きベクトル取得手段２２２から上述された予測モード、基準の動きベクトル及び動きベクトルを受信した後、動き補償器１２４は、予測モードに基づいてエンコード及び部分的にデコードされているフレームメモリ（図示せず）に記憶された前の基準フレームの画像データ、基準の動きベクトル及び動きベクトルを読み出し、基準の動きベクトルに従って画像の前のフレームをシフトし、次いで動きベクトルに従って再びシフトし、これにより画像の現在のフレームを予測する。勿論、画像の前のフレームは、基準の動きベクトル及び動きベクトルの合計である量だけ一度だけシフトすることができ、このケースでは、基準の動きベクトル及び動きベクトルの合計は、前記画像のフレームの動きベクトルとして使用される。 After receiving the above-described prediction mode, reference motion vector and motion vector from the motion vector acquisition means 222, the motion compensator 124 is a frame memory (not shown) that is encoded and partially decoded based on the prediction mode. ) Stored in the previous reference frame image data, reference motion vector and motion vector, and shift the previous frame of the image according to the reference motion vector and then shift again according to the motion vector, thereby Predict frames. Of course, the previous frame of the image can be shifted once by an amount that is the sum of the reference motion vector and the motion vector, and in this case, the reference motion vector and the sum of the motion vectors is the sum of the frame of the image. Used as a motion vector.

次いで、動き補償器１２４は、予測された画像を算術ユニット１２５及びスイッチ１４４に供給する。また、算術ユニット１２５は、入力ビデオストリームを受け、入力ビデオストリームの画像と動き補償器１２４から到来する予測された画像との間の差を計算する。差は、ＤＣＴ回路１３０に供給される。動き予測手段１２２から受信された予測モードがイントラフレーム予測である場合、動き予測器１２４は、予測された画像を出力しない。かかるケースでは、算術ユニット１２５は、上述された処理を実行しないが、ビデオストリームをＤＣＴ回路１３０に直接入力する。 The motion compensator 124 then provides the predicted image to the arithmetic unit 125 and the switch 144. Arithmetic unit 125 also receives the input video stream and calculates the difference between the image of the input video stream and the predicted image coming from motion compensator 124. The difference is supplied to the DCT circuit 130. When the prediction mode received from the motion prediction unit 122 is intra frame prediction, the motion predictor 124 does not output a predicted image. In such a case, the arithmetic unit 125 does not perform the processing described above, but directly inputs the video stream to the DCT circuit 130.

ＤＣＴ回路１３０は、量子化器１３２に供給される、ＤＣＴ係数を取得するために、算術ユニットから出力された信号にＤＣＴ処理を実行する。量子化器１３２は、バッファに記憶されたデータ量に基づいて量子化のための大きさ（量子化レベル）を設定し、量子化レベルを使用することでＤＣＴ回路１３０から供給されるＤＣＴ係数を量子化する。量子化されたＤＣＴ係数及び設定された量子化の大きさは、ＶＬＣユニット１３４に供給される。 The DCT circuit 130 performs DCT processing on the signal output from the arithmetic unit to obtain the DCT coefficient supplied to the quantizer 132. The quantizer 132 sets a size for quantization (quantization level) based on the amount of data stored in the buffer, and uses the quantization level to change the DCT coefficient supplied from the DCT circuit 130. Quantize. The quantized DCT coefficient and the set quantization magnitude are supplied to the VLC unit 134.

量子化器１３２から供給された量子化の大きさに従って、ＶＬＣユニット１３４は、量子化器からの量子化係数を、たとえばハフマンコードといった可変長コードに変換し、これによりベースレイヤを生成する。 According to the magnitude of quantization supplied from the quantizer 132, the VLC unit 134 converts the quantization coefficient from the quantizer into a variable length code such as a Huffman code, thereby generating a base layer.

さらに、変換された量子化係数はバッファ（図示せず）に出力される。量子化係数及び量子化の大きさは、量子化係数をＤＣＴ係数に変換するように、量子化の大きさに従って量子化係数を逆に量子化する逆量子化器１３８に供給される。ＤＣＴ係数は、ＤＣＴ係数への逆ＤＣＴ変換を実行する逆ＤＣＴユニット１４０に供給される。取得された逆ＤＣＴ係数は、算術ユニット１４８に供給される。算術ユニット１４８は、逆ＤＣＴユニット１４０から逆ＤＣＴ係数を受け、スイッチ１４４の位置に従って動き補償器１２４からデータを受ける。算術ユニット１４８は、逆ＤＣＴユニット１４０により供給される信号と動き補償器１２４により供給される予測画像との合計を計算し、原画像を部分的にデコードする。しかし、予測モードがイントラフレーム符号化からなる場合、逆ＤＣＴユニット１４０の出力は、フレームメモリに直接に送出される。算術ユニット１４８により取得されたデコードされた画像は、将来的に、イントラフレーム符号化、前方符号化、後方符号化、又は双方向符号化のための基準フレームとして使用されるため、フレームメモリに供給され、記憶される。 Further, the converted quantization coefficient is output to a buffer (not shown). The quantization coefficient and the quantization magnitude are supplied to an inverse quantizer 138 that inversely quantizes the quantization coefficient according to the quantization magnitude so as to convert the quantization coefficient into a DCT coefficient. The DCT coefficients are supplied to an inverse DCT unit 140 that performs an inverse DCT transform to DCT coefficients. The obtained inverse DCT coefficient is supplied to the arithmetic unit 148. Arithmetic unit 148 receives the inverse DCT coefficient from inverse DCT unit 140 and receives data from motion compensator 124 according to the position of switch 144. Arithmetic unit 148 calculates the sum of the signal supplied by inverse DCT unit 140 and the predicted image supplied by motion compensator 124 and partially decodes the original image. However, if the prediction mode consists of intra-frame coding, the output of the inverse DCT unit 140 is sent directly to the frame memory. The decoded image acquired by the arithmetic unit 148 is supplied to the frame memory for future use as a reference frame for intra-frame encoding, forward encoding, backward encoding, or bidirectional encoding. And memorized.

また、算術ユニット１４０の出力は、高解像度の入力ビデオストリームの解像度と実質的に同じ解像度を有する再構成されたストリームを生成するため、アップサンプラ１５０に供給される。しかし、フィルタ、及び圧縮及び伸長によりもたらされる損傷のため、再構成されたストリームは、ある程度でエラーを有する。かかる違いは、オリジナルの変化していない高解像度のビデオストリームから再構成された高解像度のビデオストリームを引くことで決定され、エンコードされるべきエンハンスメントレイヤに入力される。したがって、エンハンスメントレイヤは、かかる差の情報を有するフレームをエンコードし、圧縮する。 The output of arithmetic unit 140 is also provided to upsampler 150 to generate a reconstructed stream having substantially the same resolution as that of the high resolution input video stream. However, due to the filter and the damage caused by compression and decompression, the reconstructed stream has some error. Such differences are determined by subtracting the reconstructed high-resolution video stream from the original unchanged high-resolution video stream and input to the enhancement layer to be encoded. Therefore, the enhancement layer encodes and compresses frames having such difference information.

エンハンスメントレイヤの予測符号化のプロセスは、ベースレイヤのそれに非常に類似している。取得手段２１６が基準の動きベクトルを取得した後、基準の動きベクトルは、エンハンスメントレイヤ取得手段２１４の動き予測手段２５４に供給される。このように、動き予測手段２５４は、基準の動きベクトルに基づいてエンハンスメントレイヤで同じフレームの画像に更なる動き予測を行い、これによりビデオストリームの画像の対応するフレームの動きベクトルを取得する。次いで、予測モード、基準の動きベクトル及び前記動きベクトルに従って、動き補償器１５５は、基準フレームを相応してシフトし、現在のフレームを予測する。この動き予測のプロセスはベースレイヤのそれに類似するため、ここでは更に詳細に説明しない。 The process of enhancement layer predictive coding is very similar to that of the base layer. After the acquisition unit 216 acquires the reference motion vector, the reference motion vector is supplied to the motion prediction unit 254 of the enhancement layer acquisition unit 214. As described above, the motion prediction unit 254 performs further motion prediction on the image of the same frame in the enhancement layer based on the reference motion vector, thereby obtaining the motion vector of the corresponding frame of the image of the video stream. Then, according to the prediction mode, the reference motion vector and the motion vector, the motion compensator 155 shifts the reference frame accordingly and predicts the current frame. This motion prediction process is similar to that of the base layer and will not be described in further detail here.

図３は、本発明の１実施の形態に係る基準の動きベクトルを使用することによる符号化のフローチャートである。このフローは、手段２００の動作フローである。 FIG. 3 is a flowchart of encoding by using a reference motion vector according to an embodiment of the present invention. This flow is an operation flow of the means 200.

はじめに、たとえば解像度１９２０*１０８０ｉを有するビデオストリームといった、特定の高解像度のビデオストリームを受ける（ステップＳ３０５）。
つぎに、ビデオストリームの画像のそれぞれのフレームについて基準の動きベクトルを取得する（ステップＳ３１０）。現在のフレームがＰフレームであるとし、現在のフレームに最良に整合しているマクロブロックは、基準のフレームＩのサーチウィンドウでサーチされ、たとえば、サーチは、動き予測により推薦される値である±１５画素のサイズを有するサーチウィンドウで行われる。最適なマッチングブロックが発見された後、現在のブロックとマッチングブロックとの間のシフトは、基準の動きベクトルである。この基準の動きベクトルは、エラーを持たないオリジナルのビデオストリームで基準フレームを予測することで取得されるため、実際のビデオの動きを良好に反映する。 First, a specific high-resolution video stream, such as a video stream having a resolution of 1920 * 1080i, is received (step S305).
Next, a reference motion vector is acquired for each frame of the image of the video stream (step S310). Assuming that the current frame is a P frame, the macroblock that best matches the current frame is searched in the reference frame I search window, for example, the search is the value recommended by motion estimation ± This is done in a search window having a size of 15 pixels. After the optimal matching block is found, the shift between the current block and the matching block is the reference motion vector. Since this reference motion vector is obtained by predicting the reference frame in the original video stream without error, it accurately reflects the actual video motion.

基準の動きベクトルの取得プロセスは、以下の式により表現され、ここで（Ｂ_x，Ｂ_y）は動きベクトルである。 Acquisition process of the reference motion vector is expressed by the following equation, where (B _x, B _y) represents a motion vector.

式（１）では、ａｒｇはＳＡＤが最小であるときに現在のマクロブロックに対応する動きベクトルである。

In equation (1), arg is the motion vector corresponding to the current macroblock when SAD is minimum.

式（２）では、ＳＡＤは、２つのマクロブロックの類似性を示し、それぞれの画素間の違いの絶対値であり、ｍ及びｎは、水平及び垂直方向のそれぞれにおけるマッチングブロックの移動する成分であり、Ｐ_c（i,j）及びＲ_p（i,j）は、現在のフレームの画素と前の基準フレームの画素である。添え字“ｃ”及び“ｐ”は、“現在のフレーム”及び“前のフレーム”をそれぞれ表す。

In equation (2), SAD indicates the similarity between two macroblocks and is the absolute value of the difference between the respective pixels, and m and n are the moving components of the matching block in the horizontal and vertical directions, respectively. Yes, P _c (i, j) and R _p (i, j) are the pixel of the current frame and the pixel of the previous reference frame. The subscripts “c” and “p” represent “current frame” and “previous frame”, respectively.

基準の動きベクトルは、ビデオストリームのベースレイヤ及びエンハンスメントレイヤにおける動きを再予測するためにそれぞれ使用され、ベースレイヤ及びエンハンスメントレイヤは、この基準の動きベクトルに基づいて小さなレンジでの動き予測のみを必要とし、これにより計算上の複雑度を低減し、符号化システムの圧縮された符号化効率が増加される。 The reference motion vector is used to re-predict motion in the base layer and enhancement layer of the video stream, respectively, and the base layer and enhancement layer only need motion prediction in a small range based on this reference motion vector This reduces the computational complexity and increases the compressed coding efficiency of the coding system.

つぎに、（Ｂ_x，Ｂ_y’）を得るために基準の動きベクトル（Ｂ_x，Ｂ_y）をダウンサンプリングする（ステップＳ３１２）。
たとえば７２０*４８０ｉにその解像度を低減するため、ビデオストリームをダウンサンプリングする（ステップＳ３１６）。 Next, in order to obtain (B _x , B _y ′), the reference motion vector (B _x , B _y ) is down-sampled (step S312).
For example, in order to reduce the resolution to 720 * 480i, the video stream is downsampled (step S316).

ダウンサンプリングされた基準の動きベクトル（ＢX’，Ｂy’）によれば、ダウンサンプリングされたビデオストリームの画像の対応するフレームの動きベクトルが取得される（ステップ３３２）。ここで説明される画像の対応するフレームは、基準の動きベクトルが取得されたときに現在のフレームと同じフレームである。これは、予測が同じフレームで行われ、基準フレームの小さなサーチウィンドウで現在のブロックに楽観的に整合するマクロブロックを更にサーチすることで、基準の動きベクトル（Ｂx’，Ｂy’）に基づいて、動きベクトル（Ｄx1；Ｄy1）が取得されるためである。サーチウィンドウが±２画素の新たなサーチウィンドウであるという経験により証明される。式（３）及び（４）を参照することで、サーチプロセスは、より明らかに理解される。 According to the downsampled reference motion vector (BX ', By'), the motion vector of the corresponding frame of the image of the downsampled video stream is obtained (step 332). The corresponding frame of the image described here is the same frame as the current frame when the reference motion vector is acquired. This is based on the reference motion vector (Bx ′, By ′) by further searching for macroblocks that are predicted in the same frame and that are optimistically matched to the current block in a small search window of the reference frame. This is because the motion vector (Dx1; Dy1) is acquired. This is evidenced by experience that the search window is a new search window of ± 2 pixels. The search process is more clearly understood with reference to equations (3) and (4).

動き予測が基準の動きベクトル（Ｂx’，Ｂy’）に基づいてサーチすることが式（４）により示される。基準の動きベクトルを計算するときにサーチの大部分は終了しているため、このステップにおいて最適な整合しているブロックを発見するために非常に制限されたサーチのみが必要とされる。±２画素のサーチウィンドウにおけるサーチ量は、±１５画素のサーチウィンドウのそれよりも少ない。

Equation (4) shows that the motion prediction searches based on the reference motion vector (Bx ′, By ′). Since most of the search is done when calculating the reference motion vector, only a very limited search is needed to find the best matching block in this step. The search amount in the search window of ± 2 pixels is smaller than that of the search window of ± 15 pixels.

ダウンサンプリングされたビデオストリームは、ベースレイヤを生成するため。動きベクトルを使用することで処理される（ステップＳ３２６）。現在のフレームの予測フレームは、先に記載された基準の動きベクトル及び動きベクトルに従って基準フレームをシフトするのみで得られ、公知の処理は、ベースレイヤを生成するために十分である。 Downsampled video stream to generate a base layer. Processing is performed using the motion vector (step S326). The predicted frame of the current frame is obtained simply by shifting the reference frame according to the reference motion vectors and motion vectors described above, and known processing is sufficient to generate the base layer.

基準の動きベクトル（Ｂx，Ｂy）に従ってビデオストリームの画像の対応するフレームの動きベクトルを取得する（ステップＳ３３２）。なお、基準の動きベクトルが取得されたとき、ここでの画像の対応するフレームは、現在のフレームと同じフレームである。これは、予測が同じフレームで行われ、基準フレームの比較的小さなサーチウィンドウで現在のブロックに楽観的に整合するマクロブロックを更にサーチすることで、基準の動きベクトル（Ｂx，Ｂy）に基づいて、動きベクトル（Ｄx2；Ｄy2）が取得されるためである。動きベクトルを取得する方法は、ベースレイヤにより動きベクトルを取得する方法に類似しており、したがって詳細な説明は省略される。 In accordance with the reference motion vector (Bx, By), the motion vector of the corresponding frame of the video stream image is acquired (step S332). When the reference motion vector is acquired, the corresponding frame of the image here is the same frame as the current frame. This is based on the reference motion vector (Bx, By) by further searching for macroblocks that are predicted in the same frame and that are optimistically matched to the current block in a relatively small search window of the reference frame. This is because the motion vector (Dx2; Dy2) is acquired. The method for obtaining the motion vector is similar to the method for obtaining the motion vector by the base layer, and thus detailed description thereof is omitted.

次いで、動きベクトル及びベースレイヤを使用することでビデオストリームを処理し、これによりエンハンスメントレイヤを生成する（ステップＳ３３６）。
したがって、この実施の形態では、基準の動きベクトルは、動きを予測するために同時にベースレイヤとエンハンスメントレイヤにより使用され、したがって両方のレイヤでのサーチのための計算上の複雑さが低減され、圧縮符号化の効率が増加される。 Next, the video stream is processed using the motion vector and the base layer, thereby generating an enhancement layer (step S336).
Thus, in this embodiment, the reference motion vector is used by the base layer and enhancement layer simultaneously to predict motion, thus reducing the computational complexity for searching in both layers and compressing Encoding efficiency is increased.

本発明及び従来技術の図１に関する圧縮スキームの計算上の複雑さは、以下のように分析され、比較される。
高精細度（ＨＤ）フレーム及び標準精細度（ＳＤ）の解像度は、１９２０×１０８８ｉ及び７２０×４８０ｉのそれぞれであるとされ、サーチウィンドウは±１５画素からなる。Ｙ成分について２つのマイクロブロック間のエラー速度ＳＡＤの計算上の複雑度は、Ｔ_SADである。 The computational complexity of the compression scheme for the present invention and the prior art FIG. 1 is analyzed and compared as follows.
The resolution of the high definition (HD) frame and standard definition (SD) is assumed to be 1920 × 1088i and 720 × 480i, respectively, and the search window consists of ± 15 pixels. The computational complexity of the error rate SAD between two microblocks for the Y component is T _SAD .

ＨＤフレーム及びＳＤフレーム（Ｙ成分のみを考慮）のマクロブロックの全体の数は、８１６０及び１３５０のそれぞれであり、±１５画素のサーチウィンドウでのそれぞれのマクロブロックについて動き予測を実行する場合、マクロブロックの好適な動きベクトルを取得するための最も多い計算量は、（３１*３１*Ｔ_SAD＝９６１*Ｔ_SAD）である。ＨＤフレームの計算量は、（８１６０*９６１*Ｔ_SAD＝７，８４１，７６０*Ｔ_SAD）であり、ＳＤフレーム（ベースレイヤ）の計算量は、（１３５０*９６１*Ｔ_SAD＝１，２９７，３５０*Ｔ_SAD）である。 The total number of macroblocks of HD frames and SD frames (considering only the Y component) is 8160 and 1350, respectively, and when motion prediction is performed for each macroblock in the search window of ± 15 pixels, The largest amount of computation for obtaining a suitable motion vector for a block is (31 * 31 * T _SAD = 961 * T _SAD ). The calculation amount of the HD frame is (8160 * 961 * T _SAD = 7,841,760 * T _SAD ), and the calculation amount of the SD frame (base layer) is (1350 * 961 * T _SAD = 1,297, 350 * T _SAD ).

図１に示される符号化システムについて、それぞれのフレームの動きベクトルの計算の全体の最も大きな量は、ＨＤフレームの計算量とＳＤフレームの計算量との合計であり、すなわち（９，１３９，１１０*Ｔ_SAD）である。 For the coding system shown in FIG. 1, the largest overall motion vector calculation for each frame is the sum of the HD frame and SD frame calculations, ie (9, 139, 110). * T _SAD ).

図２に示される符号化システムにとって、基準の動きベクトルの計算量は、（７，８４１，７６０*Ｔ_SAD）である。それぞれのマクロブロックの動き予測は比較的小さなサーチウィンドウ（±２画素）で実行されるとき、好適な動きベクトルを得るための最も多くの計算量は、（５*５*Ｔ_SAD＝２５*Ｔ_SAD）である。ＳＤフレーム（ベースレイヤ）の計算量は、（１３５０*２５*Ｔ_SAD＝３３，７５０*Ｔ_SAD）であり、ＨＤフレーム（エンハンスメントレイヤ）の計算量は、（８１６０*２５*Ｔ_SAD＝２０４，００*Ｔ_SAD）である。 For the coding system shown in FIG. 2, the amount of calculation of the reference motion vector is (7,841,760 * T _SAD ). When motion prediction for each macroblock is performed with a relatively small search window (± 2 pixels), the most computational effort to obtain a suitable motion vector is (5 * 5 * T _SAD = 25 * T _SAD ). The calculation amount of the SD frame (base layer) is (1350 * 25 * T _SAD = 33,750 * T _SAD ), and the calculation amount of the HD frame (enhancement layer) is (8160 * 25 * T _SAD = 204, 00 * T _SAD ).

図２に示される符号化システムについて、それぞれのフレームの動きベクトルの計算の全体的な最も多くの量は、基準の動きベクトルの計算量の合計であり、比較的小さなサーチウィンドウでＳＤフレームの量をサーチし、比較的小さなサーチウィンドウでＨＤフレームの量をサーチする、すなわち（７，８７５，５１０*Ｔ_SAD）である。 For the coding system shown in FIG. 2, the overall largest amount of motion vector computation for each frame is the sum of the computations of the reference motion vector, and the amount of SD frames in a relatively small search window. And search for the amount of HD frames in a relatively small search window, ie, (7,875,510 * T _SAD ).

図１に示される符号化システムとの比較において、図２に示される符号化システムは、以下のパーセンテージで計算量を低減する。
Ｒ＝｜７，８７５，５１０−９，１３９，１１０｜／９，１３９，１１０＝１４％
図４は、本発明の別の実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。この実施の形態の符号化システム４００は、図２に示されるシステムに類似しており、ここでの説明は、それらの間の違いにのみ集中し、同様の部分を省略する。それらの間の違いは、取得手段４１０がダウンサンプラ１２０及び基準の動きベクトル取得手段４１６を有することである。オリジナルビデオストリームは、ダウンサンプラ１２０によりはじめにダウンサンプリングされる。次いで、ダウンサンプリングされたビデオストリームは、基準の動きベクトル取得手段４１６に供給され、すなわち動き予測手段４７６及びフレームメモリ２８２にそれぞれ供給され、これによりビデオストリームの画像のそれぞれのフレームの基準の動きベクトルを取得する。次いで、基準の動きベクトルは、ベースレイヤ取得手段４１２の動き予測手段４２２に直接に供給され、基準の動きベクトルに基づいて、手段４２２は、ダウンサンプリングされたビデオストリームの画像の対応するフレームの動きベクトルを取得するために比較的小さなサーチウィンドウで動きを再び予測し、その後、ベースレイヤ生成手段４１３は、動きベクトルを使用することでダウンサンプリングされたビデオストリームを処理し、これによりベースレイヤを生成する。 In comparison with the encoding system shown in FIG. 1, the encoding system shown in FIG. 2 reduces the computational complexity by the following percentages.
R = | 7,875,510-9,139,110 | / 9,139,110 = 14%
FIG. 4 is a conceptual diagram of an encoding system using a reference motion vector according to another embodiment of the present invention. The encoding system 400 of this embodiment is similar to the system shown in FIG. 2, and the description here concentrates only on the differences between them, and the same parts are omitted. The difference between them is that the acquisition means 410 has a downsampler 120 and a reference motion vector acquisition means 416. The original video stream is first downsampled by the downsampler 120. The down-sampled video stream is then supplied to the reference motion vector acquisition means 416, i.e. to the motion prediction means 476 and the frame memory 282, respectively, so that the reference motion vector of each frame of the image of the video stream. To get. The reference motion vector is then supplied directly to the motion prediction means 422 of the base layer acquisition means 412, and based on the reference motion vector, the means 422 can detect the motion of the corresponding frame of the image of the downsampled video stream. The motion is predicted again with a relatively small search window to obtain the vector, and then the base layer generation means 413 processes the down-sampled video stream by using the motion vector, thereby generating the base layer. To do.

さらに、エンハンスメントレイヤ取得手段４１４で、先に記載された基準の動きベクトルは、はじめにアップサンプラ４８６によりアップサンプリングされ、次いで、動きベクトル取得手段、すなわち動きベクトル予測手段４５４は、ビデオストリームの画像の対応するフレームの動きベクトルを取得するため、アップサンプリングされた基準の動きベクトルに基づいて動きを再び予測する。次いで、ビデオストリームは、基準の動きベクトル及びベースレイヤでエンハンスメントレイヤ生成手段４１５により処理され、これによりエンハンスメントレイヤを生成する。 Further, in the enhancement layer acquisition unit 414, the reference motion vector described above is first up-sampled by the up-sampler 486, and then the motion vector acquisition unit, that is, the motion vector prediction unit 454 performs the correspondence of the image of the video stream. In order to obtain the motion vector of the frame to be reproduced, the motion is predicted again based on the upsampled reference motion vector. The video stream is then processed by the enhancement layer generation means 415 with the reference motion vector and the base layer, thereby generating an enhancement layer.

上述された説明から、ベースレイヤ及びエンハンスメントレイヤにおける動き予測は互いに関連付けされることが分かり、同じフレームの画像を予測するときにそれらにより行われる必要がある繰り返しのサーチが一度で終了し、ベースレイヤ及びエンハンスメントレイヤは、同じ基準の動きベクトルに基づいて比較的小さなサーチウィンドウ内で再び予測する。サーチプロセスは大いに節約されるため、全体の符号化システムの計算量が低減される。 From the above description, it can be seen that motion predictions in the base layer and the enhancement layer are related to each other, and the iterative search that needs to be performed by them when predicting images of the same frame is completed at one time, And the enhancement layer predicts again within a relatively small search window based on the same reference motion vector. Since the search process is greatly saved, the overall coding system complexity is reduced.

図５は、本発明の更なる実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。この実施の形態の符号化システム５００は、図２に示されるシステムに類似しており、ここでの説明は、それらの間の違いにのみ集中し、同様の部分を省略する。違いは、ベースレイヤ取得手段５１２の動き予測手段５２２がベースレイヤの画像のそれぞれのフレームの動きベクトルを出力し、前記動きベクトルは、基準の動きベクトル取得手段、すなわちアップサンプラ５８６により画像の対応するフレームの基準の動きベクトルとして使用されるためにアップサンプリングされ、基準の動きベクトルは、エンハンスメントレイヤ取得手段５１４の動き予測手段５５４に供給される。基準の動きベクトルに基づいて、動き予測は、比較的に小さなサーチウィンドウで再び処理され、これによりビデオストリームの画像の対応するフレームの動きベクトルを取得する。次いで、基準の動きベクトルに従って、動きベクトルは、ベースレイヤの出力と同様に、エンハンスメントレイヤ生成手段５１５は、図２に示される実施の形態に類似した方法でエンハンスメントレイヤを生成する。 FIG. 5 is a conceptual diagram of an encoding system using a reference motion vector according to a further embodiment of the present invention. The encoding system 500 of this embodiment is similar to the system shown in FIG. 2, and the description here concentrates only on the differences between them and omits similar parts. The difference is that the motion prediction means 522 of the base layer acquisition means 512 outputs the motion vector of each frame of the base layer image, and the motion vector corresponds to the image by the reference motion vector acquisition means, that is, the upsampler 586. Up-sampled for use as a frame reference motion vector, the reference motion vector is supplied to the motion prediction means 554 of the enhancement layer acquisition means 514. Based on the reference motion vector, the motion prediction is processed again with a relatively small search window, thereby obtaining the motion vector of the corresponding frame of the video stream image. Then, according to the reference motion vector, the enhancement vector generation means 515 generates the enhancement layer in a manner similar to the embodiment shown in FIG. 2 in the same manner as the output of the base layer.

上述された内容から、エンハンスメントレイヤは、ベースレイヤのそれに同一なサーチ部分を省略し、したがって符号化システムの計算の全体量を低減するように、この実施の形態では、ベースレイヤで取得された動きベクトルに基づいて、エンハンスメントレイヤは、比較的小さなレンジで再びそのサーチを処理する。 From what has been described above, in this embodiment, the enhancement layer obtains the motion obtained at the base layer so as to omit the search portion that is identical to that of the base layer and thus reduce the overall amount of computation of the coding system. Based on the vector, the enhancement layer processes the search again with a relatively small range.

本発明は特定の実施の形態と共に記載されたが、多くの代替、変更及び変形が上述された記載に照らして当業者にとって明らかである。したがって、特許請求の範囲の精神及び範囲に含まれる全ての係る代替、変更及び変形を包含することが意図される。 Although the present invention has been described with particular embodiments, many alternatives, modifications and variations will be apparent to those skilled in the art in light of the above description. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and scope of the appended claims.

従来技術の空間階層型圧縮ビデオエンコーダのブロック図である。1 is a block diagram of a conventional spatial hierarchical compression video encoder. FIG. 本発明の実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。It is a conceptual diagram of the encoding system using the reference | standard motion vector which concerns on embodiment of this invention. 本発明の１実施の形態に係る基準の動きベクトルを使用することによる符号化のフローチャートである。6 is a flowchart of encoding by using a reference motion vector according to an embodiment of the present invention. 本発明の別の実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。It is a conceptual diagram of the encoding system using the reference | standard motion vector which concerns on another embodiment of this invention. 本発明の更なる実施の形態に係る基準の動きベクトルを使用した符号化システムの概念図である。FIG. 6 is a conceptual diagram of an encoding system using a reference motion vector according to a further embodiment of the present invention.

Claims

A spatial hierarchical compression method of a video stream,
a) processing the video stream to obtain a reference motion vector for each frame of an image of the video stream;
b) processing the video stream using the reference motion vector to generate a base layer;
c) processing the video stream using the reference motion vector and the base layer to generate an enhancement layer;
Including methods.

Said step a)
Down-sampling the video stream;
Obtaining the reference motion vector for each frame of an image of a downsampled video stream;
The method of claim 1 comprising:

Said step b)
Obtaining the motion vector of a corresponding frame of an image of the downsampled video stream according to the reference motion vector;
Processing the downsampled video stream using the motion vector to generate the enhancement layer;
The method of claim 2 comprising:

Said step c)
Up-sampling the reference motion vector;
Obtaining the motion vector of a corresponding frame of an image of the video stream according to an upsampled motion vector;
Processing the video stream using the motion vector and the base layer to generate an enhancement layer;
The method of claim 2 comprising:

Said step b)
Down-sampling the reference motion vector;
Down-sampling the video stream;
Obtaining the motion vector of a corresponding frame of an image of the downsampled video stream in response to the downsampled reference motion vector;
Processing the downsampled video stream using the motion vector to generate the base layer;
The method of claim 1 comprising:

Said step c)
Obtaining a motion vector of a corresponding frame of an image of the video stream according to the reference motion vector;
Processing the video stream using the motion vector and the base layer to generate the enhancement layer;
The method of claim 5 comprising:

A spatial hierarchical compression method of a video stream,
a) processing the video stream to generate a base layer;
b) up-sampling the motion vector of each frame of the base layer image to obtain a reference motion vector of the corresponding frame of the image;
c) processing the video stream using the reference motion vector and a base layer to generate an enhancement layer;
Including methods.

Said step c)
Obtaining the motion vector of a corresponding frame of an image of the video stream according to the reference motion vector;
Processing a video stream using the motion vector and the base layer to generate the enhancement layer;
The method of claim 7 comprising:

A spatial hierarchical compression device for a video stream,
Obtaining means used to process the video stream and obtain a reference motion vector for each frame of an image of the video stream;
Base layer acquisition means for processing the video stream using the reference motion vector and generating a base layer;
Enhancement layer acquisition means for processing the video stream using the reference motion vector and the base layer to generate an enhancement layer;
Having a device.

The acquisition means includes
A downsampler used to downsample the video stream;
Reference motion vector acquisition means used to acquire the reference motion vector of each frame of the image of the downsampled video stream;
The apparatus of claim 9 comprising:

The base layer acquisition means includes
Motion vector acquisition means used to acquire the motion vector of a corresponding frame of the image of the downsampled video stream based on the reference motion vector;
Base layer generating means for processing the down-sampled video stream using the motion vector and generating the base layer;
The apparatus of claim 10 comprising:

The enhancement layer acquisition means includes:
An upsampler used to upsample the reference motion vector;
Motion vector acquisition means for acquiring the motion vector of a corresponding frame of an image of the video stream according to an upsampled motion vector;
Enhancement layer generating means for processing the video stream using the motion vector and the base layer to generate the enhancement layer;
The apparatus of claim 10 comprising:

The base layer acquisition means includes
A downsampler used to downsample the reference motion vector and the video stream;
Motion vector acquisition means used to acquire the motion vector of a corresponding frame of the image of the downsampled video stream based on the downsampled reference motion vector;
Base layer generating means for processing the down-sampled video stream using the motion vector and generating the base layer;
The apparatus of claim 9 comprising:

The enhancement layer acquisition means includes:
Motion vector acquisition means for acquiring the motion vector of the corresponding frame of the image of the video stream according to the reference motion vector;
Enhancement layer generating means for processing the video stream using the motion vector and the base layer to generate the enhancement layer;
The apparatus of claim 9 comprising:

A spatial hierarchical compression device for a video stream,
Base layer acquisition means used to process the video stream to generate a base layer;
Reference motion vector acquisition means used to upsample the motion vector of each frame of the base layer image and acquire a reference motion vector of the corresponding frame of the image;
Enhancement layer acquisition means used to process the video stream using the reference motion vector and a base layer to generate an enhancement layer;
Including the device.

The enhancement layer acquisition means includes:
Motion vector acquisition means for acquiring the motion vector of the corresponding frame of the image of the video stream according to the reference motion vector;
Enhancement layer generating means for processing the video stream by using the motion vector and the base layer and generating the enhancement layer;
The apparatus of claim 15 comprising: